Re: [PHP] A no brainer...



# edlazor@xxxxxxxxx / 2006-10-16 16:40:34 -0700:

On Oct 16, 2006, at 6:20 PM, Roman Neuhauser wrote:
Modern filesystems cope well with large directories (plus it's
quite trivial to derive a directory hierarchy from the
filenames). Looking at the numbers produced by timing various
operations in a directory with exactly 100,000 files on sw RAID 1
(2 SATA disks) in my desktop i'd say this concern is completely
baseless.

I knew that you could get PHP to use a directory structure for the
session data files, but hearing that you can have 100k files in a
single directory and not run into performance issues or problems is
news to me. Which OS are you running?

FreeBSD. What do your tests show, on what OS/version/FS?

It still uses files, but hopefully you don't hit them very often,
especially when you're dealing with the same table records.

A RDBMS is basically required to hit the disk with the data on
commit. One of the defining features of a RDBMS, Durability, says
that once you commit, the data is there no matter what. The
host OS may crash right after the commit has been acked, the data
must stay.

You can turn on query caching in MySQL, but this will give you
*nothing* for purposes of session storage.

Unless session storage is used to save time in retrieving data,
right? I'm seeing your point on the writing, but what about reading?

I think it would be kind of fun to run some actual tests.

Check out the query cache in the MySQL 5.0 manual, it clearly
says that any modification of a table (INSERT, UPDATE, ALTER
TABLE...) will invalidate all cache entries that use that table.
IOW, request from any visitor such that it starts or updates
a session invalidates query cache entries for all sessions.

Max cache hits for any single cache entry depend on the number
of requests a visitor can produce in sequence without updating the
session table, number of concurrent visitors, request frequency...

You're likely to max cache hits for any entry at 1, and all but
that one will be purged with 0 cache hits.

Also, having raw data is always faster than having to process it
before you can use it.

I don't know what that means.

Bytes in files on disk are as raw
as it gets, you get one roundtrip process -> kernel -> process;
compare the communication protocol MySQL (or just any other DB)
uses where data is marshalled by the client, and unmarshalled by
the server, overhead of the database process(es) taking part in
the write...

If you pull a record from the db, you can access the data. Or you
can query the db, get the serialized data, de-serialize it, and now
access the data.

That's not really filesystem vs. database, that's "to serialize or
not to serialize".

I tested this previously and found the database to be faster.
The references I gave supported this and listed additional benefits.

The article from Chris Shiflett contains zero quantifications of the
purported performance benefits.

Things change tho, especially with technology. It seems like we
should be able to test this pretty easily. I actually think it would
be fun to do as well. Do you have a box we can test this on?
Meanwhile, I'll check one of my boxes to see if I can use it. If
anything, it'll be interesting to see if two systems report the same.

Yes I can provide a testbed, just post a testing methodology
proposal.

--
How many Vietnam vets does it take to screw in a light bulb?
You don't know, man. You don't KNOW.
Cause you weren't THERE. http://bash.org/?255991
.



Relevant Pages

  • Re: How best to use php5 objects between pages?
    ... > or cache storage (PHP-APC cache can provide this, ... 1.He can use APC cache instead of dealing shared memory by himself. ... session files or databases. ... Session files are relatively secure. ...
    (comp.lang.php)
  • Re: Avoiding generating redo logs
    ... a query is executed which ... We store the results in the "cache" and then the ... outside of the session (as ours is a web app over http a new session is ... regard to this specific question regqarding NOLOGGING ...
    (comp.databases.oracle.server)
  • Re: public static/shared read only properties in global.asax
    ... I am trying to simplify the use of the Session, Application and Cache ... Most of our developers are just breaking ... >> Public Shared ReadOnly Property TitleAs String ...
    (microsoft.public.dotnet.framework)
  • Re: Retrive User Control from Cache
    ... If the object is serializable, you should be able to store it in Cache, ... Session, Application state, etc. ... Dim service As aureports.ReportingService = New ... Dim mitem As New ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: [PATCH/RFC] Simplified Readahead
    ... Right the first time the next_size is decremented ... >since the pages are already in the page cache the next_size keeps incrementing. ... >>in the page cache the readahead turns off. ... What I do now for page cache hits is count how many pages in a row are ...
    (Linux-Kernel)