Skip to content

php

Benchmarking (in general)

I just wanted to follow up on Davey's post about extending PDO, because I think he isn't being very clear about what he's doing (no offence, Davey!)

Davey's Cortex framework allows you to pass in either a DSN or an already-instantiated-PDO object to its data object class, and Davey's post claims, quite rightly, that it is faster to not create brand new connections each time you create an instance of his framework objects.

Let's see if we can come up with a slightly more scientific and perhaps more fair set of tests.

When benchmarking it's important to get some decent numbers. If your test is over in less than 1 second, your readings are probably wildly inaccurate because your system may not have had a chance to properly utilize hardware or OS level caches or otherwise adjust in the same way that it would be doing under a consistent level of load.

If you're running a quick test, try to make it last more than 10 seconds. If you want to get a better numbers, run the test for longer; 5 or 10 minutes should give pretty decent results for a given code fragment, and an hour is probably as good as you could ever hope for.

Here's a simple test harness that runs for approximately 5 minutes:

<?php
    $start = time();
    $deadline = $start + (5 * 60); // run for 5 minutes
    $iters = 0;
    do {    
      something();
      ++$iters;
    } while (time() <= $deadline);
    $end = time();
    $diff = $end - $start;
    printf("Ran %.2f iterations per minute (%d/%d)\\n",
        (60.0 * $iters) / $diff, $iters, $diff);
    ?>

This harness simply repeats a task until the time limit is more or less up, and then summarizes how many times it managed to run within the specified time, normalizing it to iterations per minute.

Notice that I'm not particular bothered about sub-second time intervals here, because they don't really have much impact when compared to a 5 minute time duration--5 minutes plus or minus half a second is still near as damn it 5 minutes.

Our first test creates some kind of object that does some kind of work on a database connection. We'll make this one re-connect to the database each time; this is equivalent to Davey's extending PDO case:

<?php
   // represents some object in your framework
   class TestObject {
      var $db;
      function __construct($db) {
         $this->db = $db;
      }
      function doWork() {
         # Limited to 100 rows, because the connection cost
         # will be lost in the noise of the fetch otherwise
         array_reverse($this->db->query("select * from words LIMIT 100")->fetchAll());
      }
   }
   function something() {
       $db = new PDO($dsn, $user, $pass);
       $obj = new TestObject($db);
       $obj->doWork();
   }
   ?>

The next test uses the same test object class, but caches the PDO instance. This is equivalent to Davey's call proxying case:

<?php
   function something() {
       static $db = null;
       if ($db === null) $db = new PDO($dsn, $user, $pass);
       $obj = new TestObject($db);
       $obj->doWork();
   }
   ?>

The third test uses persistent connections; this is equivalent to Davey's extending PDO case, but "smarter"; even though $db falls out of scope and is destroyed at the end of each call to the something() function, the underlying connection is cached so that subsequent calls don't need to re-connect. This is transparent to the calling script, except for the extra parameter to the constructor, and is generally a very good thing to do with database connections:

<?php
   function something() {
       $db = new PDO($dsn, $user, $pass, array(PDO::ATTR_PERSISTENT => true));
       $obj = new TestObject($db);
       $obj->doWork();
   }
   ?>

Here are the results I got; since I'm lazy I'm only running mine for about 30 seconds each. I used a sqlite database with the contents of /usr/share/dict/words inserted into it (234937 words).

   one:   Ran 46734.19 iterations per minute (24146/31)
   two:   Ran 68504.52 iterations per minute (35394/31)
   three: Ran 64689.68 iterations per minute (33423/31)

The results speak for themselves; if you're initiating connections every time to you want to do some work, it's the slowest. If you cache the connection in a PHP variable it's faster than making persistent connections to PDO, because it doesn't need to create a new object each time. Persistent connections are "almost" as fast as caching in PHP variables; they need to create a new object but still reference the same connection internally.

It's work mentioning that benchmarks are tricky things. For instance, if you take out the "LIMIT 100" clause from the SELECT statement, the connection overhead time becomes so small in comparison to the time it takes to fetch the data, that all the tests wind up the same (about 18 iterations per minute with my test data). Similarly, if you limit the fetch to 1 row, you'll see a more exaggerated difference in the numbers, because the benchmark script is exercising your system differently.

If you're running against mysql, the differences between test one and test two will be greater because there is more overhead in establishing a connection over a socket than there is for sqlite to open up a file or two. You'll see a bigger difference again when connection to Oracle, because it does a pretty hefty amount of work at connection time.

The main lesson to be learned here is that benchmarking an artifical code fragment will give you artificial results; they can help you guage how fast something will run in general, but the answers you get back depend on the questions you're asking. If you don't ask appropriate or even enough questions (eg: Davey's quick tests didn't include persistent connections), you're not going to get all the information you need to tune your application effectively.

PS: There's a fourth test case that I didn't cover; it's important and probably yields the best results out of all the cases presented here. Anyone care to suggest what that case might be?

moderated blogs, a.k.a censorship

I'm very dissappointed to see that my (perfectly reasonably) comments have not been "accepted" on a certain blog where a technical discussion has been taking place.

At first I thought it was just because the blog owner was offline for the night, but now I see that someone elses comments, posted after mine, have appeared.

I'd taken a low-key approach (by not responding to each comment with an entry on my blog) on this "blog conversation" because I don't particularly like to "do my laundry in public"

The sad fact is that I won't trust moderated blogs to be fair or honest again in the future.

Release Engineering

One of the things I spend a lot of time doing is release engineering. If you're not familiar with the term, it's the process of taking some software product and packaging it. Piece of cake? Far from it.

The packaging has to be perfect. It needs to handle all kinds of quirky problems on the different flavours of systems that you're planning to install on. You need to expect the unexpected (you have no idea exactly what people have done to their systems), handle all errors gracefully, preserve everything that might be important, and at the end of the day, it just needs to work.

I've spent many hours building and rebuilding packages on different combinations of operating system flavours and versions, watching the build output scrolling by for several minutes just to verify the smallest change in the packaging configuration, rolling back virtual machines, installing, uninstalling, reinstalling ad infinitum. Tedious but essential. And it's not easy.

So, why have I been going on about all this? The recent (or not so recent) changes in the PEAR infrastructure have introduced a new package2.xml file to describe packages. For reasons of backwards compatibility, the older package.xml file still needs to be maintained. This instantly makes the release engineering section of my brain nervous... it's going to be all too easy to forget to update both of these files when it's time to ship the software, especially when shipping time means releasing 8 packages at once (just taking pdo as an example). That means that I'll need to edit 16 package files and 8 C source files when I want to push a release. I have a hard enough time editing the 8 source files and 8 package files I already have right now, so I'm not looking forward to this--it's going to be unmaintainable.

It would seem pretty reasonable to expect to be able to automate this process a little; why can't the package.xml file be generated from the package2.xml file, or vice versa? I've yet to hear a good answer on this, but I've been told that we can't. I don't buy it, which is one of the reasons why I'm blogging this.

One suggestion is to drop support for package.xml and only go with package2.xml. If someone has an older version of PEAR that "works for them(tm)" and they use it to download a pecl package today (and it works), they'd expect for it to work if that package had a minor bug fix next week. This would not be the case if switch-to-package2.xml day happened in the middle of that week.

For it to suddenly stop working in this situation is wrong because their environment didn't change. It's our breakage, and even if it is intentional, we didn't give them any notice that we're doing this.

There is no good reason why we can't provide an auto-generated package.xml file, especially if the package2.xml describes the same package as the package.xml file.

Why can't we make things easier instead of harder?

(and that means that the automatic conversion needs to be part of the "pear package" tool, or that the tool provides clear, simple instructions on how to ensure that any pre-requisites are met).

Running PHP as a Service on Win32

[Update: I wrote some docs for the php manual]

So, you've written some kind of super-duper daemon process in PHP, perhaps using the event extension and stream_socket_server(). On Unix, it's quite a simple matter to have it run from init (or maybe inetd) when your machine starts... but doing the same on windows isn't possible without some hacks. Until now.

Last night I put together 2 new extensions for windows; the first of these is called win32service and it allows you run your PHP scripts from the "Service Control Manager" (SCM). The SCM is roughly analagous to the init process on unix, in that it runs tasks on startup and monitors their status, optionally restarting them if something goes awry.

I've included a sample service script that demonstrates minimal usage. Before you can run a script as a service, you need to register it with the SCM; in the sample, you do this by running it with the "install" argument on the command line. Once installed, you can use either the services MMC snap-in (run services.msc, or look for it under "Administrative Tools") or the good old fashined "net" command to launch or stop the service. I prefer the latter:

   net start dummyphp

The output from the command should indicate the service started correctly; use the task manager to verify this--you should see a php.exe process running as SYSTEM. This dummy service does nothing much; just sleeps and waits for the SCM to tell it to stop; lets do that now:

   net stop dummyphp

Again, the output from that command should indicate that the service stopped, and your task manager should no longer show php.exe running as SYSTEM. Now that we've proved that it works, we should remove the same from the SCM; running the script with the "uninstall" argument will do this.

It's all pretty straight-forward; the most complicated part is the win32_create_service() function; the first argument is an array that describes the service; the following keys are supported:

  • service - the short name of the service. You can't have two services with the same name.
  • display - the display name of the service.
  • user - the account name under which to run the service. If omitted, runs as the local system account
  • password - the password to match the "user" setting.
  • path - full path to the binary to run. If omitted, the full path to the currently running process will be used (typically php.exe)
  • params - command line parameters to pass to the binary. You'll probably want to specify the full path to a PHP script, plus some parameter to indicate that the script should run as a service.

(there are some more keys but they're not fully supported yet)

When it comes to actually running your service, you should call win32_start_service_ctrl_dispatcher() and pass in the name of the service. This function checks-in with the SCM; it is especially important that you do this as soon as possible in your script, as the SCM blocks while it waits for you to check-in--you can cause system-wide issues if you take too long.

While your service is running, you should periodically check to see if the SCM has requested that you stop. One way to do this is to wrap the main body of your service code in a while loop like this:

<?php
   while (WIN32_SERVICE_CONTROL_STOP != win32_get_last_control_message()) {
      // do stuff here, but try not to take more than a few seconds
   }
?>

If you already have an event loop, you can fold the above into your application. If you're using the event extension, you can schedule a recurrent timed event to check for the stop condition.

And that's pretty much all there is to say for now. I strongly recommend that you look through the MSDN documentation on services; it's very valuable background information.

The binaries for PHP 5 should show up under http://snaps.php.net/win32/PECL_5_0/ in the next couple of hours.

Enjoy :)

PDO Slides from php|works 2005

You can find the slides from PDO talk here.

While I'm here talking about PDO, I wanted to give you a heads up. PDO 1.0RC2 will move from PDO_XXX style constants to using class based constants like PDO::XXX. The reason for this change is that it will fit in with namespace support in PHP when it is implemented.

Yes, this should have been changed before 1.0RC1, but we only just talked about it over the last couple of days. Sorry.

This change has not yet been made, I'm just letting you know in advance; we'll be pushing out a release with this change over the next week, after making sure we're not going to make another change like this.

PDO 1.0RC1

I spent pretty much all day working through the outstanding PDO bugs listed on bugs.php.net and pecl.php.net. There are now no known problems with any of the main-stream drivers, so I've bumped the version numbers up to 1.0RC1 and released them via PECL (ODBC and OCI package releases are pending build/install tests).

Please make the effort to download and use them so that we can catch any remaining bugs; we're getting very close to PHP 5.1 stable, and I really want you to catch any problems with the way that you use it before the release date, rather than the day after.

I've also been fleshing out the PDO docs in the PHP online manual; those changes will be picked up when it is next built and pushed to the mirrors.

Enjoy!

Updated oci8 extension now in PECL

You may have heard of the Zend Core for Oracle; as part of that project the oci8 extension for PHP received a lot of attention from teams of people from Zend, Oracle and OmniTI. As a result, the oci8 extension is now more robust, performant, better documented and more thoroughly tested than ever before (the list of closed bugs is enourmous).

Although the extension is currently marked as beta, it's a significant improvement over the older versions that have shipped with PHP.

The updated code is available now via PECL, and will compile against PHP 4 and PHP 5 (Windows users can obtain the updated extension DLL via snaps).

Installation on unix should be as simple as:

   # pear install oci8-beta

However, if you compiled the oci8 extension statically, you will need to recompile PHP using --with-oci8=shared (or --without-oci8) before this will work.

Updated documentation will be appearing at your local PHP documentation mirror as soon as the manual build finishes.

Many thanks to Antony Dovgal for doing the lions share of the implementation and testing; good work Tony!

php dtrace 1.0.3

I added two new probe parameters to the dtrace module; they don't do anything useful with PHP 4, but if you're running PHP 5, they add the classname (arg3) and a scoping operator (arg4) (either "::", "->" or "") that are filled in when making a call to an object or a static call to a class.

This means that you can use the concatentation of arg3, arg4 and arg0 (in that order) for the full name of the method or function being called.

I can probably fill in those things for PHP 4, but I didn't have the time to do that yet; I pushed this release because I'm likely to get too busy to do it otherwise :)

I also refined the build process very slightly; if you're running on 64-bit, you should have better luck.

One of these days (maybe at php|works) I'll write up some more docs on using dtrace. If you're on solaris and want to try it, let me know and I'll give you some pointers. If you're not on solaris, you can't use dtrace :)