Skip to content

Home

in SF for the Zend/PHP Conference 2005

It's been a long, busy weekend for me; we had our whole house carpeted on saturday, which meant moving all our furniture around. I also built a new bed for Xander; I'm sore from the physical effort and mentally tired from not having any time to unwind. I was up at 4am this morning in preparation for my flight to SF at 7am, and had the "pleasure" of the never-ending morning that goes hand in hand with flying west. All that combined with the customary airplane headache has me feeling pretty beat up.

I'm looking forward to the conference; meeting the usual crowd is always good... and this time around we're "business focused", which should put a bit of a different spin on things... hopefully for the better :-)

PDO Driver Howto online

Thanks to Bill Abt and Rick McGuire of IBM, preliminary documentation for writing a PDO driver is now part of the PHP manual.

It was originally written against an earlier of incarnation of PDO, so some parts may not be 100% correct. If you're thinking of adding support for other databases to PHP, then you should read the howto and give it a whirl. If you find an inaccuracy, drop a line to pecl-dev@lists.php.net and we'll set you straight and fix the docs.

Recommendations for XSL-FO -> PDF?

[Update: thanks for the suggestions; we went with RenderX and have it running from cron to rebuild our product manual as it changes]

I've been playing around with DocBook this weekend, converting our product manual to HTML and PDF. I'm using the docbook xsl stylesheets to convert to HTML and XSL-FO, and then using an FO processor to convert from XSL-FO to PDF.

Apache FOP is a free FO processor, but the version that gentoo emerged for me borks on our manual; it either stops generating pages after page 36, or spins in an infinite loop.

I also tried XMLroff, which is the only C based FO processor I've found (based on libxml2). It segfaults straight away for me, so it's not immediately useful; maybe a future release will work.

I've downloaded trial versions of two commercial offerings; RenderX XEP and Lunasil XINC.

RenderX seems to work ok, but blanks out every odd page after page 11, so it's a little bit hard to figure out if we want to pay $300 for the full version. It does look promising though, and the price doesn't sound that bad.

Lunasil XINC appears to be based on an older version of Apache FOP, and doesn't have support for PDF bookmarks. It works though, which is more than can be said for the real Apache FOP that I tried. Lunasil XINC is only $95 for the full version.

Does anyone else have any experience in this area and care to share it? Has anyone dared to implement XSL-FO -> PDF using PHP ?

Benchmarking (in general)

I just wanted to follow up on Davey's post about extending PDO, because I think he isn't being very clear about what he's doing (no offence, Davey!)

Davey's Cortex framework allows you to pass in either a DSN or an already-instantiated-PDO object to its data object class, and Davey's post claims, quite rightly, that it is faster to not create brand new connections each time you create an instance of his framework objects.

Let's see if we can come up with a slightly more scientific and perhaps more fair set of tests.

When benchmarking it's important to get some decent numbers. If your test is over in less than 1 second, your readings are probably wildly inaccurate because your system may not have had a chance to properly utilize hardware or OS level caches or otherwise adjust in the same way that it would be doing under a consistent level of load.

If you're running a quick test, try to make it last more than 10 seconds. If you want to get a better numbers, run the test for longer; 5 or 10 minutes should give pretty decent results for a given code fragment, and an hour is probably as good as you could ever hope for.

Here's a simple test harness that runs for approximately 5 minutes:

<?php
    $start = time();
    $deadline = $start + (5 * 60); // run for 5 minutes
    $iters = 0;
    do {    
      something();
      ++$iters;
    } while (time() <= $deadline);
    $end = time();
    $diff = $end - $start;
    printf("Ran %.2f iterations per minute (%d/%d)\\n",
        (60.0 * $iters) / $diff, $iters, $diff);
    ?>

This harness simply repeats a task until the time limit is more or less up, and then summarizes how many times it managed to run within the specified time, normalizing it to iterations per minute.

Notice that I'm not particular bothered about sub-second time intervals here, because they don't really have much impact when compared to a 5 minute time duration--5 minutes plus or minus half a second is still near as damn it 5 minutes.

Our first test creates some kind of object that does some kind of work on a database connection. We'll make this one re-connect to the database each time; this is equivalent to Davey's extending PDO case:

<?php
   // represents some object in your framework
   class TestObject {
      var $db;
      function __construct($db) {
         $this->db = $db;
      }
      function doWork() {
         # Limited to 100 rows, because the connection cost
         # will be lost in the noise of the fetch otherwise
         array_reverse($this->db->query("select * from words LIMIT 100")->fetchAll());
      }
   }
   function something() {
       $db = new PDO($dsn, $user, $pass);
       $obj = new TestObject($db);
       $obj->doWork();
   }
   ?>

The next test uses the same test object class, but caches the PDO instance. This is equivalent to Davey's call proxying case:

<?php
   function something() {
       static $db = null;
       if ($db === null) $db = new PDO($dsn, $user, $pass);
       $obj = new TestObject($db);
       $obj->doWork();
   }
   ?>

The third test uses persistent connections; this is equivalent to Davey's extending PDO case, but "smarter"; even though $db falls out of scope and is destroyed at the end of each call to the something() function, the underlying connection is cached so that subsequent calls don't need to re-connect. This is transparent to the calling script, except for the extra parameter to the constructor, and is generally a very good thing to do with database connections:

<?php
   function something() {
       $db = new PDO($dsn, $user, $pass, array(PDO::ATTR_PERSISTENT => true));
       $obj = new TestObject($db);
       $obj->doWork();
   }
   ?>

Here are the results I got; since I'm lazy I'm only running mine for about 30 seconds each. I used a sqlite database with the contents of /usr/share/dict/words inserted into it (234937 words).

   one:   Ran 46734.19 iterations per minute (24146/31)
   two:   Ran 68504.52 iterations per minute (35394/31)
   three: Ran 64689.68 iterations per minute (33423/31)

The results speak for themselves; if you're initiating connections every time to you want to do some work, it's the slowest. If you cache the connection in a PHP variable it's faster than making persistent connections to PDO, because it doesn't need to create a new object each time. Persistent connections are "almost" as fast as caching in PHP variables; they need to create a new object but still reference the same connection internally.

It's work mentioning that benchmarks are tricky things. For instance, if you take out the "LIMIT 100" clause from the SELECT statement, the connection overhead time becomes so small in comparison to the time it takes to fetch the data, that all the tests wind up the same (about 18 iterations per minute with my test data). Similarly, if you limit the fetch to 1 row, you'll see a more exaggerated difference in the numbers, because the benchmark script is exercising your system differently.

If you're running against mysql, the differences between test one and test two will be greater because there is more overhead in establishing a connection over a socket than there is for sqlite to open up a file or two. You'll see a bigger difference again when connection to Oracle, because it does a pretty hefty amount of work at connection time.

The main lesson to be learned here is that benchmarking an artifical code fragment will give you artificial results; they can help you guage how fast something will run in general, but the answers you get back depend on the questions you're asking. If you don't ask appropriate or even enough questions (eg: Davey's quick tests didn't include persistent connections), you're not going to get all the information you need to tune your application effectively.

PS: There's a fourth test case that I didn't cover; it's important and probably yields the best results out of all the cases presented here. Anyone care to suggest what that case might be?

moderated blogs, a.k.a censorship

I'm very dissappointed to see that my (perfectly reasonably) comments have not been "accepted" on a certain blog where a technical discussion has been taking place.

At first I thought it was just because the blog owner was offline for the night, but now I see that someone elses comments, posted after mine, have appeared.

I'd taken a low-key approach (by not responding to each comment with an entry on my blog) on this "blog conversation" because I don't particularly like to "do my laundry in public"

The sad fact is that I won't trust moderated blogs to be fair or honest again in the future.

Release Engineering

One of the things I spend a lot of time doing is release engineering. If you're not familiar with the term, it's the process of taking some software product and packaging it. Piece of cake? Far from it.

The packaging has to be perfect. It needs to handle all kinds of quirky problems on the different flavours of systems that you're planning to install on. You need to expect the unexpected (you have no idea exactly what people have done to their systems), handle all errors gracefully, preserve everything that might be important, and at the end of the day, it just needs to work.

I've spent many hours building and rebuilding packages on different combinations of operating system flavours and versions, watching the build output scrolling by for several minutes just to verify the smallest change in the packaging configuration, rolling back virtual machines, installing, uninstalling, reinstalling ad infinitum. Tedious but essential. And it's not easy.

So, why have I been going on about all this? The recent (or not so recent) changes in the PEAR infrastructure have introduced a new package2.xml file to describe packages. For reasons of backwards compatibility, the older package.xml file still needs to be maintained. This instantly makes the release engineering section of my brain nervous... it's going to be all too easy to forget to update both of these files when it's time to ship the software, especially when shipping time means releasing 8 packages at once (just taking pdo as an example). That means that I'll need to edit 16 package files and 8 C source files when I want to push a release. I have a hard enough time editing the 8 source files and 8 package files I already have right now, so I'm not looking forward to this--it's going to be unmaintainable.

It would seem pretty reasonable to expect to be able to automate this process a little; why can't the package.xml file be generated from the package2.xml file, or vice versa? I've yet to hear a good answer on this, but I've been told that we can't. I don't buy it, which is one of the reasons why I'm blogging this.

One suggestion is to drop support for package.xml and only go with package2.xml. If someone has an older version of PEAR that "works for them(tm)" and they use it to download a pecl package today (and it works), they'd expect for it to work if that package had a minor bug fix next week. This would not be the case if switch-to-package2.xml day happened in the middle of that week.

For it to suddenly stop working in this situation is wrong because their environment didn't change. It's our breakage, and even if it is intentional, we didn't give them any notice that we're doing this.

There is no good reason why we can't provide an auto-generated package.xml file, especially if the package2.xml describes the same package as the package.xml file.

Why can't we make things easier instead of harder?

(and that means that the automatic conversion needs to be part of the "pear package" tool, or that the tool provides clear, simple instructions on how to ensure that any pre-requisites are met).

Running PHP as a Service on Win32

[Update: I wrote some docs for the php manual]

So, you've written some kind of super-duper daemon process in PHP, perhaps using the event extension and stream_socket_server(). On Unix, it's quite a simple matter to have it run from init (or maybe inetd) when your machine starts... but doing the same on windows isn't possible without some hacks. Until now.

Last night I put together 2 new extensions for windows; the first of these is called win32service and it allows you run your PHP scripts from the "Service Control Manager" (SCM). The SCM is roughly analagous to the init process on unix, in that it runs tasks on startup and monitors their status, optionally restarting them if something goes awry.

I've included a sample service script that demonstrates minimal usage. Before you can run a script as a service, you need to register it with the SCM; in the sample, you do this by running it with the "install" argument on the command line. Once installed, you can use either the services MMC snap-in (run services.msc, or look for it under "Administrative Tools") or the good old fashined "net" command to launch or stop the service. I prefer the latter:

   net start dummyphp

The output from the command should indicate the service started correctly; use the task manager to verify this--you should see a php.exe process running as SYSTEM. This dummy service does nothing much; just sleeps and waits for the SCM to tell it to stop; lets do that now:

   net stop dummyphp

Again, the output from that command should indicate that the service stopped, and your task manager should no longer show php.exe running as SYSTEM. Now that we've proved that it works, we should remove the same from the SCM; running the script with the "uninstall" argument will do this.

It's all pretty straight-forward; the most complicated part is the win32_create_service() function; the first argument is an array that describes the service; the following keys are supported:

  • service - the short name of the service. You can't have two services with the same name.
  • display - the display name of the service.
  • user - the account name under which to run the service. If omitted, runs as the local system account
  • password - the password to match the "user" setting.
  • path - full path to the binary to run. If omitted, the full path to the currently running process will be used (typically php.exe)
  • params - command line parameters to pass to the binary. You'll probably want to specify the full path to a PHP script, plus some parameter to indicate that the script should run as a service.

(there are some more keys but they're not fully supported yet)

When it comes to actually running your service, you should call win32_start_service_ctrl_dispatcher() and pass in the name of the service. This function checks-in with the SCM; it is especially important that you do this as soon as possible in your script, as the SCM blocks while it waits for you to check-in--you can cause system-wide issues if you take too long.

While your service is running, you should periodically check to see if the SCM has requested that you stop. One way to do this is to wrap the main body of your service code in a while loop like this:

<?php
   while (WIN32_SERVICE_CONTROL_STOP != win32_get_last_control_message()) {
      // do stuff here, but try not to take more than a few seconds
   }
?>

If you already have an event loop, you can fold the above into your application. If you're using the event extension, you can schedule a recurrent timed event to check for the stop condition.

And that's pretty much all there is to say for now. I strongly recommend that you look through the MSDN documentation on services; it's very valuable background information.

The binaries for PHP 5 should show up under http://snaps.php.net/win32/PECL_5_0/ in the next couple of hours.

Enjoy :)

PDO Slides from php|works 2005

You can find the slides from PDO talk here.

While I'm here talking about PDO, I wanted to give you a heads up. PDO 1.0RC2 will move from PDO_XXX style constants to using class based constants like PDO::XXX. The reason for this change is that it will fit in with namespace support in PHP when it is implemented.

Yes, this should have been changed before 1.0RC1, but we only just talked about it over the last couple of days. Sorry.

This change has not yet been made, I'm just letting you know in advance; we'll be pushing out a release with this change over the next week, after making sure we're not going to make another change like this.