Skip to content

2005

Zend Framework

There's a lot of speculation flying around about the Zend Framework; some positive, some not so positive. I'm getting a little bit sick of hearing people spread misinformation about it; I know that official news on the framework is pretty sparse, and that the speculation is mostly as a result of that lack of information, so I'm going to say a few words to try to improve the situation.

So, why yet-another-framework? Clearly, it's useful to have a standard library of components that you can drop into your applications. Why is Zend getting behind this? One of the major reasons is that, as PHP attracts larger and larger businesses, there is a greater need for such a component library. One big problem for big business is that it's pretty much impossible for them to determine just how "legal" the existing libraries are.

What do I mean by this? When you have an anyone-can-commit-or-submit policy, you have no idea where the newly contributed code has come from. How do you know for sure that it wasn't stolen from someone's place of work, or taken from an application without respecting its license? (eg: "borrowing" some code from a GPL app and shoving it into a BSDish framework, without changing the BSDish code to GPL).

It makes a lot of sense to control the repository and build-in accountability as part of the submission process. By having contributors sign an agreement that forces them to take responsibility for the code they commit, the framework and its users are now insulated from any potential legal recourse that might arise. And because the people committing the code are aware of that liability, they'll take greater pains to ensure that their code is legally allowed to be contributed. The end result is "Clean IP", and is immediately much more appealing to anyone that takes their business seriously.

Of course, you need some kind of accountable body to take care of the paperwork for the submission process, both in terms of processing new contributors and to be there in case of some kind of audit. If you're a business that takes legal matters seriously, are you going to trust a bunch of guys that probably haven't even met each other in real-life to maintain clean IP, or a company backed by other PHP related businesses?

Aside from clean IP, there are also questions of code reliability an stability; is the code any good, is the API going to be subject to wild changes between releases, what kind of testing and QA procedures are in place? Can you trust that they'll be adhered to?

So, there are a lot of business reasons behind the decision to create the Zend Framework, what about technical merit? Contrary to some of the hot air that's been blowing around the blogs, there is code already, and it's actually pretty good. Here's a directory listing from my CVS checkout, to give you a taste of what's already implemented:

   % ls                                                                          
   CVS            ZDBAdapter      ZLog             ZTemplate
   ZActiveRecord  ZException.php  ZPageController  ZUri
   ZController    ZInputFilter    ZSearch

There are plans for AJAX, SMTP, and web services components, among others. As you can probably deduce from the names, the components already implemented include the ActiveRecord pattern, some glue for MVC, flexible logging and templating and a security related input filtering class are also present. ZSearch provides document indexing capabilities, to make it easy to implement custom search engines for arbitrary documents stored in arbitrary storage containers.

One of the goals for the project is to keep every clear and simple to use, without forcing you to adopt the entire framework throughout your application; it doesn't impose itself on your app, and doesn't require any configuration files to deploy and use.

I'm not going to reveal any more about the code than this right now; one of the reasons that the code isn't open at the moment is to keep the initial work manageable and focused--too many cooks spoil the broth, as we they say.

So there were have a bit of a sneak peek and some background on the Zend Framework. It's undergoing active development, with multiple code and documentation commits going in daily. I can't give you any more detail on the schedule, you'll just have to stay tuned.

Upcoming PHP-on-Windows webcasts next week

One of the people that I met at ZendCon was Joe Stagner, who's been using PHP since before he started work at Microsoft. Joe gave a talk entitled "PHP Rocking in the Windows World" which went down quite well. I'm sad to say that I missed it--I got caught up talking to a bunch of people and lost track of time.

Joe is running a series of PHP-on-Windows webcasts next week on MSDN:

MSDN Webcast: Comparing PHP with ASP.NET

MSDN Webcast: Building and Running PHP Applications on Windows

MSDN Webcast: Extending PHP Applications Using Microsoft Technologies

Back from ZendCon

So, I'm back from ZendCon. I was unsure how this business focused conference would turn out, and I'm glad to say that it seemed to go down quite well. There were a lot of "biz" sessions and that made it feel like there wasn't much technical content at the conference proper--but that was compensated for by the technical tutorials that took place on tuesday.

One of the things I really like about conferences is the opportunity to meet people face to face. Aside from the usual PHP conference circuit crowd (you know who you are!) it was good to see some faces that I haven't seen since my last visit to SF (php{con west 2003). It was also good to put faces to names and voices that I'd only previously dealt with over email, telephone or even just through reading RSS feeds.

I did a lot of "networking" at this conference compared to others that I've attended, having had some good, positive discussions with folks from Oracle, IBM, Microsoft and others about what they're doing, what we're doing and some general plans for the future--nothing top secret or earth shattering (that I know of!) just positive :)

I think one of the biggest things to take home from the conference (besides the swag) is the message that big business recognizes PHP as platform for big business. The presence of these big names at the conference not only helps to improve the perception of PHP as an "Enterprise" platform, but also helps to validate the efforts of everyone that has contributed to PHP (usually on a volunteer basis) over the last 10 years--well done to us all!

PDO Slides from ZendCon

(hmm, I could have sworn I posted these the other day; maybe the wifi cut out just as I hit "save")

You can find the slides from the PDO talk here.

While I'm here talking about PDO, I wanted to give you a heads up. PDO 1.0RC2 will move from PDO_XXX style constants to using class based constants like PDO::XXX. The reason for this change is that it will fit in with namespace support in PHP when it is implemented.

Yes, this should have been changed before 1.0RC1: Sorry.

This change has not yet been made to the packages available via PECL, but is present in the PHP 5.1 branch--including PHP 5.1RC3.

[yes, this text is pretty much a rip off an earlier blog entry I made]

in SF for the Zend/PHP Conference 2005

It's been a long, busy weekend for me; we had our whole house carpeted on saturday, which meant moving all our furniture around. I also built a new bed for Xander; I'm sore from the physical effort and mentally tired from not having any time to unwind. I was up at 4am this morning in preparation for my flight to SF at 7am, and had the "pleasure" of the never-ending morning that goes hand in hand with flying west. All that combined with the customary airplane headache has me feeling pretty beat up.

I'm looking forward to the conference; meeting the usual crowd is always good... and this time around we're "business focused", which should put a bit of a different spin on things... hopefully for the better :-)

PDO Driver Howto online

Thanks to Bill Abt and Rick McGuire of IBM, preliminary documentation for writing a PDO driver is now part of the PHP manual.

It was originally written against an earlier of incarnation of PDO, so some parts may not be 100% correct. If you're thinking of adding support for other databases to PHP, then you should read the howto and give it a whirl. If you find an inaccuracy, drop a line to pecl-dev@lists.php.net and we'll set you straight and fix the docs.

Recommendations for XSL-FO -> PDF?

[Update: thanks for the suggestions; we went with RenderX and have it running from cron to rebuild our product manual as it changes]

I've been playing around with DocBook this weekend, converting our product manual to HTML and PDF. I'm using the docbook xsl stylesheets to convert to HTML and XSL-FO, and then using an FO processor to convert from XSL-FO to PDF.

Apache FOP is a free FO processor, but the version that gentoo emerged for me borks on our manual; it either stops generating pages after page 36, or spins in an infinite loop.

I also tried XMLroff, which is the only C based FO processor I've found (based on libxml2). It segfaults straight away for me, so it's not immediately useful; maybe a future release will work.

I've downloaded trial versions of two commercial offerings; RenderX XEP and Lunasil XINC.

RenderX seems to work ok, but blanks out every odd page after page 11, so it's a little bit hard to figure out if we want to pay $300 for the full version. It does look promising though, and the price doesn't sound that bad.

Lunasil XINC appears to be based on an older version of Apache FOP, and doesn't have support for PDF bookmarks. It works though, which is more than can be said for the real Apache FOP that I tried. Lunasil XINC is only $95 for the full version.

Does anyone else have any experience in this area and care to share it? Has anyone dared to implement XSL-FO -> PDF using PHP ?

Benchmarking (in general)

I just wanted to follow up on Davey's post about extending PDO, because I think he isn't being very clear about what he's doing (no offence, Davey!)

Davey's Cortex framework allows you to pass in either a DSN or an already-instantiated-PDO object to its data object class, and Davey's post claims, quite rightly, that it is faster to not create brand new connections each time you create an instance of his framework objects.

Let's see if we can come up with a slightly more scientific and perhaps more fair set of tests.

When benchmarking it's important to get some decent numbers. If your test is over in less than 1 second, your readings are probably wildly inaccurate because your system may not have had a chance to properly utilize hardware or OS level caches or otherwise adjust in the same way that it would be doing under a consistent level of load.

If you're running a quick test, try to make it last more than 10 seconds. If you want to get a better numbers, run the test for longer; 5 or 10 minutes should give pretty decent results for a given code fragment, and an hour is probably as good as you could ever hope for.

Here's a simple test harness that runs for approximately 5 minutes:

<?php
    $start = time();
    $deadline = $start + (5 * 60); // run for 5 minutes
    $iters = 0;
    do {    
      something();
      ++$iters;
    } while (time() <= $deadline);
    $end = time();
    $diff = $end - $start;
    printf("Ran %.2f iterations per minute (%d/%d)\\n",
        (60.0 * $iters) / $diff, $iters, $diff);
    ?>

This harness simply repeats a task until the time limit is more or less up, and then summarizes how many times it managed to run within the specified time, normalizing it to iterations per minute.

Notice that I'm not particular bothered about sub-second time intervals here, because they don't really have much impact when compared to a 5 minute time duration--5 minutes plus or minus half a second is still near as damn it 5 minutes.

Our first test creates some kind of object that does some kind of work on a database connection. We'll make this one re-connect to the database each time; this is equivalent to Davey's extending PDO case:

<?php
   // represents some object in your framework
   class TestObject {
      var $db;
      function __construct($db) {
         $this->db = $db;
      }
      function doWork() {
         # Limited to 100 rows, because the connection cost
         # will be lost in the noise of the fetch otherwise
         array_reverse($this->db->query("select * from words LIMIT 100")->fetchAll());
      }
   }
   function something() {
       $db = new PDO($dsn, $user, $pass);
       $obj = new TestObject($db);
       $obj->doWork();
   }
   ?>

The next test uses the same test object class, but caches the PDO instance. This is equivalent to Davey's call proxying case:

<?php
   function something() {
       static $db = null;
       if ($db === null) $db = new PDO($dsn, $user, $pass);
       $obj = new TestObject($db);
       $obj->doWork();
   }
   ?>

The third test uses persistent connections; this is equivalent to Davey's extending PDO case, but "smarter"; even though $db falls out of scope and is destroyed at the end of each call to the something() function, the underlying connection is cached so that subsequent calls don't need to re-connect. This is transparent to the calling script, except for the extra parameter to the constructor, and is generally a very good thing to do with database connections:

<?php
   function something() {
       $db = new PDO($dsn, $user, $pass, array(PDO::ATTR_PERSISTENT => true));
       $obj = new TestObject($db);
       $obj->doWork();
   }
   ?>

Here are the results I got; since I'm lazy I'm only running mine for about 30 seconds each. I used a sqlite database with the contents of /usr/share/dict/words inserted into it (234937 words).

   one:   Ran 46734.19 iterations per minute (24146/31)
   two:   Ran 68504.52 iterations per minute (35394/31)
   three: Ran 64689.68 iterations per minute (33423/31)

The results speak for themselves; if you're initiating connections every time to you want to do some work, it's the slowest. If you cache the connection in a PHP variable it's faster than making persistent connections to PDO, because it doesn't need to create a new object each time. Persistent connections are "almost" as fast as caching in PHP variables; they need to create a new object but still reference the same connection internally.

It's work mentioning that benchmarks are tricky things. For instance, if you take out the "LIMIT 100" clause from the SELECT statement, the connection overhead time becomes so small in comparison to the time it takes to fetch the data, that all the tests wind up the same (about 18 iterations per minute with my test data). Similarly, if you limit the fetch to 1 row, you'll see a more exaggerated difference in the numbers, because the benchmark script is exercising your system differently.

If you're running against mysql, the differences between test one and test two will be greater because there is more overhead in establishing a connection over a socket than there is for sqlite to open up a file or two. You'll see a bigger difference again when connection to Oracle, because it does a pretty hefty amount of work at connection time.

The main lesson to be learned here is that benchmarking an artifical code fragment will give you artificial results; they can help you guage how fast something will run in general, but the answers you get back depend on the questions you're asking. If you don't ask appropriate or even enough questions (eg: Davey's quick tests didn't include persistent connections), you're not going to get all the information you need to tune your application effectively.

PS: There's a fourth test case that I didn't cover; it's important and probably yields the best results out of all the cases presented here. Anyone care to suggest what that case might be?

moderated blogs, a.k.a censorship

I'm very dissappointed to see that my (perfectly reasonably) comments have not been "accepted" on a certain blog where a technical discussion has been taking place.

At first I thought it was just because the blog owner was offline for the night, but now I see that someone elses comments, posted after mine, have appeared.

I'd taken a low-key approach (by not responding to each comment with an entry on my blog) on this "blog conversation" because I don't particularly like to "do my laundry in public"

The sad fact is that I won't trust moderated blogs to be fair or honest again in the future.

Release Engineering

One of the things I spend a lot of time doing is release engineering. If you're not familiar with the term, it's the process of taking some software product and packaging it. Piece of cake? Far from it.

The packaging has to be perfect. It needs to handle all kinds of quirky problems on the different flavours of systems that you're planning to install on. You need to expect the unexpected (you have no idea exactly what people have done to their systems), handle all errors gracefully, preserve everything that might be important, and at the end of the day, it just needs to work.

I've spent many hours building and rebuilding packages on different combinations of operating system flavours and versions, watching the build output scrolling by for several minutes just to verify the smallest change in the packaging configuration, rolling back virtual machines, installing, uninstalling, reinstalling ad infinitum. Tedious but essential. And it's not easy.

So, why have I been going on about all this? The recent (or not so recent) changes in the PEAR infrastructure have introduced a new package2.xml file to describe packages. For reasons of backwards compatibility, the older package.xml file still needs to be maintained. This instantly makes the release engineering section of my brain nervous... it's going to be all too easy to forget to update both of these files when it's time to ship the software, especially when shipping time means releasing 8 packages at once (just taking pdo as an example). That means that I'll need to edit 16 package files and 8 C source files when I want to push a release. I have a hard enough time editing the 8 source files and 8 package files I already have right now, so I'm not looking forward to this--it's going to be unmaintainable.

It would seem pretty reasonable to expect to be able to automate this process a little; why can't the package.xml file be generated from the package2.xml file, or vice versa? I've yet to hear a good answer on this, but I've been told that we can't. I don't buy it, which is one of the reasons why I'm blogging this.

One suggestion is to drop support for package.xml and only go with package2.xml. If someone has an older version of PEAR that "works for them(tm)" and they use it to download a pecl package today (and it works), they'd expect for it to work if that package had a minor bug fix next week. This would not be the case if switch-to-package2.xml day happened in the middle of that week.

For it to suddenly stop working in this situation is wrong because their environment didn't change. It's our breakage, and even if it is intentional, we didn't give them any notice that we're doing this.

There is no good reason why we can't provide an auto-generated package.xml file, especially if the package2.xml describes the same package as the package.xml file.

Why can't we make things easier instead of harder?

(and that means that the automatic conversion needs to be part of the "pear package" tool, or that the tool provides clear, simple instructions on how to ensure that any pre-requisites are met).