Skip to content

php

2006 from Wez's perspective

Here's 2006 from my point of view. I did tinker a bit with the appearance of my blog, but that's not really a high point of the year for me. My year went a bit like this:

I started the year with a couple of EvilDesk releases, which in turn generated some snarky feedback from a couple of people in the PHP community, cooling my drive for working on PHP. Work and family pressures didn't help to restore my earlier level of PHP activity, and to be honest, some words and actions in the PHP community over the year didn't really help either.

MessageLabs chose us to provide their MTA infrastructure, which saw me back in the UK a couple of times at the start of the year while we worked together to plan part of the architecture.

I ported Solaris' memory manager (umem) to the "other" platforms (linux, windows and bsdish systems).

My older brother suffered a set-back this year, and I wish I could have helped him out a lot more than I did.

I started to test-drive google calendar, but that petered out because I can't put company confidential information in there. It's a shame, because it works well.

I finally got some of my writings published in a book, although not an entire book of my own.

I spoke at MySQLUC, OSCON, php|works and zendcon and attended the first MS web dev summit. Memorable moments around these events include being stuck in the seedier part of Phoenix for a night on the way to MySQLUC (not a fond memory to be sure!), awful karaoke at the Sun party @OSCON, excellent home-made sushi and xbox 360 on a 120" screen at a friends home in Seattle on the way back from OSCON, a really good British style pub in Toronto during php|works, reasonable karaoke at zendcon, Andrei's birthday party after zendcon and meeting Don Box at the MS web dev summit.

Our 4 year old son Xander started at pre-school and is doing well.

The death of my faithful toshiba m30 saw me adopt an intel based macbook, partly for the native unix environment and partly to force me to learn about the oddities of its runtime linker. I still have complaints about the way certain things work, but on the whole I am a happy user, made happier by Parallels because I still need Windows based software.

OmniTI has grown a decent amount this year, and we moved premises and now have two business units--OmniTI the Computer Consulting company and Message Systems the messaging company; both arms of the company have done well this year and are set to continue doing so in the future. Work continues to be fun, interesting and challenging, with great people on staff (that's all of them, not just the "internet celebrities").

I've been the architect of several large pieces of infrastructure for Ecelerity this year, one of which is in the realm of meta-meta programming (gives your brain a workout, guaranteed!). I'm looking forward to seeing the fruits of this labor in 2007, and to getting around to work on some more of the juicy ideas we continue to have for expanding and improving things.

What about PHP? I've been working on a unicode enabled version of PDO for the preview release of PHP 6. This should be completed soon, and I look forward to continued improvements in the PDO drivers and, in particular, the OCI driver which is long overdue some TLC. I've also been toying with something OSX specific for PHP that just isn't close to ready yet; maybe that will be something I can share in the first quarter of 2007.

Here's to a prosperous 2007!

Coding for source control

Hot on the heels of my Coding for Coders entry (focused on C), here's another on coding for source control.

When you have a large code base in a source control system (like subversion), you'll find that things go easier if you adopt a few coding practices that work in-hand with the way that the version control works.

Embrace branches and tags

You really should investigate how to use the branching and tagging feature in your source control system. A typical practice is to do development in trunk and have a branch for each major version of the code (eg: a 1.0 branch, 2.0 branch and so on), tagging that branch each time you reach a significant point in development and each time you ship code. Depending on your project, you might branch for minor versions too (eg: 1.2, 1.3).

Think in terms of changesets

If you're working on a bug fix or implementing a feature, it's good practice to distill the net development effort down to a single patch for the tree. The set of changes in that patch is the changeset that implements the bug fix or feature.

Once you have the changeset, you can look at applying it to one of your branches so that you can ship the fixed/enhanced product.

Trivial fixes can usually be implemented with a single commit to the repository, but more complex changesets might span a number of commits. It's important to track the commits so that your changeset is easier to produce.

We use trac for our development ticket tracking. It's easy to configure trac/subversion to add a commit hook that allows developers to reference a ticket in their commit messages and then have all the commits related to that ticket show up as comments when viewing the ticket. You can then merge each commit into your working copy and then check in the resulting changeset.

If one of more of your developers are making extensive changes, it's a good idea for them to do their work in their own branches. That way they won't step on each others toes during development. You might also want to look at creating a branch per big ticket--this will allow you to exploit the diffing/merging features of your source control system to keep track of the overall changeset.

Code with merging in mind

When you're making code changes, try to think ahead to how the patch will look, and how easy it will be for your source control system to manage merging that code.

A few suggestions:

  • if you have a list of things to update, break the list up so that each item has its own line.
  • if the list has a separator character (eg: a comma), include the separator on the last line of the list.
  • if you're adding to a list, add to the end if possible.
  • avoid changing whitespace, try to have your patch reflect functional changes only.

Your goal is to minimize the patch so that it represents the smallest possible set of changed lines. If you can avoid touching peripheral lines around your change set, you reduce the disk of running into conflicts when you merge.

Get into the habit of diffing your changes against the repository while your work, and certainly always diff before you commit. If you find in changed lines that are not essential for the patch (whitespace in particular), take them out!

Here's an example from a makefile:

      SOURCES = one.c two.c three.c

This is nice and readable at first, but over time this line may grow to include a large number of source files. People will tend to add to the end at first, and perhaps alphabetically when the number of files increases. The resulting diff shows a single modified line but won't really show you what changed on that line. Things get difficult when two changeset affect that line; you'll get a conflict because the source control system doesn't know how to merge them.

      # this is better
      SOURCES = \\
        one.c \\
        two.c \\
        three.c \\

Each item now has its own line. By getting into the habit of adding at the end, complete with separator or continuation character you help the merge process: each item you add will be a single line diff, and it will know that you're adding it at the end, improving the chances of a successful merge a great deal.

Adding at the end isn't the golden rule so much as making sure that everyone adds consistently. Often, order is important, so adding at the end isn't going to help you. By adding in a consistent manner, you reduce the chances of touching the same lines as another changeset and thus reduce the chances of a conflict.

Here's the same example, but in PHP:

      $foo = array("one", "two", "three");

better:

      $foo = array(
              "one",
              "two",
              "three",
             );

Dangling commas are good! :)

Keep the diff readable

Don't take the concept of small diffs too literally--if you can express your change on a single line that is 1024 characters long you've made the merge easier at the expense of making it really hard to review what the change does. This basically boils down to making sure that you stick to the coding standards that have been established for the project.

Don't sacrifice human readability for the sake of easier merging.

If you find that you need to merge a changeset to more than one branch (say you have a bug fix to apply to 2.0 and 2.0.1) then it's often easier to merge to 2.0 first, resolve any conflicts, commit and merge the 2.0 changeset into 2.0.1 rather than the trunk changeset direct to 2.0.1.

These practices aren't obtrusive and will help you when you need to merge a changeset from one branch to another.

I don't pretend to know everything, these are just a couple of tidbits I thought I'd share. If you have other similar advice, I'd like to hear it--feel free to post a comment.

parser and lexer generators for PHP

[Update: I've put these parser/lexer tools on BitBucket and Github; enjoy!]

From time to time, I find that I need to put a parser together. Most of the time I find that I need to do this in C for performance, but other times I just want something convenient, like PHP, and have been out of luck.

This thanksgiving I set out to remedy this and adapted lemon to optionally emit PHP code, and likewise with JLex.

You need a C compiler to build lemon and a java compiler and runtime to build and run JLexPHP, but after having translated your .y and .lex files with these tools, you're left with a pure PHP parser and lexer implementation.

The parser and lexer generators are available under a BSDish license, from both BitBucket and Github:

See enclosed README files for more information.

HTTP POST from PHP, without cURL

Update May 2010: This is one of my most popular blog entries, so it seems worthwhile to modernize it a little. I've added an example of a generic REST helper that I've been using in a couple of places below the original do_post_request function in this entry. Enjoy!

I don't think we do a very good job of evangelizing some of the nice things that the PHP streams layer does in the PHP manual, or even in general. At least, every time I search for the code snippet that allows you to do an HTTP POST request, I don't find it in the manual and resort to reading the source. (You can find it if you search for "HTTP wrapper" in the online documentation, but that's not really what you think you're searching for when you're looking).

So, here's an example of how to send a POST request with straight up PHP, no cURL:

I'm looking for another Dark Apprentice

I'm looking for someone who wants to hone their existing 3+ years of C hacking and debugging skills on some of the fastest, most highly stressed core infrastructure applications ever created.

The full job description is available on the OmniTI Careers page.

A successful applicant for the position will join the ranks of my Dark Apprentices and will have the opportunity to learn and develop skills such as:

  • Performant, scalable thinking. Writing and troubleshooting code that runs in high stress environments.
  • Sith debugging. Mastering the inner mysteries to deduce ways to effectively reproduce and resolve otherwise impossible problems.
  • All the fun and happy details of the various email specs.
  • Dry wit. You'll have the option of picking up some of my British humour.

There's plenty of scope for developing these skills and more.

If you're interested in this position, or know someone else that might be, please direct resumes to jobs[at]messagesystems.com.

(I hope the folks an planets mysql and php don't mind the cross posting; we do do work with both PHP and mysql, so it's not totally off topic. Thanks for reading!)

On the road to San Jose for ZendCon'06

I'm currently sitting in Atlanta airport (because it's on the way to San Jose from BWI, obviously).

I really enjoyed last years conference, so I have great expectations this year. I'll be giving the short version of my PDO talk again this year (but this time, in shiny Keynote on my shiny macbook).

I think I'll try to attend the session "Managing PHP and PHP Applications on Windows" to see what the folks at Microsoft have to say about that, and "Unlocking The Enterprise Using PHP and Messaging and Queuing" to see what IBM have planned there. Outside of the sessions, I'm going to sit down with Andrei and Sara to discuss implementing Unicode for PDO in PHP 6.

Ah, time to board. See you there if you're there!

Background/batch/workflow processing with PDO::PGSQL

One of the other things I've been looking it as ways to implement background processing in PHP. In my recent talk on sending mail from php I mention that you want to avoid sending mail directly from a web page. A couple of people have asked me how to implement that, and one of the suggestions I have is to queue your mail in a database table and have some other process act on that table.

The idea is that you have a PHP CLI script that, in an infinite loop, sleeps for a short time then polls the database to see if it needs to do some work. While that will work just fine, wouldn't it be great if the database woke you up only when you needed to do some work?

I've been working on a patch originally contributed by David Begley that adds support for LISTEN/NOTIFY processing to the Postgres PDO driver. With the patch you can write a CLI script that looks a bit like this:

<?php
   $db = new PDO('pgsql:');
   $db->exec('LISTEN work');
   dispatch_work();
   while (true) {
      if (is_array($db->pgsqlGetNotify(PDO::FETCH_NUM, 360))) {
          dispatch_work();
      }
   }
?>

This script will effectively sleep for 360 seconds, or until someone else issues a 'NOTIFY work' query against the database, like this:

<?php
   $db->beginTransaction();
   $q = $db->prepare('insert into work(...) values (...)');
   $q->execute($params);
   $db->exec('NOTIFY work');
   $db->commit();
?>

When the transaction commits, the CLI script will wake up and return an array containing 'work' and a process id; the script will then call dispatch_work() which is some function that queries the database to find out exactly what it needs to do, and then does it.

This technique allows you to save CPU resources on the database server by avoiding repeated queries against the server. The classic polling overhead trade-off is to increase the time interval between polls at the cost of increased latency. The LISTEN/NOTIFY approach is vastly superior; you do zero work until the database wakes you up to do it--and it wakes you up almost immediately after the NOTIFY statement is committed. The transactional tie-in is nice; if something causes your insert to be rolled back, your NOTIFY will roll-back too.

Once PHP 5.2.0 is out the door (it's too late to sneak it into the release candidate), you can expect to see a PECL release of PDO::PGSQL with this feature.

Identity/Authentication and PHP OpenSSL updates in the pipeline

I've been idly daydreaming about improving my blog. This is something (the daydreaming) I've been doing for some time with George and more recently with Chris. There are a number of things that I want to change (that aren't really worth talking about right now), but one of the main things is adding support for emerging authentication technologies.

I've had support for external authenticators on this blog for a couple of years now--you can login using your php.net cvs username and password if you wish. Why do I have an external authentication mechanism? I don't want to maintain a user database just for my blog. It's more moving parts and requires things like sending email pings to random email addresses (which could be abused by malicious folks) and mechanisms for resetting or retrieving a forgotten password. Not to mention that it's yet another username/password to be remembered by the person doing the commenting.

More and more people are beginning to think the same way that I did back in the spring of 2004 and so we have technologies such as TypeKey and OpenID emerging to make life simpler. TypeKey is a service provided by SixApart to allow third parties to assert that someone is a verified user of their services. OpenID is an open protocol that allows anyone to authenticate a user against an OpenID server. OpenID is a decentralized protocol; there is no central managing body and anyone can run an OpenID server.

There are a couple of OpenID services out there; I'm using the VeriSign OpenID server for my online identity because it very clearly puts me in control over what profile information is released to the site requesting authentication.

I found TypeKey easier to implement than OpenID, but I like OpenID more because I can use my own URL for my identity, and I'm not forced to register with a single authentication provider. TypeKey also exposes its authentication scheme via OpenID, so if you only implement OpenID, you can still authenticate TypeKey users.

At the time of writing, TypeKey doesn't support the simple registration extension for OpenID so you have to prompt the user for their name/email. If you use native TypeKey authentication you get the name/email automatically.

These are browser-based authentication technologies, similar to the Yahoo! browser-based auth scheme, but have the advantage that you get identity information in addition to the authentication result. That means that, using yahoo bbauth, while you get an opaque token on successful auth, you still don't know the name or the user, their email address or even a web page URL. That's part of the design of bbauth, protecting the privacy of yahoo users, but at the same time limiting the utility of the scheme for lazy programmers like me. My goal is to display the users nickname and blog url in the comments section of my blog without building the machinery for sending out email verification. Yahoo bbauth doesn't currently support that, but I've heard that Y! are looking into expanding that in the future.

I've been looking at the PHP implementations of consumers of each of those technologies and, at least to my eyes, they're screaming for some better support from PHP. I've been working on a patch for the openssl extension that provides functions for verifying DSA signatures and performing the steps of the Diffie-Hellman key exchange algorithm which are used in TypeKey and OpenID respectively. Once this patch is mainstream it will eliminate the need for performing big number math in PHP script.

I have plans to release the patched openssl extension via PECL in the near future, so you won't have to wait for PHP 5.2.1 to use it.

MS Web Dev Summit

For the past couple of days I've been in (rainy|sunny) Seattle attending a web development summit hosted on the Microsoft campus in Redmond. Microsoft invited a number of "influentials" from web development communities outside of the usual MS camps; the folks attending were mostly of a PHP background, but there was one Rails guy and a couple of others with more of a .Net background.

At first you'd think that MS had set out to brainwash us all into talking about how great their new bits are. While that was true to a certain extent, they were very keen to find out what we all thought about those bits--did they suck? how could they be improved? and so on.

For me, the more interesting parts included:

Feature focus on IIS7

The IIS7 that will ship with Vista is designed to make things easier for a web developer. There are some innovations like per-directory configuration files called web.config files. These are effectively an XML equivalent to Apache .htaccess files and will make things much easier for transporting configuration from a local dev box up to a staging or even production server. The IIS guys re-engineered the core of IIS to run in a modular fashion, making it much easier to build in custom authentication or URL rewriting facilities, for example.

This may not sound like a big deal to apache users, but it's a significant stride in the right direction as far as feature parity between apache and IIS is concerned--it makes it easier to create an app that will run "the same" on IIS as it does on Apache.

Oh yes, FastCGI support is planned ship with with IIS7.

LINQ

LINQ can be described as SQL integration at the programming level. But its more than that; the LINQ langugage extensions to C# allow you to structure queries across disparate data sources. If you have an array of in memory data and a SQL table, you can join and query across both those things as though they were one data source. It sounds very interesting; you can find out more at http://msdn.microsoft.com/data/ref/linq/.

CardSpace (formerly known as InfoCard)

CardSpace is a new identity technology that will be integrated into browsers (IE7 will ship with it, and I've been told that there is a firefox plugin). The technology uses cryptography to put you firmly in control of your personal and financial information. For instance, if you're buying something online the authorization for that transaction takes place between you and your bank/credit provider and they issue a cryptographically signed token that the seller can use as confirmation of the transfer of funds. The seller never even has an inkling of what your credit card details are, eliminating the risk of identity theft.

It's an interesting technology. If you google for "cardspace php" you can find some PHP code that accepts CardSpace data. I was talking to Rob Richards about this last week in Toronto; you can see working CardSpace/InfoCard authentication on his blog.

Feature focus on IE7

I don't have too much to say about this except that, like IIS 7, a lot of the visible changes are primarily playing catch-up to opensource alternatives. Again, it's definitely a step in the right direction, but feels a bit like "so what?" right now. The IE7 guys made a point of saying that they are committed to making IE a better browser and that they are aware of its current shortcomings. IE7 will ship in Q4 2006 and they already have a roadmap for the next two versions of IE. Again, good news.

Expression Web

You can think of this as being something like Microsofts equivalent to dreamweaver. (disclaimer: I haven't really touched DW for some time, and barely scratched the surface, so I could be a bit off-base here). Expression Web is part of a suite of tools aimed at designers rather than coders. It looks like a very nice tool for editing HTML and CSS, and the folks behind it stressed repeatedly that a fundamental principle behind the tool is to generate standards compliant xhtml and css.

Expression has a nice natural editor that intelligently creates and re-uses style classes according to your preferences, generating good, clean markup. One particuarly nice feature was visualization of the box model; it's possible to drag and change padding and margins for elements in the page.

Summing up

Looks like Microsoft have some interesting bits heading our way in the near future. More importantly, this event helped to underscore an attitude shift within Microsoft that has been taking place over the last couple of years. People like Brian Goldfarb and Joe Stagner have played an important role in sending the message that Microsoft are genuinely interested in making the Windows platform more appealing for non-Microsoft technologies like PHP, python and ruby.

php|works 2006 - slides online

Another php|works is done. As always, Marco puts together a good conference. An interesting mix of speakers and attendees, a good selection of talks and some fun activities--the PHP trivia quiz was fun to watch (speakers were not allowed to compete) with some tough questions and a great prize (a brand new macbook!).

The extending PHP session I was covering for Sara seemed to go ok; in my experience there's typically only 1 or 2 people that are seriously following the content in these sessions, with the rest either snoozing or feeling overwhelmed. It is a tough topic to cover, even in 3 hours. I used Sara's slides, but the pacing was a bit aggressive, so we wound up spending a bit more time doing some real time extension hacking instead of following the slides too closely.

The PDO talk was the same as usual, and my new talk, on best mailing practices (affectionately known as "not PDO" by the rest of the speakers) had a decent turn-out with people actually scribbling down notes.

I think I only managed to attend two other talks; Sebastians AOP talk (although I had to cut out pretty early to make a phone call) and Zak's talk on licensing, which very clearly explained things like copyright and licensing that every developer should know.

On my return journey, I had the pleasure of meeting Eli White (PHP Hacker @ Digg, author of "PHP 5 in Practice") at the gate for the flight back home. By a strange quirk of fate I hadn't seen Eli at all at the conference, but with ample time at the gate, and on the plane (another quirk of fate had us sitting next to each other), we made up for that.

You can find my PDO and Mail talks up at the OmniTI talks page: http://omniti.com/resources/talks and you can find the extending PHP slides up at furlong-golemon-extending-php.pdf.