Rules of Play – Faster is_serialized()

It is interesting to see what happens when you throw a bunch of developers on a specific problem over time. While developing free software you can often benefit from things like code review and collective code ownership as well as just to learn from each other.

In WordPress this can be seen on multiple layers. Errors found in the underlying language get reported back to the developers of PHP. Bugs or flaws in WordPress itself get often reviewed by multiple developers and hopefully in the end fixed in a way that suits most users. It’s not always working perfectly, but the point is, that the review by a large userbase and multiple developers can help to get things running. Even if the codebase of the main application is not designed perfectly. In the end, there is always a lot to learn for the individual developer taking place in this process, because there is different feedback, from experienced, just good ideas and even bad examples can be fruitful.

There was a pretty interesting run for the is_serialized() function in WordPress in the last months. The function and it’s use has been often a source for problems which could not always be solved because some of them tend to be structural or by design, or we just do not get it going. Anyway, even under these circumstances, optimization can happen w/o throwing in the towel. I would rate this as an expensive process as design issues tend to be expensive when not solved at the design level, but as worpdress is far away from hitting changes in that direction, you’re left with what you have. And that’s real life somehow. I mean you can do blogging with it pretty well and that is what it’s for. And it’s interesting to see that expensive processes can be shouldered in a free software community quite well.

This is where the Story goes

The interesting story I’d like to tell starts here: In March Rasmus Lerdorf had a talk at the ConFoo.ca conference about PHP Performance (and a similar one in July). He exemplary showed how to optimize the computational performance of WordPress. And while doing so, he quickly hacked the is_serialized() function of WordPress to gain more performance. He did actually found out about it having performance issues while profiling the application (this video demonstrates how).

It’s not that it was only Rasmus who was aware of the bottleneck. Sambaurs, a WordPress developer, reported something similar in July in the project’s bugtracker: is_serialized runs a preg_match every time it is called which is slow (#14429). He didn’t do the same hack as Rasmus, in his proposed change he tweaked the regular expressions to make them more fast. Rasmus did use some PHP 5 function to make his improvement. So let’s say these two proposals were unrelated.

Then it was Denis de Bernardy, another WordPress known developer who chimed in and pointed to some changes I proposed some month in advance regarding that function in another ticket (#9930). We discussed regular expression related performance issues as well in the older ticket but it was not directly related to the performance of is_serialized(). But this already started some discussion. Is the old suggestion applicable? Does it solve the issue? Mine got a bad rating. Well it probably didn’t even solve a performance issue, I must admit😉.

In between Sambaurs was kind of disappointed as well:

My code is also bad. This isn’t faster at all after doing some reduction testing. In which case it’s probably best to stick with what is there.

But then Ryan must have heard about Rasmus talk. He found the existing ticket and uploaded a patch that was incorporating Rasmus’ approach in a way that it won’t improve for PHP 4 but for PHP 5. And I assume that he somehow must have been influenced by Rasmus talk because he actually profiled the changes. So we already were a party discussing the same problem with different influents:

Sambaurs just was struggeling with the performance of regexes. Rasmus just made a talk and showed how simple things can be for the user to hack core. Ryan wanted to improve wordpress. Denis with his practical view on things. And I had a backpack full of unserialize problem tickets and the knowledge gained therein.

I started to contribute to the ticket after Ryan made his proposal. I was a bit unconfident with it. I had viewed Rasmus talk as well and while doing so, I became aware of the strpbrk function for the first time. It’s always cool to discover something new, but as well as Denis I was not convinced that strpbrk is the way to go. I smelled that it might be faster (and therefore valid for Rasmus to gain some speed here quickly), but that some simple string comparison might be even more fast. As Denis shared a part of that view (or the other way round probably), and this ticket somehow caught my attention, I wanted to give it a go from my side.

What followed was a lineup of patches that combined multiple approaches we knew of already, doing changes step by step and through the whole function. Within 24 hours I added various patches to iterate over the code. In the end I was pretty confident that the last patch would be the fastest is_serialized() function on planet wordpress. For me the job was done – or was it just me who was done?

Well I need to admit that I would have loved to see that the ticket got some traction then because Ryan as having commit access did already put Rasmus approach into the core while I was sure that mine was better. Instead of a quick commit we discussed the proposed changes a bit and a time of inactivity started again.

So you might expect a happy end now? Well, as it showed, the party was not complete yet. Two weeks ago Duck_ somehow was attracked by the ticket. He did a decent review of the whole code that we all together added to the ticket so far, proposed even a new patch and did measure the speed. He could even optimize this a bit more by making the regular expressions more lightweight and by saving some preg_match checks even and using simple char comparisons. Basically the main issue Sambaurs was concerned about when he created the ticket.

Ryan just did another commit some days ago: [16300].

Thoughts

This story is somehow exemplary. The topics vary but what it shows is, that people who even do not know of each other are working together to solve (selfmade? ;)) problems. We’ve been seven in this ticket: Rasmus, Sambauers, Denis, Nacin, Ryan, Duck_ and me. Some of them might even do not know about their part. What we share is just to use a software and the possibility to share, study and improve it freely. You do not always find the best solutions, but in the end, the rate isn’t that bad.

And what this story shows as well is: For commercial projects or the single user it might not make sense to do optimizations with such a detail. In a free software project, there are other rules of play. Existing installations will benefit from the change. And who says that the story ends here? We still have some false positives that might need some care. Valid serialized floats are still reported false.

— hakre

Image Credits: Les Musiciens ca. 1595 by Caravaggio

This entry was posted in Hacking The Core, Pressed and tagged , , , , , , , , . Bookmark the permalink.

One Response to Rules of Play – Faster is_serialized()

  1. Pingback: Serialization in Options: Cant see the Wood for the Trees | hakre on wordpress

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s