WordPress Outage Feedback

WP.COM was down the second time this year at the end of last week. Well that’s certainly not a news any longer and good so, the dust of buzz has lowered a bit. What’s interesting about the outage for me is, that it got plenty of attention from the enterprise (maybe because of TechCrunch’s coverage and the nearby 3.0 release?).

There was analysis quickly available regarding the bugs in the rolled-out code: SaaS provider like Automattic should consider to deal with mistakes in software and therefore rollout-processes more carefully to not let become code and database design the single point of failure. One suggestion is to not deliver to all hotels at once for example. So it’s an architectural problem. Sound and fairly common sense.

What I have missed so far in this discussions is a more certain look onto the wordpress codebase and how these inner guts might be related to the named problems.

“The cause of the outage was a very unfortunate code change that overwrote some key options in the options table for a number of blogs. We brought the site down to prevent damage and have been bringing blogs back after we’ve verified that they’re 100% okay.”

Quote Source: Matt Mullenweg by email

Just to give this some personal taint: Throw a term like quality control onto a wordpress core developer and you’re pretty close to “see your name on the shit-list” the sooner or later. Or do not take care and patch even minor looking stuff to give another example (even it’s security-related nature turns out later).

But you do not need to get that personally involved. Most of the wordpress data-structures, flows and programming paradigms are unspecified. Wrote I most? Maybe there ain’t a single sheet of specs. By that it’s a no-wonder the projects testsuite is highly broken, does not perform properly since ages and it isn’t even periodically run any longer because of this. Not that I’m advocating TDD here, I mean, just run some tests because otherwise you do not know what and if it works.

Historically wordpress tests by pushing out a new release. You can imagine that this costs a lot of resources and is far away from being effective nor pleasing. For wordpress.com it is said having some sort of SVN based roll-out and it is now running close to the community versions trunk. What is the strategy? Beta testing done by 10+ million WP instances on .com?

The main problem is, the improvement of these strategic important development topics are mostly initiatives of single individuals which do not get much support in the project. By far not the support this would need to properly take care successfully and constantly.

So developers make mistakes, but Mullenweg runs the project by playing the “take care when it’s broken” card very often. It is he personally who speaks about “rapid-fire incidents”. Looks like this does not fit well together: Reactive project management can be okay for emerging periods and shouldered by a strong team, but a seven year old codebase has it’s own history and probably needs some more love, right?


Update: It took some time, about four days after “All hands were called to deck”, WP.COM posted about the downtime (14 Jun 2010, 10:28). Not much news in there, 99% of sites were unavailable for an hour, rest came up a bit after that (whatever that means), and the tricky ones were worked on until the morning (this monday morning?).

This entry was posted in Hacking The Core, Pressed, Reports, Surviving the Internet and tagged , , , , , , , , , , , , , , , , . Bookmark the permalink.

1 Response to WordPress Outage Feedback

  1. inibukansaya.sungguh says:

    i think wordpress should have some kind of test suits or better structure that make them easier to debug.

    According to my friend, a seasoned php coder, he consider wordpress suck. cause by how wordpress manage its separated of concern. he said learning every wordpress quirk seem to be a waste of time.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.