Serialization in Options: Cant see the Wood for the Trees

Last night lying in the spa when I put my book aside, it popped into my mind:

We were so looking for a practicable fast and general approach to check if an option value is serialized or not in worpdress, but we completely missed an important fact: Option values are only serialized if they are array or object or already a serialized value.

That is by design, the related function is called maybe_serialize() (in functions.php):

/**
 * Serialize data, if needed.
 *
 * @since 2.0.5
 *
 * @param mixed $data Data that might be serialized.
 * @return mixed A scalar data
 */
function maybe_serialize( $data ) {
	if ( is_array( $data ) || is_object( $data ) )
		return serialize( $data );

	if ( is_serialized( $data ) )
		return serialize( $data );

	return $data;
}

So only true arrays, true objects and those strings, that look like being serialized data (is_serialized() has some false positives which I silently ignore here) will get serialized. A little table for the overview:

Data Serialized in Option?
NULL No
Scalar* No
Resource No
Array Yes
Object Yes
String (serialized data) Yes, double encoded

Table: Data and it’s serialized state in the options database table. *Exceptions to this rule are strings that contain serialized data.

But in it’s counterpart maybe_unserialize(), we take the full blown approach with is_serialized() again, that function that checks for any value being serialized, not only for array, object or strings.

And is_serialized() was that function that got tweaked a lot in Ticket #14429 (On my blog: Rules of Play – Faster is_serialized()).

So to make maybe_unserialize faster (as it got run on the options), it must have some is_serialized_maybe( $data ) that is checking for serialized array, objects and strings only to come faster to a conclusion. As mainly test for the first bytes first, this will reduce from seven cases to three. But next to that we can make more propper assumptions:

  • The data is already trimmed, we can spare the trim function call.
  • Tokens can be compared against a static map, no need for a switch construct.
  • The NULL special case does not even need to be handeled any longer.

So I made a quick first iteration of is_serialized() into is_serialized_maybe() and gave it a run on my testbed.

It’s a first approach, maybe the static map thingy would do a good job on is_serialized() as well. For a starter, I’ve opened a Ticket #16504 – Faster maybe_unserialize and provided a patch.

What’s the influence? I did some measurements how often maybe_unserialize() is actually called:

Location Calls
Blog Homepage 298
Admin Dashboard 247
Admin Media Library (10 Items) 478
Admin Media Library (20 Items) 809
Admin Media Library (30 Items) 1068

Table: Number of calls to maybe_unserialize() on a standard WordPress install (only some test-content, not much configuration, no widgets, default theme, no menu)

Let the games begin ;)

Read On: Rules of Play – Faster is_serialized() (17 Nov 2010; by hakre)

About these ads
This entry was posted in Hacking The Core, Pressed and tagged , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s