The SimpleXMLElement Magic Wonder World in PHP

PHP’s Simplexml ships with a lot of magic to simplify access to an XML documents element and attribute node values. Some criticize this and suggest to use the DOM library instead. The DOM library on the other hand, even it can do everything tend to be known with an XML document, it’s pretty verbose – and yes that’s some critique with XML as well, the verbosity. Sure there are many nice libraries around the DOM library and wrapping it and one of these libraries again is Simplexml.

From a data-type perspective, the SimpleXMLElement is quite an interesting one actually, literally I mean figuratively -what not. It’s something like a hierarchical data-structure. One that comes with it’s own query method via the xpath() method. It can be iterated, traversed, nodes added and leafs unset as if it would be an array or an std class. And it comes with a serializer built in – into XML – in both directions.

From it’s internals, it’s fully backed from C code below from libxml, it’s also pretty fast and perhaps also fine with the memory (at least I hope).

It speaks Unicode in the popular UTF-8 encoding you know from the web and if you need to, it can even convert to other encodings.

And one of it’s magic properties is that it’s such a class of classes in PHP that can be casted from one class to another. This works by converting one (subclass of a) SimpleXMLElement to another subclass of it by sending it through DOM (the besaid sister-library):

$foo  = new Foo("<doc/>");
$via  = dom_import_simplexml($foo);
$cast = simplexml_import_dom($via, 'Bar');
var_dump(get_class($cast)); # string(3) "Bar"

This is actually not only true for SimpleXMLElement but also to the node-classes in a DOMDocument to a certain degree but this post is about SimpleXMLElement so just saying.

I have to say it: With so much simplification and magic, there is a price to pay and there are limitations, too. The constructor is final, so you can’t override it. No way :). This hinders you in terms of “classic” object inheritance. One path out is to decorate the elements, but even I did this in the past, it doesn’t feel equally well as well. It might also be more work as first thought. But most often, extending SimpleXMLElement just more to sugar-in some methods, so it’s often not worth for a full-feature decoration. So this is a limitation. ERR_TOO_MUCH_MAGIC comes to mind.

And some argue as for the data-structure you can’t use it as array or object store as all class-properties or array-indexes represent either XML element or attribute nodes only accepting scalar types (actually stringy values).

Storing Arrays and Objects in a SimpleXMLElement

Let me elaborate on that last point a little. It’s normally not possible to store array or object data inside a SimpleXMLElement. As you couldn’t serialize it as XML, by default it’s fordidden to do:

class Foo extends SimpleXMLElement
{
}

$foo  = new Foo("<doc/>");
# Warning: It is not yet possible to assign complex types to properties
$foo->bar = $foo;

If you now think that creating a private field and assigning the data to the private field would be a solution, it will teach you about another limitation: there are no private fields with a SimpleXMLElement. It’s field are all exposed XML nodes so all you can store there are strings.

But wouldn’t it be nice to actually be able to store some objects therein? Let’s elaborate a bit on the internals which is how I discovered some nice properties of the document model in PHP and it’s use from within Simplexml.

The SimpleXMLElement is somewhat a shell around some other object only. It perhaps can be describben as a Flyweight (as in the pattern), an interface of factory and object manager of the underlying document nodes. And the document again can be represented as a DOMNode which again is a shell/interface around the underlying document node managed by libxml. This is the underlying structure of not only the SimpleXMLElements but also the tree structure of the DOM. The PHP SimpleXML/DOM extensions manage all these document nodes nicely for us.

If it is now possible to turn a SimpleXMLElement into a DOMNode it is then – because of the object model in PHP with the dynamic properties (every object in PHP is actually somewhat an array/hash) – possible to assing data to a document node without creating a new element as it would be the case on level of SimpleXMLElement:

class Foo extends SimpleXMLElement
{
    function setData($data) {
        $element = dom_import_simplexml($this);
        $element->data = $data;
    }
}

$foo = new Foo("<doc/>");
$foo->addChild('bar')->setData($foo);

This does actually work, but the data won’t yet persist. What is necessary to keep the dynamic property data here in memory within the DOM is to add as circular reference to the DOMNode, let’s call that one circref in this example. It’s then possible to write and read the data:

        ...

        function setData($data) {
            $element = dom_import_simplexml($this);            
            $element->data    = $data;
            $element->circref = $element;
        }

        function getData() {
            $element = dom_import_simplexml($this);
            return $element->data;
        }

        ...

The usage example demonstrates that it is now possible to store (or attach) an object to the document node accessed via Simplexml:

$foo = new Foo("<doc/>");
$foo->addChild('bar')->setData($foo);

var_dump($foo->bar->getData());

# class Foo#1 (1) {
#   public $bar =>
#   class Foo#2 (0) {
#   }
# }

For those who hate Simplexml but even read until here: As already written earlier, this same principle works with pure DOMDocument / DOMNode as well. Just in case you want to re-use the node based data-structure and you need to add (object) information to it. All you need is the circular reference to keep the association between the data and the node in memory. And it’s really within the same document:

# obtaining the data via DOMDocument
$doc = dom_import_simplexml($foo)->ownerDocument;

$bar = $doc->getElementsByTagName('bar')->item(0);
var_dump($bar->data);

As you can imagine same applies for xpath queries – both via DOMXPath or SimpleXMLElement.

This entry was posted in Developing, Hakre's Tips, PHP Development, Pressed, The Know Your Language Department, Uncategorized and tagged , , , , . Bookmark the permalink.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.