SimpleXML and JSON Encode in PHP – Part III and End

The previous two parts (Part I; Part II) did outline PHP’s standard behaviour when JSON encoding a SimpleXMLElement with json_encode().

As outlined this does not always fits the encoding needs and for some potential problems some workarounds have been showed. However those worked by affecting the XML document instead of affecting the JSON serialization.

By default what json_encode() contains as data and structure is exactly following the rules of casting a SimpleXMLElement to an array. This is because internally (see lxr json.c) json_encode() does this cast and then builds the JSON object output based on that structure.

Luckily since PHP 5.4 the JsonSerializable interface allows to interfere exactly at that point. Instead of the standard array cast, a more tailored array or object – even a string or number – can be returned. Just anything which json_encode() would normally accept. This allows to create an own JSON encoding easily by extending from SimpleXMLElement and implementing the interface as I will show now.

JsonSerializable as a JSON XML encoder

The following boilerplate code shows how to implement such a serialization. In this example, the standard array casting is used:

/**
 * Class JsonXMLElement
 */
class JsonXMLElement extends SimpleXMLElement implements JsonSerializable
{

    /**
     * Specify data which should be serialized to JSON
     *
     * @return mixed data which can be serialized by json_encode.
     */
    public function jsonSerialize()
    {
        return (object) (array) $this;
    }
}

This is really only boilerplate code because having such an implementation will encode the JSON exactly as it had been done before. With those exact same characteristics. But before changing these characteristics, first a usage example of such an JSON XML encoder:

$buffer = <<<BUFFER
<root>
    <!-- no comment -->
    <element attribute="variable">
        element 1
    </element>
    <element>
        element 2
    </element>
</root>
BUFFER;

$xml = simplexml_load_string($buffer, 'JsonXMLElement');

echo json_encode($xml, JSON_PRETTY_PRINT), "\n";

In this example note that by creating the SimpleXMLElement with the simplexml_load_string() function, the new sub-type is used: JsonXMLElement. When executed, this example shows that the JSON yet still is the same as if without that parameter:

{
    "comment": {

    },
    "element": [
        "\n        element 1\n    ",
        "\n        element 2\n    "
    ]
}

Changing JSON Encoding Rules

So now it’s up to change this standard implementation. This is done by changing the public function jsonSerialize(). Taken the problems that were highlighted in the first two parts, a routine seems useful that returns an array or object structure that can represent text including CDATA, attributes and children at once. It should also ignore comments and processing instructions.

So for the conflicting case that an element contains text and children or attributes, a special property named “@text” is added so that it can be preserved. The idea is to use a "@text" property containing the text if necessary, comparable to what is done with "@attributes" already.

Such an implementation then would look like:

    ...

    /**
     * Specify data which should be serialized to JSON
     *
     * @return mixed data which can be serialized by json_encode.
     */
    public function jsonSerialize()
    {
        $array = array();

        // json encode attributes if any.
        if ($attributes = $this->attributes()) {
            $array['@attributes'] = iterator_to_array($attributes);
        }

        // json encode child elements if any. group on duplicate names as an array.
        foreach ($this as $name => $element) {
            if (isset($array[$name])) {
                if (!is_array($array[$name])) {
                    $array[$name] = [$array[$name]];
                }
                $array[$name][] = $element;
            } else {
                $array[$name] = $element;
            }
        }

        // json encode non-whitespace element simplexml text values.
        $text = trim($this);
        if (strlen($text)) {
            if ($array) {
                $array['@text'] = $text;
            } else {
                $array = $text;
            }
        }

        // return empty elements as NULL (self-closing or empty tags)
        if (!$array) {
            $array = NULL;
        }

        return $array;
    }

    ...

Creating an awkward test case where the standard JSON encoding fails in many places shows how using this implementation makes a difference in the right direction:

<root attribute="variable">
    <!-- no comment -->
    <comment>test<!-- no comment --></comment>
    <!-- no comment -->
    <?php processing instruction ?>
    <element>
        test
        <child />
        <child />
    </element>
     <element><![CDATA[cdata]]> test</element>
    <element>
        <child>text</child>
        test
        <child attribute="variable">text</child>
    </element>
</root>
{
    "@attributes": {
        "attribute": "variable"
    },
    "comment": "test",
    "element": [
        {
            "child": [
                null,
                null
            ],
            "@text": "test"
        },
        "cdata test",
        {
            "child": [
                "text",
                {
                    "@attributes": {
                        "attribute": "variable"
                    },
                    "@text": "text"
                }
            ],
            "@text": "test"
        }
    ]
}

This preserves data now more well as the conflict between text nodes and element nodes / attribute nodes for children is better balanced. It also completely drops those awkward comments and processing instructions that were originally returned as fake-nodes. And also CDATA does not need a special treatment any longer.

But this yet is not advanced enough. For security purposes let’s introduce a depth that should be encoded maximally and some settings to decide whether to encode attributes or whether to add the extra text property. Just exemplary.

Even though the previous example class is already quite nice, SimpleXMLElement is technically not well for extending / inheritance. The options to introduce would require private members, however by extending from SimpleXMLElement private members are not available, not even private members, all default members are not available, so you can’t define those. Same applies to not being able to make use of magic methods like __set().

Cutting The Gordian Knot

So instead of using inheritance to control the serialization, aggregation is used. In this case it means that the JsonSerializable will become a decorator of a SimpleXMLElement instead of being a child-class.

For such cases I have boilerplate code that is a SimpleXMLElementDecorator base class I only need to extend from to create a more fine-grained decorator. However to just add the JSON encoding, a full-blown decorator is not really necessary. So I leave extending from such a decorator class out of this post and instead just define a new type which has decorator in name only exemplary.

So to change from inheritance to aggregation, the new object gets some constructor code and the jsonSerialize() method moved in and changed to operate on the subject SimpleXMLElement. This is rather straight forward and requires only little modification. The following PHP code-example contains all needed, subtle changes:

/**
 * Class JsonSimpleXMLElementDecorator
 *
 * Implement JsonSerializable for SimpleXMLElement as a Decorator
 */
class JsonSimpleXMLElementDecorator implements JsonSerializable
{
    /**
     * @var SimpleXMLElement
     */
    private $subject;

    public function __construct(SimpleXMLElement $element) {
        $this->subject = $element;
    }

    /**
     * Specify data which should be serialized to JSON
     *
     * @return mixed data which can be serialized by json_encode.
     */
    public function jsonSerialize() {
        $subject = $this->subject;

        $array = array();

        // json encode attributes if any.
        if ($attributes = $subject->attributes()) {
            $array['@attributes'] = array_map('strval', iterator_to_array($attributes));
        }

        // traverse into children if applicable
        $children = $subject;

        // json encode child elements if any. group on duplicate names as an array.
        foreach ($children as $name => $element) {
            $decorator = new self($element);

            if (isset($array[$name])) {
                if (!is_array($array[$name])) {
                    $array[$name] = [$array[$name]];
                }
                $array[$name][] = $decorator;
            } else {
                $array[$name] = $decorator;
            }
        }

        // json encode non-whitespace element simplexml text values.
        $text = trim($subject);
        if (strlen($text)) {
            if ($array) {
                $array['@text'] = $text;
            } else {
                $array = $text;
            }
        }

        // return empty elements as NULL (self-closing or empty tags)
        if (!$array) {
            $array = NULL;
        }

        return $array;
    }
}

The usage-example needs to be adopted as well as now this operates again on a standard SimpleXMLElement and the decorator wraps that element:

$xml = new SimpleXMLElement($buffer);
$xml = new JsonSimpleXMLElementDecorator($xml);

echo json_encode($xml, JSON_PRETTY_PRINT), "\n";

This encodes the same JSON as above with the difference that now it is a type of it’s own that does not fall under the restrictions SimpleXMLElement has due to it’s magic. The Gordian Knot has been cut and private members can be added and therefore decisions based on these new parameters.

One can continue in multiple ways from here on. I will keep it not so well designed, to demonstrate how inheriting the decorator in this recursive encoding situation can work. The following example is not that well designed because it opens it’s inner guts based on configuration values. That means a lot of decisions will be done. As an alternative the large jsonSerialize() method should be divided and then extending from such a base-class could then do various JSON encoding styles – based on the concrete subtype. Keep this in mind, the following is not the best way, just one way to add different behaviour.

JSON Encoding Based on Parameters

As introduced the design is not that well. This means instead of extending some base class this is a full implementation by adding options/parameters. It’s more straight forward but requires to rewrite a lot. First the options need to be defined and added to the constructor otherwise we can’t get them in. Additionally some setters are added to be able to change it later:

/**
 * Class JsonSimpleXMLElementDecorator
 *
 * Implement JsonSerializable for SimpleXMLElement as a Decorator
 */
class JsonSimpleXMLElementDecorator implements JsonSerializable
{
    const DEF_DEPTH = 512;

    private $options = ['@attributes' => TRUE, '@text' => TRUE, 'depth' => self::DEF_DEPTH];

    /**
     * @var SimpleXMLElement
     */
    private $subject;

    public function __construct(SimpleXMLElement $element, $useAttributes = TRUE, $useText = TRUE, $depth = self::DEF_DEPTH) {

        $this->subject = $element;

        if (!is_null($useAttributes)) {
            $this->useAttributes($useAttributes);
        }
        if (!is_null($useText)) {
            $this->useText($useText);
        }
        if (!is_null($depth)) {
            $this->setDepth($depth);
        }
    }

    public function useAttributes($bool) {
        $this->options['@attributes'] = (bool)$bool;
    }

    public function useText($bool) {
        $this->options['@text'] = (bool)$bool;
    }

    public function setDepth($depth) {
        $this->options['depth'] = (int)max(0, $depth);
    }

This is rather straight forward. For convenience reasons I store the settings internal in form of an array. This is useful later on to pass along settings to new objects of the same type as this is necessary in the jsonSerialize() method. Per each new level the depth needs to be lowered by one and when reaching zero, all children of the element should be dropped.

Next to handling the depth, decisions whether to add "@attributes" or "@text" need to be made. Here is the code of the modified jsonSerialize() method which also finishes the class definition:

    /**
     * Specify data which should be serialized to JSON
     *
     * @return mixed data which can be serialized by json_encode.
     */
    public function jsonSerialize() {
        $subject = $this->subject;

        $array = array();

        // json encode attributes if any.
        if ($this->options['@attributes']) {
            if ($attributes = $subject->attributes()) {
                $array['@attributes'] = array_map('strval', iterator_to_array($attributes));
            }
        }

        // traverse into children if applicable
        $children      = $subject;
        $this->options = (array)$this->options;
        $depth         = $this->options['depth'] - 1;
        if ($depth <= 0) {
            $children = [];
        }

        // json encode child elements if any. group on duplicate names as an array.
        foreach ($children as $name => $element) {
            /* @var SimpleXMLElement $element */
            $decorator          = new self($element);
            $decorator->options = ['depth' => $depth] + $this->options;

            if (isset($array[$name])) {
                if (!is_array($array[$name])) {
                    $array[$name] = [$array[$name]];
                }
                $array[$name][] = $decorator;
            } else {
                $array[$name] = $decorator;
            }
        }

        // json encode non-whitespace element simplexml text values.
        $text = trim($subject);
        if (strlen($text)) {
            if ($array) {
                $this->options['@text'] && $array['@text'] = $text;
            } else {
                $array = $text;
            }
        }

        // return empty elements as NULL (self-closing or empty tags)
        if (!$array) {
            $array = NULL;
        }

        return $array;
    }
}

So now the JsonSimpleXMLElementDecorator has options that can control whether or not to have @attributes and @text nodes as well as to control the depth – independent to the option in json_encode() which comes with PHP 5.5. Let’s see a new usage example and the resulting JSON. The XML is still the awkward one from above:

$xml = new SimpleXMLElement($buffer);
$xml = new JsonSimpleXMLElementDecorator($xml, FALSE, FALSE, 2);

echo json_encode($xml, JSON_PRETTY_PRINT), "\n";

The parameter mean in their order: Do not use @attributes, Do not use @text and set the depth to 2. The result is accordingly without any @text and @attribute members and the limited depth:

{
    "comment": "test",
    "element": [
        "test",
        "cdata test",
        "test"
    ]
}

Closing Notes

And that’s it for now and this closes my third part and therfore my little three parted series about JSON encoding a SimpleXMLElement. The final concepts do not shy away from comparisons to well established libraries like Zend_JSON for example which also uses SimpleXMLElement under the hood for XML to JSON conversion.

Not only does this series json_encode() scenario show how to make use of the JsonSerializable interface it also covers the processing necessary when turning a SimpleXMLElement into an array without too much code. The last class given is already pretty extensive, in more real-live scenarios such an encoder can often be much smaller (compare with PHP convert XML to JSON group when there is one child on Stackoverflow for an example).

Hope this was helpful to read and have fun!

This entry was posted in Developing, PHP Development, PHP Development, Pressed, Tools and tagged , , , , , . Bookmark the permalink.

4 Responses to SimpleXML and JSON Encode in PHP – Part III and End

  1. Pingback: The SimpleXMLElement Magic Wonder World in PHP | hakre on wordpress

  2. Brian Matovu says:

    Hello,

    Am glad you wrote these articles 5 years ago, I have been stuck with these unique scenarios and now I can have some breathing space.

    I wish to incorporate it into my Laravel package [https://github.com/mtvbrianking/laravel-xml] so that it can help the entire opensource community.

    • hakre says:

      You’re welcome. It’s not that SimpleXML is always the best tool for XML processing but hopefully this series offers a near complete picture of the mechanics for JSON encoding a SimleXMLElement. Be aware that I did not cover XML Namespaces (XMLNS) extensively in this series, just saying.

  3. Sindh PK says:

    Below if condition:
    // return empty elements as NULL (self-closing or empty tags)
    if (!$array) {
    $array = NULL;
    }

    Should be:

    if (empty($array) && !is_numeric($array) && !is_bool($array)) {
    $array = NULL;
    }

    So, it’ll not convert 0 to {“fieldvalue”:null}

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.