DOMDocument schema and DTD validation in PHP can make use of libxml2’s Catalog support feature.
A catalog is basically a XML file which contains information where to obtain the DTD and XSD schema from local disk. That is mapping a “logical” name like -//w3c//dtd html 4.01 transitional//en from the common <!doctype html public "-//w3c//dtd html 4.01 transitional//en"> doctype to a concrete file on disk. Or to map a remote URI like http://www.w3.org/2002/08/xhtml/xhtml1-transitional.xsd to a local equivalent of the file.
In the second case this is extremely useful, because the World Wide Web Consortium (W3C) does add an arbitrary delay of 30 seconds because most libraries (including PHP’s DOMDocument extension) do not cache the remote files. This results in millions of hits on their servers each day.
Because of that delay and because you should always use local resources for the validation due to performance reasons, it’s technically not feasible to validate against XSD files without having such a catalog.
Setting up the Catalog
The catalog for libxml – the library behind PHPs DOMDocument object – is specified via an environment variable. The variable is called XML_CATALOG_FILES. It must be set within the environment the PHP script will be executed in. It’s not enough to set the environment variable in the PHP script like putenv('XML_CATALOG_FILES=...'), that does not work.
The variable also can be used to point to multiple catalog files. The different filenames are separated by space. If a filename contains a space, the workaround is to encode it as file URI:
C:\Documents and Settings\hakre\PhpstormProjects\schema-validation/schema/catalog.xml file://C:/Documents%20and%20Settings/hakre/PhpstormProjects/schema-validation/schema/catalog.xml
In this example, setting the XML_CATALOG_FILES to file://C:/Documents%20and%20Settings/hakre/PhpstormProjects/schema-validation/schema/catalog.xml will successfully load the catalog.xml.
Defining the Catalog
The catalog itself is a XML file (Wikipedia: XML Catalog). In this little example it makes use of two schemas, XHTML 1.0 and XML. Both xsd files have been stored into the same directory as the catalog.xml file:
<?xml version="1.0"?> <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> <system systemId="http://www.w3.org/2002/08/xhtml/xhtml1-transitional.xsd" uri="xhtml1-transitional.xsd"/> <system systemId="http://www.w3.org/2001/xml.xsd" uri="xml.xsd"/> </catalog>
This example shows, that you can make use of relative references to the XSD files. With having the environment variable set and the catalog file in place, the validation works now straight forward:
<?php /** * Validate with a catalog */ $doc = new DOMDocument(); $doc->load('test-data.xml'); $isValid = $doc->schemaValidate('test-schema.xsd'); var_dump($isValid);
And that’s basically it. A workaround is available in PHP by using a callback function to resolve public and system identifiers, however once the catalog.xml file is setup, I found it much better than with the callback function.
- 47. Catalog Common Resources (Chapter from the book “Effective XML”; Copyright 2003 Elliotte Rusty Harold)
- Cache Soap envelope schema for schema validation (20 Oct 2011; by Chris)
- Speeding up XML schema validations of a batch of XML files against the same XML schema (XSD) (13 Dec 2012; related Stackoverflow Q&A material)
- Handle XML Catalogs by php (10 May 2011; related Stackoverflow Q&A material)