hckr.fyi // thoughts

Recursion in XML Schema

by Michael Szul on

XML Schema is a powerful validation tool for XML documents that is virtually a requirement if you are accepting 3rd-party XML as incoming data for a web service. Many shy away from using the technology because of its overt verbosity and complexity, but it offers the granularity necessary for fine-tuned validation.

With XML, many different markup structures are possible, including element names existing within each other. For example, maybe you have a <container> element that can exist within itself:

<container>
        <container>
            <container />
        </container>
    </container>
    

This embeddedness can can be difficult to initially discern how to validate with XML Schema. It actually requires the use of a global complex element type that is then referenced by itself (and elsewhere in the initial code).

Below is an example:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
        <xs:element name="container">
            <xs:complexType>
                <xs:sequence>
                    <xs:element ref="container" minOccurs="0" maxOccurs="unbounded" />
                </xs:sequence>
            </xs:complexType>
        </xs:element>
    
        <xs:element name="containers">
            <xs:complexType>
                <xs:sequence>
                    <xs:element ref="container" minOccurs="0" maxOccurs="unbounded" />
                </xs:sequence>
            </xs:complexType>
        </xs:element>
    </xs:schema>
    

One thing you'll notice right from the beginning is that the <container> element exists in an area where XML Schema will consider it a global element. You can only ref a global element. That global element has a name attribute, while all other areas where the element can reside, as per validation, are noted with the ref instead.

The second <xs:element> declaration is the root of our document: <containers>. It is within this structure that we make our first call to the global element, and then the global element, in turn references itself, creating the recursive validation.