SQL and XML—representing trees

I'll use an organizational hierarchy to highlight differences between SQL and XML, and look into conversions between them.

XML files are trees of nodes. XML is good for representing tree-structured data. But rarely is one tree the only structure in a database, so in general, a relational database (e.g. SQL) is better at maintaining database integrity.

We'll look closer at two tree structures: relation-like trees and recursive trees.

Relation-like trees

An XML document follows the SQL relation structure closely, if the root represents the relation, the second level nodes the tuples, and the third level the values in each tuple.

Recursive trees

Recursive trees are trees where the same node type appears at different levels, and there is no limit on how deep the tree may be. Relational DBs at OK for representing that structure, but SQL is not a good tool to ensure the following two integrity constraints:

XML is very good at enforcing these particular integrity constraints.

The pecking-order example

In the tree version of the pecking-order, pecking-order.xml.txt, we can easily see that Þór is a middle manager, and that Røskva is one of the three at the bottom. pecking-order.xml

In the relation-like normalized.xml Óðin is seen as top dog because of his "NULL" boss, getting the rest of the picture is difficult for a human, even with only 6 persons involved. (Rendered as HTML below, because IE6 doesn't understand CSS table styling of XML.) normalized.html

The name "normalized.xml" does not indicate that this is a 'standard' or 'good' XML file. On the contrary: it indicates that the XML structure is tweaked to match the structure of a database relation in 1st normal form.

An insignificant observation: When you focus on a person in the tree version, you see his subordinates (and his name). When you focus on a person in a relation, you see his boss (and—again—his name).

XSLT style sheets

normalize.xsl

normalize.xsl takes XML files conforming to pecking-order.xsd, and transforms them to relation-like XML like normalized.xml. No information is lost in the transformation (left as an exercise for the student...), but XML's power to validate the tree is removed. That's why I didn't bother to define an XML schema for the normalized form—we wouldn't be able to validate the tree anyway. And a human with an XML editor should write to the pecking-order.xsd, normalized documents should be created automatically.

comma.xsl

comma.xsl is essentially the same as normalize.xsl. The structure of the output is the same, but the file is not XML—it is a 'comma-separated' text file, suited to import into e.g. Microsoft Access 97.

2tree.xsl

2tree.xsl is close to being the inverse of normalize.xsl. (In general, XSLTs are not invertible functions.) If there are no circles in the data, the root xsl:param of 2tree.xsl keeps its default "NULL" value (and because the sequence of persons carries no meaning), normalize and 2tree are inverse functions.

Any person that is part of a circle—and its subordinates—are dropped by 2tree. But such data would not be produced by normalize in the first place.

XML integration in (or for) SQL products

SQL Products

Microsoft Access 97

Microsoft Access 2002 (Office XP pro)

Microsoft SQL Server 2000

Oracle 9i rel 2

XML data (complete with validation against XML schema) can be used as a data type in SQL. But if the pecking-order is placed as XML data within SQL, it will still be XML, and can be queried with XPath only, not with SQL. You can find a person's boss with thw XPath parent::node(), but not with the SQL select. I'd really want that data to be both SQL and XML, so both XPath and SQL queries would work. (And both DOM and SQL updates.)

XML data can be constrained by relational constraints. So we can ensure that every person in an XML pecking-order is present in the main person relation. But the Oracle whitepaper does not seem to say that the opposite is true; we probably can't contstrain relational data by XML. We need a way to ensure the integrity of the person relation by requiring that every (relational) person is present in the XML pecking-order. That could be done, if an XML index could be refered as as foreign key from SQL. XPath to specify parts of your document to create indexes for XPath searches. I guess that means that the XML index can't be sed as an SQL key.

The other approach in Oracle, is the XML view. That gives us a choice of querying the same data with either XPath og SQL. I have to find out what happens to the XML view when the relational data is curcular. The XSLT transformation 2tree.xsl simply drops any data that is part of a circle, maybe Oracle does the same? And is the XML view updateable by the open/save commands of an XML editor?

On one hand, the existence of an XML view should imply integrity constraints on the relational data: no circular data, and one root node only.

On the other hand, having view definitions change integrity constraints is rather ugly—adding or deleting views should not change the database structure.

When there is an XML view on relational data ,why isn't there a relational view on XML data?

Third Party Products

XML Integrator from IBM

Profium Select