XML—still waiting for Godot?

A whitepaper describing Styrheim's position on XML as of spring 2002
Written by Jan Egil Kristiansen
styrheim@post.olivant.fo

XML is a mature standard. But…

The XML standard, as a file or document (meta-)format, is now mature. Still, the most interesting things we could do with XML are still hard or impossible to do: Client-side choice of which XSL style to apply to the XML data, or which XML data to give to the style sheet, metadata for parts of documents, integrating XHTML in other XML formats and XML in XHTML.

Lack of XML data

So far, very little data is published in XML form. The richest source of XML data may well be WAP pages, which are also XML, almost by accident. But some XML pages are available, e.g. Macbeth and other Shakespeare plays. Plays are obvious candidates for XSL styling, as different readers will have different needs. An actor might want his own lines printed in bold, a stage manager will put emphasis on the stage directions, while someone reading the play as literature will want a more neutral presentation.

XML and the net before the dotcom bubble

My crude analysis:

XML has no place in a business selling banner ads. It is simply too easy to strip them off. But for anyone having real information that they wish to share with the world, XML will be a good tool. Obvious examples are public service sites like whitehouse.gov and http://landsbank.fo. But commercial companies can also use genuine information in their marketing, like ferry and bus schedules. Another example should be television schedules—if your programming is worth watching, your Nielsen rating will increase if people know it's worth watching. With schedules published in XML, it is feasible for 3rd party sites to combine schedules from any TV channel. (With today's heterogenous HTML sites, I'd use an hour to figure out what's on my cable tonight, and most of that information would be from another time zone. Jubii.dk once tried to make a general TV schedule, but had to abandon the project.)

Client-side XML transformation

In Netscape 6 and IE6, we can transform an XML page in the browser for display as HTML. That is done by specifying an xml-stylesheet processing instruction in the XML document. (Example) This solution is very rigid, the author of the XML page controls how it will be displayed, he might just as well publish his page as html. The only situation where I'd prefer XML with fixed style over plain HTML, is when it could reduce bandwidth consumption when the content changes.

It would be much more interesting to specify the XML page from the style sheet. Then I could roll my own presentation of Vinmonopolet's liquor prices. Like expressing the price per cl pure alcohol, they do publish the data to compute it, but they would never want to present it that way.

Javascripting of IE6's Microsoft.xmldom ActiveX object can load external XML and XSL files, tranform the XML with the style sheet, and present the result. At this time, programming XML DOM on the client is difficult, the pages run slow, and are very browser-dependent. A phone directory example requires Internet Explorer 6, with security modified to allow cross-domain data. (Didn't work last time I tested it, ripping data from the pages of others requires a certain amount of maintenance.)

The W3C has long had plans for other mechanisms than the xml-stylesheet processing instruction (used in the liquor example), but so far, all we've got is proprietary DOM scripting.

Server-side XML transformation

Server-side XSL/T transformations are available today, from Macromedia's JRun/Neos, from Cauco.com's Resin, from ASP and indeed from any JSP server.

For the end user, there is no benefit—what is delivered from the server is just old-fashioned HTML.

Once a developer has learned the XSL/T language, she will probably prefer to have ASP-like tools generate raw, data-only XML documents, and let XSL/T add all cosmetic detail. But switching to XSL/T is hard—it's not a normal procedural language like Java or Cobol. The functional language ML is the only language I know that is anything like XSL/T.

Off-line server-side transformation

For many applications, both the XML source and the XSL/T style may be static. In that case, the XSL transformation may run off-line, and the resulting HTML files be deployed on a plain vanilla web server. Sablotron from www.gingerall.cz is a nice off-line tool. Off-line XSL transformation may serve as a crude content management system for static web sites.

August 2003: Browsing styled XML

IE 6

IE 6 can handle both CSS and XSLT styling of XML, including a CSS style sheet output from XSLT processing.

But it has no way to choose between alternative style sheets. Always picks the last stylesheet processing instruction.

Opera 7.11

Opera 7.11 gives a shoice between alternative style sheets, but does not handle XSLT at all. Using the first stylesheet processing instruction as default, even if it is XSLT.

Mozilla 1.5b

Mozilla gives the choice between alternative style sheets. But only for CSS, if any XSLT style sheet is given, all the choices disappear.

XSLT styling works, but take care how you define comments in the style sheet. Use

		<style type="text/css">				
			<![CDATA[
				<!--		
         				...
      				-->
			]]>
		</style>
		

rather than

		<style type="text/css">
			<xsl:comment>			
     				 <![CDATA[
         				...
	 			]]>
   			</xsl:comment>	
		</style>
		

Using the first stylesheet processing instruction as default.

Current pragmatic use of XSLT client styling

To make IE use XSLT, the XSLT processing instruction must be the last one.

To make Mozilla use XSLT, the XSLT processing instruction must be the first one

If first instruction is XSLT, the default view in Opera will be displaying everything as plain inline.

To conclude: We can get IE to work with one of the other browsers, but not both, because the other browsers are not compatible with each other on XSLT styling. And to keep Opera, we will always have to include a CSS style.

Cross-site styling

Still no easy way to declare that data from a second site is to be styled with a sheet from a third.