Adding catalog support into Saxon

Jirka Kosek

$Id$


Table of Contents

Motivation
Saxon 6.5.1 and above
Saxon 6.x
Feedback
Future plans

Motivation

If you do not know, what SGML/XML catalogs are good for and if you really need them, read excellent article If You Can Name It, You Can Claim It! from Norman Walsh. Norman developed set of Java classes, which implements catalog mapping. Syntax of catalog files is described in OASIS Technical Resolution 9401:1997 (Amendment 2 to TR 9401). This page describes how to extend popular XSLT processor Saxon to support catalogs.

Note

Please note, that Norm released new resolver, which has many advanced features and can be integrated more seamlessly with Saxon and other XML and XSLT processors. In this light following information may seem obsolete, but still might be useful for someone.

Currently I know two little bit different ways to add catalog support for Saxon. One works with all versions of Saxon 6.x and the second only with Saxon 6.5.1+ (maybe also some lower, but I didn't tested it). I recommend you to use the latest version because earlier versions of Saxon than 6.4.3 had some serious bug in implementation import/include instructions.

Saxon 6.5.1 and above

Installation

Saxon allows you to replace parser used to read XML files and XSLT stylesheets. Usually you need catalogs only for input XML files. To use Saxon with catalog files do following steps:

Procedure 1.

  1. Install Saxon from above mentioned address. Use pure Java version, not Instant Saxon.

  2. Download Java archive saxoncatalog.jar with catalog support and save it somewhere on your drive.

  3. Now you can run Saxon by following command:

    java -cp /path/to/saxoncatalog.jar;/path/to/saxon.jar
    -Dxml.catalog.files=your_catalog_files com.icl.saxon.StyleSheet
    -x cz.kosek.CatalogXMLReader normal saxon parameters

    Of course, command should not be split across several lines as on your screen.

If you use Saxon on regular basis, you probably want to make some batch file or shell script to invoke it. If you have already path to your catalogs stored in environment variable SGML_CATALOG_FILES create something like saxon.bat on Windows:

@java -cp /path/to/saxoncatalog.jar;/path/to/saxon.jar -Dxml.catalog.files=%SGML_CATALOG_FILES% com.icl.saxon.StyleSheet -x cz.kosek.CatalogXMLReader  %1 %2 %3 %4 %5 %6 %7 %8 %9

Or if you are happy unix user:

#!/bin/sh @java -cp /path/to/saxoncatalog.jar;/path/to/saxon.jar -Dxml.catalog.files=$SGML_CATALOG_FILES com.icl.saxon.StyleSheet -x cz.kosek.CatalogXMLReader  $*

This version of catalog support uses Saxon's internal parser Ælfred. Currently it is also able to handle all encodings supported by your JVM, so there is no problem using it with some documents in national specific encodings.

Implementation

Saxon is able to use user supplied parser to read XML files. This parser must implement XMLReader interface from SAX2. My solution to catalog classes uses existing implementation of this interface provided by Ælfred. I created descendant of SAXDriver class in which I changed entity resolver to resolver implemented by Java Catalog Classes.

Whole magic thing is one simple class named CatalogXMLReader.java.

// Written by Jirka Kosek, jirka@kosek.cz
// NO WARRANTY! This class is in the public domain.

package cz.kosek;

import java.io.*;
import com.icl.saxon.aelfred.SAXDriver;
import com.arbortext.catalog.Catalog;
import com.arbortext.catalog.CatalogEntityResolver;

public class CatalogXMLReader extends SAXDriver
{
  static Catalog catalog = new Catalog();
  static CatalogEntityResolver resolver = new CatalogEntityResolver();

  public CatalogXMLReader()
  {
    try
    {
      catalog.loadSystemCatalogs();
      resolver.setCatalog(catalog);
    }
    catch (IOException e)
    {
      System.err.println("Error loading catalogs: " + e.getMessage());
    }
    this.setEntityResolver(resolver);
 }
}

Due to some problems in internals of Ælfred parser I must slightly modify Norm's catalog classes to not open streams in InputSource, just to return new resolved system identifier. If you are interesed in these changes, you can e-mail me.

Saxon 6.x

Warning

This modification uses Crimson parser. As I recognized, it contains some bugs and newer versions (1.1.1+) reports warnings which should not be reported as warnings. Thus using it for example with DocBook files is very annoying. I recommend you to use Saxon 6.5.1 and its built-in parser.

Installation

Saxon allows you to replace parser used to read XML files and XSLT stylesheets. Usually you need catalogs only for input XML files. To use Saxon with catalog files do following steps:

Procedure 2.

  1. Install Saxon from above mentioned address. Use pure Java version, not Instant Saxon.

  2. Download Java archive kosek.jar with catalog support and save it somewhere on your drive.

  3. Download Crimson parser. Crimson is a part of JAXP package from Sun, also can be downloaded separately from Crimson pages. You need file named crimson.jar. For your convenience I placed copy of this file at my site.

  4. Now you can run Saxon by following command:

    java -cp /path/to/crimson.jar;/path/to/kosek.jar;/path/to/saxon.jar
    -Dxml.catalog.files=your_catalog_files com.icl.saxon.StyleSheet
    -x cz.kosek.CatalogXMLReader normal saxon parameters

    Of course, command should not be split across several lines as on your screen.

If you use Saxon on regular basis, you probably want to make some batch file or shell script to invoke it. If you have already path to your catalogs stored in environment variable SGML_CATALOG_FILES create something like saxon.bat on Windows:

@java -cp /path/to/crimson.jar;/path/to/kosek.jar;/path/to/saxon.jar -Dxml.catalog.files=%SGML_CATALOG_FILES% com.icl.saxon.StyleSheet -x cz.kosek.CatalogXMLReader  %1 %2 %3 %4 %5 %6 %7 %8 %9

Or if you are happy unix user:

#!/bin/sh
@java -cp /path/to/crimson.jar;/path/to/kosek.jar;/path/to/saxon.jar -Dxml.catalog.files=$SGML_CATALOG_FILES com.icl.saxon.StyleSheet -x cz.kosek.CatalogXMLReader  $*

As a side-effect of using Crimson parser, you can use all encodings supported by your JVM in your XML and XSLT files.

Implementation

Saxon is able to use user supplied parser to read XML files. This parser must implement XMLReader interface from SAX2. My solution to catalog classes uses existing implementation of this interface by Crimson. I created descendant of Crimson XMLReader class in which I changed entity resolver to resolver implemented by Java Catalog Classes.

Whole magic thing is one simple class named CatalogXMLReader.java. In order to build it, you must have Crimson parser and above mentioned Java Catalog Classes.

Feedback

If you find some bug in program or you have found way how to improve it, please feel free to contact me by e-mail <jirka@kosek.cz>.

Future plans

I would like make class JAXP compatible, so you can use any JAXP enabled parser with it. Unfortunately, I have no idea how to implement it now. And I have no big motivation, because Norm plans to release cool and flexible catalog package very soon.

© Jiří Kosek 2001-2002