Chapter 10.  Create low-level SOAP web services.

10.1.  Describe the functions and capabilities of the APIs included within JAXP.

The Java API for XML Processing (JAXP) is for processing XML data using applications written in the Java programming language. JAXP leverages the parser standards Simple API for XML Parsing (SAX) and Document Object Model (DOM) so that you can choose to parse your data as a stream of events or to build an object representation of it. JAXP also supports the Extensible Stylesheet Language Transformations (XSLT) standard, giving you control over the presentation of the data and enabling you to convert the data to other XML documents or to other formats, such as HTML. JAXP also provides namespace support, allowing you to work with DTDs that might otherwise have naming conflicts. Finally, as of version 1.4, JAXP implements the Streaming API for XML (StAX) standard.

Designed to be flexible, JAXP allows you to use any XML-compliant parser from within your application. It does this with what is called a pluggability layer, which lets you plug in an implementation of the SAX or DOM API. The pluggability layer also allows you to plug in an XSL processor, letting you control how your XML data is displayed.

The main JAXP APIs are defined in the javax.xml.parsers package. That package contains vendor-neutral factory classes, SAXParserFactory, DocumentBuilderFactory, and TransformerFactory, which give you a SAXParser, a DocumentBuilder, and an XSLT transformer, respectively. DocumentBuilder, in turn, creates a DOM-compliant Document object.

Simple API for XML APIs

To start the process, an instance of the SAXParserFactory class is used to generate an instance of the SAXParser.

Figure 10.1. SAX APIs

SAX APIs


The parser wraps a SAXReader object. When the parser's parse() method is invoked, the reader invokes one of several callback methods implemented in the application. Those methods are defined by the interfaces ContentHandler, ErrorHandler, DTDHandler, and EntityResolver.

Here is a summary of the key SAX APIs:

  • SAXParserFactory

    A SAXParserFactory object creates an instance of the parser determined by the system property, javax.xml.parsers.SAXParserFactory.

  • SAXParser

    The SAXParser interface defines several kinds of parse() methods. In general, you pass an XML data source and a DefaultHandler object to the parser, which processes the XML and invokes the appropriate methods in the handler object.

  • SAXReader

    The SAXParser wraps a SAXReader. Typically, you do not care about that, but every once in a while you need to get hold of it using SAXParser's getXMLReader() so that you can configure it. It is the SAXReader that carries on the conversation with the SAX event handlers you define.

  • DefaultHandler

    Not shown in the diagram, a DefaultHandler implements the ContentHandler, ErrorHandler, DTDHandler, and EntityResolver interfaces (with null methods), so you can override only the ones you are interested in.

  • ContentHandler

    Methods such as startDocument, endDocument, startElement, and endElement are invoked when an XML tag is recognized. This interface also defines the methods characters() and processingInstruction(), which are invoked when the parser encounters the text in an XML element or an inline processing instruction, respectively.

  • ErrorHandler

    Methods error(), fatalError(), and warning() are invoked in response to various parsing errors. The default error handler throws an exception for fatal errors and ignores other errors (including validation errors). This is one reason you need to know something about the SAX parser, even if you are using the DOM. Sometimes, the application may be able to recover from a validation error. Other times, it may need to generate an exception. To ensure the correct handling, you will need to supply your own error handler to the parser.

  • DTDHandler

    Defines methods you will generally never be called upon to use. Used when processing a DTD to recognize and act on declarations for an unparsed entity.

  • EntityResolver

    The resolveEntity method is invoked when the parser must identify data identified by a URI. In most cases, a URI is simply a URL, which specifies the location of a document, but in some cases the document may be identified by a URN - a public identifier, or name, that is unique in the web space. The public identifier may be specified in addition to the URL. The EntityResolver can then use the public identifier instead of the URL to find the document - for example, to access a local copy of the document if one exists.

When to Use SAX

SAX is fast and efficient, but its event model makes it most useful for state-independent filtering. For example, a SAX parser calls one method in your application when an element tag is encountered and calls a different method when text is found. If the processing you are doing is state-independent (meaning that it does not depend on the elements that have come before), then SAX works fine.

On the other hand, for state-dependent processing, where the program needs to do one thing with the data under element A but something different with the data under element B, then a pull parser such as the Streaming API for XML (StAX) would be a better choice. With a pull parser, you get the next node, whatever it happens to be, at any point in the code that you ask for it. So it is easy to vary the way you process text (for example), because you can process it multiple places in the program.

SAX requires much less memory than DOM, because SAX does not construct an internal representation (tree structure) of the XML data, as a DOM does. Instead, SAX simply sends data to the application as it is read; your application can then do whatever it wants to do with the data it sees.

Pull parsers and the SAX API both act like a serial I/O stream. You see the data as it streams in, but you cannot go back to an earlier position or leap ahead to a different position. In general, such parsers work well when you simply want to read data and have the application act on it.

But when you need to modify an XML structure - especially when you need to modify it interactively - an in-memory structure makes more sense. DOM is one such model. However, although DOM provides many powerful capabilities for large-scale documents (like books and articles), it also requires a lot of complex coding.

For simpler applications, that complexity may well be unnecessary. For faster development and simpler applications, one of the object-oriented XML-programming standards, such as JDOM (http://www.jdom.org) and DOM4J (http://dom4j.org/), might make more sense.

Parsing an XML File Using SAX

The following code sets up the parser and gets it started:

static public void main(String[] args) throws Exception {
	SAXParserFactory spf = SAXParserFactory.newInstance();
	spf.setNamespaceAware(true);
	SAXParser saxParser = spf.newSAXParser();
	XMLReader xmlReader = saxParser.getXMLReader();
    xmlReader.setContentHandler(new MyHandler());
    xmlReader.parse(fileURL);
}
					

These lines of code create a SAXParserFactory instance, as determined by the setting of the javax.xml.parsers.SAXParserFactory system property. The factory to be created is set up to support XML namespaces by setting setNamespaceAware to true, and then a SAXParser instance is obtained from the factory by invoking its newSAXParser() method. Then you obtain an XMLReader instance for your parser by invoking your SAXParser instance's getXMLReader() method. The XMLReader then registers the MyHandler class as its content handler, so that the actions performed by the parser will be those of the startDocument(), startElement(), and endDocument(), etc methods. Finally, the XMLReader tells the parser which document to parse by passing it the location of the XML file in question, in the form of the file URL.

The SAXParserFactory can be set up such that it uses a validating parser instead of the default non-validating parser:

SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setNamespaceAware(true);
spf.setValidating(true);
SAXParser saxParser = spf.newSAXParser();
					

Document Object Model APIs

You use the javax.xml.parsers.DocumentBuilderFactory class to get a DocumentBuilder instance, and you use that instance to produce a Document object that conforms to the DOM specification. The builder you get, in fact, is determined by the system property javax.xml.parsers.DocumentBuilderFactory, which selects the factory implementation that is used to produce the builder. (The platform's default value can be overridden from the command line.)

Figure 10.2. DOM APIs

DOM APIs


You can also use the DocumentBuilder.newDocument() method to create an empty Document that implements the org.w3c.dom.Document interface. Alternatively, you can use one of the builder's parse methods to create a Document from existing XML data.

When to Use DOM

The Document Object Model standard is, above all, designed for documents (for example, articles and books). In addition, the JAXP 1.4.2 implementation supports XML Schema, something that can be an important consideration for any given application.

On the other hand, if you are dealing with simple data structures and if XML Schema is not a big part of your plans, then you may find that one of the more object-oriented standards, such as JDOM or dom4j, is better suited for your purpose.

From the start, DOM was intended to be language-neutral. Because it was designed for use with languages such as C and Perl, DOM does not take advantage of Java's object-oriented features. That fact, in addition to the distinction between documents and data, also helps to account for the ways in which processing a DOM differs from processing a JDOM or dom4j structure.

Parsing an XML File Using DOM

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

dbf.setNamespaceAware(true);
dbf.setValidating(true;

DocumentBuilder db = dbf.newDocumentBuilder();

Document doc = db.parse(new File(filename));
					

Extensible Stylesheet Language Transformations APIs

A TransformerFactory object is instantiated and used to create a Transformer. The source object is the input to the transformation process. A source object can be created from a SAX reader, from a DOM, or from an input stream.

Figure 10.3. XSLT APIs

XSLT APIs


Similarly, the result object is the result of the transformation process. That object can be a SAX event handler, a DOM, or an output stream.

When the transformer is created, it can be created from a set of transformation instructions, in which case the specified transformations are carried out. If it is created without any specific instructions, then the transformer object simply copies the source to the result.

Streaming API for XML APIs

StAX is the latest API in the JAXP family, and provides an alternative to SAX, DOM, TrAX, and DOM for developers looking to do high-performance stream filtering, processing, and modification, particularly with low memory and limited extensibility requirements.

To summarize, StAX provides a standard, bidirectional pull parser interface for streaming XML processing, offering a simpler programming model than SAX and more efficient memory management than DOM. StAX enables developers to parse and modify XML streams as events, and to extend XML information models to allow application-specific additions.

Comparing StAX to Other JAXP APIs

As an API in the JAXP family, StAX can be compared, among other APIs, to SAX, TrAX, and JDOM. Of the latter two, StAX is not as powerful or flexible as TrAX or JDOM, but neither does it require as much memory or processor load to be useful, and StAX can, in many cases, outperform the DOM-based APIs.

With this in mind, the closest comparisons can be made between StAX and SAX, and it is here that StAX offers features that are beneficial in many cases; some of these include:

  • StAX-enabled clients are generally easier to code than SAX clients. While it can be argued that SAX parsers are marginally easier to write, StAX parser code can be smaller and the code necessary for the client to interact with the parser simpler.

  • StAX is a bidirectional API, meaning that it can both read and write XML documents. SAX is read only, so another API is needed if you want to write XML documents.

  • SAX is a push API, whereas StAX is pull.

Table 10.1.  XML Parser API Feature Summary

FeatureStAXSAXDOMTrAX

API Type

Pull, streaming

Push, streaming

In memory tree

XSLT Rule

Ease of Use

High

Medium

High

Medium

XPath Capability

No

No

Yes

Yes

CPU and Memory Efficiency

Good

Good

Varies

Varies

Forward Only

Yes

Yes

No

No

Read XML

Yes

Yes

Yes

Yes

Write XML

Yes

No

Yes

Yes

Create, Read, Update, Delete

No

No

Yes

No


The StAX API exposes methods for iterative, event-based processing of XML documents. XML documents are treated as a filtered series of events, and infoset states can be stored in a procedural fashion. Moreover, unlike SAX, the StAX API is bidirectional, enabling both reading and writing of XML documents.

The StAX API is really two distinct API sets: a cursor API and an iterator API.

Cursor API

As the name implies, the StAX cursor API represents a cursor with which you can walk an XML document from beginning to end. This cursor can point to one thing at a time, and always moves forward, never backward, usually one infoset element at a time.

The two main cursor interfaces are XMLStreamReader and XMLStreamWriter. XMLStreamReader includes accessor methods for all possible information retrievable from the XML Information model, including document encoding, element names, attributes, namespaces, text nodes, start tags, comments, processing instructions, document boundaries, and so forth; for example:

public interface XMLStreamReader {
    public int next() throws XMLStreamException;
    public boolean hasNext() throws XMLStreamException;
    public String getText();
    public String getLocalName();
    public String getNamespaceURI();
    // ... other methods not shown
}
					

You can call methods on XMLStreamReader, such as getText and getName, to get data at the current cursor location. XMLStreamWriter provides methods that correspond to StartElement and EndElement event types; for example:

public interface XMLStreamWriter {
    public void writeStartElement(String localName) throws XMLStreamException;
    public void writeEndElement() throws XMLStreamException;
    public void writeCharacters(String text) throws XMLStreamException;
    // ... other methods not shown
}
					

The cursor API mirrors SAX in many ways. For example, methods are available for directly accessing string and character information, and integer indexes can be used to access attribute and namespace information. As with SAX, the cursor API methods return XML information as strings, which minimizes object allocation requirements.

Iterator API

The StAX iterator API represents an XML document stream as a set of discrete event objects. These events are pulled by the application and provided by the parser in the order in which they are read in the source XML document.

The base iterator interface is called XMLEvent, and there are subinterfaces for each event type. The primary parser interface for reading iterator events is XMLEventReader, and the primary interface for writing iterator events is XMLEventWriter. The XMLEventReader interface contains five methods, the most important of which is nextEvent, which returns the next event in an XML stream. XMLEventReader implements java.util.Iterator, which means that returns from XMLEventReader can be cached or passed into routines that can work with the standard Java Iterator; for example:

public interface XMLEventReader extends Iterator {
    public XMLEvent nextEvent() throws XMLStreamException;
    public boolean hasNext();
    public XMLEvent peek() throws XMLStreamException;
    ...
}
					

Similarly, on the output side of the iterator API, you have:

public interface XMLEventWriter {
    public void flush() throws XMLStreamException;
    public void close() throws XMLStreamException;
    public void add(XMLEvent e) throws XMLStreamException;
    public void add(Attribute attribute) throws XMLStreamException;
    ...
}
					

Comparing Cursor and Iterator APIs

Before choosing between the cursor and iterator APIs, you should note a few things that you can do with the iterator API that you cannot do with the cursor API:

  • Objects created from the XMLEvent subclasses are immutable, and can be used in arrays, lists, and maps, and can be passed through your applications even after the parser has moved on to subsequent events.

  • You can create subtypes of XMLEvent that are either completely new information items or extensions of existing items but with additional methods.

  • You can add and remove events from an XML event stream in much simpler ways than with the cursor API.

Similarly, keep some general recommendations in mind when making your choice:

  • If you are programming for a particularly memory-constrained environment, like Java ME, you can make smaller, more efficient code with the cursor API.

  • If performance is your highest priority — for example, when creating low-level libraries or infrastructure — the cursor API is more efficient.

  • If you want to create XML processing pipelines, use the iterator API.

  • If you want to modify the event stream, use the iterator API.

  • If you want your application to be able to handle pluggable processing of the event stream, use the iterator API.

  • In general, if you do not have a strong preference one way or the other, using the iterator API is recommended because it is more flexible and extensible, thereby "future-proofing" your applications.

Professional hosting         'Oracle Certified Expert Web Services Developer 6' Quiz     Free SCDJWS 5.0 Guide