only after it has parsed the whole document. However, once the document has been created in memory, it can be navigated and changed. A DOM parser would be a tree-based parser.

Event-based parsing

These parsers process the document as it encounters the tags of the document. It is a data-centric view of the XML. Whenever an element or tag is encountered, it (or its contents) can be processed. However, it cannot backtrack once the tag has been passed. The parser returns the element, its attributes and the contents. The event-based parser never attempts to build a structure of the data, and therefore, its memory requirements are less. It comes in useful, when one is looking in the document only for certain elements. A SAX parser would be an example of a event-based parser.

The most popular XML parsers on the market is the Apache XML Project’s Xerces. The parsers provides XML parsing and generation, and are fully-validating parsers available for both Java and C++, implementing the W3C XML and DOM (Level 1 and 2) standards, as well as SAX (Level 2) standard. The parsers also support for XML Schema. This parser has been incorporated into the IBM set of products (WebSphere, Application Studio and DB2).

Another parser is IBM’s XML Parser for Java (XML4J and XML4C). The XML4J is a validating XML parser written in 100% pure Java, whereas XML4C is a validating XML parser written for C++. It provides classes for parsing, generating, manipulating, and validating XML documents. Both parsers are support the XML

1.0Recommendation and associated standards (DOM 1.0, SAX 1.0, DOM 2.0). XML4J contains implementations of the DOM Level 2, the SAX Level 2 implementations, and parts of W3C schema, but these are experimental at this stage. XML4C is supported on most operating systems including AIX and Linux.

Both parsers are open source and have the same code base, where the XML4J parser has the latest code enhancements, while Xerces has been through production level testing.

2.2 DTD and XML Schema

DTDs and XML Schema are both used to describe structured information, however, in the last two years acceptance of XML Schema has gained momentum. Both DTDs and schemas are building blocks for XML documents and consists of elements, tags, attributes, and entities

XML Schemas evolved to overcome limitations in DTDs. W3C has three documents published, the latest update being in May 2001:

Chapter 2. Technologies in XML 19

Page 35
Image 35
IBM Version 5 manual DTD and XML Schema, Event-based parsing