2.1 XML Processor (parser)

An XML Processor can either be a validating or non-validating parser. Both kinds of parsers report violations on an XML document. According to the XML 1.0 specification:

http://www.w3.org/ TR/REC-xml#proc-types

Validating processors must, at user option, report violations of the constraints expressed by the declarations in the DTD, and failures to fulfill the validity constraints given in this specification. To accomplish this, validating XML Processors must read and process the entire DTD and all external parsed entities referenced in the document.”

Non-validating processors are required to check only the document entity, including the entire internal DTD subset, for well-formedness. While they are not required to check the document for validity, they are required to process all the declarations they read in the internal DTD subset and in any parameter entity that they read. This is done up to the first reference to a parameter entity that they do not read; that is to say, they must use the information in those declarations to normalize attribute values, include the replacement text of internal entities, and supply default attribute values. Except when standalone="yes", they must not process entity declarations or attribute-list declarations encountered after a reference to a parameter entity that is not read, since the entity may have contained overriding declarations.

From the definition above, a validating parser must read the entire DTD and check the XML document against it. A non-validatiing parser may not need the DTD must still check the XML against default values for attributes. Both parsers check for the well-formedness of the document.

Most parsers can be run in validating and non-validating mode. Validating of XML documents is crucial in the development and testing stage of the software development life cycle. However, running validation has a performance cost. In production, when the reliability of the data of a system is already established, and they are expected to have complex DTDs and XML Schemas, the validating can be turned off. Some parsers are non-validating by default.

Parsers can be of two types: tree-based parsing or event-based parsing. These will be further discussed in Chapter 3, however, here is an overview:

Tree-based parsing

In tree-based parsing, the parsers attempts to create an hierarchal structure for the entire document. For a hugh document, this will be extremely memory-sensitive. The parser will make the elements and attributes available

18 The XML Files: Development of XML/XSL Applications Using WebSphere Studio

Page 34
Image 34
IBM Version 5 manual XML Processor parser, Tree-based parsing