From 5880b059e9a156336daf32a73bed72def6ba90f2 Mon Sep 17 00:00:00 2001 From: marha Date: Fri, 18 Oct 2013 13:24:37 +0200 Subject: Added expat-2.1.0 --- expat/doc/expat.png | Bin 0 -> 1027 bytes expat/doc/reference.html | 2390 +++++++++++++++++++++++++++++++++++++++++++ expat/doc/style.css | 101 ++ expat/doc/valid-xhtml10.png | Bin 0 -> 2368 bytes expat/doc/xmlwf.1 | 251 +++++ expat/doc/xmlwf.sgml | 468 +++++++++ 6 files changed, 3210 insertions(+) create mode 100644 expat/doc/expat.png create mode 100644 expat/doc/reference.html create mode 100644 expat/doc/style.css create mode 100644 expat/doc/valid-xhtml10.png create mode 100644 expat/doc/xmlwf.1 create mode 100644 expat/doc/xmlwf.sgml (limited to 'expat/doc') diff --git a/expat/doc/expat.png b/expat/doc/expat.png new file mode 100644 index 000000000..5bc0726cf Binary files /dev/null and b/expat/doc/expat.png differ diff --git a/expat/doc/reference.html b/expat/doc/reference.html new file mode 100644 index 000000000..8811a3397 --- /dev/null +++ b/expat/doc/reference.html @@ -0,0 +1,2390 @@ + + + + + + Expat XML Parser + + + + + + + + + + + + + + +
(Expat logo)
Release 2.0.1
+
+ +

Expat is a library, written in C, for parsing XML documents. It's +the underlying XML parser for the open source Mozilla project, Perl's +XML::Parser, Python's xml.parsers.expat, and +other open-source XML parsers.

+ +

This library is the creation of James Clark, who's also given us +groff (an nroff look-alike), Jade (an implemention of ISO's DSSSL +stylesheet language for SGML), XP (a Java XML parser package), XT (a +Java XSL engine). James was also the technical lead on the XML +Working Group at W3C that produced the XML specification.

+ +

This is free software, licensed under the MIT/X Consortium license. You may download it +from the Expat home page. +

+ +

The bulk of this document was originally commissioned as an article +by XML.com. They graciously allowed +Clark Cooper to retain copyright and to distribute it with Expat. +This version has been substantially extended to include documentation +on features which have been added since the original article was +published, and additional information on using the original +interface.

+ +
+

Table of Contents

+ + +
+

Overview

+ +

Expat is a stream-oriented parser. You register callback (or +handler) functions with the parser and then start feeding it the +document. As the parser recognizes parts of the document, it will +call the appropriate handler for that part (if you've registered one.) +The document is fed to the parser in pieces, so you can start parsing +before you have all the document. This also allows you to parse really +huge documents that won't fit into memory.

+ +

Expat can be intimidating due to the many kinds of handlers and +options you can set. But you only need to learn four functions in +order to do 90% of what you'll want to do with it:

+ +
+ +
XML_ParserCreate
+
Create a new parser object.
+ +
XML_SetElementHandler
+
Set handlers for start and end tags.
+ +
XML_SetCharacterDataHandler
+
Set handler for text.
+ +
XML_Parse
+
Pass a buffer full of document to the parser
+
+ +

These functions and others are described in the reference part of this document. The reference +section also describes in detail the parameters passed to the +different types of handlers.

+ +

Let's look at a very simple example program that only uses 3 of the +above functions (it doesn't need to set a character handler.) The +program outline.c prints an +element outline, indenting child elements to distinguish them from the +parent element that contains them. The start handler does all the +work. It prints two indenting spaces for every level of ancestor +elements, then it prints the element and attribute +information. Finally it increments the global Depth +variable.

+ +
+int Depth;
+
+void XMLCALL
+start(void *data, const char *el, const char **attr) {
+  int i;
+
+  for (i = 0; i < Depth; i++)
+    printf("  ");
+
+  printf("%s", el);
+
+  for (i = 0; attr[i]; i += 2) {
+    printf(" %s='%s'", attr[i], attr[i + 1]);
+  }
+
+  printf("\n");
+  Depth++;
+}  /* End of start handler */
+
+ +

The end tag simply does the bookkeeping work of decrementing +Depth.

+
+void XMLCALL
+end(void *data, const char *el) {
+  Depth--;
+}  /* End of end handler */
+
+ +

Note the XMLCALL annotation used for the callbacks. +This is used to ensure that the Expat and the callbacks are using the +same calling convention in case the compiler options used for Expat +itself and the client code are different. Expat tries not to care +what the default calling convention is, though it may require that it +be compiled with a default convention of "cdecl" on some platforms. +For code which uses Expat, however, the calling convention is +specified by the XMLCALL annotation on most platforms; +callbacks should be defined using this annotation.

+ +

The XMLCALL annotation was added in Expat 1.95.7, but +existing working Expat applications don't need to add it (since they +are already using the "cdecl" calling convention, or they wouldn't be +working). The annotation is only needed if the default calling +convention may be something other than "cdecl". To use the annotation +safely with older versions of Expat, you can conditionally define it +after including Expat's header file:

+ +
+#include <expat.h>
+
+#ifndef XMLCALL
+#if defined(_MSC_EXTENSIONS) && !defined(__BEOS__) && !defined(__CYGWIN__)
+#define XMLCALL __cdecl
+#elif defined(__GNUC__)
+#define XMLCALL __attribute__((cdecl))
+#else
+#define XMLCALL
+#endif
+#endif
+
+ +

After creating the parser, the main program just has the job of +shoveling the document to the parser so that it can do its work.

+ +
+

Building and Installing Expat

+ +

The Expat distribution comes as a compressed (with GNU gzip) tar +file. You may download the latest version from Source Forge. After +unpacking this, cd into the directory. Then follow either the Win32 +directions or Unix directions below.

+ +

Building under Win32

+ +

If you're using the GNU compiler under cygwin, follow the Unix +directions in the next section. Otherwise if you have Microsoft's +Developer Studio installed, then from Windows Explorer double-click on +"expat.dsp" in the lib directory and build and install in the usual +manner.

+ +

Alternatively, you may download the Win32 binary package that +contains the "expat.h" include file and a pre-built DLL.

+ +

Building under Unix (or GNU)

+ +

First you'll need to run the configure shell script in order to +configure the Makefiles and headers for your system.

+ +

If you're happy with all the defaults that configure picks for you, +and you have permission on your system to install into /usr/local, you +can install Expat with this sequence of commands:

+ +
+./configure
+make
+make install
+
+ +

There are some options that you can provide to this script, but the +only one we'll mention here is the --prefix option. You +can find out all the options available by running configure with just +the --help option.

+ +

By default, the configure script sets things up so that the library +gets installed in /usr/local/lib and the associated +header file in /usr/local/include. But if you were to +give the option, --prefix=/home/me/mystuff, then the +library and header would get installed in +/home/me/mystuff/lib and +/home/me/mystuff/include respectively.

+ +

Configuring Expat Using the Pre-Processor

+ +

Expat's feature set can be configured using a small number of +pre-processor definitions. The definition of this symbols does not +affect the set of entry points for Expat, only the behavior of the API +and the definition of character types in the case of +XML_UNICODE_WCHAR_T. The symbols are:

+ +
+
XML_DTD
+
Include support for using and reporting DTD-based content. If +this is defined, default attribute values from an external DTD subset +are reported and attribute value normalization occurs based on the +type of attributes defined in the external subset. Without +this, Expat has a smaller memory footprint and can be faster, but will +not load external entities or process conditional sections. This does +not affect the set of functions available in the API.
+ +
XML_NS
+
When defined, support for the Namespaces in XML +specification is included.
+ +
XML_UNICODE
+
When defined, character data reported to the application is +encoded in UTF-16 using wide characters of the type +XML_Char. This is implied if +XML_UNICODE_WCHAR_T is defined.
+ +
XML_UNICODE_WCHAR_T
+
If defined, causes the XML_Char character type to be +defined using the wchar_t type; otherwise, unsigned +short is used. Defining this implies +XML_UNICODE.
+ +
XML_LARGE_SIZE
+
If defined, causes the XML_Size and XML_Index +integer types to be at least 64 bits in size. This is intended to support +processing of very large input streams, where the return values of +XML_GetCurrentByteIndex, +XML_GetCurrentLineNumber and +XML_GetCurrentColumnNumber +could overflow. It may not be supported by all compilers, and is turned +off by default.
+ +
XML_CONTEXT_BYTES
+
The number of input bytes of markup context which the parser will +ensure are available for reporting via XML_GetInputContext. This is +normally set to 1024, and must be set to a positive interger. If this +is not defined, the input context will not be available and XML_GetInputContext will +always report NULL. Without this, Expat has a smaller memory +footprint and can be faster.
+ +
XML_STATIC
+
On Windows, this should be set if Expat is going to be linked +statically with the code that calls it; this is required to get all +the right MSVC magic annotations correct. This is ignored on other +platforms.
+ +
XML_ATTR_INFO
+
If defined, makes the the additional function XML_GetAttributeInfo available +for reporting attribute byte offsets.
+
+ +
+

Using Expat

+ +

Compiling and Linking Against Expat

+ +

Unless you installed Expat in a location not expected by your +compiler and linker, all you have to do to use Expat in your programs +is to include the Expat header (#include <expat.h>) +in your files that make calls to it and to tell the linker that it +needs to link against the Expat library. On Unix systems, this would +usually be done with the -lexpat argument. Otherwise, +you'll need to tell the compiler where to look for the Expat header +and the linker where to find the Expat library. You may also need to +take steps to tell the operating system where to find this library at +run time.

+ +

On a Unix-based system, here's what a Makefile might look like when +Expat is installed in a standard location:

+ +
+CC=cc
+LDFLAGS=
+LIBS= -lexpat
+xmlapp: xmlapp.o
+        $(CC) $(LDFLAGS) -o xmlapp xmlapp.o $(LIBS)
+
+ +

If you installed Expat in, say, /home/me/mystuff, then +the Makefile would look like this:

+ +
+CC=cc
+CFLAGS= -I/home/me/mystuff/include
+LDFLAGS=
+LIBS= -L/home/me/mystuff/lib -lexpat
+xmlapp: xmlapp.o
+        $(CC) $(LDFLAGS) -o xmlapp xmlapp.o $(LIBS)
+
+ +

You'd also have to set the environment variable +LD_LIBRARY_PATH to /home/me/mystuff/lib (or +to ${LD_LIBRARY_PATH}:/home/me/mystuff/lib if +LD_LIBRARY_PATH already has some directories in it) in order to run +your application.

+ +

Expat Basics

+ +

As we saw in the example in the overview, the first step in parsing +an XML document with Expat is to create a parser object. There are three functions in the Expat API for creating a +parser object. However, only two of these (XML_ParserCreate and XML_ParserCreateNS) can be used for +constructing a parser for a top-level document. The object returned +by these functions is an opaque pointer (i.e. "expat.h" declares it as +void *) to data with further internal structure. In order to free the +memory associated with this object you must call XML_ParserFree. Note that if you have +provided any user data that gets stored in the +parser, then your application is responsible for freeing it prior to +calling XML_ParserFree.

+ +

The objects returned by the parser creation functions are good for +parsing only one XML document or external parsed entity. If your +application needs to parse many XML documents, then it needs to create +a parser object for each one. The best way to deal with this is to +create a higher level object that contains all the default +initialization you want for your parser objects.

+ +

Walking through a document hierarchy with a stream oriented parser +will require a good stack mechanism in order to keep track of current +context. For instance, to answer the simple question, "What element +does this text belong to?" requires a stack, since the parser may have +descended into other elements that are children of the current one and +has encountered this text on the way out.

+ +

The things you're likely to want to keep on a stack are the +currently opened element and it's attributes. You push this +information onto the stack in the start handler and you pop it off in +the end handler.

+ +

For some tasks, it is sufficient to just keep information on what +the depth of the stack is (or would be if you had one.) The outline +program shown above presents one example. Another such task would be +skipping over a complete element. When you see the start tag for the +element you want to skip, you set a skip flag and record the depth at +which the element started. When the end tag handler encounters the +same depth, the skipped element has ended and the flag may be +cleared. If you follow the convention that the root element starts at +1, then you can use the same variable for skip flag and skip +depth.

+ +
+void
+init_info(Parseinfo *info) {
+  info->skip = 0;
+  info->depth = 1;
+  /* Other initializations here */
+}  /* End of init_info */
+
+void XMLCALL
+rawstart(void *data, const char *el, const char **attr) {
+  Parseinfo *inf = (Parseinfo *) data;
+
+  if (! inf->skip) {
+    if (should_skip(inf, el, attr)) {
+      inf->skip = inf->depth;
+    }
+    else
+      start(inf, el, attr);     /* This does rest of start handling */
+  }
+
+  inf->depth++;
+}  /* End of rawstart */
+
+void XMLCALL
+rawend(void *data, const char *el) {
+  Parseinfo *inf = (Parseinfo *) data;
+
+  inf->depth--;
+
+  if (! inf->skip)
+    end(inf, el);              /* This does rest of end handling */
+
+  if (inf->skip == inf->depth)
+    inf->skip = 0;
+}  /* End rawend */
+
+ +

Notice in the above example the difference in how depth is +manipulated in the start and end handlers. The end tag handler should +be the mirror image of the start tag handler. This is necessary to +properly model containment. Since, in the start tag handler, we +incremented depth after the main body of start tag code, then +in the end handler, we need to manipulate it before the main +body. If we'd decided to increment it first thing in the start +handler, then we'd have had to decrement it last thing in the end +handler.

+ +

Communicating between handlers

+ +

In order to be able to pass information between different handlers +without using globals, you'll need to define a data structure to hold +the shared variables. You can then tell Expat (with the XML_SetUserData function) to pass a +pointer to this structure to the handlers. This is the first +argument received by most handlers. In the reference section, an argument to a callback function is named +userData and have type void * if the user +data is passed; it will have the type XML_Parser if the +parser itself is passed. When the parser is passed, the user data may +be retrieved using XML_GetUserData.

+ +

One common case where multiple calls to a single handler may need +to communicate using an application data structure is the case when +content passed to the character data handler (set by XML_SetCharacterDataHandler) needs to be accumulated. A +common first-time mistake with any of the event-oriented interfaces to +an XML parser is to expect all the text contained in an element to be +reported by a single call to the character data handler. Expat, like +many other XML parsers, reports such data as a sequence of calls; +there's no way to know when the end of the sequence is reached until a +different callback is made. A buffer referenced by the user data +structure proves both an effective and convenient place to accumulate +character data.

+ + + + +

XML Version

+ +

Expat is an XML 1.0 parser, and as such never complains based on +the value of the version pseudo-attribute in the XML +declaration, if present.

+ +

If an application needs to check the version number (to support +alternate processing), it should use the XML_SetXmlDeclHandler function to +set a handler that uses the information in the XML declaration to +determine what to do. This example shows how to check that only a +version number of "1.0" is accepted:

+ +
+static int wrong_version;
+static XML_Parser parser;
+
+static void XMLCALL
+xmldecl_handler(void            *userData,
+                const XML_Char  *version,
+                const XML_Char  *encoding,
+                int              standalone)
+{
+  static const XML_Char Version_1_0[] = {'1', '.', '0', 0};
+
+  int i;
+
+  for (i = 0; i < (sizeof(Version_1_0) / sizeof(Version_1_0[0])); ++i) {
+    if (version[i] != Version_1_0[i]) {
+      wrong_version = 1;
+      /* also clear all other handlers: */
+      XML_SetCharacterDataHandler(parser, NULL);
+      ...
+      return;
+    }
+  }
+  ...
+}
+
+ +

Namespace Processing

+ +

When the parser is created using the XML_ParserCreateNS, function, Expat +performs namespace processing. Under namespace processing, Expat +consumes xmlns and xmlns:... attributes, +which declare namespaces for the scope of the element in which they +occur. This means that your start handler will not see these +attributes. Your application can still be informed of these +declarations by setting namespace declaration handlers with XML_SetNamespaceDeclHandler.

+ +

Element type and attribute names that belong to a given namespace +are passed to the appropriate handler in expanded form. By default +this expanded form is a concatenation of the namespace URI, the +separator character (which is the 2nd argument to XML_ParserCreateNS), and the local +name (i.e. the part after the colon). Names with undeclared prefixes +are not well-formed when namespace processing is enabled, and will +trigger an error. Unprefixed attribute names are never expanded, +and unprefixed element names are only expanded when they are in the +scope of a default namespace.

+ +

However if XML_SetReturnNSTriplet has been called with a non-zero +do_nst parameter, then the expanded form for names with +an explicit prefix is a concatenation of: URI, separator, local name, +separator, prefix.

+ +

You can set handlers for the start of a namespace declaration and +for the end of a scope of a declaration with the XML_SetNamespaceDeclHandler +function. The StartNamespaceDeclHandler is called prior to the start +tag handler and the EndNamespaceDeclHandler is called after the +corresponding end tag that ends the namespace's scope. The namespace +start handler gets passed the prefix and URI for the namespace. For a +default namespace declaration (xmlns='...'), the prefix will be null. +The URI will be null for the case where the default namespace is being +unset. The namespace end handler just gets the prefix for the closing +scope.

+ +

These handlers are called for each declaration. So if, for +instance, a start tag had three namespace declarations, then the +StartNamespaceDeclHandler would be called three times before the start +tag handler is called, once for each declaration.

+ +

Character Encodings

+ +

While XML is based on Unicode, and every XML processor is required +to recognized UTF-8 and UTF-16 (1 and 2 byte encodings of Unicode), +other encodings may be declared in XML documents or entities. For the +main document, an XML declaration may contain an encoding +declaration:

+
+<?xml version="1.0" encoding="ISO-8859-2"?>
+
+ +

External parsed entities may begin with a text declaration, which +looks like an XML declaration with just an encoding declaration:

+
+<?xml encoding="Big5"?>
+
+ +

With Expat, you may also specify an encoding at the time of +creating a parser. This is useful when the encoding information may +come from a source outside the document itself (like a higher level +protocol.)

+ +

There are four built-in encodings +in Expat:

+ + +

Anything else discovered in an encoding declaration or in the +protocol encoding specified in the parser constructor, triggers a call +to the UnknownEncodingHandler. This handler gets passed +the encoding name and a pointer to an XML_Encoding data +structure. Your handler must fill in this structure and return +XML_STATUS_OK if it knows how to deal with the +encoding. Otherwise the handler should return +XML_STATUS_ERROR. The handler also gets passed a pointer +to an optional application data structure that you may indicate when +you set the handler.

+ +

Expat places restrictions on character encodings that it can +support by filling in the XML_Encoding structure. +include file:

+
    +
  1. Every ASCII character that can appear in a well-formed XML document +must be represented by a single byte, and that byte must correspond to +it's ASCII encoding (except for the characters $@\^'{}~)
  2. +
  3. Characters must be encoded in 4 bytes or less.
  4. +
  5. All characters encoded must have Unicode scalar values less than or +equal to 65535 (0xFFFF)This does not apply to the built-in support +for UTF-16 and UTF-8
  6. +
  7. No character may be encoded by more that one distinct sequence of +bytes
  8. +
+ +

XML_Encoding contains an array of integers that +correspond to the 1st byte of an encoding sequence. If the value in +the array for a byte is zero or positive, then the byte is a single +byte encoding that encodes the Unicode scalar value contained in the +array. A -1 in this array indicates a malformed byte. If the value is +-2, -3, or -4, then the byte is the beginning of a 2, 3, or 4 byte +sequence respectively. Multi-byte sequences are sent to the convert +function pointed at in the XML_Encoding structure. This +function should return the Unicode scalar value for the sequence or -1 +if the sequence is malformed.

+ +

One pitfall that novice Expat users are likely to fall into is that +although Expat may accept input in various encodings, the strings that +it passes to the handlers are always encoded in UTF-8 or UTF-16 +(depending on how Expat was compiled). Your application is responsible +for any translation of these strings into other encodings.

+ +

Handling External Entity References

+ +

Expat does not read or parse external entities directly. Note that +any external DTD is a special case of an external entity. If you've +set no ExternalEntityRefHandler, then external entity +references are silently ignored. Otherwise, it calls your handler with +the information needed to read and parse the external entity.

+ +

Your handler isn't actually responsible for parsing the entity, but +it is responsible for creating a subsidiary parser with XML_ExternalEntityParserCreate that will do the job. This +returns an instance of XML_Parser that has handlers and +other data structures initialized from the parent parser. You may then +use XML_Parse or XML_ParseBuffer calls against this +parser. Since external entities my refer to other external entities, +your handler should be prepared to be called recursively.

+ +

Parsing DTDs

+ +

In order to parse parameter entities, before starting the parse, +you must call XML_SetParamEntityParsing with one of the following +arguments:

+
+
XML_PARAM_ENTITY_PARSING_NEVER
+
Don't parse parameter entities or the external subset
+
XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE
+
Parse parameter entites and the external subset unless +standalone was set to "yes" in the XML declaration.
+
XML_PARAM_ENTITY_PARSING_ALWAYS
+
Always parse parameter entities and the external subset
+
+ +

In order to read an external DTD, you also have to set an external +entity reference handler as described above.

+ +

Temporarily Stopping Parsing

+ +

Expat 1.95.8 introduces a new feature: its now possible to stop +parsing temporarily from within a handler function, even if more data +has already been passed into the parser. Applications for this +include

+ + + +

To take advantage of this feature, the main parsing loop of an +application needs to support this specifically. It cannot be +supported with a parsing loop compatible with Expat 1.95.7 or +earlier (though existing loops will continue to work without +supporting the stop/resume feature).

+ +

An application that uses this feature for a single parser will have +the rough structure (in pseudo-code):

+ +
+fd = open_input()
+p = create_parser()
+
+if parse_xml(p, fd) {
+  /* suspended */
+
+  int suspended = 1;
+
+  while (suspended) {
+    do_something_else()
+    if ready_to_resume() {
+      suspended = continue_parsing(p, fd);
+    }
+  }
+}
+
+ +

An application that may resume any of several parsers based on +input (either from the XML being parsed or some other source) will +certainly have more interesting control structures.

+ +

This C function could be used for the parse_xml +function mentioned in the pseudo-code above:

+ +
+#define BUFF_SIZE 10240
+
+/* Parse a document from the open file descriptor 'fd' until the parse
+   is complete (the document has been completely parsed, or there's
+   been an error), or the parse is stopped.  Return non-zero when
+   the parse is merely suspended.
+*/
+int
+parse_xml(XML_Parser p, int fd)
+{
+  for (;;) {
+    int last_chunk;
+    int bytes_read;
+    enum XML_Status status;
+
+    void *buff = XML_GetBuffer(p, BUFF_SIZE);
+    if (buff == NULL) {
+      /* handle error... */
+      return 0;
+    }
+    bytes_read = read(fd, buff, BUFF_SIZE);
+    if (bytes_read < 0) {
+      /* handle error... */
+      return 0;
+    }
+    status = XML_ParseBuffer(p, bytes_read, bytes_read == 0);
+    switch (status) {
+      case XML_STATUS_ERROR:
+        /* handle error... */
+        return 0;
+      case XML_STATUS_SUSPENDED:
+        return 1;
+    }
+    if (bytes_read == 0)
+      return 0;
+  }
+}
+
+ +

The corresponding continue_parsing function is +somewhat simpler, since it only need deal with the return code from +XML_ResumeParser; it can +delegate the input handling to the parse_xml +function:

+ +
+/* Continue parsing a document which had been suspended.  The 'p' and
+   'fd' arguments are the same as passed to parse_xml().  Return
+   non-zero when the parse is suspended.
+*/
+int
+continue_parsing(XML_Parser p, int fd)
+{
+  enum XML_Status status = XML_ResumeParser(p);
+  switch (status) {
+    case XML_STATUS_ERROR:
+      /* handle error... */
+      return 0;
+    case XML_ERROR_NOT_SUSPENDED:
+      /* handle error... */
+      return 0;.
+    case XML_STATUS_SUSPENDED:
+      return 1;
+  }
+  return parse_xml(p, fd);
+}
+
+ +

Now that we've seen what a mess the top-level parsing loop can +become, what have we gained? Very simply, we can now use the XML_StopParser function to stop +parsing, without having to go to great lengths to avoid additional +processing that we're expecting to ignore. As a bonus, we get to stop +parsing temporarily, and come back to it when we're +ready.

+ +

To stop parsing from a handler function, use the XML_StopParser function. This function +takes two arguments; the parser being stopped and a flag indicating +whether the parse can be resumed in the future.

+ + + + +
+ + +

Expat Reference

+ +

Parser Creation

+ +
+XML_Parser XMLCALL
+XML_ParserCreate(const XML_Char *encoding);
+
+
+Construct a new parser. If encoding is non-null, it specifies a +character encoding to use for the document. This overrides the document +encoding declaration. There are four built-in encodings: +
    +
  • US-ASCII
  • +
  • UTF-8
  • +
  • UTF-16
  • +
  • ISO-8859-1
  • +
+Any other value will invoke a call to the UnknownEncodingHandler. +
+ +
+XML_Parser XMLCALL
+XML_ParserCreateNS(const XML_Char *encoding,
+                   XML_Char sep);
+
+
+Constructs a new parser that has namespace processing in effect. Namespace +expanded element names and attribute names are returned as a concatenation +of the namespace URI, sep, and the local part of the name. This +means that you should pick a character for sep that can't be part +of an URI. Since Expat does not check namespace URIs for conformance, the +only safe choice for a namespace separator is a character that is illegal +in XML. For instance, '\xFF' is not legal in UTF-8, and +'\xFFFF' is not legal in UTF-16. There is a special case when +sep is the null character '\0': the namespace URI and +the local part will be concatenated without any separator - this is intended +to support RDF processors. It is a programming error to use the null separator +with namespace triplets.
+ +
+XML_Parser XMLCALL
+XML_ParserCreate_MM(const XML_Char *encoding,
+                    const XML_Memory_Handling_Suite *ms,
+		    const XML_Char *sep);
+
+
+typedef struct {
+  void *(XMLCALL *malloc_fcn)(size_t size);
+  void *(XMLCALL *realloc_fcn)(void *ptr, size_t size);
+  void (XMLCALL *free_fcn)(void *ptr);
+} XML_Memory_Handling_Suite;
+
+
+

Construct a new parser using the suite of memory handling functions +specified in ms. If ms is NULL, then use the +standard set of memory management functions. If sep is +non NULL, then namespace processing is enabled in the created parser +and the character pointed at by sep is used as the separator between +the namespace URI and the local part of the name.

+
+ +
+XML_Parser XMLCALL
+XML_ExternalEntityParserCreate(XML_Parser p,
+                               const XML_Char *context,
+                               const XML_Char *encoding);
+
+
+Construct a new XML_Parser object for parsing an external +general entity. Context is the context argument passed in a call to a +ExternalEntityRefHandler. Other state information such as handlers, +user data, namespace processing is inherited from the parser passed as +the 1st argument. So you shouldn't need to call any of the behavior +changing functions on this parser (unless you want it to act +differently than the parent parser). +
+ +
+void XMLCALL
+XML_ParserFree(XML_Parser p);
+
+
+Free memory used by the parser. Your application is responsible for +freeing any memory associated with user data. +
+ +
+XML_Bool XMLCALL
+XML_ParserReset(XML_Parser p,
+                const XML_Char *encoding);
+
+
+Clean up the memory structures maintained by the parser so that it may +be used again. After this has been called, parser is +ready to start parsing a new document. All handlers are cleared from +the parser, except for the unknownEncodingHandler. The parser's external +state is re-initialized except for the values of ns and ns_triplets. +This function may not be used on a parser created using XML_ExternalEntityParserCreate; it will return XML_FALSE in that case. Returns +XML_TRUE on success. Your application is responsible for +dealing with any memory associated with user data. +
+ +

Parsing

+ +

To state the obvious: the three parsing functions XML_Parse, +XML_ParseBuffer and +XML_GetBuffer must not be called from within a handler +unless they operate on a separate parser instance, that is, one that +did not call the handler. For example, it is OK to call the parsing +functions from within an XML_ExternalEntityRefHandler, +if they apply to the parser created by +XML_ExternalEntityParserCreate.

+ +

Note: the len argument passed to these functions +should be considerably less than the maximum value for an integer, +as it could create an integer overflow situation if the added +lengths of a buffer and the unprocessed portion of the previous buffer +exceed the maximum integer value. Input data at the end of a buffer +will remain unprocessed if it is part of an XML token for which the +end is not part of that buffer.

+ +
+enum XML_Status XMLCALL
+XML_Parse(XML_Parser p,
+          const char *s,
+          int len,
+          int isFinal);
+
+
+enum XML_Status {
+  XML_STATUS_ERROR = 0,
+  XML_STATUS_OK = 1
+};
+
+
+Parse some more of the document. The string s is a buffer +containing part (or perhaps all) of the document. The number of bytes of s +that are part of the document is indicated by len. This means +that s doesn't have to be null terminated. It also means that +if len is larger than the number of bytes in the block of +memory that s points at, then a memory fault is likely. The +isFinal parameter informs the parser that this is the last +piece of the document. Frequently, the last piece is empty (i.e. +len is zero.) +If a parse error occurred, it returns XML_STATUS_ERROR. +Otherwise it returns XML_STATUS_OK value. +
+ +
+enum XML_Status XMLCALL
+XML_ParseBuffer(XML_Parser p,
+                int len,
+                int isFinal);
+
+
+This is just like XML_Parse, +except in this case Expat provides the buffer. By obtaining the +buffer from Expat with the XML_GetBuffer function, the application can avoid double +copying of the input. +
+ +
+void * XMLCALL
+XML_GetBuffer(XML_Parser p,
+              int len);
+
+
+Obtain a buffer of size len to read a piece of the document +into. A NULL value is returned if Expat can't allocate enough memory for +this buffer. This has to be called prior to every call to +XML_ParseBuffer. A +typical use would look like this: + +
+for (;;) {
+  int bytes_read;
+  void *buff = XML_GetBuffer(p, BUFF_SIZE);
+  if (buff == NULL) {
+    /* handle error */
+  }
+
+  bytes_read = read(docfd, buff, BUFF_SIZE);
+  if (bytes_read < 0) {
+    /* handle error */
+  }
+
+  if (! XML_ParseBuffer(p, bytes_read, bytes_read == 0)) {
+    /* handle parse error */
+  }
+
+  if (bytes_read == 0)
+    break;
+}
+
+
+ +
+enum XML_Status XMLCALL
+XML_StopParser(XML_Parser p,
+               XML_Bool resumable);
+
+
+ +

Stops parsing, causing XML_Parse or XML_ParseBuffer to return. Must be called from within a +call-back handler, except when aborting (when resumable +is XML_FALSE) an already suspended parser. Some +call-backs may still follow because they would otherwise get +lost, including +

    +
  • the end element handler for empty elements when stopped in the + start element handler,
  • +
  • the end namespace declaration handler when stopped in the end + element handler,
  • +
  • the character data handler when stopped in the character data handler + while making multiple call-backs on a contiguous chunk of characters,
  • +
+and possibly others.

+ +

This can be called from most handlers, including DTD related +call-backs, except when parsing an external parameter entity and +resumable is XML_TRUE. Returns +XML_STATUS_OK when successful, +XML_STATUS_ERROR otherwise. The possible error codes +are:

+
+
XML_ERROR_SUSPENDED
+
when suspending an already suspended parser.
+
XML_ERROR_FINISHED
+
when the parser has already finished.
+
XML_ERROR_SUSPEND_PE
+
when suspending while parsing an external PE.
+
+ +

Since the stop/resume feature requires application support in the +outer parsing loop, it is an error to call this function for a parser +not being handled appropriately; see Temporarily Stopping Parsing for more information.

+ +

When resumable is XML_TRUE then parsing +is suspended, that is, XML_Parse and XML_ParseBuffer return XML_STATUS_SUSPENDED. +Otherwise, parsing is aborted, that is, XML_Parse and XML_ParseBuffer return +XML_STATUS_ERROR with error code +XML_ERROR_ABORTED.

+ +

Note: +This will be applied to the current parser instance only, that is, if +there is a parent parser then it will continue parsing when the +external entity reference handler returns. It is up to the +implementation of that handler to call XML_StopParser on the parent parser +(recursively), if one wants to stop parsing altogether.

+ +

When suspended, parsing can be resumed by calling XML_ResumeParser.

+ +

New in Expat 1.95.8.

+
+ +
+enum XML_Status XMLCALL
+XML_ResumeParser(XML_Parser p);
+
+
+

Resumes parsing after it has been suspended with XML_StopParser. Must not be called from +within a handler call-back. Returns same status codes as XML_Parse or XML_ParseBuffer. An additional error +code, XML_ERROR_NOT_SUSPENDED, will be returned if the +parser was not currently suspended.

+ +

Note: +This must be called on the most deeply nested child parser instance +first, and on its parent parser only after the child parser has +finished, to be applied recursively until the document entity's parser +is restarted. That is, the parent parser will not resume by itself +and it is up to the application to call XML_ResumeParser on it at the +appropriate moment.

+ +

New in Expat 1.95.8.

+
+ +
+void XMLCALL
+XML_GetParsingStatus(XML_Parser p,
+                     XML_ParsingStatus *status);
+
+
+enum XML_Parsing {
+  XML_INITIALIZED,
+  XML_PARSING,
+  XML_FINISHED,
+  XML_SUSPENDED
+};
+
+typedef struct {
+  enum XML_Parsing parsing;
+  XML_Bool finalBuffer;
+} XML_ParsingStatus;
+
+
+

Returns status of parser with respect to being initialized, +parsing, finished, or suspended, and whether the final buffer is being +processed. The status parameter must not be +NULL.

+ +

New in Expat 1.95.8.

+
+ + +

Handler Setting

+ +

Although handlers are typically set prior to parsing and left alone, an +application may choose to set or change the handler for a parsing event +while the parse is in progress. For instance, your application may choose +to ignore all text not descended from a para element. One +way it could do this is to set the character handler when a para start tag +is seen, and unset it for the corresponding end tag.

+ +

A handler may be unset by providing a NULL pointer to the +appropriate handler setter. None of the handler setting functions have +a return value.

+ +

Your handlers will be receiving strings in arrays of type +XML_Char. This type is conditionally defined in expat.h as +either char, wchar_t or unsigned short. +The former implies UTF-8 encoding, the latter two imply UTF-16 encoding. +Note that you'll receive them in this form independent of the original +encoding of the document.

+ +
+
+void XMLCALL
+XML_SetStartElementHandler(XML_Parser p,
+                           XML_StartElementHandler start);
+
+
+typedef void
+(XMLCALL *XML_StartElementHandler)(void *userData,
+                                   const XML_Char *name,
+                                   const XML_Char **atts);
+
+

Set handler for start (and empty) tags. Attributes are passed to the start +handler as a pointer to a vector of char pointers. Each attribute seen in +a start (or empty) tag occupies 2 consecutive places in this vector: the +attribute name followed by the attribute value. These pairs are terminated +by a null pointer.

+

Note that an empty tag generates a call to both start and end handlers +(in that order).

+
+ +
+
+void XMLCALL
+XML_SetEndElementHandler(XML_Parser p,
+                         XML_EndElementHandler);
+
+
+typedef void
+(XMLCALL *XML_EndElementHandler)(void *userData,
+                                 const XML_Char *name);
+
+

Set handler for end (and empty) tags. As noted above, an empty tag +generates a call to both start and end handlers.

+
+ +
+
+void XMLCALL
+XML_SetElementHandler(XML_Parser p,
+                      XML_StartElementHandler start,
+                      XML_EndElementHandler end);
+
+

Set handlers for start and end tags with one call.

+
+ +
+
+void XMLCALL
+XML_SetCharacterDataHandler(XML_Parser p,
+                            XML_CharacterDataHandler charhndl)
+
+
+typedef void
+(XMLCALL *XML_CharacterDataHandler)(void *userData,
+                                    const XML_Char *s,
+                                    int len);
+
+

Set a text handler. The string your handler receives +is NOT nul-terminated. You have to use the length argument +to deal with the end of the string. A single block of contiguous text +free of markup may still result in a sequence of calls to this handler. +In other words, if you're searching for a pattern in the text, it may +be split across calls to this handler. Note: Setting this handler to NULL +may NOT immediately terminate call-backs if the parser is currently +processing such a single block of contiguous markup-free text, as the parser +will continue calling back until the end of the block is reached.

+
+ +
+
+void XMLCALL
+XML_SetProcessingInstructionHandler(XML_Parser p,
+                                    XML_ProcessingInstructionHandler proc)
+
+
+typedef void
+(XMLCALL *XML_ProcessingInstructionHandler)(void *userData,
+                                            const XML_Char *target,
+                                            const XML_Char *data);
+
+
+

Set a handler for processing instructions. The target is the first word +in the processing instruction. The data is the rest of the characters in +it after skipping all whitespace after the initial word.

+
+ +
+
+void XMLCALL
+XML_SetCommentHandler(XML_Parser p,
+                      XML_CommentHandler cmnt)
+
+
+typedef void
+(XMLCALL *XML_CommentHandler)(void *userData,
+                              const XML_Char *data);
+
+

Set a handler for comments. The data is all text inside the comment +delimiters.

+
+ +
+
+void XMLCALL
+XML_SetStartCdataSectionHandler(XML_Parser p,
+                                XML_StartCdataSectionHandler start);
+
+
+typedef void
+(XMLCALL *XML_StartCdataSectionHandler)(void *userData);
+
+

Set a handler that gets called at the beginning of a CDATA section.

+
+ +
+
+void XMLCALL
+XML_SetEndCdataSectionHandler(XML_Parser p,
+                              XML_EndCdataSectionHandler end);
+
+
+typedef void
+(XMLCALL *XML_EndCdataSectionHandler)(void *userData);
+
+

Set a handler that gets called at the end of a CDATA section.

+
+ +
+
+void XMLCALL
+XML_SetCdataSectionHandler(XML_Parser p,
+                           XML_StartCdataSectionHandler start,
+                           XML_EndCdataSectionHandler end)
+
+

Sets both CDATA section handlers with one call.

+
+ +
+
+void XMLCALL
+XML_SetDefaultHandler(XML_Parser p,
+                      XML_DefaultHandler hndl)
+
+
+typedef void
+(XMLCALL *XML_DefaultHandler)(void *userData,
+                              const XML_Char *s,
+                              int len);
+
+ +

Sets a handler for any characters in the document which wouldn't +otherwise be handled. This includes both data for which no handlers +can be set (like some kinds of DTD declarations) and data which could +be reported but which currently has no handler set. The characters +are passed exactly as they were present in the XML document except +that they will be encoded in UTF-8 or UTF-16. Line boundaries are not +normalized. Note that a byte order mark character is not passed to the +default handler. There are no guarantees about how characters are +divided between calls to the default handler: for example, a comment +might be split between multiple calls. Setting the handler with +this call has the side effect of turning off expansion of references +to internally defined general entities. Instead these references are +passed to the default handler.

+ +

See also XML_DefaultCurrent.

+
+ +
+
+void XMLCALL
+XML_SetDefaultHandlerExpand(XML_Parser p,
+                            XML_DefaultHandler hndl)
+
+
+typedef void
+(XMLCALL *XML_DefaultHandler)(void *userData,
+                              const XML_Char *s,
+                              int len);
+
+

This sets a default handler, but doesn't inhibit the expansion of +internal entity references. The entity reference will not be passed +to the default handler.

+ +

See also XML_DefaultCurrent.

+
+ +
+
+void XMLCALL
+XML_SetExternalEntityRefHandler(XML_Parser p,
+                                XML_ExternalEntityRefHandler hndl)
+
+
+typedef int
+(XMLCALL *XML_ExternalEntityRefHandler)(XML_Parser p,
+                                        const XML_Char *context,
+                                        const XML_Char *base,
+                                        const XML_Char *systemId,
+                                        const XML_Char *publicId);
+
+

Set an external entity reference handler. This handler is also +called for processing an external DTD subset if parameter entity parsing +is in effect. (See +XML_SetParamEntityParsing.)

+ +

The context parameter specifies the parsing context in +the format expected by the context argument to XML_ExternalEntityParserCreate. code is +valid only until the handler returns, so if the referenced entity is +to be parsed later, it must be copied. context is NULL +only when the entity is a parameter entity, which is how one can +differentiate between general and parameter entities.

+ +

The base parameter is the base to use for relative +system identifiers. It is set by XML_SetBase and may be NULL. The +publicId parameter is the public id given in the entity +declaration and may be NULL. systemId is the system +identifier specified in the entity declaration and is never NULL.

+ +

There are a couple of ways in which this handler differs from +others. First, this handler returns a status indicator (an +integer). XML_STATUS_OK should be returned for successful +handling of the external entity reference. Returning +XML_STATUS_ERROR indicates failure, and causes the +calling parser to return an +XML_ERROR_EXTERNAL_ENTITY_HANDLING error.

+ +

Second, instead of having the user data as its first argument, it +receives the parser that encountered the entity reference. This, along +with the context parameter, may be used as arguments to a call to +XML_ExternalEntityParserCreate. Using the returned +parser, the body of the external entity can be recursively parsed.

+ +

Since this handler may be called recursively, it should not be saving +information into global or static variables.

+
+ +
+void XMLCALL
+XML_SetExternalEntityRefHandlerArg(XML_Parser p,
+                                   void *arg)
+
+
+

Set the argument passed to the ExternalEntityRefHandler. If +arg is not NULL, it is the new value passed to the +handler set using XML_SetExternalEntityRefHandler; if arg is +NULL, the argument passed to the handler function will be the parser +object itself.

+ +

Note: +The type of arg and the type of the first argument to the +ExternalEntityRefHandler do not match. This function takes a +void * to be passed to the handler, while the handler +accepts an XML_Parser. This is a historical accident, +but will not be corrected before Expat 2.0 (at the earliest) to avoid +causing compiler warnings for code that's known to work with this +API. It is the responsibility of the application code to know the +actual type of the argument passed to the handler and to manage it +properly.

+
+ +
+
+void XMLCALL
+XML_SetSkippedEntityHandler(XML_Parser p,
+                            XML_SkippedEntityHandler handler)
+
+
+typedef void
+(XMLCALL *XML_SkippedEntityHandler)(void *userData,
+                                    const XML_Char *entityName,
+                                    int is_parameter_entity);
+
+

Set a skipped entity handler. This is called in two situations:

+
    +
  1. An entity reference is encountered for which no declaration + has been read and this is not an error.
  2. +
  3. An internal entity reference is read, but not expanded, because + XML_SetDefaultHandler + has been called.
  4. +
+

The is_parameter_entity argument will be non-zero for +a parameter entity and zero for a general entity.

Note: skipped +parameter entities in declarations and skipped general entities in +attribute values cannot be reported, because the event would be out of +sync with the reporting of the declarations or attribute values

+
+ +
+
+void XMLCALL
+XML_SetUnknownEncodingHandler(XML_Parser p,
+                              XML_UnknownEncodingHandler enchandler,
+			      void *encodingHandlerData)
+
+
+typedef int
+(XMLCALL *XML_UnknownEncodingHandler)(void *encodingHandlerData,
+                                      const XML_Char *name,
+                                      XML_Encoding *info);
+
+typedef struct {
+  int map[256];
+  void *data;
+  int (XMLCALL *convert)(void *data, const char *s);
+  void (XMLCALL *release)(void *data);
+} XML_Encoding;
+
+

Set a handler to deal with encodings other than the built in set. This should be done before +XML_Parse or XML_ParseBuffer have been called on the +given parser.

If the handler knows how to deal with an encoding +with the given name, it should fill in the info data +structure and return XML_STATUS_OK. Otherwise it +should return XML_STATUS_ERROR. The handler will be called +at most once per parsed (external) entity. The optional application +data pointer encodingHandlerData will be passed back to +the handler.

+ +

The map array contains information for every possible possible leading +byte in a byte sequence. If the corresponding value is >= 0, then it's +a single byte sequence and the byte encodes that Unicode value. If the +value is -1, then that byte is invalid as the initial byte in a sequence. +If the value is -n, where n is an integer > 1, then n is the number of +bytes in the sequence and the actual conversion is accomplished by a +call to the function pointed at by convert. This function may return -1 +if the sequence itself is invalid. The convert pointer may be null if +there are only single byte codes. The data parameter passed to the convert +function is the data pointer from XML_Encoding. The +string s is NOT nul-terminated and points at the sequence of +bytes to be converted.

+ +

The function pointed at by release is called by the +parser when it is finished with the encoding. It may be NULL.

+
+ +
+
+void XMLCALL
+XML_SetStartNamespaceDeclHandler(XML_Parser p,
+			         XML_StartNamespaceDeclHandler start);
+
+
+typedef void
+(XMLCALL *XML_StartNamespaceDeclHandler)(void *userData,
+                                         const XML_Char *prefix,
+                                         const XML_Char *uri);
+
+

Set a handler to be called when a namespace is declared. Namespace +declarations occur inside start tags. But the namespace declaration start +handler is called before the start tag handler for each namespace declared +in that start tag.

+
+ +
+
+void XMLCALL
+XML_SetEndNamespaceDeclHandler(XML_Parser p,
+			       XML_EndNamespaceDeclHandler end);
+
+
+typedef void
+(XMLCALL *XML_EndNamespaceDeclHandler)(void *userData,
+                                       const XML_Char *prefix);
+
+

Set a handler to be called when leaving the scope of a namespace +declaration. This will be called, for each namespace declaration, +after the handler for the end tag of the element in which the +namespace was declared.

+
+ +
+
+void XMLCALL
+XML_SetNamespaceDeclHandler(XML_Parser p,
+                            XML_StartNamespaceDeclHandler start,
+                            XML_EndNamespaceDeclHandler end)
+
+

Sets both namespace declaration handlers with a single call.

+
+ +
+
+void XMLCALL
+XML_SetXmlDeclHandler(XML_Parser p,
+		      XML_XmlDeclHandler xmldecl);
+
+
+typedef void
+(XMLCALL *XML_XmlDeclHandler)(void            *userData,
+                              const XML_Char  *version,
+                              const XML_Char  *encoding,
+                              int             standalone);
+
+

Sets a handler that is called for XML declarations and also for +text declarations discovered in external entities. The way to +distinguish is that the version parameter will be NULL +for text declarations. The encoding parameter may be NULL +for an XML declaration. The standalone argument will +contain -1, 0, or 1 indicating respectively that there was no +standalone parameter in the declaration, that it was given as no, or +that it was given as yes.

+
+ +
+
+void XMLCALL
+XML_SetStartDoctypeDeclHandler(XML_Parser p,
+			       XML_StartDoctypeDeclHandler start);
+
+
+typedef void
+(XMLCALL *XML_StartDoctypeDeclHandler)(void           *userData,
+                                       const XML_Char *doctypeName,
+                                       const XML_Char *sysid,
+                                       const XML_Char *pubid,
+                                       int            has_internal_subset);
+
+

Set a handler that is called at the start of a DOCTYPE declaration, +before any external or internal subset is parsed. Both sysid +and pubid may be NULL. The has_internal_subset +will be non-zero if the DOCTYPE declaration has an internal subset.

+
+ +
+
+void XMLCALL
+XML_SetEndDoctypeDeclHandler(XML_Parser p,
+			     XML_EndDoctypeDeclHandler end);
+
+
+typedef void
+(XMLCALL *XML_EndDoctypeDeclHandler)(void *userData);
+
+

Set a handler that is called at the end of a DOCTYPE declaration, +after parsing any external subset.

+
+ +
+
+void XMLCALL
+XML_SetDoctypeDeclHandler(XML_Parser p,
+			  XML_StartDoctypeDeclHandler start,
+			  XML_EndDoctypeDeclHandler end);
+
+

Set both doctype handlers with one call.

+
+ +
+
+void XMLCALL
+XML_SetElementDeclHandler(XML_Parser p,
+			  XML_ElementDeclHandler eldecl);
+
+
+typedef void
+(XMLCALL *XML_ElementDeclHandler)(void *userData,
+                                  const XML_Char *name,
+                                  XML_Content *model);
+
+
+enum XML_Content_Type {
+  XML_CTYPE_EMPTY = 1,
+  XML_CTYPE_ANY,
+  XML_CTYPE_MIXED,
+  XML_CTYPE_NAME,
+  XML_CTYPE_CHOICE,
+  XML_CTYPE_SEQ
+};
+
+enum XML_Content_Quant {
+  XML_CQUANT_NONE,
+  XML_CQUANT_OPT,
+  XML_CQUANT_REP,
+  XML_CQUANT_PLUS
+};
+
+typedef struct XML_cp XML_Content;
+
+struct XML_cp {
+  enum XML_Content_Type		type;
+  enum XML_Content_Quant	quant;
+  const XML_Char *		name;
+  unsigned int			numchildren;
+  XML_Content *			children;
+};
+
+

Sets a handler for element declarations in a DTD. The handler gets +called with the name of the element in the declaration and a pointer +to a structure that contains the element model. It is the +application's responsibility to free this data structure using +XML_FreeContentModel.

+ +

The model argument is the root of a tree of +XML_Content nodes. If type equals +XML_CTYPE_EMPTY or XML_CTYPE_ANY, then +quant will be XML_CQUANT_NONE, and the other +fields will be zero or NULL. If type is +XML_CTYPE_MIXED, then quant will be +XML_CQUANT_NONE or XML_CQUANT_REP and +numchildren will contain the number of elements that are +allowed to be mixed in and children points to an array of +XML_Content structures that will all have type +XML_CTYPE_NAME with no quantification. Only the root node can be type +XML_CTYPE_EMPTY, XML_CTYPE_ANY, or +XML_CTYPE_MIXED.

+ +

For type XML_CTYPE_NAME, the name field +points to the name and the numchildren and +children fields will be zero and NULL. The +quant field will indicate any quantifiers placed on the +name.

+ +

Types XML_CTYPE_CHOICE and XML_CTYPE_SEQ +indicate a choice or sequence respectively. The +numchildren field indicates how many nodes in the choice +or sequence and children points to the nodes.

+
+ +
+
+void XMLCALL
+XML_SetAttlistDeclHandler(XML_Parser p,
+                          XML_AttlistDeclHandler attdecl);
+
+
+typedef void
+(XMLCALL *XML_AttlistDeclHandler)(void           *userData,
+                                  const XML_Char *elname,
+                                  const XML_Char *attname,
+                                  const XML_Char *att_type,
+                                  const XML_Char *dflt,
+                                  int            isrequired);
+
+

Set a handler for attlist declarations in the DTD. This handler is +called for each attribute. So a single attlist declaration +with multiple attributes declared will generate multiple calls to this +handler. The elname parameter returns the name of the +element for which the attribute is being declared. The attribute name +is in the attname parameter. The attribute type is in the +att_type parameter. It is the string representing the +type in the declaration with whitespace removed.

+ +

The dflt parameter holds the default value. It will be +NULL in the case of "#IMPLIED" or "#REQUIRED" attributes. You can +distinguish these two cases by checking the isrequired +parameter, which will be true in the case of "#REQUIRED" attributes. +Attributes which are "#FIXED" will have also have a true +isrequired, but they will have the non-NULL fixed value +in the dflt parameter.

+
+ +
+
+void XMLCALL
+XML_SetEntityDeclHandler(XML_Parser p,
+			 XML_EntityDeclHandler handler);
+
+
+typedef void
+(XMLCALL *XML_EntityDeclHandler)(void           *userData,
+                                 const XML_Char *entityName,
+                                 int            is_parameter_entity,
+                                 const XML_Char *value,
+                                 int            value_length, 
+                                 const XML_Char *base,
+                                 const XML_Char *systemId,
+                                 const XML_Char *publicId,
+                                 const XML_Char *notationName);
+
+

Sets a handler that will be called for all entity declarations. +The is_parameter_entity argument will be non-zero in the +case of parameter entities and zero otherwise.

+ +

For internal entities (<!ENTITY foo "bar">), +value will be non-NULL and systemId, +publicId, and notationName will all be NULL. +The value string is not NULL terminated; the length is +provided in the value_length parameter. Do not use +value_length to test for internal entities, since it is +legal to have zero-length values. Instead check for whether or not +value is NULL.

The notationName +argument will have a non-NULL value only for unparsed entity +declarations.

+
+ +
+
+void XMLCALL
+XML_SetUnparsedEntityDeclHandler(XML_Parser p,
+                                 XML_UnparsedEntityDeclHandler h)
+
+
+typedef void
+(XMLCALL *XML_UnparsedEntityDeclHandler)(void *userData,
+                                         const XML_Char *entityName, 
+                                         const XML_Char *base,
+                                         const XML_Char *systemId,
+                                         const XML_Char *publicId,
+                                         const XML_Char *notationName);
+
+

Set a handler that receives declarations of unparsed entities. These +are entity declarations that have a notation (NDATA) field:

+ +
+<!ENTITY logo SYSTEM "images/logo.gif" NDATA gif>
+
+

This handler is obsolete and is provided for backwards +compatibility. Use instead XML_SetEntityDeclHandler.

+
+ +
+
+void XMLCALL
+XML_SetNotationDeclHandler(XML_Parser p,
+                           XML_NotationDeclHandler h)
+
+
+typedef void
+(XMLCALL *XML_NotationDeclHandler)(void *userData, 
+                                   const XML_Char *notationName,
+                                   const XML_Char *base,
+                                   const XML_Char *systemId,
+                                   const XML_Char *publicId);
+
+

Set a handler that receives notation declarations.

+
+ +
+
+void XMLCALL
+XML_SetNotStandaloneHandler(XML_Parser p,
+                            XML_NotStandaloneHandler h)
+
+
+typedef int 
+(XMLCALL *XML_NotStandaloneHandler)(void *userData);
+
+

Set a handler that is called if the document is not "standalone". +This happens when there is an external subset or a reference to a +parameter entity, but does not have standalone set to "yes" in an XML +declaration. If this handler returns XML_STATUS_ERROR, +then the parser will throw an XML_ERROR_NOT_STANDALONE +error.

+
+ +

Parse position and error reporting functions

+ +

These are the functions you'll want to call when the parse +functions return XML_STATUS_ERROR (a parse error has +occurred), although the position reporting functions are useful outside +of errors. The position reported is the byte position (in the original +document or entity encoding) of the first of the sequence of +characters that generated the current event (or the error that caused +the parse functions to return XML_STATUS_ERROR.) The +exceptions are callbacks trigged by declarations in the document +prologue, in which case they exact position reported is somewhere in the +relevant markup, but not necessarily as meaningful as for other +events.

+ +

The position reporting functions are accurate only outside of the +DTD. In other words, they usually return bogus information when +called from within a DTD declaration handler.

+ +
+enum XML_Error XMLCALL
+XML_GetErrorCode(XML_Parser p);
+
+
+Return what type of error has occurred. +
+ +
+const XML_LChar * XMLCALL
+XML_ErrorString(enum XML_Error code);
+
+
+Return a string describing the error corresponding to code. +The code should be one of the enums that can be returned from +XML_GetErrorCode. +
+ +
+XML_Index XMLCALL
+XML_GetCurrentByteIndex(XML_Parser p);
+
+
+Return the byte offset of the position. This always corresponds to +the values returned by XML_GetCurrentLineNumber and XML_GetCurrentColumnNumber. +
+ +
+XML_Size XMLCALL
+XML_GetCurrentLineNumber(XML_Parser p);
+
+
+Return the line number of the position. The first line is reported as +1. +
+ +
+XML_Size XMLCALL
+XML_GetCurrentColumnNumber(XML_Parser p);
+
+
+Return the offset, from the beginning of the current line, of +the position. +
+ +
+int XMLCALL
+XML_GetCurrentByteCount(XML_Parser p);
+
+
+Return the number of bytes in the current event. Returns +0 if the event is inside a reference to an internal +entity and for the end-tag event for empty element tags (the later can +be used to distinguish empty-element tags from empty elements using +separate start and end tags). +
+ +
+const char * XMLCALL
+XML_GetInputContext(XML_Parser p,
+                    int *offset,
+                    int *size);
+
+
+ +

Returns the parser's input buffer, sets the integer pointed at by +offset to the offset within this buffer of the current +parse position, and set the integer pointed at by size to +the size of the returned buffer.

+ +

This should only be called from within a handler during an active +parse and the returned buffer should only be referred to from within +the handler that made the call. This input buffer contains the +untranslated bytes of the input.

+ +

Only a limited amount of context is kept, so if the event +triggering a call spans over a very large amount of input, the actual +parse position may be before the beginning of the buffer.

+ +

If XML_CONTEXT_BYTES is not defined, this will always +return NULL.

+
+ +

Miscellaneous functions

+ +

The functions in this section either obtain state information from +the parser or can be used to dynamicly set parser options.

+ +
+void XMLCALL
+XML_SetUserData(XML_Parser p,
+                void *userData);
+
+
+This sets the user data pointer that gets passed to handlers. It +overwrites any previous value for this pointer. Note that the +application is responsible for freeing the memory associated with +userData when it is finished with the parser. So if you +call this when there's already a pointer there, and you haven't freed +the memory associated with it, then you've probably just leaked +memory. +
+ +
+void * XMLCALL
+XML_GetUserData(XML_Parser p);
+
+
+This returns the user data pointer that gets passed to handlers. +It is actually implemented as a macro. +
+ +
+void XMLCALL
+XML_UseParserAsHandlerArg(XML_Parser p);
+
+
+After this is called, handlers receive the parser in their +userData arguments. The user data can still be obtained +using the XML_GetUserData function. +
+ +
+enum XML_Status XMLCALL
+XML_SetBase(XML_Parser p,
+            const XML_Char *base);
+
+
+Set the base to be used for resolving relative URIs in system +identifiers. The return value is XML_STATUS_ERROR if +there's no memory to store base, otherwise it's +XML_STATUS_OK. +
+ +
+const XML_Char * XMLCALL
+XML_GetBase(XML_Parser p);
+
+
+Return the base for resolving relative URIs. +
+ +
+int XMLCALL
+XML_GetSpecifiedAttributeCount(XML_Parser p);
+
+
+When attributes are reported to the start handler in the atts vector, +attributes that were explicitly set in the element occur before any +attributes that receive their value from default information in an +ATTLIST declaration. This function returns the number of attributes +that were explicitly set times two, thus giving the offset in the +atts array passed to the start tag handler of the first +attribute set due to defaults. It supplies information for the last +call to a start handler. If called inside a start handler, then that +means the current call. +
+ +
+int XMLCALL
+XML_GetIdAttributeIndex(XML_Parser p);
+
+
+Returns the index of the ID attribute passed in the atts array in the +last call to XML_StartElementHandler, or -1 if there is no ID +attribute. If called inside a start handler, then that means the +current call. +
+ +
+const XML_AttrInfo * XMLCALL
+XML_GetAttributeInfo(XML_Parser parser);
+
+
+typedef struct {
+  XML_Index  nameStart;  /* Offset to beginning of the attribute name. */
+  XML_Index  nameEnd;    /* Offset after the attribute name's last byte. */
+  XML_Index  valueStart; /* Offset to beginning of the attribute value. */
+  XML_Index  valueEnd;   /* Offset after the attribute value's last byte. */
+} XML_AttrInfo;
+
+
+Returns an array of XML_AttrInfo structures for the +attribute/value pairs passed in the last call to the +XML_StartElementHandler that were specified +in the start-tag rather than defaulted. Each attribute/value pair counts +as 1; thus the number of entries in the array is +XML_GetSpecifiedAttributeCount(parser) / 2. +
+ +
+enum XML_Status XMLCALL
+XML_SetEncoding(XML_Parser p,
+                const XML_Char *encoding);
+
+
+Set the encoding to be used by the parser. It is equivalent to +passing a non-null encoding argument to the parser creation functions. +It must not be called after XML_Parse or XML_ParseBuffer have been called on the given parser. +Returns XML_STATUS_OK on success or +XML_STATUS_ERROR on error. +
+ +
+int XMLCALL
+XML_SetParamEntityParsing(XML_Parser p,
+                          enum XML_ParamEntityParsing code);
+
+
+This enables parsing of parameter entities, including the external +parameter entity that is the external DTD subset, according to +code. +The choices for code are: +
    +
  • XML_PARAM_ENTITY_PARSING_NEVER
  • +
  • XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE
  • +
  • XML_PARAM_ENTITY_PARSING_ALWAYS
  • +
+Note: If XML_SetParamEntityParsing is called after +XML_Parse or XML_ParseBuffer, then it has +no effect and will always return 0. +
+ +
+int XMLCALL
+XML_SetHashSalt(XML_Parser p,
+                unsigned long hash_salt);
+
+
+Sets the hash salt to use for internal hash calculations. +Helps in preventing DoS attacks based on predicting hash +function behavior. In order to have an effect this must be called +before parsing has started. Returns 1 if successful, 0 when called +after XML_Parse or XML_ParseBuffer. +

Note: This call is optional, as the parser will auto-generate a new +random salt value if no value has been set at the start of parsing.

+
+ +
+enum XML_Error XMLCALL
+XML_UseForeignDTD(XML_Parser parser, XML_Bool useDTD);
+
+
+

This function allows an application to provide an external subset +for the document type declaration for documents which do not specify +an external subset of their own. For documents which specify an +external subset in their DOCTYPE declaration, the application-provided +subset will be ignored. If the document does not contain a DOCTYPE +declaration at all and useDTD is true, the +application-provided subset will be parsed, but the +startDoctypeDeclHandler and +endDoctypeDeclHandler functions, if set, will not be +called. The setting of parameter entity parsing, controlled using +XML_SetParamEntityParsing, will be honored.

+ +

The application-provided external subset is read by calling the +external entity reference handler set via XML_SetExternalEntityRefHandler with both +publicId and systemId set to NULL.

+ +

If this function is called after parsing has begun, it returns +XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING and ignores +useDTD. If called when Expat has been compiled without +DTD support, it returns +XML_ERROR_FEATURE_REQUIRES_XML_DTD. Otherwise, it +returns XML_ERROR_NONE.

+ +

Note: For the purpose of checking WFC: Entity Declared, passing +useDTD == XML_TRUE will make the parser behave as if +the document had a DTD with an external subset. This holds true even if +the external entity reference handler returns without action.

+
+ +
+void XMLCALL
+XML_SetReturnNSTriplet(XML_Parser parser,
+                       int        do_nst);
+
+
+

+This function only has an effect when using a parser created with +XML_ParserCreateNS, +i.e. when namespace processing is in effect. The do_nst +sets whether or not prefixes are returned with names qualified with a +namespace prefix. If this function is called with do_nst +non-zero, then afterwards namespace qualified names (that is qualified +with a prefix as opposed to belonging to a default namespace) are +returned as a triplet with the three parts separated by the namespace +separator specified when the parser was created. The order of +returned parts is URI, local name, and prefix.

If +do_nst is zero, then namespaces are reported in the +default manner, URI then local_name separated by the namespace +separator.

+
+ +
+void XMLCALL
+XML_DefaultCurrent(XML_Parser parser);
+
+
+This can be called within a handler for a start element, end element, +processing instruction or character data. It causes the corresponding +markup to be passed to the default handler set by XML_SetDefaultHandler or +XML_SetDefaultHandlerExpand. It does nothing if there is +not a default handler. +
+ +
+XML_LChar * XMLCALL
+XML_ExpatVersion();
+
+
+Return the library version as a string (e.g. "expat_1.95.1"). +
+ +
+struct XML_Expat_Version XMLCALL
+XML_ExpatVersionInfo();
+
+
+typedef struct {
+  int major;
+  int minor;
+  int micro;
+} XML_Expat_Version;
+
+
+Return the library version information as a structure. +Some macros are also defined that support compile-time tests of the +library version: +
    +
  • XML_MAJOR_VERSION
  • +
  • XML_MINOR_VERSION
  • +
  • XML_MICRO_VERSION
  • +
+Testing these constants is currently the best way to determine if +particular parts of the Expat API are available. +
+ +
+const XML_Feature * XMLCALL
+XML_GetFeatureList();
+
+
+enum XML_FeatureEnum {
+  XML_FEATURE_END = 0,
+  XML_FEATURE_UNICODE,
+  XML_FEATURE_UNICODE_WCHAR_T,
+  XML_FEATURE_DTD,
+  XML_FEATURE_CONTEXT_BYTES,
+  XML_FEATURE_MIN_SIZE,
+  XML_FEATURE_SIZEOF_XML_CHAR,
+  XML_FEATURE_SIZEOF_XML_LCHAR,
+  XML_FEATURE_NS,
+  XML_FEATURE_LARGE_SIZE
+};
+
+typedef struct {
+  enum XML_FeatureEnum  feature;
+  XML_LChar            *name;
+  long int              value;
+} XML_Feature;
+
+
+

Returns a list of "feature" records, providing details on how +Expat was configured at compile time. Most applications should not +need to worry about this, but this information is otherwise not +available from Expat. This function allows code that does need to +check these features to do so at runtime.

+ +

The return value is an array of XML_Feature, +terminated by a record with a feature of +XML_FEATURE_END and name of NULL, +identifying the feature-test macros Expat was compiled with. Since an +application that requires this kind of information needs to determine +the type of character the name points to, records for the +XML_FEATURE_SIZEOF_XML_CHAR and +XML_FEATURE_SIZEOF_XML_LCHAR will be located at the +beginning of the list, followed by XML_FEATURE_UNICODE +and XML_FEATURE_UNICODE_WCHAR_T, if they are present at +all.

+ +

Some features have an associated value. If there isn't an +associated value, the value field is set to 0. At this +time, the following features have been defined to have values:

+ +
+
XML_FEATURE_SIZEOF_XML_CHAR
+
The number of bytes occupied by one XML_Char + character.
+
XML_FEATURE_SIZEOF_XML_LCHAR
+
The number of bytes occupied by one XML_LChar + character.
+
XML_FEATURE_CONTEXT_BYTES
+
The maximum number of characters of context which can be + reported by XML_GetInputContext.
+
+
+ +
+void XMLCALL
+XML_FreeContentModel(XML_Parser parser, XML_Content *model);
+
+
+Function to deallocate the model argument passed to the +XML_ElementDeclHandler callback set using XML_ElementDeclHandler. +This function should not be used for any other purpose. +
+ +

The following functions allow external code to share the memory +allocator an XML_Parser has been configured to use. This +is especially useful for third-party libraries that interact with a +parser object created by application code, or heavily layered +applications. This can be essential when using dynamically loaded +libraries which use different C standard libraries (this can happen on +Windows, at least).

+ +
+void * XMLCALL
+XML_MemMalloc(XML_Parser parser, size_t size);
+
+
+Allocate size bytes of memory using the allocator the +parser object has been configured to use. Returns a +pointer to the memory or NULL on failure. Memory allocated in this +way must be freed using XML_MemFree. +
+ +
+void * XMLCALL
+XML_MemRealloc(XML_Parser parser, void *ptr, size_t size);
+
+
+Allocate size bytes of memory using the allocator the +parser object has been configured to use. +ptr must point to a block of memory allocated by XML_MemMalloc or +XML_MemRealloc, or be NULL. This function tries to +expand the block pointed to by ptr if possible. Returns +a pointer to the memory or NULL on failure. On success, the original +block has either been expanded or freed. On failure, the original +block has not been freed; the caller is responsible for freeing the +original block. Memory allocated in this way must be freed using +XML_MemFree. +
+ +
+void XMLCALL
+XML_MemFree(XML_Parser parser, void *ptr);
+
+
+Free a block of memory pointed to by ptr. The block must +have been allocated by XML_MemMalloc or XML_MemRealloc, or be NULL. +
+ +
+

Valid XHTML 1.0!

+
+ + diff --git a/expat/doc/style.css b/expat/doc/style.css new file mode 100644 index 000000000..69df30bce --- /dev/null +++ b/expat/doc/style.css @@ -0,0 +1,101 @@ +body { + background-color: white; + border: 0px; + margin: 0px; + padding: 0px; +} + +.corner { + width: 200px; + height: 80px; + text-align: center; +} + +.banner { + background-color: rgb(110,139,61); + color: rgb(255,236,176); + padding-left: 2em; +} + +.banner h1 { + font-size: 200%; +} + +.content { + padding: 0em 2em 1em 2em; +} + +.releaseno { + background-color: rgb(110,139,61); + color: rgb(255,236,176); + padding-bottom: 0.3em; + padding-top: 0.5em; + text-align: center; + font-weight: bold; +} + +.noborder { + border-width: 0px; +} + +.eg { + padding-left: 1em; + padding-top: .5em; + padding-bottom: .5em; + border: solid thin; + margin: 1em 0; + background-color: tan; + margin-left: 2em; + margin-right: 10%; +} + +.pseudocode { + padding-left: 1em; + padding-top: .5em; + padding-bottom: .5em; + border: solid thin; + margin: 1em 0; + background-color: rgb(250,220,180); + margin-left: 2em; + margin-right: 10%; +} + +.handler { + width: 100%; + border-top-width: thin; + margin-bottom: 1em; +} + +.handler p { + margin-left: 2em; +} + +.setter { + font-weight: bold; +} + +.signature { + color: navy; +} + +.fcndec { + width: 100%; + border-top-width: thin; + font-weight: bold; +} + +.fcndef { + margin-left: 2em; + margin-bottom: 2em; +} + +dd { + margin-bottom: 2em; +} + +.cpp-symbols dt { + font-family: monospace; +} +.cpp-symbols dd { + margin-bottom: 1em; +} diff --git a/expat/doc/valid-xhtml10.png b/expat/doc/valid-xhtml10.png new file mode 100644 index 000000000..4c23f48fe Binary files /dev/null and b/expat/doc/valid-xhtml10.png differ diff --git a/expat/doc/xmlwf.1 b/expat/doc/xmlwf.1 new file mode 100644 index 000000000..174719a70 --- /dev/null +++ b/expat/doc/xmlwf.1 @@ -0,0 +1,251 @@ +.\" This manpage has been automatically generated by docbook2man +.\" from a DocBook document. This tool can be found at: +.\" +.\" Please send any bug reports, improvements, comments, patches, +.\" etc. to Steve Cheng . +.TH "XMLWF" "1" "24 January 2003" "" "" +.SH NAME +xmlwf \- Determines if an XML document is well-formed +.SH SYNOPSIS + +\fBxmlwf\fR [ \fB-s\fR] [ \fB-n\fR] [ \fB-p\fR] [ \fB-x\fR] [ \fB-e \fIencoding\fB\fR] [ \fB-w\fR] [ \fB-d \fIoutput-dir\fB\fR] [ \fB-c\fR] [ \fB-m\fR] [ \fB-r\fR] [ \fB-t\fR] [ \fB-v\fR] [ \fBfile ...\fR] + +.SH "DESCRIPTION" +.PP +\fBxmlwf\fR uses the Expat library to +determine if an XML document is well-formed. It is +non-validating. +.PP +If you do not specify any files on the command-line, and you +have a recent version of \fBxmlwf\fR, the +input file will be read from standard input. +.SH "WELL-FORMED DOCUMENTS" +.PP +A well-formed document must adhere to the +following rules: +.TP 0.2i +\(bu +The file begins with an XML declaration. For instance, +. +\fBNOTE:\fR +\fBxmlwf\fR does not currently +check for a valid XML declaration. +.TP 0.2i +\(bu +Every start tag is either empty () +or has a corresponding end tag. +.TP 0.2i +\(bu +There is exactly one root element. This element must contain +all other elements in the document. Only comments, white +space, and processing instructions may come after the close +of the root element. +.TP 0.2i +\(bu +All elements nest properly. +.TP 0.2i +\(bu +All attribute values are enclosed in quotes (either single +or double). +.PP +If the document has a DTD, and it strictly complies with that +DTD, then the document is also considered \fBvalid\fR. +\fBxmlwf\fR is a non-validating parser -- +it does not check the DTD. However, it does support +external entities (see the \fB-x\fR option). +.SH "OPTIONS" +.PP +When an option includes an argument, you may specify the argument either +separately ("\fB-d\fR output") or concatenated with the +option ("\fB-d\fRoutput"). \fBxmlwf\fR +supports both. +.TP +\fB-c\fR +If the input file is well-formed and \fBxmlwf\fR +doesn't encounter any errors, the input file is simply copied to +the output directory unchanged. +This implies no namespaces (turns off \fB-n\fR) and +requires \fB-d\fR to specify an output file. +.TP +\fB-d output-dir\fR +Specifies a directory to contain transformed +representations of the input files. +By default, \fB-d\fR outputs a canonical representation +(described below). +You can select different output formats using \fB-c\fR +and \fB-m\fR. + +The output filenames will +be exactly the same as the input filenames or "STDIN" if the input is +coming from standard input. Therefore, you must be careful that the +output file does not go into the same directory as the input +file. Otherwise, \fBxmlwf\fR will delete the +input file before it generates the output file (just like running +cat < file > file in most shells). + +Two structurally equivalent XML documents have a byte-for-byte +identical canonical XML representation. +Note that ignorable white space is considered significant and +is treated equivalently to data. +More on canonical XML can be found at +http://www.jclark.com/xml/canonxml.html . +.TP +\fB-e encoding\fR +Specifies the character encoding for the document, overriding +any document encoding declaration. \fBxmlwf\fR +supports four built-in encodings: +US-ASCII, +UTF-8, +UTF-16, and +ISO-8859-1. +Also see the \fB-w\fR option. +.TP +\fB-m\fR +Outputs some strange sort of XML file that completely +describes the input file, including character positions. +Requires \fB-d\fR to specify an output file. +.TP +\fB-n\fR +Turns on namespace processing. (describe namespaces) +\fB-c\fR disables namespaces. +.TP +\fB-p\fR +Tells xmlwf to process external DTDs and parameter +entities. + +Normally \fBxmlwf\fR never parses parameter +entities. \fB-p\fR tells it to always parse them. +\fB-p\fR implies \fB-x\fR. +.TP +\fB-r\fR +Normally \fBxmlwf\fR memory-maps the XML file +before parsing; this can result in faster parsing on many +platforms. +\fB-r\fR turns off memory-mapping and uses normal file +IO calls instead. +Of course, memory-mapping is automatically turned off +when reading from standard input. + +Use of memory-mapping can cause some platforms to report +substantially higher memory usage for +\fBxmlwf\fR, but this appears to be a matter of +the operating system reporting memory in a strange way; there is +not a leak in \fBxmlwf\fR. +.TP +\fB-s\fR +Prints an error if the document is not standalone. +A document is standalone if it has no external subset and no +references to parameter entities. +.TP +\fB-t\fR +Turns on timings. This tells Expat to parse the entire file, +but not perform any processing. +This gives a fairly accurate idea of the raw speed of Expat itself +without client overhead. +\fB-t\fR turns off most of the output options +(\fB-d\fR, \fB-m\fR, \fB-c\fR, +\&...). +.TP +\fB-v\fR +Prints the version of the Expat library being used, including some +information on the compile-time configuration of the library, and +then exits. +.TP +\fB-w\fR +Enables support for Windows code pages. +Normally, \fBxmlwf\fR will throw an error if it +runs across an encoding that it is not equipped to handle itself. With +\fB-w\fR, xmlwf will try to use a Windows code +page. See also \fB-e\fR. +.TP +\fB-x\fR +Turns on parsing external entities. + +Non-validating parsers are not required to resolve external +entities, or even expand entities at all. +Expat always expands internal entities (?), +but external entity parsing must be enabled explicitly. + +External entities are simply entities that obtain their +data from outside the XML file currently being parsed. + +This is an example of an internal entity: + +.nf + +.fi + +And here are some examples of external entities: + +.nf + (parsed) + (unparsed) +.fi +.TP +\fB--\fR +(Two hyphens.) +Terminates the list of options. This is only needed if a filename +starts with a hyphen. For example: + +.nf +xmlwf -- -myfile.xml +.fi + +will run \fBxmlwf\fR on the file +\fI-myfile.xml\fR. +.PP +Older versions of \fBxmlwf\fR do not support +reading from standard input. +.SH "OUTPUT" +.PP +If an input file is not well-formed, +\fBxmlwf\fR prints a single line describing +the problem to standard output. If a file is well formed, +\fBxmlwf\fR outputs nothing. +Note that the result code is \fBnot\fR set. +.SH "BUGS" +.PP +According to the W3C standard, an XML file without a +declaration at the beginning is not considered well-formed. +However, \fBxmlwf\fR allows this to pass. +.PP +\fBxmlwf\fR returns a 0 - noerr result, +even if the file is not well-formed. There is no good way for +a program to use \fBxmlwf\fR to quickly +check a file -- it must parse \fBxmlwf\fR's +standard output. +.PP +The errors should go to standard error, not standard output. +.PP +There should be a way to get \fB-d\fR to send its +output to standard output rather than forcing the user to send +it to a file. +.PP +I have no idea why anyone would want to use the +\fB-d\fR, \fB-c\fR, and +\fB-m\fR options. If someone could explain it to +me, I'd like to add this information to this manpage. +.SH "ALTERNATIVES" +.PP +Here are some XML validators on the web: + +.nf +http://www.hcrc.ed.ac.uk/~richard/xml-check.html +http://www.stg.brown.edu/service/xmlvalid/ +http://www.scripting.com/frontier5/xml/code/xmlValidator.html +http://www.xml.com/pub/a/tools/ruwf/check.html +.fi +.SH "SEE ALSO" +.PP + +.nf +The Expat home page: http://www.libexpat.org/ +The W3 XML specification: http://www.w3.org/TR/REC-xml +.fi +.SH "AUTHOR" +.PP +This manual page was written by Scott Bronson for +the Debian GNU/Linux system (but may be used by others). Permission is +granted to copy, distribute and/or modify this document under +the terms of the GNU Free Documentation +License, Version 1.1. diff --git a/expat/doc/xmlwf.sgml b/expat/doc/xmlwf.sgml new file mode 100644 index 000000000..313cfbcb2 --- /dev/null +++ b/expat/doc/xmlwf.sgml @@ -0,0 +1,468 @@ + manpage.1'. You may view + the manual page with: `docbook-to-man manpage.sgml | nroff -man | + less'. A typical entry in a Makefile or Makefile.am is: + +manpage.1: manpage.sgml + docbook-to-man $< > $@ + --> + + + Scott"> + Bronson"> + + December 5, 2001"> + + 1"> + bronson@rinspin.com"> + + XMLWF"> + + + Debian GNU/Linux"> + GNU"> +]> + + + +
+ &dhemail; +
+ + &dhfirstname; + &dhsurname; + + + 2001 + &dhusername; + + &dhdate; +
+ + &dhucpackage; + + &dhsection; + + + &dhpackage; + + Determines if an XML document is well-formed + + + + &dhpackage; + + + + + + + + + + + + + + + + + + file ... + + + + + DESCRIPTION + + + &dhpackage; uses the Expat library to + determine if an XML document is well-formed. It is + non-validating. + + + + If you do not specify any files on the command-line, and you + have a recent version of &dhpackage;, the + input file will be read from standard input. + + + + + + WELL-FORMED DOCUMENTS + + + A well-formed document must adhere to the + following rules: + + + + + The file begins with an XML declaration. For instance, + <?xml version="1.0" standalone="yes"?>. + NOTE: + &dhpackage; does not currently + check for a valid XML declaration. + + + Every start tag is either empty (<tag/>) + or has a corresponding end tag. + + + There is exactly one root element. This element must contain + all other elements in the document. Only comments, white + space, and processing instructions may come after the close + of the root element. + + + All elements nest properly. + + + All attribute values are enclosed in quotes (either single + or double). + + + + + If the document has a DTD, and it strictly complies with that + DTD, then the document is also considered valid. + &dhpackage; is a non-validating parser -- + it does not check the DTD. However, it does support + external entities (see the option). + + + + + OPTIONS + + +When an option includes an argument, you may specify the argument either +separately (" output") or concatenated with the +option ("output"). &dhpackage; +supports both. + + + + + + + + + If the input file is well-formed and &dhpackage; + doesn't encounter any errors, the input file is simply copied to + the output directory unchanged. + This implies no namespaces (turns off ) and + requires to specify an output file. + + + + + + + + + Specifies a directory to contain transformed + representations of the input files. + By default, outputs a canonical representation + (described below). + You can select different output formats using + and . + + + The output filenames will + be exactly the same as the input filenames or "STDIN" if the input is + coming from standard input. Therefore, you must be careful that the + output file does not go into the same directory as the input + file. Otherwise, &dhpackage; will delete the + input file before it generates the output file (just like running + cat < file > file in most shells). + + + Two structurally equivalent XML documents have a byte-for-byte + identical canonical XML representation. + Note that ignorable white space is considered significant and + is treated equivalently to data. + More on canonical XML can be found at + http://www.jclark.com/xml/canonxml.html . + + + + + + + + + Specifies the character encoding for the document, overriding + any document encoding declaration. &dhpackage; + supports four built-in encodings: + US-ASCII, + UTF-8, + UTF-16, and + ISO-8859-1. + Also see the option. + + + + + + + + + Outputs some strange sort of XML file that completely + describes the the input file, including character postitions. + Requires to specify an output file. + + + + + + + + + Turns on namespace processing. (describe namespaces) + disables namespaces. + + + + + + + + + Tells xmlwf to process external DTDs and parameter + entities. + + + Normally &dhpackage; never parses parameter + entities. tells it to always parse them. + implies . + + + + + + + + + Normally &dhpackage; memory-maps the XML file + before parsing; this can result in faster parsing on many + platforms. + turns off memory-mapping and uses normal file + IO calls instead. + Of course, memory-mapping is automatically turned off + when reading from standard input. + + + Use of memory-mapping can cause some platforms to report + substantially higher memory usage for + &dhpackage;, but this appears to be a matter of + the operating system reporting memory in a strange way; there is + not a leak in &dhpackage;. + + + + + + + + + Prints an error if the document is not standalone. + A document is standalone if it has no external subset and no + references to parameter entities. + + + + + + + + + Turns on timings. This tells Expat to parse the entire file, + but not perform any processing. + This gives a fairly accurate idea of the raw speed of Expat itself + without client overhead. + turns off most of the output options + (, , , + ...). + + + + + + + + + Prints the version of the Expat library being used, including some + information on the compile-time configuration of the library, and + then exits. + + + + + + + + + Enables support for Windows code pages. + Normally, &dhpackage; will throw an error if it + runs across an encoding that it is not equipped to handle itself. With + , &dhpackage; will try to use a Windows code + page. See also . + + + + + + + + + Turns on parsing external entities. + + + Non-validating parsers are not required to resolve external + entities, or even expand entities at all. + Expat always expands internal entities (?), + but external entity parsing must be enabled explicitly. + + + External entities are simply entities that obtain their + data from outside the XML file currently being parsed. + + + This is an example of an internal entity: + +<!ENTITY vers '1.0.2'> + + + + And here are some examples of external entities: + + +<!ENTITY header SYSTEM "header-&vers;.xml"> (parsed) +<!ENTITY logo SYSTEM "logo.png" PNG> (unparsed) + + + + + + + + + + + (Two hyphens.) + Terminates the list of options. This is only needed if a filename + starts with a hyphen. For example: + + +&dhpackage; -- -myfile.xml + + + will run &dhpackage; on the file + -myfile.xml. + + + + + + + Older versions of &dhpackage; do not support + reading from standard input. + + + + + OUTPUT + + If an input file is not well-formed, + &dhpackage; prints a single line describing + the problem to standard output. If a file is well formed, + &dhpackage; outputs nothing. + Note that the result code is not set. + + + + + BUGS + + &dhpackage; returns a 0 - noerr result, + even if the file is not well-formed. There is no good way for + a program to use &dhpackage; to quickly + check a file -- it must parse &dhpackage;'s + standard output. + + + The errors should go to standard error, not standard output. + + + There should be a way to get to send its + output to standard output rather than forcing the user to send + it to a file. + + + I have no idea why anyone would want to use the + , , and + options. If someone could explain it to + me, I'd like to add this information to this manpage. + + + + + ALTERNATIVES + + Here are some XML validators on the web: + + +http://www.hcrc.ed.ac.uk/~richard/xml-check.html +http://www.stg.brown.edu/service/xmlvalid/ +http://www.scripting.com/frontier5/xml/code/xmlValidator.html +http://www.xml.com/pub/a/tools/ruwf/check.html + + + + + + + SEE ALSO + + + +The Expat home page: http://www.libexpat.org/ +The W3 XML specification: http://www.w3.org/TR/REC-xml + + + + + + + AUTHOR + + This manual page was written by &dhusername; &dhemail; for + the &debian; system (but may be used by others). Permission is + granted to copy, distribute and/or modify this document under + the terms of the GNU Free Documentation + License, Version 1.1. + + +
+ + -- cgit v1.2.3