diff options
Diffstat (limited to 'expat/doc/xmlwf.1')
-rw-r--r-- | expat/doc/xmlwf.1 | 251 |
1 files changed, 251 insertions, 0 deletions
diff --git a/expat/doc/xmlwf.1 b/expat/doc/xmlwf.1 new file mode 100644 index 000000000..174719a70 --- /dev/null +++ b/expat/doc/xmlwf.1 @@ -0,0 +1,251 @@ +.\" This manpage has been automatically generated by docbook2man +.\" from a DocBook document. This tool can be found at: +.\" <http://shell.ipoline.com/~elmert/comp/docbook2X/> +.\" Please send any bug reports, improvements, comments, patches, +.\" etc. to Steve Cheng <steve@ggi-project.org>. +.TH "XMLWF" "1" "24 January 2003" "" "" +.SH NAME +xmlwf \- Determines if an XML document is well-formed +.SH SYNOPSIS + +\fBxmlwf\fR [ \fB-s\fR] [ \fB-n\fR] [ \fB-p\fR] [ \fB-x\fR] [ \fB-e \fIencoding\fB\fR] [ \fB-w\fR] [ \fB-d \fIoutput-dir\fB\fR] [ \fB-c\fR] [ \fB-m\fR] [ \fB-r\fR] [ \fB-t\fR] [ \fB-v\fR] [ \fBfile ...\fR] + +.SH "DESCRIPTION" +.PP +\fBxmlwf\fR uses the Expat library to +determine if an XML document is well-formed. It is +non-validating. +.PP +If you do not specify any files on the command-line, and you +have a recent version of \fBxmlwf\fR, the +input file will be read from standard input. +.SH "WELL-FORMED DOCUMENTS" +.PP +A well-formed document must adhere to the +following rules: +.TP 0.2i +\(bu +The file begins with an XML declaration. For instance, +<?xml version="1.0" standalone="yes"?>. +\fBNOTE:\fR +\fBxmlwf\fR does not currently +check for a valid XML declaration. +.TP 0.2i +\(bu +Every start tag is either empty (<tag/>) +or has a corresponding end tag. +.TP 0.2i +\(bu +There is exactly one root element. This element must contain +all other elements in the document. Only comments, white +space, and processing instructions may come after the close +of the root element. +.TP 0.2i +\(bu +All elements nest properly. +.TP 0.2i +\(bu +All attribute values are enclosed in quotes (either single +or double). +.PP +If the document has a DTD, and it strictly complies with that +DTD, then the document is also considered \fBvalid\fR. +\fBxmlwf\fR is a non-validating parser -- +it does not check the DTD. However, it does support +external entities (see the \fB-x\fR option). +.SH "OPTIONS" +.PP +When an option includes an argument, you may specify the argument either +separately ("\fB-d\fR output") or concatenated with the +option ("\fB-d\fRoutput"). \fBxmlwf\fR +supports both. +.TP +\fB-c\fR +If the input file is well-formed and \fBxmlwf\fR +doesn't encounter any errors, the input file is simply copied to +the output directory unchanged. +This implies no namespaces (turns off \fB-n\fR) and +requires \fB-d\fR to specify an output file. +.TP +\fB-d output-dir\fR +Specifies a directory to contain transformed +representations of the input files. +By default, \fB-d\fR outputs a canonical representation +(described below). +You can select different output formats using \fB-c\fR +and \fB-m\fR. + +The output filenames will +be exactly the same as the input filenames or "STDIN" if the input is +coming from standard input. Therefore, you must be careful that the +output file does not go into the same directory as the input +file. Otherwise, \fBxmlwf\fR will delete the +input file before it generates the output file (just like running +cat < file > file in most shells). + +Two structurally equivalent XML documents have a byte-for-byte +identical canonical XML representation. +Note that ignorable white space is considered significant and +is treated equivalently to data. +More on canonical XML can be found at +http://www.jclark.com/xml/canonxml.html . +.TP +\fB-e encoding\fR +Specifies the character encoding for the document, overriding +any document encoding declaration. \fBxmlwf\fR +supports four built-in encodings: +US-ASCII, +UTF-8, +UTF-16, and +ISO-8859-1. +Also see the \fB-w\fR option. +.TP +\fB-m\fR +Outputs some strange sort of XML file that completely +describes the input file, including character positions. +Requires \fB-d\fR to specify an output file. +.TP +\fB-n\fR +Turns on namespace processing. (describe namespaces) +\fB-c\fR disables namespaces. +.TP +\fB-p\fR +Tells xmlwf to process external DTDs and parameter +entities. + +Normally \fBxmlwf\fR never parses parameter +entities. \fB-p\fR tells it to always parse them. +\fB-p\fR implies \fB-x\fR. +.TP +\fB-r\fR +Normally \fBxmlwf\fR memory-maps the XML file +before parsing; this can result in faster parsing on many +platforms. +\fB-r\fR turns off memory-mapping and uses normal file +IO calls instead. +Of course, memory-mapping is automatically turned off +when reading from standard input. + +Use of memory-mapping can cause some platforms to report +substantially higher memory usage for +\fBxmlwf\fR, but this appears to be a matter of +the operating system reporting memory in a strange way; there is +not a leak in \fBxmlwf\fR. +.TP +\fB-s\fR +Prints an error if the document is not standalone. +A document is standalone if it has no external subset and no +references to parameter entities. +.TP +\fB-t\fR +Turns on timings. This tells Expat to parse the entire file, +but not perform any processing. +This gives a fairly accurate idea of the raw speed of Expat itself +without client overhead. +\fB-t\fR turns off most of the output options +(\fB-d\fR, \fB-m\fR, \fB-c\fR, +\&...). +.TP +\fB-v\fR +Prints the version of the Expat library being used, including some +information on the compile-time configuration of the library, and +then exits. +.TP +\fB-w\fR +Enables support for Windows code pages. +Normally, \fBxmlwf\fR will throw an error if it +runs across an encoding that it is not equipped to handle itself. With +\fB-w\fR, xmlwf will try to use a Windows code +page. See also \fB-e\fR. +.TP +\fB-x\fR +Turns on parsing external entities. + +Non-validating parsers are not required to resolve external +entities, or even expand entities at all. +Expat always expands internal entities (?), +but external entity parsing must be enabled explicitly. + +External entities are simply entities that obtain their +data from outside the XML file currently being parsed. + +This is an example of an internal entity: + +.nf +<!ENTITY vers '1.0.2'> +.fi + +And here are some examples of external entities: + +.nf +<!ENTITY header SYSTEM "header-&vers;.xml"> (parsed) +<!ENTITY logo SYSTEM "logo.png" PNG> (unparsed) +.fi +.TP +\fB--\fR +(Two hyphens.) +Terminates the list of options. This is only needed if a filename +starts with a hyphen. For example: + +.nf +xmlwf -- -myfile.xml +.fi + +will run \fBxmlwf\fR on the file +\fI-myfile.xml\fR. +.PP +Older versions of \fBxmlwf\fR do not support +reading from standard input. +.SH "OUTPUT" +.PP +If an input file is not well-formed, +\fBxmlwf\fR prints a single line describing +the problem to standard output. If a file is well formed, +\fBxmlwf\fR outputs nothing. +Note that the result code is \fBnot\fR set. +.SH "BUGS" +.PP +According to the W3C standard, an XML file without a +declaration at the beginning is not considered well-formed. +However, \fBxmlwf\fR allows this to pass. +.PP +\fBxmlwf\fR returns a 0 - noerr result, +even if the file is not well-formed. There is no good way for +a program to use \fBxmlwf\fR to quickly +check a file -- it must parse \fBxmlwf\fR's +standard output. +.PP +The errors should go to standard error, not standard output. +.PP +There should be a way to get \fB-d\fR to send its +output to standard output rather than forcing the user to send +it to a file. +.PP +I have no idea why anyone would want to use the +\fB-d\fR, \fB-c\fR, and +\fB-m\fR options. If someone could explain it to +me, I'd like to add this information to this manpage. +.SH "ALTERNATIVES" +.PP +Here are some XML validators on the web: + +.nf +http://www.hcrc.ed.ac.uk/~richard/xml-check.html +http://www.stg.brown.edu/service/xmlvalid/ +http://www.scripting.com/frontier5/xml/code/xmlValidator.html +http://www.xml.com/pub/a/tools/ruwf/check.html +.fi +.SH "SEE ALSO" +.PP + +.nf +The Expat home page: http://www.libexpat.org/ +The W3 XML specification: http://www.w3.org/TR/REC-xml +.fi +.SH "AUTHOR" +.PP +This manual page was written by Scott Bronson <bronson@rinspin.com> for +the Debian GNU/Linux system (but may be used by others). Permission is +granted to copy, distribute and/or modify this document under +the terms of the GNU Free Documentation +License, Version 1.1. |