diff options
Diffstat (limited to 'libjava/classpath/gnu/xml/pipeline')
-rw-r--r-- | libjava/classpath/gnu/xml/pipeline/CallFilter.java | 257 | ||||
-rw-r--r-- | libjava/classpath/gnu/xml/pipeline/DomConsumer.java | 967 | ||||
-rw-r--r-- | libjava/classpath/gnu/xml/pipeline/EventConsumer.java | 95 | ||||
-rw-r--r-- | libjava/classpath/gnu/xml/pipeline/EventFilter.java | 796 | ||||
-rw-r--r-- | libjava/classpath/gnu/xml/pipeline/LinkFilter.java | 242 | ||||
-rw-r--r-- | libjava/classpath/gnu/xml/pipeline/NSFilter.java | 341 | ||||
-rw-r--r-- | libjava/classpath/gnu/xml/pipeline/PipelineFactory.java | 723 | ||||
-rw-r--r-- | libjava/classpath/gnu/xml/pipeline/TeeConsumer.java | 417 | ||||
-rw-r--r-- | libjava/classpath/gnu/xml/pipeline/TextConsumer.java | 117 | ||||
-rw-r--r-- | libjava/classpath/gnu/xml/pipeline/ValidationConsumer.java | 1928 | ||||
-rw-r--r-- | libjava/classpath/gnu/xml/pipeline/WellFormednessFilter.java | 363 | ||||
-rw-r--r-- | libjava/classpath/gnu/xml/pipeline/XIncludeFilter.java | 579 | ||||
-rw-r--r-- | libjava/classpath/gnu/xml/pipeline/XsltFilter.java | 130 | ||||
-rw-r--r-- | libjava/classpath/gnu/xml/pipeline/package.html | 255 |
14 files changed, 7210 insertions, 0 deletions
diff --git a/libjava/classpath/gnu/xml/pipeline/CallFilter.java b/libjava/classpath/gnu/xml/pipeline/CallFilter.java new file mode 100644 index 000000000..2398b8685 --- /dev/null +++ b/libjava/classpath/gnu/xml/pipeline/CallFilter.java @@ -0,0 +1,257 @@ +/* CallFilter.java -- + Copyright (C) 1999,2000,2001 Free Software Foundation, Inc. + +This file is part of GNU Classpath. + +GNU Classpath is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2, or (at your option) +any later version. + +GNU Classpath is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GNU Classpath; see the file COPYING. If not, write to the +Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA +02110-1301 USA. + +Linking this library statically or dynamically with other modules is +making a combined work based on this library. Thus, the terms and +conditions of the GNU General Public License cover the whole +combination. + +As a special exception, the copyright holders of this library give you +permission to link this library with independent modules to produce an +executable, regardless of the license terms of these independent +modules, and to copy and distribute the resulting executable under +terms of your choice, provided that you also meet, for each linked +independent module, the terms and conditions of the license of that +module. An independent module is a module which is not derived from +or based on this library. If you modify this library, you may extend +this exception to your version of the library, but you are not +obligated to do so. If you do not wish to do so, delete this +exception statement from your version. */ + +package gnu.xml.pipeline; + +import java.io.IOException; +import java.io.OutputStreamWriter; +import java.net.URL; +import java.net.URLConnection; +import java.io.Writer; + +import org.xml.sax.DTDHandler; +import org.xml.sax.ErrorHandler; +import org.xml.sax.InputSource; +import org.xml.sax.SAXException; +import org.xml.sax.SAXNotRecognizedException; +import org.xml.sax.XMLReader; +import org.xml.sax.helpers.XMLReaderFactory; + +import gnu.xml.util.Resolver; +import gnu.xml.util.XMLWriter; + + +/** + * Input is sent as an XML request to given URI, and the output of this + * filter is the parsed response to that request. + * A connection is opened to the remote URI when the startDocument call is + * issued through this filter, and the request is finished when the + * endDocument call is issued. Events should be written quickly enough to + * prevent the remote HTTP server from aborting the connection due to + * inactivity; you may want to buffer text in an earlier pipeline stage. + * If your application requires validity checking of such + * outputs, have the output pipeline include a validation stage. + * + * <p>In effect, this makes a remote procedure call to the URI, with the + * request and response document syntax as chosen by the application. + * <em>Note that all the input events must be seen, and sent to the URI, + * before the first output event can be seen. </em> Clients are delayed + * at least by waiting for the server to respond, constraining concurrency. + * Services can thus be used to synchronize concurrent activities, and + * even to prioritize service among different clients. + * + * <p> You are advised to avoid restricting yourself to an "RPC" model + * for distributed computation. With a World Wide Web, network latencies + * and failures (e.g. non-availability) + * are significant; adopting a "procedure" model, rather than a workflow + * model where bulk requests are sent and worked on asynchronously, is not + * generally an optimal system-wide architecture. When the messages may + * need authentication, such as with an OpenPGP signature, or when server + * loads don't argue in favor of immediate responses, non-RPC models can + * be advantageous. (So-called "peer to peer" computing models are one + * additional type of model, though too often that term is applied to + * systems that still have a centralized control structure.) + * + * <p> <em>Be strict in what you send, liberal in what you accept,</em> as + * the Internet tradition goes. Strictly conformant data should never cause + * problems to its receiver; make your request pipeline be very strict, and + * don't compromise on that. Make your response pipeline strict as well, + * but be ready to tolerate specific mild, temporary, and well-documented + * variations from specific communications peers. + * + * @see XmlServlet + * + * @author David Brownell + */ +final public class CallFilter implements EventConsumer +{ + private Requestor req; + private EventConsumer next; + private URL target; + private URLConnection conn; + private ErrorHandler errHandler; + + + /** + * Initializes a call filter so that its inputs are sent to the + * specified URI, and its outputs are sent to the next consumer + * provided. + * + * @exception IOException if the URI isn't accepted as a URL + */ + // constructor used by PipelineFactory + public CallFilter (String uri, EventConsumer next) + throws IOException + { + this.next = next; + req = new Requestor (); + setCallTarget (uri); + } + + /** + * Assigns the URI of the call target to be used. + * Does not affect calls currently being made. + */ + final public void setCallTarget (String uri) + throws IOException + { + target = new URL (uri); + } + + /** + * Assigns the error handler to be used to present most fatal + * errors. + */ + public void setErrorHandler (ErrorHandler handler) + { + req.setErrorHandler (handler); + } + + + /** + * Returns the call target's URI. + */ + final public String getCallTarget () + { + return target.toString (); + } + + /** Returns the content handler currently in use. */ + final public org.xml.sax.ContentHandler getContentHandler () + { + return req; + } + + /** Returns the DTD handler currently in use. */ + final public DTDHandler getDTDHandler () + { + return req; + } + + + /** + * Returns the declaration or lexical handler currently in + * use, or throws an exception for other properties. + */ + final public Object getProperty (String id) + throws SAXNotRecognizedException + { + if (EventFilter.DECL_HANDLER.equals (id)) + return req; + if (EventFilter.LEXICAL_HANDLER.equals (id)) + return req; + throw new SAXNotRecognizedException (id); + } + + + // JDK 1.1 seems to need it to be done this way, sigh + ErrorHandler getErrorHandler () { return errHandler; } + + // + // Takes input and echoes to server as POST input. + // Then sends the POST reply to the next pipeline element. + // + final class Requestor extends XMLWriter + { + Requestor () + { + super ((Writer)null); + } + + public synchronized void startDocument () throws SAXException + { + // Connect to remote object and set up to send it XML text + try { + if (conn != null) + throw new IllegalStateException ("call is being made"); + + conn = target.openConnection (); + conn.setDoOutput (true); + conn.setRequestProperty ("Content-Type", + "application/xml;charset=UTF-8"); + + setWriter (new OutputStreamWriter ( + conn.getOutputStream (), + "UTF8"), "UTF-8"); + + } catch (IOException e) { + fatal ("can't write (POST) to URI: " + target, e); + } + + // NOW base class can safely write that text! + super.startDocument (); + } + + public void endDocument () throws SAXException + { + // + // Finish writing the request (for HTTP, a POST); + // this closes the output stream. + // + super.endDocument (); + + // + // Receive the response. + // Produce events for the next stage. + // + InputSource source; + XMLReader producer; + String encoding; + + try { + + source = new InputSource (conn.getInputStream ()); + +// FIXME if status is anything but success, report it!! It'd be good to +// save the request data just in case we need to deal with a forward. + + encoding = Resolver.getEncoding (conn.getContentType ()); + if (encoding != null) + source.setEncoding (encoding); + + producer = XMLReaderFactory.createXMLReader (); + producer.setErrorHandler (getErrorHandler ()); + EventFilter.bind (producer, next); + producer.parse (source); + conn = null; + + } catch (IOException e) { + fatal ("I/O Exception reading response, " + e.getMessage (), e); + } + } + } +} diff --git a/libjava/classpath/gnu/xml/pipeline/DomConsumer.java b/libjava/classpath/gnu/xml/pipeline/DomConsumer.java new file mode 100644 index 000000000..141f36eca --- /dev/null +++ b/libjava/classpath/gnu/xml/pipeline/DomConsumer.java @@ -0,0 +1,967 @@ +/* DomConsumer.java -- + Copyright (C) 1999,2000,2001 Free Software Foundation, Inc. + +This file is part of GNU Classpath. + +GNU Classpath is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2, or (at your option) +any later version. + +GNU Classpath is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GNU Classpath; see the file COPYING. If not, write to the +Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA +02110-1301 USA. + +Linking this library statically or dynamically with other modules is +making a combined work based on this library. Thus, the terms and +conditions of the GNU General Public License cover the whole +combination. + +As a special exception, the copyright holders of this library give you +permission to link this library with independent modules to produce an +executable, regardless of the license terms of these independent +modules, and to copy and distribute the resulting executable under +terms of your choice, provided that you also meet, for each linked +independent module, the terms and conditions of the license of that +module. An independent module is a module which is not derived from +or based on this library. If you modify this library, you may extend +this exception to your version of the library, but you are not +obligated to do so. If you do not wish to do so, delete this +exception statement from your version. */ + +package gnu.xml.pipeline; + +import gnu.xml.util.DomParser; + +import org.xml.sax.Attributes; +import org.xml.sax.ContentHandler; +import org.xml.sax.DTDHandler; +import org.xml.sax.ErrorHandler; +import org.xml.sax.Locator; +import org.xml.sax.SAXException; +import org.xml.sax.SAXNotRecognizedException; +import org.xml.sax.SAXParseException; +import org.xml.sax.ext.DeclHandler; +import org.xml.sax.ext.LexicalHandler; +import org.xml.sax.helpers.AttributesImpl; +import org.w3c.dom.Attr; +import org.w3c.dom.CDATASection; +import org.w3c.dom.CharacterData; +import org.w3c.dom.Document; +import org.w3c.dom.DOMImplementation; +import org.w3c.dom.Element; +import org.w3c.dom.EntityReference; +import org.w3c.dom.Node; +import org.w3c.dom.ProcessingInstruction; +import org.w3c.dom.Text; + +/** + * This consumer builds a DOM Document from its input, acting either as a + * pipeline terminus or as an intermediate buffer. When a document's worth + * of events has been delivered to this consumer, that document is read with + * a {@link DomParser} and sent to the next consumer. It is also available + * as a read-once property. + * + * <p>The DOM tree is constructed as faithfully as possible. There are some + * complications since a DOM should expose behaviors that can't be implemented + * without API backdoors into that DOM, and because some SAX parsers don't + * report all the information that DOM permits to be exposed. The general + * problem areas involve information from the Document Type Declaration (DTD). + * DOM only represents a limited subset, but has some behaviors that depend + * on much deeper knowledge of a document's DTD. You shouldn't have much to + * worry about unless you change handling of "noise" nodes from its default + * setting (which ignores them all); note if you use JAXP to populate your + * DOM trees, it wants to save "noise" nodes by default. (Such nodes include + * ignorable whitespace, comments, entity references and CDATA boundaries.) + * Otherwise, your + * main worry will be if you use a SAX parser that doesn't flag ignorable + * whitespace unless it's validating (few don't). + * + * <p> The SAX2 events used as input must contain XML Names for elements + * and attributes, with original prefixes. In SAX2, + * this is optional unless the "namespace-prefixes" parser feature is set. + * Moreover, many application components won't provide completely correct + * structures anyway. <em>Before you convert a DOM to an output document, + * you should plan to postprocess it to create or repair such namespace + * information.</em> The {@link NSFilter} pipeline stage does such work. + * + * <p> <em>Note: changes late in DOM L2 process made it impractical to + * attempt to create the DocumentType node in any implementation-neutral way, + * much less to populate it (L1 didn't support even creating such nodes). + * To create and populate such a node, subclass the inner + * {@link DomConsumer.Handler} class and teach it about the backdoors into + * whatever DOM implementation you want. It's possible that some revised + * DOM API (L3?) will make this problem solvable again. </em> + * + * @see DomParser + * + * @author David Brownell + */ +public class DomConsumer implements EventConsumer +{ + private Class domImpl; + + private boolean hidingCDATA = true; + private boolean hidingComments = true; + private boolean hidingWhitespace = true; + private boolean hidingReferences = true; + + private Handler handler; + private ErrorHandler errHandler; + + private EventConsumer next; + + // FIXME: this can't be a generic pipeline stage just now, + // since its input became a Class not a String (to be turned + // into a class, using the right class loader) + + + /** + * Configures this pipeline terminus to use the specified implementation + * of DOM when constructing its result value. + * + * @param impl class implementing {@link org.w3c.dom.Document Document} + * which publicly exposes a default constructor + * + * @exception SAXException when there is a problem creating an + * empty DOM document using the specified implementation + */ + public DomConsumer (Class impl) + throws SAXException + { + domImpl = impl; + handler = new Handler (this); + } + + /** + * This is the hook through which a subclass provides a handler + * which knows how to access DOM extensions, specific to some + * implementation, to record additional data in a DOM. + * Treat this as part of construction; don't call it except + * before (or between) parses. + */ + protected void setHandler (Handler h) + { + handler = h; + } + + + private Document emptyDocument () + throws SAXException + { + try { + return (Document) domImpl.newInstance (); + } catch (IllegalAccessException e) { + throw new SAXException ("can't access constructor: " + + e.getMessage ()); + } catch (InstantiationException e) { + throw new SAXException ("can't instantiate Document: " + + e.getMessage ()); + } + } + + + /** + * Configures this consumer as a buffer/filter, using the specified + * DOM implementation when constructing its result value. + * + * <p> This event consumer acts as a buffer and filter, in that it + * builds a DOM tree and then writes it out when <em>endDocument</em> + * is invoked. Because of the limitations of DOM, much information + * will as a rule not be seen in that replay. To get a full fidelity + * copy of the input event stream, use a {@link TeeConsumer}. + * + * @param impl class implementing {@link org.w3c.dom.Document Document} + * which publicly exposes a default constructor + * @param next receives a "replayed" sequence of parse events when + * the <em>endDocument</em> method is invoked. + * + * @exception SAXException when there is a problem creating an + * empty DOM document using the specified DOM implementation + */ + public DomConsumer (Class impl, EventConsumer n) + throws SAXException + { + this (impl); + next = n; + } + + + /** + * Returns the document constructed from the preceding + * sequence of events. This method should not be + * used again until another sequence of events has been + * given to this EventConsumer. + */ + final public Document getDocument () + { + return handler.clearDocument (); + } + + public void setErrorHandler (ErrorHandler handler) + { + errHandler = handler; + } + + + /** + * Returns true if the consumer is hiding entity references nodes + * (the default), and false if EntityReference nodes should + * instead be created. Such EntityReference nodes will normally be + * empty, unless an implementation arranges to populate them and then + * turn them back into readonly objects. + * + * @see #setHidingReferences + */ + final public boolean isHidingReferences () + { return hidingReferences; } + + /** + * Controls whether the consumer will hide entity expansions, + * or will instead mark them with entity reference nodes. + * + * @see #isHidingReferences + * @param flag False if entity reference nodes will appear + */ + final public void setHidingReferences (boolean flag) + { hidingReferences = flag; } + + + /** + * Returns true if the consumer is hiding comments (the default), + * and false if they should be placed into the output document. + * + * @see #setHidingComments + */ + public final boolean isHidingComments () + { return hidingComments; } + + /** + * Controls whether the consumer is hiding comments. + * + * @see #isHidingComments + */ + public final void setHidingComments (boolean flag) + { hidingComments = flag; } + + + /** + * Returns true if the consumer is hiding ignorable whitespace + * (the default), and false if such whitespace should be placed + * into the output document as children of element nodes. + * + * @see #setHidingWhitespace + */ + public final boolean isHidingWhitespace () + { return hidingWhitespace; } + + /** + * Controls whether the consumer hides ignorable whitespace + * + * @see #isHidingComments + */ + public final void setHidingWhitespace (boolean flag) + { hidingWhitespace = flag; } + + + /** + * Returns true if the consumer is saving CDATA boundaries, or + * false (the default) otherwise. + * + * @see #setHidingCDATA + */ + final public boolean isHidingCDATA () + { return hidingCDATA; } + + /** + * Controls whether the consumer will save CDATA boundaries. + * + * @see #isHidingCDATA + * @param flag True to treat CDATA text differently from other + * text nodes + */ + final public void setHidingCDATA (boolean flag) + { hidingCDATA = flag; } + + + + /** Returns the document handler being used. */ + final public ContentHandler getContentHandler () + { return handler; } + + /** Returns the DTD handler being used. */ + final public DTDHandler getDTDHandler () + { return handler; } + + /** + * Returns the lexical handler being used. + * (DOM construction can't really use declaration handlers.) + */ + final public Object getProperty (String id) + throws SAXNotRecognizedException + { + if ("http://xml.org/sax/properties/lexical-handler".equals (id)) + return handler; + if ("http://xml.org/sax/properties/declaration-handler".equals (id)) + return handler; + throw new SAXNotRecognizedException (id); + } + + EventConsumer getNext () { return next; } + + ErrorHandler getErrorHandler () { return errHandler; } + + /** + * Class used to intercept various parsing events and use them to + * populate a DOM document. Subclasses would typically know and use + * backdoors into specific DOM implementations, used to implement + * DTD-related functionality. + * + * <p> Note that if this ever throws a DOMException (runtime exception) + * that will indicate a bug in the DOM (e.g. doesn't support something + * per specification) or the parser (e.g. emitted an illegal name, or + * accepted illegal input data). </p> + */ + public static class Handler + implements ContentHandler, LexicalHandler, + DTDHandler, DeclHandler + { + protected DomConsumer consumer; + + private DOMImplementation impl; + private Document document; + private boolean isL2; + + private Locator locator; + private Node top; + private boolean inCDATA; + private boolean mergeCDATA; + private boolean inDTD; + private String currentEntity; + + private boolean recreatedAttrs; + private AttributesImpl attributes = new AttributesImpl (); + + /** + * Subclasses may use SAX2 events to provide additional + * behaviors in the resulting DOM. + */ + protected Handler (DomConsumer consumer) + throws SAXException + { + this.consumer = consumer; + document = consumer.emptyDocument (); + impl = document.getImplementation (); + isL2 = impl.hasFeature ("XML", "2.0"); + } + + private void fatal (String message, Exception x) + throws SAXException + { + SAXParseException e; + ErrorHandler errHandler = consumer.getErrorHandler (); + + if (locator == null) + e = new SAXParseException (message, null, null, -1, -1, x); + else + e = new SAXParseException (message, locator, x); + if (errHandler != null) + errHandler.fatalError (e); + throw e; + } + + /** + * Returns and forgets the document produced. If the handler is + * reused, a new document may be created. + */ + Document clearDocument () + { + Document retval = document; + document = null; + locator = null; + return retval; + } + + /** + * Returns the document under construction. + */ + protected Document getDocument () + { return document; } + + /** + * Returns the current node being populated. This is usually + * an Element or Document, but it might be an EntityReference + * node if some implementation-specific code knows how to put + * those into the result tree and later mark them as readonly. + */ + protected Node getTop () + { return top; } + + + // SAX1 + public void setDocumentLocator (Locator locator) + { + this.locator = locator; + } + + // SAX1 + public void startDocument () + throws SAXException + { + if (document == null) + try { + if (isL2) { + // couple to original implementation + document = impl.createDocument (null, "foo", null); + document.removeChild (document.getFirstChild ()); + } else { + document = consumer.emptyDocument (); + } + } catch (Exception e) { + fatal ("DOM create document", e); + } + top = document; + } + + // SAX1 + public void endDocument () + throws SAXException + { + try { + if (consumer.getNext () != null && document != null) { + DomParser parser = new DomParser (document); + + EventFilter.bind (parser, consumer.getNext ()); + parser.parse ("ignored"); + } + } finally { + top = null; + } + } + + // SAX1 + public void processingInstruction (String target, String data) + throws SAXException + { + // we can't create populated entity ref nodes using + // only public DOM APIs (they've got to be readonly) + if (currentEntity != null) + return; + + ProcessingInstruction pi; + + if (isL2 + // && consumer.isUsingNamespaces () + && target.indexOf (':') != -1) + namespaceError ( + "PI target name is namespace nonconformant: " + + target); + if (inDTD) + return; + pi = document.createProcessingInstruction (target, data); + top.appendChild (pi); + } + + /** + * Subclasses may overrride this method to provide a more efficient + * way to construct text nodes. + * Typically, copying the text into a single character array will + * be more efficient than doing that as well as allocating other + * needed for a String, including an internal StringBuffer. + * Those additional memory and CPU costs can be incurred later, + * if ever needed. + * Unfortunately the standard DOM factory APIs encourage those costs + * to be incurred early. + */ + protected Text createText ( + boolean isCDATA, + char ch [], + int start, + int length + ) { + String value = new String (ch, start, length); + + if (isCDATA) + return document.createCDATASection (value); + else + return document.createTextNode (value); + } + + // SAX1 + public void characters (char ch [], int start, int length) + throws SAXException + { + // we can't create populated entity ref nodes using + // only public DOM APIs (they've got to be readonly + // at creation time) + if (currentEntity != null) + return; + + Node lastChild = top.getLastChild (); + + // merge consecutive text or CDATA nodes if appropriate. + if (lastChild instanceof Text) { + if (consumer.isHidingCDATA () + // consecutive Text content ... always merge + || (!inCDATA + && !(lastChild instanceof CDATASection)) + // consecutive CDATASection content ... don't + // merge between sections, only within them + || (inCDATA && mergeCDATA + && lastChild instanceof CDATASection) + ) { + CharacterData last = (CharacterData) lastChild; + String value = new String (ch, start, length); + + last.appendData (value); + return; + } + } + if (inCDATA && !consumer.isHidingCDATA ()) { + top.appendChild (createText (true, ch, start, length)); + mergeCDATA = true; + } else + top.appendChild (createText (false, ch, start, length)); + } + + // SAX2 + public void skippedEntity (String name) + throws SAXException + { + // this callback is useless except to report errors, since + // we can't know if the ref was in content, within an + // attribute, within a declaration ... only one of those + // cases supports more intelligent action than a panic. + fatal ("skipped entity: " + name, null); + } + + // SAX2 + public void startPrefixMapping (String prefix, String uri) + throws SAXException + { + // reconstruct "xmlns" attributes deleted by all + // SAX2 parsers without "namespace-prefixes" = true + if ("".equals (prefix)) + attributes.addAttribute ("", "", "xmlns", + "CDATA", uri); + else + attributes.addAttribute ("", "", "xmlns:" + prefix, + "CDATA", uri); + recreatedAttrs = true; + } + + // SAX2 + public void endPrefixMapping (String prefix) + throws SAXException + { } + + // SAX2 + public void startElement ( + String uri, + String localName, + String qName, + Attributes atts + ) throws SAXException + { + // we can't create populated entity ref nodes using + // only public DOM APIs (they've got to be readonly) + if (currentEntity != null) + return; + + // parser discarded basic information; DOM tree isn't writable + // without massaging to assign prefixes to all nodes. + // the "NSFilter" class does that massaging. + if (qName.length () == 0) + qName = localName; + + + Element element; + int length = atts.getLength (); + + if (!isL2) { + element = document.createElement (qName); + + // first the explicit attributes ... + length = atts.getLength (); + for (int i = 0; i < length; i++) + element.setAttribute (atts.getQName (i), + atts.getValue (i)); + // ... then any recreated ones (DOM deletes duplicates) + if (recreatedAttrs) { + recreatedAttrs = false; + length = attributes.getLength (); + for (int i = 0; i < length; i++) + element.setAttribute (attributes.getQName (i), + attributes.getValue (i)); + attributes.clear (); + } + + top.appendChild (element); + top = element; + return; + } + + // For an L2 DOM when namespace use is enabled, use + // createElementNS/createAttributeNS except when + // (a) it's an element in the default namespace, or + // (b) it's an attribute with no prefix + String namespace; + + if (localName.length () != 0) + namespace = (uri.length () == 0) ? null : uri; + else + namespace = getNamespace (getPrefix (qName), atts); + + if (namespace == null) + element = document.createElement (qName); + else + element = document.createElementNS (namespace, qName); + + populateAttributes (element, atts); + if (recreatedAttrs) { + recreatedAttrs = false; + // ... DOM deletes any duplicates + populateAttributes (element, attributes); + attributes.clear (); + } + + top.appendChild (element); + top = element; + } + + final static String xmlnsURI = "http://www.w3.org/2000/xmlns/"; + + private void populateAttributes (Element element, Attributes attrs) + throws SAXParseException + { + int length = attrs.getLength (); + + for (int i = 0; i < length; i++) { + String type = attrs.getType (i); + String value = attrs.getValue (i); + String name = attrs.getQName (i); + String local = attrs.getLocalName (i); + String uri = attrs.getURI (i); + + // parser discarded basic information, DOM tree isn't writable + if (name.length () == 0) + name = local; + + // all attribute types other than these three may not + // contain scoped names... enumerated attributes get + // reported as NMTOKEN, except for NOTATION values + if (!("CDATA".equals (type) + || "NMTOKEN".equals (type) + || "NMTOKENS".equals (type))) { + if (value.indexOf (':') != -1) { + namespaceError ( + "namespace nonconformant attribute value: " + + "<" + element.getNodeName () + + " " + name + "='" + value + "' ...>"); + } + } + + // xmlns="" is legal (undoes default NS) + // xmlns:foo="" is illegal + String prefix = getPrefix (name); + String namespace; + + if ("xmlns".equals (prefix)) { + if ("".equals (value)) + namespaceError ("illegal null namespace decl, " + name); + namespace = xmlnsURI; + } else if ("xmlns".equals (name)) + namespace = xmlnsURI; + + else if (prefix == null) + namespace = null; + else if (!"".equals(uri) && uri.length () != 0) + namespace = uri; + else + namespace = getNamespace (prefix, attrs); + + if (namespace == null) + element.setAttribute (name, value); + else + element.setAttributeNS (namespace, name, value); + } + } + + private String getPrefix (String name) + { + int temp; + + if ((temp = name.indexOf (':')) > 0) + return name.substring (0, temp); + return null; + } + + // used with SAX1-level parser output + private String getNamespace (String prefix, Attributes attrs) + throws SAXParseException + { + String namespace; + String decl; + + // defaulting + if (prefix == null) { + decl = "xmlns"; + namespace = attrs.getValue (decl); + if ("".equals (namespace)) + return null; + else if (namespace != null) + return namespace; + + // "xmlns" is like a keyword + // ... according to the Namespace REC, but DOM L2 CR2+ + // and Infoset violate that by assigning a namespace. + // that conflict is resolved elsewhere. + } else if ("xmlns".equals (prefix)) + return null; + + // "xml" prefix is fixed + else if ("xml".equals (prefix)) + return "http://www.w3.org/XML/1998/namespace"; + + // otherwise, expect a declaration + else { + decl = "xmlns:" + prefix; + namespace = attrs.getValue (decl); + } + + // if we found a local declaration, great + if (namespace != null) + return namespace; + + + // ELSE ... search up the tree we've been building + for (Node n = top; + n != null && n.getNodeType () != Node.DOCUMENT_NODE; + n = n.getParentNode ()) { + if (n.getNodeType () == Node.ENTITY_REFERENCE_NODE) + continue; + Element e = (Element) n; + Attr attr = e.getAttributeNode (decl); + if (attr != null) + return attr.getNodeValue (); + } + // see above re "xmlns" as keyword + if ("xmlns".equals (decl)) + return null; + + namespaceError ("Undeclared namespace prefix: " + prefix); + return null; + } + + // SAX2 + public void endElement (String uri, String localName, String qName) + throws SAXException + { + // we can't create populated entity ref nodes using + // only public DOM APIs (they've got to be readonly) + if (currentEntity != null) + return; + + top = top.getParentNode (); + } + + // SAX1 (mandatory reporting if validating) + public void ignorableWhitespace (char ch [], int start, int length) + throws SAXException + { + if (consumer.isHidingWhitespace ()) + return; + characters (ch, start, length); + } + + // SAX2 lexical event + public void startCDATA () + throws SAXException + { + inCDATA = true; + // true except for the first fragment of a cdata section + mergeCDATA = false; + } + + // SAX2 lexical event + public void endCDATA () + throws SAXException + { + inCDATA = false; + } + + // SAX2 lexical event + // + // this SAX2 callback merges two unrelated things: + // - Declaration of the root element type ... belongs with + // the other DTD declaration methods, NOT HERE. + // - IDs for the optional external subset ... belongs here + // with other lexical information. + // + // ...and it doesn't include the internal DTD subset, desired + // both to support DOM L2 and to enable "pass through" processing + // + public void startDTD (String name, String publicId, String SystemId) + throws SAXException + { + // need to filter out comments and PIs within the DTD + inDTD = true; + } + + // SAX2 lexical event + public void endDTD () + throws SAXException + { + inDTD = false; + } + + // SAX2 lexical event + public void comment (char ch [], int start, int length) + throws SAXException + { + Node comment; + + // we can't create populated entity ref nodes using + // only public DOM APIs (they've got to be readonly) + if (consumer.isHidingComments () + || inDTD + || currentEntity != null) + return; + comment = document.createComment (new String (ch, start, length)); + top.appendChild (comment); + } + + /** + * May be overridden by subclasses to return true, indicating + * that entity reference nodes can be populated and then made + * read-only. + */ + public boolean canPopulateEntityRefs () + { return false; } + + // SAX2 lexical event + public void startEntity (String name) + throws SAXException + { + // are we ignoring what would be contents of an + // entity ref, since we can't populate it? + if (currentEntity != null) + return; + + // Are we hiding all entity boundaries? + if (consumer.isHidingReferences ()) + return; + + // SAX2 shows parameter entities; DOM hides them + if (name.charAt (0) == '%' || "[dtd]".equals (name)) + return; + + // Since we can't create a populated entity ref node in any + // standard way, we create an unpopulated one. + EntityReference ref = document.createEntityReference (name); + top.appendChild (ref); + top = ref; + + // ... allowing subclasses to populate them + if (!canPopulateEntityRefs ()) + currentEntity = name; + } + + // SAX2 lexical event + public void endEntity (String name) + throws SAXException + { + if (name.charAt (0) == '%' || "[dtd]".equals (name)) + return; + if (name.equals (currentEntity)) + currentEntity = null; + if (!consumer.isHidingReferences ()) + top = top.getParentNode (); + } + + + // SAX1 DTD event + public void notationDecl ( + String name, + String publicId, String SystemId + ) throws SAXException + { + /* IGNORE -- no public DOM API lets us store these + * into the doctype node + */ + } + + // SAX1 DTD event + public void unparsedEntityDecl ( + String name, + String publicId, String SystemId, + String notationName + ) throws SAXException + { + /* IGNORE -- no public DOM API lets us store these + * into the doctype node + */ + } + + // SAX2 declaration event + public void elementDecl (String name, String model) + throws SAXException + { + /* IGNORE -- no content model support in DOM L2 */ + } + + // SAX2 declaration event + public void attributeDecl ( + String eName, + String aName, + String type, + String mode, + String value + ) throws SAXException + { + /* IGNORE -- no attribute model support in DOM L2 */ + } + + // SAX2 declaration event + public void internalEntityDecl (String name, String value) + throws SAXException + { + /* IGNORE -- no public DOM API lets us store these + * into the doctype node + */ + } + + // SAX2 declaration event + public void externalEntityDecl ( + String name, + String publicId, + String SystemId + ) throws SAXException + { + /* IGNORE -- no public DOM API lets us store these + * into the doctype node + */ + } + + // + // These really should offer the option of nonfatal handling, + // like other validity errors, though that would cause major + // chaos in the DOM data structures. DOM is already spec'd + // to treat many of these as fatal, so this is consistent. + // + private void namespaceError (String description) + throws SAXParseException + { + SAXParseException err; + + err = new SAXParseException (description, locator); + throw err; + } + } +} diff --git a/libjava/classpath/gnu/xml/pipeline/EventConsumer.java b/libjava/classpath/gnu/xml/pipeline/EventConsumer.java new file mode 100644 index 000000000..a0a8824f7 --- /dev/null +++ b/libjava/classpath/gnu/xml/pipeline/EventConsumer.java @@ -0,0 +1,95 @@ +/* EventConsumer.java -- + Copyright (C) 1999,2000,2001 Free Software Foundation, Inc. + +This file is part of GNU Classpath. + +GNU Classpath is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2, or (at your option) +any later version. + +GNU Classpath is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GNU Classpath; see the file COPYING. If not, write to the +Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA +02110-1301 USA. + +Linking this library statically or dynamically with other modules is +making a combined work based on this library. Thus, the terms and +conditions of the GNU General Public License cover the whole +combination. + +As a special exception, the copyright holders of this library give you +permission to link this library with independent modules to produce an +executable, regardless of the license terms of these independent +modules, and to copy and distribute the resulting executable under +terms of your choice, provided that you also meet, for each linked +independent module, the terms and conditions of the license of that +module. An independent module is a module which is not derived from +or based on this library. If you modify this library, you may extend +this exception to your version of the library, but you are not +obligated to do so. If you do not wish to do so, delete this +exception statement from your version. */ + +package gnu.xml.pipeline; + +import org.xml.sax.*; + + +/** + * Collects the event consumption apparatus of a SAX pipeline stage. + * Consumers which permit some handlers or other characteristics to be + * configured will provide methods to support that configuration. + * + * <p> Two important categories of consumers include <em>filters</em>, which + * process events and pass them on to other consumers, and <em>terminus</em> + * (or <em>terminal</em>) stages, which don't pass events on. Filters are not + * necessarily derived from the {@link EventFilter} class, although that + * class can substantially simplify their construction by automating the + * most common activities. + * + * <p> Event consumers which follow certain conventions for the signatures + * of their constructors can be automatically assembled into pipelines + * by the {@link PipelineFactory} class. + * + * @author David Brownell + */ +public interface EventConsumer +{ + /** Most stages process these core SAX callbacks. */ + public ContentHandler getContentHandler (); + + /** Few stages will use unparsed entities. */ + public DTDHandler getDTDHandler (); + + /** + * This method works like the SAX2 XMLReader method of the same name, + * and is used to retrieve the optional lexical and declaration handlers + * in a pipeline. + * + * @param id This is a URI identifying the type of property desired. + * @return The value of that property, if it is defined. + * + * @exception SAXNotRecognizedException Thrown if the particular + * pipeline stage does not understand the specified identifier. + */ + public Object getProperty (String id) + throws SAXNotRecognizedException; + + /** + * This method provides a filter stage with a handler that abstracts + * presentation of warnings and both recoverable and fatal errors. + * Most pipeline stages should share a single policy and mechanism + * for such reports, since application components require consistency + * in such activities. Accordingly, typical responses to this method + * invocation involve saving the handler for use; filters will pass + * it on to any other consumers they use. + * + * @param handler encapsulates error handling policy for this stage + */ + public void setErrorHandler (ErrorHandler handler); +} diff --git a/libjava/classpath/gnu/xml/pipeline/EventFilter.java b/libjava/classpath/gnu/xml/pipeline/EventFilter.java new file mode 100644 index 000000000..b3cc2d654 --- /dev/null +++ b/libjava/classpath/gnu/xml/pipeline/EventFilter.java @@ -0,0 +1,796 @@ +/* EventFilter.java -- + Copyright (C) 1999,2000,2001 Free Software Foundation, Inc. + +This file is part of GNU Classpath. + +GNU Classpath is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2, or (at your option) +any later version. + +GNU Classpath is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GNU Classpath; see the file COPYING. If not, write to the +Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA +02110-1301 USA. + +Linking this library statically or dynamically with other modules is +making a combined work based on this library. Thus, the terms and +conditions of the GNU General Public License cover the whole +combination. + +As a special exception, the copyright holders of this library give you +permission to link this library with independent modules to produce an +executable, regardless of the license terms of these independent +modules, and to copy and distribute the resulting executable under +terms of your choice, provided that you also meet, for each linked +independent module, the terms and conditions of the license of that +module. An independent module is a module which is not derived from +or based on this library. If you modify this library, you may extend +this exception to your version of the library, but you are not +obligated to do so. If you do not wish to do so, delete this +exception statement from your version. */ + +package gnu.xml.pipeline; + +import java.lang.reflect.InvocationTargetException; +import java.lang.reflect.Method; + +import org.xml.sax.*; +import org.xml.sax.ext.*; +import org.xml.sax.helpers.XMLFilterImpl; + +/** + * A customizable event consumer, used to assemble various kinds of filters + * using SAX handlers and an optional second consumer. It can be constructed + * in two ways: <ul> + * + * <li> To serve as a passthrough, sending all events to a second consumer. + * The second consumer may be identified through {@link #getNext}. + * + * <li> To serve as a dead end, with all handlers null; + * {@link #getNext} returns null. + * + * </ul> + * + * <p> Additionally, SAX handlers may be assigned, which completely replace + * the "upstream" view (through {@link EventConsumer}) of handlers, initially + * null or the "next" consumer provided to the constructor. To make + * it easier to build specialized filter classes, this class implements + * all the standard SAX consumer handlers, and those implementations + * delegate "downstream" to the consumer accessed by {@link #getNext}. + * + * <p> The simplest way to create a custom a filter class is to create a + * subclass which overrides one or more handler interface methods. The + * constructor for that subclass then registers itself as a handler for + * those interfaces using a call such as <em>setContentHandler(this)</em>, + * so the "upstream" view of event delivery is modified from the state + * established in the base class constructor. That way, + * the overridden methods intercept those event callbacks + * as they go "downstream", and + * all other event callbacks will pass events to any next consumer. + * Overridden methods may invoke superclass methods (perhaps after modifying + * parameters) if they wish to delegate such calls. Such subclasses + * should use {@link #getErrorHandler} to report errors using the + * common error reporting mechanism. + * + * <p> Another important technique is to construct a filter consisting + * of only a few specific types of handler. For example, one could easily + * prune out lexical events or various declarations by providing handlers + * which don't pass those events downstream, or by providing null handlers. + * + * <hr /> + * + * <p> This may be viewed as the consumer oriented analogue of the SAX2 + * {@link org.xml.sax.helpers.XMLFilterImpl XMLFilterImpl} class. + * Key differences include: <ul> + * + * <li> This fully separates consumer and producer roles: it + * does not implement the producer side <em>XMLReader</em> or + * <em>EntityResolver</em> interfaces, so it can only be used + * in "push" mode (it has no <em>parse()</em> methods). + * + * <li> "Extension" handlers are fully supported, enabling a + * richer set of application requirements. + * And it implements {@link EventConsumer}, which groups related + * consumer methods together, rather than leaving them separated. + * + * <li> The chaining which is visible is "downstream" to the next + * consumer, not "upstream" to the preceding producer. + * It supports "fan-in", where + * a consumer can be fed by several producers. (For "fan-out", + * see the {@link TeeConsumer} class.) + * + * <li> Event chaining is set up differently. It is intended to + * work "upstream" from terminus towards producer, during filter + * construction, as described above. + * This is part of an early binding model: + * events don't need to pass through stages which ignore them. + * + * <li> ErrorHandler support is separated, on the grounds that + * pipeline stages need to share the same error handling policy. + * For the same reason, error handler setup goes "downstream": + * when error handlers get set, they are passed to subsequent + * consumers. + * + * </ul> + * + * <p> The {@link #chainTo chainTo()} convenience routine supports chaining to + * an XMLFilterImpl, in its role as a limited functionality event + * consumer. Its event producer role ({@link XMLFilter}) is ignored. + * + * <hr /> + * + * <p> The {@link #bind bind()} routine may be used associate event pipelines + * with any kind of {@link XMLReader} that will produce the events. + * Such pipelines don't necessarily need to have any members which are + * implemented using this class. That routine has some intelligence + * which supports automatic changes to parser feature flags, letting + * event piplines become largely independent of the particular feature + * sets of parsers. + * + * @author David Brownell + */ +public class EventFilter + implements EventConsumer, ContentHandler, DTDHandler, + LexicalHandler, DeclHandler +{ + // SAX handlers + private ContentHandler docHandler, docNext; + private DTDHandler dtdHandler, dtdNext; + private LexicalHandler lexHandler, lexNext; + private DeclHandler declHandler, declNext; + // and ideally, one more for the stuff SAX2 doesn't show + + private Locator locator; + private EventConsumer next; + private ErrorHandler errHandler; + + + /** SAX2 URI prefix for standard feature flags. */ + public static final String FEATURE_URI + = "http://xml.org/sax/features/"; + /** SAX2 URI prefix for standard properties (mostly for handlers). */ + public static final String PROPERTY_URI + = "http://xml.org/sax/properties/"; + + /** SAX2 property identifier for {@link DeclHandler} events */ + public static final String DECL_HANDLER + = PROPERTY_URI + "declaration-handler"; + /** SAX2 property identifier for {@link LexicalHandler} events */ + public static final String LEXICAL_HANDLER + = PROPERTY_URI + "lexical-handler"; + + // + // These class objects will be null if the relevant class isn't linked. + // Small configurations (pJava and some kinds of embedded systems) need + // to facilitate smaller executables. So "instanceof" is undesirable + // when bind() sees if it can remove some stages. + // + // SECURITY NOTE: assuming all these classes are part of the same sealed + // package, there's no problem saving these in the instance of this class + // that's associated with "this" class loader. But that wouldn't be true + // for classes in another package. + // + private static boolean loaded; + private static Class nsClass; + private static Class validClass; + private static Class wfClass; + private static Class xincClass; + + static ClassLoader getClassLoader () + { + Method m = null; + + try { + m = Thread.class.getMethod("getContextClassLoader"); + } catch (NoSuchMethodException e) { + // Assume that we are running JDK 1.1, use the current ClassLoader + return EventFilter.class.getClassLoader(); + } + + try { + return (ClassLoader) m.invoke(Thread.currentThread()); + } catch (IllegalAccessException e) { + // assert(false) + throw new UnknownError(e.getMessage()); + } catch (InvocationTargetException e) { + // assert(e.getTargetException() instanceof SecurityException) + throw new UnknownError(e.getMessage()); + } + } + + static Class loadClass (ClassLoader classLoader, String className) + { + try { + if (classLoader == null) + return Class.forName(className); + else + return classLoader.loadClass(className); + } catch (Exception e) { + return null; + } + } + + static private void loadClasses () + { + ClassLoader loader = getClassLoader (); + + nsClass = loadClass (loader, "gnu.xml.pipeline.NSFilter"); + validClass = loadClass (loader, "gnu.xml.pipeline.ValidationConsumer"); + wfClass = loadClass (loader, "gnu.xml.pipeline.WellFormednessFilter"); + xincClass = loadClass (loader, "gnu.xml.pipeline.XIncludeFilter"); + loaded = true; + } + + + /** + * Binds the standard SAX2 handlers from the specified consumer + * pipeline to the specified producer. These handlers include the core + * {@link ContentHandler} and {@link DTDHandler}, plus the extension + * {@link DeclHandler} and {@link LexicalHandler}. Any additional + * application-specific handlers need to be bound separately. + * The {@link ErrorHandler} is handled differently: the producer's + * error handler is passed through to the consumer pipeline. + * The producer is told to include namespace prefix information if it + * can, since many pipeline stages need that Infoset information to + * work well. + * + * <p> At the head of the pipeline, certain standard event filters are + * recognized and handled specially. This facilitates construction + * of processing pipelines that work regardless of the capabilities + * of the XMLReader implementation in use; for example, it permits + * validating output of a {@link gnu.xml.util.DomParser}. <ul> + * + * <li> {@link NSFilter} will be removed if the producer can be + * told not to discard namespace data, using the "namespace-prefixes" + * feature flag. + * + * <li> {@link ValidationConsumer} will be removed if the producer + * can be told to validate, using the "validation" feature flag. + * + * <li> {@link WellFormednessFilter} is always removed, on the + * grounds that no XMLReader is permitted to producee malformed + * event streams and this would just be processing overhead. + * + * <li> {@link XIncludeFilter} stops the special handling, except + * that it's told about the "namespace-prefixes" feature of the + * event producer so that the event stream is internally consistent. + * + * <li> The first consumer which is not one of those classes stops + * such special handling. This means that if you want to force + * one of those filters to be used, you could just precede it with + * an instance of {@link EventFilter} configured as a pass-through. + * You might need to do that if you are using an {@link NSFilter} + * subclass to fix names found in attributes or character data. + * + * </ul> + * + * <p> Other than that, this method works with any kind of event consumer, + * not just event filters. Note that in all cases, the standard handlers + * are assigned; any previous handler assignments for the handler will + * be overridden. + * + * @param producer will deliver events to the specified consumer + * @param consumer pipeline supplying event handlers to be associated + * with the producer (may not be null) + */ + public static void bind (XMLReader producer, EventConsumer consumer) + { + Class klass = null; + boolean prefixes; + + if (!loaded) + loadClasses (); + + // DOM building, printing, layered validation, and other + // things don't work well when prefix info is discarded. + // Include it by default, whenever possible. + try { + producer.setFeature (FEATURE_URI + "namespace-prefixes", + true); + prefixes = true; + } catch (SAXException e) { + prefixes = false; + } + + // NOTE: This loop doesn't use "instanceof", since that + // would prevent compiling/linking without those classes + // being present. + while (consumer != null) { + klass = consumer.getClass (); + + // we might have already changed this problematic SAX2 default. + if (nsClass != null && nsClass.isAssignableFrom (klass)) { + if (!prefixes) + break; + consumer = ((EventFilter)consumer).getNext (); + + // the parser _might_ do DTD validation by default ... + // if not, maybe we can change this setting. + } else if (validClass != null + && validClass.isAssignableFrom (klass)) { + try { + producer.setFeature (FEATURE_URI + "validation", + true); + consumer = ((ValidationConsumer)consumer).getNext (); + } catch (SAXException e) { + break; + } + + // parsers are required not to have such bugs + } else if (wfClass != null && wfClass.isAssignableFrom (klass)) { + consumer = ((WellFormednessFilter)consumer).getNext (); + + // stop on the first pipeline stage we can't remove + } else + break; + + if (consumer == null) + klass = null; + } + + // the actual setting here doesn't matter as much + // as that producer and consumer agree + if (xincClass != null && klass != null + && xincClass.isAssignableFrom (klass)) + ((XIncludeFilter)consumer).setSavingPrefixes (prefixes); + + // Some SAX parsers can't handle null handlers -- bleech + DefaultHandler2 h = new DefaultHandler2 (); + + if (consumer != null && consumer.getContentHandler () != null) + producer.setContentHandler (consumer.getContentHandler ()); + else + producer.setContentHandler (h); + if (consumer != null && consumer.getDTDHandler () != null) + producer.setDTDHandler (consumer.getDTDHandler ()); + else + producer.setDTDHandler (h); + + try { + Object dh; + + if (consumer != null) + dh = consumer.getProperty (DECL_HANDLER); + else + dh = null; + if (dh == null) + dh = h; + producer.setProperty (DECL_HANDLER, dh); + } catch (Exception e) { /* ignore */ } + try { + Object lh; + + if (consumer != null) + lh = consumer.getProperty (LEXICAL_HANDLER); + else + lh = null; + if (lh == null) + lh = h; + producer.setProperty (LEXICAL_HANDLER, lh); + } catch (Exception e) { /* ignore */ } + + // this binding goes the other way around + if (producer.getErrorHandler () == null) + producer.setErrorHandler (h); + if (consumer != null) + consumer.setErrorHandler (producer.getErrorHandler ()); + } + + /** + * Initializes all handlers to null. + */ + // constructor used by PipelineFactory + public EventFilter () { } + + + /** + * Handlers that are not otherwise set will default to those from + * the specified consumer, making it easy to pass events through. + * If the consumer is null, all handlers are initialzed to null. + */ + // constructor used by PipelineFactory + public EventFilter (EventConsumer consumer) + { + if (consumer == null) + return; + + next = consumer; + + // We delegate through the "xxNext" handlers, and + // report the "xxHandler" ones on our input side. + + // Normally a subclass would both override handler + // methods and register itself as the "xxHandler". + + docHandler = docNext = consumer.getContentHandler (); + dtdHandler = dtdNext = consumer.getDTDHandler (); + try { + declHandler = declNext = (DeclHandler) + consumer.getProperty (DECL_HANDLER); + } catch (SAXException e) { /* leave value null */ } + try { + lexHandler = lexNext = (LexicalHandler) + consumer.getProperty (LEXICAL_HANDLER); + } catch (SAXException e) { /* leave value null */ } + } + + /** + * Treats the XMLFilterImpl as a limited functionality event consumer, + * by arranging to deliver events to it; this lets such classes be + * "wrapped" as pipeline stages. + * + * <p> <em>Upstream Event Setup:</em> + * If no handlers have been assigned to this EventFilter, then the + * handlers from specified XMLFilterImpl are returned from this + * {@link EventConsumer}: the XMLFilterImpl is just "wrapped". + * Otherwise the specified handlers will be returned. + * + * <p> <em>Downstream Event Setup:</em> + * Subclasses may chain event delivery to the specified XMLFilterImpl + * by invoking the appropiate superclass methods, + * as if their constructor passed a "next" EventConsumer to the + * constructor for this class. + * If this EventFilter has an ErrorHandler, it is assigned as + * the error handler for the XMLFilterImpl, just as would be + * done for a next stage implementing {@link EventConsumer}. + * + * @param next the next downstream component of the pipeline. + * @exception IllegalStateException if the "next" consumer has + * already been set through the constructor. + */ + public void chainTo (XMLFilterImpl next) + { + if (this.next != null) + throw new IllegalStateException (); + + docNext = next.getContentHandler (); + if (docHandler == null) + docHandler = docNext; + dtdNext = next.getDTDHandler (); + if (dtdHandler == null) + dtdHandler = dtdNext; + + try { + declNext = (DeclHandler) next.getProperty (DECL_HANDLER); + if (declHandler == null) + declHandler = declNext; + } catch (SAXException e) { /* leave value null */ } + try { + lexNext = (LexicalHandler) next.getProperty (LEXICAL_HANDLER); + if (lexHandler == null) + lexHandler = lexNext; + } catch (SAXException e) { /* leave value null */ } + + if (errHandler != null) + next.setErrorHandler (errHandler); + } + + /** + * Records the error handler that should be used by this stage, and + * passes it "downstream" to any subsequent stage. + */ + final public void setErrorHandler (ErrorHandler handler) + { + errHandler = handler; + if (next != null) + next.setErrorHandler (handler); + } + + /** + * Returns the error handler assigned this filter stage, or null + * if no such assigment has been made. + */ + final public ErrorHandler getErrorHandler () + { + return errHandler; + } + + + /** + * Returns the next event consumer in sequence; or null if there + * is no such handler. + */ + final public EventConsumer getNext () + { return next; } + + + /** + * Assigns the content handler to use; a null handler indicates + * that these events will not be forwarded. + * This overrides the previous settting for this handler, which was + * probably pointed to the next consumer by the base class constructor. + */ + final public void setContentHandler (ContentHandler h) + { + docHandler = h; + } + + /** Returns the content handler being used. */ + final public ContentHandler getContentHandler () + { + return docHandler; + } + + /** + * Assigns the DTD handler to use; a null handler indicates + * that these events will not be forwarded. + * This overrides the previous settting for this handler, which was + * probably pointed to the next consumer by the base class constructor. + */ + final public void setDTDHandler (DTDHandler h) + { dtdHandler = h; } + + /** Returns the dtd handler being used. */ + final public DTDHandler getDTDHandler () + { + return dtdHandler; + } + + /** + * Stores the property, normally a handler; a null handler indicates + * that these events will not be forwarded. + * This overrides the previous handler settting, which was probably + * pointed to the next consumer by the base class constructor. + */ + final public void setProperty (String id, Object o) + throws SAXNotRecognizedException, SAXNotSupportedException + { + try { + Object value = getProperty (id); + + if (value == o) + return; + if (DECL_HANDLER.equals (id)) { + declHandler = (DeclHandler) o; + return; + } + if (LEXICAL_HANDLER.equals (id)) { + lexHandler = (LexicalHandler) o; + return; + } + throw new SAXNotSupportedException (id); + + } catch (ClassCastException e) { + throw new SAXNotSupportedException (id); + } + } + + /** Retrieves a property of unknown intent (usually a handler) */ + final public Object getProperty (String id) + throws SAXNotRecognizedException + { + if (DECL_HANDLER.equals (id)) + return declHandler; + if (LEXICAL_HANDLER.equals (id)) + return lexHandler; + + throw new SAXNotRecognizedException (id); + } + + /** + * Returns any locator provided to the next consumer, if this class + * (or a subclass) is handling {@link ContentHandler } events. + */ + public Locator getDocumentLocator () + { return locator; } + + + // CONTENT HANDLER DELEGATIONS + + /** <b>SAX2:</b> passes this callback to the next consumer, if any */ + public void setDocumentLocator (Locator locator) + { + this.locator = locator; + if (docNext != null) + docNext.setDocumentLocator (locator); + } + + /** <b>SAX2:</b> passes this callback to the next consumer, if any */ + public void startDocument () throws SAXException + { + if (docNext != null) + docNext.startDocument (); + } + + /** <b>SAX2:</b> passes this callback to the next consumer, if any */ + public void skippedEntity (String name) throws SAXException + { + if (docNext != null) + docNext.skippedEntity (name); + } + + /** <b>SAX2:</b> passes this callback to the next consumer, if any */ + public void processingInstruction (String target, String data) + throws SAXException + { + if (docNext != null) + docNext.processingInstruction (target, data); + } + + /** <b>SAX2:</b> passes this callback to the next consumer, if any */ + public void characters (char ch [], int start, int length) + throws SAXException + { + if (docNext != null) + docNext.characters (ch, start, length); + } + + /** <b>SAX2:</b> passes this callback to the next consumer, if any */ + public void ignorableWhitespace (char ch [], int start, int length) + throws SAXException + { + if (docNext != null) + docNext.ignorableWhitespace (ch, start, length); + } + + /** <b>SAX2:</b> passes this callback to the next consumer, if any */ + public void startPrefixMapping (String prefix, String uri) + throws SAXException + { + if (docNext != null) + docNext.startPrefixMapping (prefix, uri); + } + + /** <b>SAX2:</b> passes this callback to the next consumer, if any */ + public void startElement ( + String uri, String localName, + String qName, Attributes atts + ) throws SAXException + { + if (docNext != null) + docNext.startElement (uri, localName, qName, atts); + } + + /** <b>SAX2:</b> passes this callback to the next consumer, if any */ + public void endElement (String uri, String localName, String qName) + throws SAXException + { + if (docNext != null) + docNext.endElement (uri, localName, qName); + } + + /** <b>SAX2:</b> passes this callback to the next consumer, if any */ + public void endPrefixMapping (String prefix) throws SAXException + { + if (docNext != null) + docNext.endPrefixMapping (prefix); + } + + /** <b>SAX2:</b> passes this callback to the next consumer, if any */ + public void endDocument () throws SAXException + { + if (docNext != null) + docNext.endDocument (); + locator = null; + } + + + // DTD HANDLER DELEGATIONS + + /** <b>SAX1:</b> passes this callback to the next consumer, if any */ + public void unparsedEntityDecl ( + String name, + String publicId, + String systemId, + String notationName + ) throws SAXException + { + if (dtdNext != null) + dtdNext.unparsedEntityDecl (name, publicId, systemId, notationName); + } + + /** <b>SAX1:</b> passes this callback to the next consumer, if any */ + public void notationDecl (String name, String publicId, String systemId) + throws SAXException + { + if (dtdNext != null) + dtdNext.notationDecl (name, publicId, systemId); + } + + + // LEXICAL HANDLER DELEGATIONS + + /** <b>SAX2:</b> passes this callback to the next consumer, if any */ + public void startDTD (String name, String publicId, String systemId) + throws SAXException + { + if (lexNext != null) + lexNext.startDTD (name, publicId, systemId); + } + + /** <b>SAX2:</b> passes this callback to the next consumer, if any */ + public void endDTD () + throws SAXException + { + if (lexNext != null) + lexNext.endDTD (); + } + + /** <b>SAX2:</b> passes this callback to the next consumer, if any */ + public void comment (char ch [], int start, int length) + throws SAXException + { + if (lexNext != null) + lexNext.comment (ch, start, length); + } + + /** <b>SAX2:</b> passes this callback to the next consumer, if any */ + public void startCDATA () + throws SAXException + { + if (lexNext != null) + lexNext.startCDATA (); + } + + /** <b>SAX2:</b> passes this callback to the next consumer, if any */ + public void endCDATA () + throws SAXException + { + if (lexNext != null) + lexNext.endCDATA (); + } + + /** + * <b>SAX2:</b> passes this callback to the next consumer, if any. + */ + public void startEntity (String name) + throws SAXException + { + if (lexNext != null) + lexNext.startEntity (name); + } + + /** + * <b>SAX2:</b> passes this callback to the next consumer, if any. + */ + public void endEntity (String name) + throws SAXException + { + if (lexNext != null) + lexNext.endEntity (name); + } + + + // DECLARATION HANDLER DELEGATIONS + + + /** <b>SAX2:</b> passes this callback to the next consumer, if any */ + public void elementDecl (String name, String model) + throws SAXException + { + if (declNext != null) + declNext.elementDecl (name, model); + } + + /** <b>SAX2:</b> passes this callback to the next consumer, if any */ + public void attributeDecl (String eName, String aName, + String type, String mode, String value) + throws SAXException + { + if (declNext != null) + declNext.attributeDecl (eName, aName, type, mode, value); + } + + /** <b>SAX2:</b> passes this callback to the next consumer, if any */ + public void externalEntityDecl (String name, + String publicId, String systemId) + throws SAXException + { + if (declNext != null) + declNext.externalEntityDecl (name, publicId, systemId); + } + + /** <b>SAX2:</b> passes this callback to the next consumer, if any */ + public void internalEntityDecl (String name, String value) + throws SAXException + { + if (declNext != null) + declNext.internalEntityDecl (name, value); + } +} diff --git a/libjava/classpath/gnu/xml/pipeline/LinkFilter.java b/libjava/classpath/gnu/xml/pipeline/LinkFilter.java new file mode 100644 index 000000000..e11a5eca6 --- /dev/null +++ b/libjava/classpath/gnu/xml/pipeline/LinkFilter.java @@ -0,0 +1,242 @@ +/* LinkFilter.java -- + Copyright (C) 1999,2000,2001 Free Software Foundation, Inc. + +This file is part of GNU Classpath. + +GNU Classpath is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2, or (at your option) +any later version. + +GNU Classpath is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GNU Classpath; see the file COPYING. If not, write to the +Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA +02110-1301 USA. + +Linking this library statically or dynamically with other modules is +making a combined work based on this library. Thus, the terms and +conditions of the GNU General Public License cover the whole +combination. + +As a special exception, the copyright holders of this library give you +permission to link this library with independent modules to produce an +executable, regardless of the license terms of these independent +modules, and to copy and distribute the resulting executable under +terms of your choice, provided that you also meet, for each linked +independent module, the terms and conditions of the license of that +module. An independent module is a module which is not derived from +or based on this library. If you modify this library, you may extend +this exception to your version of the library, but you are not +obligated to do so. If you do not wish to do so, delete this +exception statement from your version. */ + +package gnu.xml.pipeline; + +import java.io.IOException; +import java.net.URL; +import java.util.Enumeration; +import java.util.Vector; + +import org.xml.sax.Attributes; +import org.xml.sax.SAXException; + + +/** + * Pipeline filter to remember XHTML links found in a document, + * so they can later be crawled. Fragments are not counted, and duplicates + * are ignored. Callers are responsible for filtering out URLs they aren't + * interested in. Events are passed through unmodified. + * + * <p> Input MUST include a setDocumentLocator() call, as it's used to + * resolve relative links in the absence of a "base" element. Input MUST + * also include namespace identifiers, since it is the XHTML namespace + * identifier which is used to identify the relevant elements. + * + * <p><em>FIXME:</em> handle xml:base attribute ... in association with + * a stack of base URIs. Similarly, recognize/support XLink data. + * + * @author David Brownell + */ +public class LinkFilter extends EventFilter +{ + // for storing URIs + private Vector vector = new Vector (); + + // struct for "full" link record (tbd) + // these for troubleshooting original source: + // original uri + // uri as resolved (base, relative, etc) + // URI of originating doc + // line # + // original element + attrs (img src, desc, etc) + + // XLink model of the link ... for inter-site pairups ? + + private String baseURI; + + private boolean siteRestricted = false; + + // + // XXX leverage blacklist info (like robots.txt) + // + // XXX constructor w/param ... pipeline for sending link data + // probably XHTML --> XLink, providing info as sketched above + // + + + /** + * Constructs a new event filter, which collects links in private data + * structure for later enumeration. + */ + // constructor used by PipelineFactory + public LinkFilter () + { + super.setContentHandler (this); + } + + + /** + * Constructs a new event filter, which collects links in private data + * structure for later enumeration and passes all events, unmodified, + * to the next consumer. + */ + // constructor used by PipelineFactory + public LinkFilter (EventConsumer next) + { + super (next); + super.setContentHandler (this); + } + + + /** + * Returns an enumeration of the links found since the filter + * was constructed, or since removeAllLinks() was called. + * + * @return enumeration of strings. + */ + public Enumeration getLinks () + { + return vector.elements (); + } + + /** + * Removes records about all links reported to the event + * stream, as if the filter were newly created. + */ + public void removeAllLinks () + { + vector = new Vector (); + } + + + /** + * Collects URIs for (X)HTML content from elements which hold them. + */ + public void startElement ( + String uri, + String localName, + String qName, + Attributes atts + ) throws SAXException + { + String link; + + // Recognize XHTML links. + if ("http://www.w3.org/1999/xhtml".equals (uri)) { + + if ("a".equals (localName) || "base".equals (localName) + || "area".equals (localName)) + link = atts.getValue ("href"); + else if ("iframe".equals (localName) || "frame".equals (localName)) + link = atts.getValue ("src"); + else if ("blockquote".equals (localName) || "q".equals (localName) + || "ins".equals (localName) || "del".equals (localName)) + link = atts.getValue ("cite"); + else + link = null; + link = maybeAddLink (link); + + // "base" modifies designated baseURI + if ("base".equals (localName) && link != null) + baseURI = link; + + if ("iframe".equals (localName) || "img".equals (localName)) + maybeAddLink (atts.getValue ("longdesc")); + } + + super.startElement (uri, localName, qName, atts); + } + + private String maybeAddLink (String link) + { + int index; + + // ignore empty links and fragments inside docs + if (link == null) + return null; + if ((index = link.indexOf ("#")) >= 0) + link = link.substring (0, index); + if (link.equals ("")) + return null; + + try { + // get the real URI + URL base = new URL ((baseURI != null) + ? baseURI + : getDocumentLocator ().getSystemId ()); + URL url = new URL (base, link); + + link = url.toString (); + + // ignore duplicates + if (vector.contains (link)) + return link; + + // other than what "base" does, stick to original site: + if (siteRestricted) { + // don't switch protocols + if (!base.getProtocol ().equals (url.getProtocol ())) + return link; + // don't switch servers + if (base.getHost () != null + && !base.getHost ().equals (url.getHost ())) + return link; + } + + vector.addElement (link); + + return link; + + } catch (IOException e) { + // bad URLs we don't want + } + return null; + } + + /** + * Reports an error if no Locator has been made available. + */ + public void startDocument () + throws SAXException + { + if (getDocumentLocator () == null) + throw new SAXException ("no Locator!"); + } + + /** + * Forgets about any base URI information that may be recorded. + * Applications will often want to call removeAllLinks(), likely + * after examining the links which were reported. + */ + public void endDocument () + throws SAXException + { + baseURI = null; + super.endDocument (); + } +} diff --git a/libjava/classpath/gnu/xml/pipeline/NSFilter.java b/libjava/classpath/gnu/xml/pipeline/NSFilter.java new file mode 100644 index 000000000..0fa4621d3 --- /dev/null +++ b/libjava/classpath/gnu/xml/pipeline/NSFilter.java @@ -0,0 +1,341 @@ +/* NSFilter.java -- + Copyright (C) 1999,2000,2001 Free Software Foundation, Inc. + +This file is part of GNU Classpath. + +GNU Classpath is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2, or (at your option) +any later version. + +GNU Classpath is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GNU Classpath; see the file COPYING. If not, write to the +Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA +02110-1301 USA. + +Linking this library statically or dynamically with other modules is +making a combined work based on this library. Thus, the terms and +conditions of the GNU General Public License cover the whole +combination. + +As a special exception, the copyright holders of this library give you +permission to link this library with independent modules to produce an +executable, regardless of the license terms of these independent +modules, and to copy and distribute the resulting executable under +terms of your choice, provided that you also meet, for each linked +independent module, the terms and conditions of the license of that +module. An independent module is a module which is not derived from +or based on this library. If you modify this library, you may extend +this exception to your version of the library, but you are not +obligated to do so. If you do not wish to do so, delete this +exception statement from your version. */ + +package gnu.xml.pipeline; + +import java.util.Enumeration; +import java.util.Stack; + +import org.xml.sax.Attributes; +import org.xml.sax.ErrorHandler; +import org.xml.sax.Locator; +import org.xml.sax.SAXException; +import org.xml.sax.SAXParseException; +import org.xml.sax.helpers.AttributesImpl; +import org.xml.sax.helpers.NamespaceSupport; + +/** + * This filter ensures that element and attribute names are properly prefixed, + * and that such prefixes are declared. Such data is critical for operations + * like writing XML text, and validating against DTDs: names or their prefixes + * may have been discarded, although they are essential to the exchange of + * information using XML. There are various common ways that such data + * gets discarded: <ul> + * + * <li> By default, SAX2 parsers must discard the "xmlns*" + * attributes, and may also choose not to report properly prefixed + * names for elements or attributes. (Some parsers may support + * changing the <em>namespace-prefixes</em> value from the default + * to <em>true</em>, effectively eliminating the need to use this + * filter on their output.) + * + * <li> When event streams are generated from a DOM tree, they may + * have never have had prefixes or declarations for namespaces; or + * the existing prefixes or declarations may have been invalidated + * by structural modifications to that DOM tree. + * + * <li> Other software writing SAX event streams won't necessarily + * be worrying about prefix management, and so they will need to + * have a transparent solution for managing them. + * + * </ul> + * + * <p> This filter uses a heuristic to choose the prefix to assign to any + * particular name which wasn't already corectly prefixed. The associated + * namespace will be correct, and the prefix will be declared. Original + * structures facilitating text editing, such as conventions about use of + * mnemonic prefix names or the scoping of prefixes, can't always be + * reconstructed after they are discarded, as strongly encouraged by the + * current SAX2 defaults. + * + * <p> Note that this can't possibly know whether values inside attribute + * value or document content involve prefixed names. If your application + * requires using prefixed names in such locations you'll need to add some + * appropriate logic (perhaps adding additional heuristics in a subclass). + * + * @author David Brownell + */ +public class NSFilter extends EventFilter +{ + private NamespaceSupport nsStack = new NamespaceSupport (); + private Stack elementStack = new Stack (); + + private boolean pushedContext; + private String nsTemp [] = new String [3]; + private AttributesImpl attributes = new AttributesImpl (); + private boolean usedDefault; + + // gensymmed prefixes use this root name + private static final String prefixRoot = "prefix-"; + + + /** + * Passes events through to the specified consumer, after first + * processing them. + * + * @param next the next event consumer to receive events. + */ + // constructor used by PipelineFactory + public NSFilter (EventConsumer next) + { + super (next); + + setContentHandler (this); + } + + private void fatalError (String message) + throws SAXException + { + SAXParseException e; + ErrorHandler handler = getErrorHandler (); + Locator locator = getDocumentLocator (); + + if (locator == null) + e = new SAXParseException (message, null, null, -1, -1); + else + e = new SAXParseException (message, locator); + if (handler != null) + handler.fatalError (e); + throw e; + } + + + public void startDocument () throws SAXException + { + elementStack.removeAllElements (); + nsStack.reset (); + pushedContext = false; + super.startDocument (); + } + + /** + * This call is not passed to the next consumer in the chain. + * Prefix declarations and scopes are only exposed in the form + * of attributes; this callback just records a declaration that + * will be exposed as an attribute. + */ + public void startPrefixMapping (String prefix, String uri) + throws SAXException + { + if (pushedContext == false) { + nsStack.pushContext (); + pushedContext = true; + } + + // this check is awkward, but the paranoia prevents big trouble + for (Enumeration e = nsStack.getDeclaredPrefixes (); + e.hasMoreElements (); + /* NOP */ ) { + String declared = (String) e.nextElement (); + + if (!declared.equals (prefix)) + continue; + if (uri.equals (nsStack.getURI (prefix))) + return; + fatalError ("inconsistent binding for prefix '" + prefix + + "' ... " + uri + " (was " + nsStack.getURI (prefix) + ")"); + } + + if (!nsStack.declarePrefix (prefix, uri)) + fatalError ("illegal prefix declared: " + prefix); + } + + private String fixName (String ns, String l, String name, boolean isAttr) + throws SAXException + { + if ("".equals (name) || name == null) { + name = l; + if ("".equals (name) || name == null) + fatalError ("empty/null name"); + } + + // can we correctly process the name as-is? + // handles "element scope" attribute names here. + if (nsStack.processName (name, nsTemp, isAttr) != null + && nsTemp [0].equals (ns) + ) { + return nsTemp [2]; + } + + // nope, gotta modify the name or declare a default mapping + int temp; + + // get rid of any current prefix + if ((temp = name.indexOf (':')) >= 0) { + name = name.substring (temp + 1); + + // ... maybe that's enough (use/prefer default namespace) ... + if (!isAttr && nsStack.processName (name, nsTemp, false) != null + && nsTemp [0].equals (ns) + ) { + return nsTemp [2]; + } + } + + // must we define and use the default/undefined prefix? + if ("".equals (ns)) { + if (isAttr) + fatalError ("processName bug"); + if (attributes.getIndex ("xmlns") != -1) + fatalError ("need to undefine default NS, but it's bound: " + + attributes.getValue ("xmlns")); + + nsStack.declarePrefix ("", ""); + attributes.addAttribute ("", "", "xmlns", "CDATA", ""); + return name; + } + + // is there at least one non-null prefix we can use? + for (Enumeration e = nsStack.getDeclaredPrefixes (); + e.hasMoreElements (); + /* NOP */) { + String prefix = (String) e.nextElement (); + String uri = nsStack.getURI (prefix); + + if (uri == null || !uri.equals (ns)) + continue; + return prefix + ":" + name; + } + + // no such luck. create a prefix name, declare it, use it. + for (temp = 0; temp >= 0; temp++) { + String prefix = prefixRoot + temp; + + if (nsStack.getURI (prefix) == null) { + nsStack.declarePrefix (prefix, ns); + attributes.addAttribute ("", "", "xmlns:" + prefix, + "CDATA", ns); + return prefix + ":" + name; + } + } + fatalError ("too many prefixes genned"); + // NOTREACHED + return null; + } + + public void startElement ( + String uri, String localName, + String qName, Attributes atts + ) throws SAXException + { + if (!pushedContext) + nsStack.pushContext (); + pushedContext = false; + + // make sure we have all NS declarations handy before we start + int length = atts.getLength (); + + for (int i = 0; i < length; i++) { + String aName = atts.getQName (i); + + if (!aName.startsWith ("xmlns")) + continue; + + String prefix; + + if ("xmlns".equals (aName)) + prefix = ""; + else if (aName.indexOf (':') == 5) + prefix = aName.substring (6); + else // "xmlnsfoo" etc. + continue; + startPrefixMapping (prefix, atts.getValue (i)); + } + + // put namespace decls at the start of our regenned attlist + attributes.clear (); + for (Enumeration e = nsStack.getDeclaredPrefixes (); + e.hasMoreElements (); + /* NOP */) { + String prefix = (String) e.nextElement (); + + attributes.addAttribute ("", "", + ("".equals (prefix) + ? "xmlns" + : "xmlns:" + prefix), + "CDATA", + nsStack.getURI (prefix)); + } + + // name fixups: element, then attributes. + // fixName may declare a new prefix or, for the element, + // redeclare the default (if element name needs it). + qName = fixName (uri, localName, qName, false); + + for (int i = 0; i < length; i++) { + String aName = atts.getQName (i); + String aNS = atts.getURI (i); + String aLocal = atts.getLocalName (i); + String aType = atts.getType (i); + String aValue = atts.getValue (i); + + if (aName.startsWith ("xmlns")) + continue; + aName = fixName (aNS, aLocal, aName, true); + attributes.addAttribute (aNS, aLocal, aName, aType, aValue); + } + + elementStack.push (qName); + + // pass event along, with cleaned-up names and decls. + super.startElement (uri, localName, qName, attributes); + } + + public void endElement (String uri, String localName, String qName) + throws SAXException + { + nsStack.popContext (); + qName = (String) elementStack.pop (); + super.endElement (uri, localName, qName); + } + + /** + * This call is not passed to the next consumer in the chain. + * Prefix declarations and scopes are only exposed in their + * attribute form. + */ + public void endPrefixMapping (String prefix) + throws SAXException + { } + + public void endDocument () throws SAXException + { + elementStack.removeAllElements (); + nsStack.reset (); + super.endDocument (); + } +} diff --git a/libjava/classpath/gnu/xml/pipeline/PipelineFactory.java b/libjava/classpath/gnu/xml/pipeline/PipelineFactory.java new file mode 100644 index 000000000..c2adab021 --- /dev/null +++ b/libjava/classpath/gnu/xml/pipeline/PipelineFactory.java @@ -0,0 +1,723 @@ +/* PipelineFactory.java -- + Copyright (C) 1999,2000,2001 Free Software Foundation, Inc. + +This file is part of GNU Classpath. + +GNU Classpath is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2, or (at your option) +any later version. + +GNU Classpath is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GNU Classpath; see the file COPYING. If not, write to the +Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA +02110-1301 USA. + +Linking this library statically or dynamically with other modules is +making a combined work based on this library. Thus, the terms and +conditions of the GNU General Public License cover the whole +combination. + +As a special exception, the copyright holders of this library give you +permission to link this library with independent modules to produce an +executable, regardless of the license terms of these independent +modules, and to copy and distribute the resulting executable under +terms of your choice, provided that you also meet, for each linked +independent module, the terms and conditions of the license of that +module. An independent module is a module which is not derived from +or based on this library. If you modify this library, you may extend +this exception to your version of the library, but you are not +obligated to do so. If you do not wish to do so, delete this +exception statement from your version. */ + +package gnu.xml.pipeline; + +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.io.OutputStream; +import java.io.OutputStreamWriter; +import java.lang.reflect.Constructor; +import java.util.StringTokenizer; + +import org.xml.sax.*; +import org.xml.sax.ext.*; + + +/** + * This provides static factory methods for creating simple event pipelines. + * These pipelines are specified by strings, suitable for passing on + * command lines or embedding in element attributes. For example, one way + * to write a pipeline that restores namespace syntax, validates (stopping + * the pipeline on validity errors) and then writes valid data to standard + * output is this: <pre> + * nsfix | validate | write ( stdout )</pre> + * + * <p> In this syntax, the tokens are always separated by whitespace, and each + * stage of the pipeline may optionally have a parameter (which can be a + * pipeline) in parentheses. Interior stages are called filters, and the + * rightmost end of a pipeline is called a terminus. + * + * <p> Stages are usually implemented by a single class, which may not be + * able to act as both a filter and a terminus; but any terminus can be + * automatically turned into a filter, through use of a {@link TeeConsumer}. + * The stage identifiers are either class names, or are one of the following + * short identifiers built into this class. (Most of these identifiers are + * no more than aliases for classes.) The built-in identifiers include:</p> + + <table border="1" cellpadding="3" cellspacing="0"> + <tr bgcolor="#ccccff" class="TableHeadingColor"> + <th align="center" width="5%">Stage</th> + <th align="center" width="9%">Parameter</th> + <th align="center" width="1%">Terminus</th> + <th align="center">Description</th> + </tr> + + <tr valign="top" align="center"> + <td><a href="../dom/Consumer.html">dom</a></td> + <td><em>none</em></td> + <td> yes </td> + <td align="left"> Applications code can access a DOM Document built + from the input event stream. When used as a filter, this buffers + data up to an <em>endDocument</em> call, and then uses a DOM parser + to report everything that has been recorded (which can easily be + less than what was reported to it). </td> + </tr> + <tr valign="top" align="center"> + <td><a href="NSFilter.html">nsfix</a></td> + <td><em>none</em></td> + <td>no</td> + <td align="left">This stage ensures that the XML element and attribute + names in its output use namespace prefixes and declarations correctly. + That is, so that they match the "Namespace plus LocalName" naming data + with which each XML element and attribute is already associated. </td> + </tr> + <tr valign="top" align="center"> + <td><a href="EventFilter.html">null</a></td> + <td><em>none</em></td> + <td>yes</td> + <td align="left">This stage ignores all input event data.</td> + </tr> + <tr valign="top" align="center"> + <td><a href="CallFilter.html">server</a></td> + <td><em>required</em><br> server URL </td> + <td>no</td> + <td align="left">Sends its input as XML request to a remote server, + normally a web application server using the HTTP or HTTPS protocols. + The output of this stage is the parsed response from that server.</td> + </tr> + <tr valign="top" align="center"> + <td><a href="TeeConsumer.html">tee</a></td> + <td><em>required</em><br> first pipeline</td> + <td>no</td> + <td align="left">This sends its events down two paths; its parameter + is a pipeline descriptor for the first path, and the second path + is the output of this stage.</td> + </tr> + + <tr valign="top" align="center"> + <td><a href="ValidationConsumer.html">validate</a></td> + <td><em>none</em></td> + <td>yes</td> + <td align="left">This checks for validity errors, and reports them + through its error handler. The input must include declaration events + and some lexical events. </td> + </tr> + <tr valign="top" align="center"> + <td><a href="WellFormednessFilter.html">wf</a></td> + <td><em>none</em></td> + <td>yes</td> + <td align="left"> This class provides some basic "well formedness" + tests on the input event stream, and reports a fatal error if any + of them fail. One example: start/end calls for elements must match. + No SAX parser is permitted to produce malformed output, but other + components can easily do so.</td> + </tr> + <tr valign="top" align="center"> + <td>write</td> + <td><em>required</em><br> "stdout", "stderr", or filename</td> + <td>yes</td> + <td align="left"> Writes its input to the specified output, as pretty + printed XML text encoded using UTF-8. Input events must be well + formed and "namespace fixed", else the output won't be XML (or possibly + namespace) conformant. The symbolic names represent + <em>System.out</em> and <em>System.err</em> respectively; names must + correspond to files which don't yet exist.</td> + </tr> + <tr valign="top" align="center"> + <td>xhtml</td> + <td><em>required</em><br> "stdout", "stderr", or filename</td> + <td>yes</td> + <td align="left"> Like <em>write</em> (above), except that XHTML rules + are followed. The XHTML 1.0 Transitional document type is declared, + and only ASCII characters are written (for interoperability). Other + characters are written as entity or character references; the text is + pretty printed.</td> + </tr> + <tr valign="top" align="center"> + <td><a href="XIncludeFilter.html">xinclude</a></td> + <td><em>none</em></td> + <td>no</td> + <td align="left">This stage handles XInclude processing. + This is like entity inclusion, except that the included content + is declared in-line rather than in the DTD at the beginning of + a document. + </td> + </tr> + <tr valign="top" align="center"> + <td><a href="XsltFilter.html">xslt</a></td> + <td><em>required</em><br> XSLT stylesheet URI</td> + <td>no</td> + <td align="left">This stage handles XSLT transformation + according to a stylesheet. + The implementation of the transformation may not actually + stream data, although if such an XSLT engine is in use + then that can happen. + </td> + </tr> + + </table> + + * <p> Note that {@link EventFilter#bind} can automatically eliminate + * some filters by setting SAX2 parser features appropriately. This means + * that you can routinely put filters like "nsfix", "validate", or "wf" at the + * front of a pipeline (for components that need inputs conditioned to match + * that level of correctness), and know that it won't actually be used unless + * it's absolutely necessary. + * + * @author David Brownell + */ +public class PipelineFactory +{ + /** + * Creates a simple pipeline according to the description string passed in. + */ + public static EventConsumer createPipeline (String description) + throws IOException + { + return createPipeline (description, null); + } + + /** + * Extends an existing pipeline by prepending the filter pipeline to the + * specified consumer. Some pipelines need more customization than can + * be done through this simplified syntax. When they are set up with + * direct API calls, use this method to merge more complex pipeline + * segments with easily configured ones. + */ + public static EventConsumer createPipeline ( + String description, + EventConsumer next + ) throws IOException + { + // tokens are (for now) what's separated by whitespace; + // very easy to parse, but IDs never have spaces. + + StringTokenizer tokenizer; + String tokens []; + + tokenizer = new StringTokenizer (description); + tokens = new String [tokenizer.countTokens ()]; + for (int i = 0; i < tokens.length; i++) + tokens [i] = tokenizer.nextToken (); + + PipelineFactory factory = new PipelineFactory (); + Pipeline pipeline = factory.parsePipeline (tokens, next); + + return pipeline.createPipeline (); + } + + + private PipelineFactory () { /* NYET */ } + + + /** + * Extends an existing pipeline by prepending a pre-tokenized filter + * pipeline to the specified consumer. Tokens are class names (or the + * predefined aliases) left and right parenthesis, and the vertical bar. + */ + public static EventConsumer createPipeline ( + String tokens [], + EventConsumer next + ) throws IOException + { + PipelineFactory factory = new PipelineFactory (); + Pipeline pipeline = factory.parsePipeline (tokens, next); + + return pipeline.createPipeline (); + } + + + private String tokens []; + private int index; + + private Pipeline parsePipeline (String toks [], EventConsumer next) + { + tokens = toks; + index = 0; + + Pipeline retval = parsePipeline (next); + + if (index != toks.length) + throw new ArrayIndexOutOfBoundsException ( + "extra token: " + tokens [index]); + return retval; + } + + // pipeline ::= stage | stage '|' pipeline + private Pipeline parsePipeline (EventConsumer next) + { + Pipeline retval = new Pipeline (parseStage ()); + + // minimal pipelines: "stage" and "... | id" + if (index > (tokens.length - 2) + || !"|".equals (tokens [index]) + ) { + retval.next = next; + return retval; + } + index++; + retval.rest = parsePipeline (next); + return retval; + } + + // stage ::= id | id '(' pipeline ')' + private Stage parseStage () + { + Stage retval = new Stage (tokens [index++]); + + // minimal stages: "id" and "id ( id )" + if (index > (tokens.length - 2) + || !"(".equals (tokens [index]) /*)*/ + ) + return retval; + + index++; + retval.param = parsePipeline (null); + if (index >= tokens.length) + throw new ArrayIndexOutOfBoundsException ( + "missing right paren"); + if (/*(*/ !")".equals (tokens [index++])) + throw new ArrayIndexOutOfBoundsException ( + "required right paren, not: " + tokens [index - 1]); + return retval; + } + + + // + // these classes obey the conventions for constructors, so they're + // only built in to this table of shortnames + // + // - filter (one or two types of arglist) + // * last constructor is 'next' element + // * optional (first) string parameter + // + // - terminus (one or types of arglist) + // * optional (only) string parameter + // + // terminus stages are transformed into filters if needed, by + // creating a "tee". filter stages aren't turned to terminus + // stages though; either eliminate such stages, or add some + // terminus explicitly. + // + private static final String builtinStages [][] = { + { "dom", "gnu.xml.dom.Consumer" }, + { "nsfix", "gnu.xml.pipeline.NSFilter" }, + { "null", "gnu.xml.pipeline.EventFilter" }, + { "server", "gnu.xml.pipeline.CallFilter" }, + { "tee", "gnu.xml.pipeline.TeeConsumer" }, + { "validate", "gnu.xml.pipeline.ValidationConsumer" }, + { "wf", "gnu.xml.pipeline.WellFormednessFilter" }, + { "xinclude", "gnu.xml.pipeline.XIncludeFilter" }, + { "xslt", "gnu.xml.pipeline.XsltFilter" }, + +// XXX want: option for validate, to preload external part of a DTD + + // xhtml, write ... nyet generic-ready + }; + + private static class Stage + { + String id; + Pipeline param; + + Stage (String name) + { id = name; } + + public String toString () + { + if (param == null) + return id; + return id + " ( " + param + " )"; + } + + private void fail (String message) + throws IOException + { + throw new IOException ("in '" + id + + "' stage of pipeline, " + message); + } + + EventConsumer createStage (EventConsumer next) + throws IOException + { + String name = id; + + // most builtins are just class aliases + for (int i = 0; i < builtinStages.length; i++) { + if (id.equals (builtinStages [i][0])) { + name = builtinStages [i][1]; + break; + } + } + + // Save output as XML or XHTML text + if ("write".equals (name) || "xhtml".equals (name)) { + String filename; + boolean isXhtml = "xhtml".equals (name); + OutputStream out = null; + TextConsumer consumer; + + if (param == null) + fail ("parameter is required"); + + filename = param.toString (); + if ("stdout".equals (filename)) + out = System.out; + else if ("stderr".equals (filename)) + out = System.err; + else { + File f = new File (filename); + +/* + if (!f.isAbsolute ()) + fail ("require absolute file paths"); + */ + if (f.exists ()) + fail ("file already exists: " + f.getName ()); + +// XXX this races against the existence test + out = new FileOutputStream (f); + } + + if (!isXhtml) + consumer = new TextConsumer (out); + else + consumer = new TextConsumer ( + new OutputStreamWriter (out, "8859_1"), + true); + + consumer.setPrettyPrinting (true); + if (next == null) + return consumer; + return new TeeConsumer (consumer, next); + + } else { + // + // Here go all the builtins that are just aliases for + // classes, and all stage IDs that started out as such + // class names. The following logic relies on several + // documented conventions for constructor invocation. + // + String msg = null; + + try { + Class klass = Class.forName (name); + Class argTypes [] = null; + Constructor constructor = null; + boolean filter = false; + Object params [] = null; + Object obj = null; + + // do we need a filter stage? + if (next != null) { + // "next" consumer is always passed, with + // or without the optional string param + if (param == null) { + argTypes = new Class [1]; + argTypes [0] = EventConsumer.class; + + params = new Object [1]; + params [0] = next; + + msg = "no-param filter"; + } else { + argTypes = new Class [2]; + argTypes [0] = String.class; + argTypes [1] = EventConsumer.class; + + params = new Object [2]; + params [0] = param.toString (); + params [1] = next; + + msg = "one-param filter"; + } + + + try { + constructor = klass.getConstructor (argTypes); + } catch (NoSuchMethodException e) { + // try creating a filter from a + // terminus and a tee + filter = true; + msg += " built from "; + } + } + + // build from a terminus stage, with or + // without the optional string param + if (constructor == null) { + String tmp; + + if (param == null) { + argTypes = new Class [0]; + params = new Object [0]; + + tmp = "no-param terminus"; + } else { + argTypes = new Class [1]; + argTypes [0] = String.class; + + params = new Object [1]; + params [0] = param.toString (); + + tmp = "one-param terminus"; + } + if (msg == null) + msg = tmp; + else + msg += tmp; + constructor = klass.getConstructor (argTypes); + // NOT creating terminus by dead-ending + // filters ... users should think about + // that one, something's likely wrong + } + + obj = constructor.newInstance (params); + + // return EventConsumers directly, perhaps after + // turning them into a filter + if (obj instanceof EventConsumer) { + if (filter) + return new TeeConsumer ((EventConsumer) obj, next); + return (EventConsumer) obj; + } + + // if it's not a handler, it's an error + // we can wrap handlers in a filter + EventFilter retval = new EventFilter (); + boolean updated = false; + + if (obj instanceof ContentHandler) { + retval.setContentHandler ((ContentHandler) obj); + updated = true; + } + if (obj instanceof DTDHandler) { + retval.setDTDHandler ((DTDHandler) obj); + updated = true; + } + if (obj instanceof LexicalHandler) { + retval.setProperty ( + EventFilter.PROPERTY_URI + "lexical-handler", + obj); + updated = true; + } + if (obj instanceof DeclHandler) { + retval.setProperty ( + EventFilter.PROPERTY_URI + "declaration-handler", + obj); + updated = true; + } + + if (!updated) + fail ("class is neither Consumer nor Handler"); + + if (filter) + return new TeeConsumer (retval, next); + return retval; + + } catch (IOException e) { + throw e; + + } catch (NoSuchMethodException e) { + fail (name + " constructor missing -- " + msg); + + } catch (ClassNotFoundException e) { + fail (name + " class not found"); + + } catch (Exception e) { + // e.printStackTrace (); + fail ("stage not available: " + e.getMessage ()); + } + } + // NOTREACHED + return null; + } + } + + private static class Pipeline + { + Stage stage; + + // rest may be null + Pipeline rest; + EventConsumer next; + + Pipeline (Stage s) + { stage = s; } + + public String toString () + { + if (rest == null && next == null) + return stage.toString (); + if (rest != null) + return stage + " | " + rest; + throw new IllegalArgumentException ("next"); + } + + EventConsumer createPipeline () + throws IOException + { + if (next == null) { + if (rest == null) + next = stage.createStage (null); + else + next = stage.createStage (rest.createPipeline ()); + } + return next; + } + } + +/* + public static void main (String argv []) + { + try { + // three basic terminus cases + createPipeline ("null"); + createPipeline ("validate"); + createPipeline ("write ( stdout )"); + + // four basic filters + createPipeline ("nsfix | write ( stderr )"); + createPipeline ("wf | null"); + createPipeline ("null | null"); + createPipeline ( +"call ( http://www.example.com/services/xml-1a ) | xhtml ( stdout )"); + + // tee junctions + createPipeline ("tee ( validate ) | write ( stdout )"); + createPipeline ("tee ( nsfix | write ( stdout ) ) | validate"); + + // longer pipeline + createPipeline ("nsfix | tee ( validate ) | write ( stdout )"); + createPipeline ( + "null | wf | nsfix | tee ( validate ) | write ( stdout )"); + + // try some parsing error cases + try { + createPipeline ("null ("); // extra token '(' + System.err.println ("** didn't report error"); + } catch (Exception e) { + System.err.println ("== err: " + e.getMessage ()); } + + try { + createPipeline ("nsfix |"); // extra token '|' + System.err.println ("** didn't report error"); + } catch (Exception e) { + System.err.println ("== err: " + e.getMessage ()); } + + try { + createPipeline ("xhtml ( foo"); // missing right paren + System.err.println ("** didn't report error"); + } catch (Exception e) { + System.err.println ("== err: " + e.getMessage ()); } + + try { + createPipeline ("xhtml ( foo bar"); // required right paren + System.err.println ("** didn't report error"); + } catch (Exception e) { + System.err.println ("== err: " + e.getMessage ()); } + + try { + createPipeline ("tee ( nsfix | validate");// missing right paren + System.err.println ("** didn't report error"); + } catch (Exception e) { + System.err.println ("== err: " + e.getMessage ()); } + + // try some construction error cases + + try { + createPipeline ("call"); // missing param + System.err.println ("** didn't report error"); + } catch (Exception e) { + System.err.println ("== err: " + e.getMessage ()); } + try { + createPipeline ("call ( foobar )"); // broken param + System.err.println ("** didn't report error"); + } catch (Exception e) { + System.err.println ("== err: " + e.getMessage ()); } + try { + createPipeline ("nsfix ( foobar )"); // illegal param + System.err.println ("** didn't report error"); + } catch (Exception e) { + System.err.println ("== err: " + e.getMessage ()); } + try { + createPipeline ("null ( foobar )"); // illegal param + System.err.println ("** didn't report error"); + } catch (Exception e) { + System.err.println ("== err: " + e.getMessage ()); } + try { + createPipeline ("wf ( foobar )"); // illegal param + System.err.println ("** didn't report error"); + } catch (Exception e) { + System.err.println ("== err: " + e.getMessage ()); } + try { + createPipeline ("xhtml ( foobar.html )"); + new File ("foobar.html").delete (); + // now supported + } catch (Exception e) { + System.err.println ("** err: " + e.getMessage ()); } + try { + createPipeline ("xhtml"); // missing param + System.err.println ("** didn't report error"); + } catch (Exception e) { + System.err.println ("== err: " + e.getMessage ()); } + try { + createPipeline ("write ( stdout ) | null"); // nonterminal + System.err.println ("** didn't report error"); + } catch (Exception e) { + System.err.println ("== err: " + e.getMessage ()); } + try { + createPipeline ("validate | null"); + // now supported + } catch (Exception e) { + System.err.println ("** err: " + e.getMessage ()); } + try { + createPipeline ("validate ( foo )"); // illegal param + System.err.println ("** didn't report error"); + } catch (Exception e) { + System.err.println ("== err: " + e.getMessage ()); } + try { + createPipeline ("tee"); // missing param + System.err.println ("** didn't report error"); + } catch (Exception e) { + System.err.println ("== err: " + e.getMessage ()); } + try { + // only builtins so far + createPipeline ("com.example.xml.FilterClass"); + System.err.println ("** didn't report error"); + } catch (Exception e) { + System.err.println ("== err: " + e.getMessage ()); } + + } catch (Exception e) { + e.printStackTrace (); + } + } +/**/ + +} diff --git a/libjava/classpath/gnu/xml/pipeline/TeeConsumer.java b/libjava/classpath/gnu/xml/pipeline/TeeConsumer.java new file mode 100644 index 000000000..3ac860575 --- /dev/null +++ b/libjava/classpath/gnu/xml/pipeline/TeeConsumer.java @@ -0,0 +1,417 @@ +/* TeeConsumer.java -- + Copyright (C) 1999,2000,2001 Free Software Foundation, Inc. + +This file is part of GNU Classpath. + +GNU Classpath is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2, or (at your option) +any later version. + +GNU Classpath is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GNU Classpath; see the file COPYING. If not, write to the +Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA +02110-1301 USA. + +Linking this library statically or dynamically with other modules is +making a combined work based on this library. Thus, the terms and +conditions of the GNU General Public License cover the whole +combination. + +As a special exception, the copyright holders of this library give you +permission to link this library with independent modules to produce an +executable, regardless of the license terms of these independent +modules, and to copy and distribute the resulting executable under +terms of your choice, provided that you also meet, for each linked +independent module, the terms and conditions of the license of that +module. An independent module is a module which is not derived from +or based on this library. If you modify this library, you may extend +this exception to your version of the library, but you are not +obligated to do so. If you do not wish to do so, delete this +exception statement from your version. */ + +package gnu.xml.pipeline; + +import org.xml.sax.Attributes; +import org.xml.sax.ContentHandler; +import org.xml.sax.DTDHandler; +import org.xml.sax.ErrorHandler; +import org.xml.sax.Locator; +import org.xml.sax.SAXException; +import org.xml.sax.SAXNotRecognizedException; +import org.xml.sax.ext.DeclHandler; +import org.xml.sax.ext.LexicalHandler; + +/** + * Fans its events out to two other consumers, a "tee" filter stage in an + * event pipeline. Networks can be assembled with multiple output points. + * + * <p> Error handling should be simple if you remember that exceptions + * you throw will cancel later stages in that callback's pipeline, and + * generally the producer will stop if it sees such an exception. You + * may want to protect your pipeline against such backflows, making a + * kind of reverse filter (or valve?) so that certain exceptions thrown by + * your pipeline will caught and handled before the producer sees them. + * Just use a "try/catch" block, rememebering that really important + * cleanup tasks should be in "finally" clauses. + * + * <p> That issue isn't unique to "tee" consumers, but tee consumers have + * the additional twist that exceptions thrown by the first consumer + * will cause the second consumer not to see the callback (except for + * the endDocument callback, which signals state cleanup). + * + * @author David Brownell + */ +final public class TeeConsumer + implements EventConsumer, + ContentHandler, DTDHandler, + LexicalHandler,DeclHandler +{ + private EventConsumer first, rest; + + // cached to minimize time overhead + private ContentHandler docFirst, docRest; + private DeclHandler declFirst, declRest; + private LexicalHandler lexFirst, lexRest; + + + /** + * Constructs a consumer which sends all its events to the first + * consumer, and then the second one. If the first consumer throws + * an exception, the second one will not see the event which + * caused that exception to be reported. + * + * @param car The first consumer to get the events + * @param cdr The second consumer to get the events + */ + public TeeConsumer (EventConsumer car, EventConsumer cdr) + { + if (car == null || cdr == null) + throw new NullPointerException (); + first = car; + rest = cdr; + + // + // Cache the handlers. + // + docFirst = first.getContentHandler (); + docRest = rest.getContentHandler (); + // DTD handler isn't cached (rarely needed) + + try { + declFirst = null; + declFirst = (DeclHandler) first.getProperty ( + EventFilter.DECL_HANDLER); + } catch (SAXException e) {} + try { + declRest = null; + declRest = (DeclHandler) rest.getProperty ( + EventFilter.DECL_HANDLER); + } catch (SAXException e) {} + + try { + lexFirst = null; + lexFirst = (LexicalHandler) first.getProperty ( + EventFilter.LEXICAL_HANDLER); + } catch (SAXException e) {} + try { + lexRest = null; + lexRest = (LexicalHandler) rest.getProperty ( + EventFilter.LEXICAL_HANDLER); + } catch (SAXException e) {} + } + +/* FIXME + /** + * Constructs a pipeline, and is otherwise a shorthand for the + * two-consumer constructor for this class. + * + * @param first Description of the first pipeline to get events, + * which will be passed to {@link PipelineFactory#createPipeline} + * @param rest The second pipeline to get the events + * / + // constructor used by PipelineFactory + public TeeConsumer (String first, EventConsumer rest) + throws IOException + { + this (PipelineFactory.createPipeline (first), rest); + } +*/ + + /** Returns the first pipeline to get event calls. */ + public EventConsumer getFirst () + { return first; } + + /** Returns the second pipeline to get event calls. */ + public EventConsumer getRest () + { return rest; } + + /** Returns the content handler being used. */ + final public ContentHandler getContentHandler () + { + if (docRest == null) + return docFirst; + if (docFirst == null) + return docRest; + return this; + } + + /** Returns the dtd handler being used. */ + final public DTDHandler getDTDHandler () + { + // not cached (hardly used) + if (rest.getDTDHandler () == null) + return first.getDTDHandler (); + if (first.getDTDHandler () == null) + return rest.getDTDHandler (); + return this; + } + + /** Returns the declaration or lexical handler being used. */ + final public Object getProperty (String id) + throws SAXNotRecognizedException + { + // + // in degenerate cases, we have no work to do. + // + Object firstProp = null, restProp = null; + + try { firstProp = first.getProperty (id); } + catch (SAXNotRecognizedException e) { /* ignore */ } + try { restProp = rest.getProperty (id); } + catch (SAXNotRecognizedException e) { /* ignore */ } + + if (restProp == null) + return firstProp; + if (firstProp == null) + return restProp; + + // + // we've got work to do; handle two builtin cases. + // + if (EventFilter.DECL_HANDLER.equals (id)) + return this; + if (EventFilter.LEXICAL_HANDLER.equals (id)) + return this; + + // + // non-degenerate, handled by both consumers, but we don't know + // how to handle this. + // + throw new SAXNotRecognizedException ("can't tee: " + id); + } + + /** + * Provides the error handler to both subsequent nodes of + * this filter stage. + */ + public void setErrorHandler (ErrorHandler handler) + { + first.setErrorHandler (handler); + rest.setErrorHandler (handler); + } + + + // + // ContentHandler + // + public void setDocumentLocator (Locator locator) + { + // this call is not made by all parsers + docFirst.setDocumentLocator (locator); + docRest.setDocumentLocator (locator); + } + + public void startDocument () + throws SAXException + { + docFirst.startDocument (); + docRest.startDocument (); + } + + public void endDocument () + throws SAXException + { + try { + docFirst.endDocument (); + } finally { + docRest.endDocument (); + } + } + + public void startPrefixMapping (String prefix, String uri) + throws SAXException + { + docFirst.startPrefixMapping (prefix, uri); + docRest.startPrefixMapping (prefix, uri); + } + + public void endPrefixMapping (String prefix) + throws SAXException + { + docFirst.endPrefixMapping (prefix); + docRest.endPrefixMapping (prefix); + } + + public void skippedEntity (String name) + throws SAXException + { + docFirst.skippedEntity (name); + docRest.skippedEntity (name); + } + + public void startElement (String uri, String localName, + String qName, Attributes atts) + throws SAXException + { + docFirst.startElement (uri, localName, qName, atts); + docRest.startElement (uri, localName, qName, atts); + } + + public void endElement (String uri, String localName, String qName) + throws SAXException + { + docFirst.endElement (uri, localName, qName); + docRest.endElement (uri, localName, qName); + } + + public void processingInstruction (String target, String data) + throws SAXException + { + docFirst.processingInstruction (target, data); + docRest.processingInstruction (target, data); + } + + public void characters (char ch [], int start, int length) + throws SAXException + { + docFirst.characters (ch, start, length); + docRest.characters (ch, start, length); + } + + public void ignorableWhitespace (char ch [], int start, int length) + throws SAXException + { + docFirst.ignorableWhitespace (ch, start, length); + docRest.ignorableWhitespace (ch, start, length); + } + + + // + // DTDHandler + // + public void notationDecl (String name, String publicId, String systemId) + throws SAXException + { + DTDHandler l1 = first.getDTDHandler (); + DTDHandler l2 = rest.getDTDHandler (); + + l1.notationDecl (name, publicId, systemId); + l2.notationDecl (name, publicId, systemId); + } + + public void unparsedEntityDecl (String name, + String publicId, String systemId, + String notationName + ) throws SAXException + { + DTDHandler l1 = first.getDTDHandler (); + DTDHandler l2 = rest.getDTDHandler (); + + l1.unparsedEntityDecl (name, publicId, systemId, notationName); + l2.unparsedEntityDecl (name, publicId, systemId, notationName); + } + + + // + // DeclHandler + // + public void attributeDecl (String eName, String aName, + String type, + String mode, String value) + throws SAXException + { + declFirst.attributeDecl (eName, aName, type, mode, value); + declRest.attributeDecl (eName, aName, type, mode, value); + } + + public void elementDecl (String name, String model) + throws SAXException + { + declFirst.elementDecl (name, model); + declRest.elementDecl (name, model); + } + + public void externalEntityDecl (String name, + String publicId, String systemId) + throws SAXException + { + declFirst.externalEntityDecl (name, publicId, systemId); + declRest.externalEntityDecl (name, publicId, systemId); + } + + public void internalEntityDecl (String name, String value) + throws SAXException + { + declFirst.internalEntityDecl (name, value); + declRest.internalEntityDecl (name, value); + } + + + // + // LexicalHandler + // + public void comment (char ch [], int start, int length) + throws SAXException + { + lexFirst.comment (ch, start, length); + lexRest.comment (ch, start, length); + } + + public void startCDATA () + throws SAXException + { + lexFirst.startCDATA (); + lexRest.startCDATA (); + } + + public void endCDATA () + throws SAXException + { + lexFirst.endCDATA (); + lexRest.endCDATA (); + } + + public void startEntity (String name) + throws SAXException + { + lexFirst.startEntity (name); + lexRest.startEntity (name); + } + + public void endEntity (String name) + throws SAXException + { + lexFirst.endEntity (name); + lexRest.endEntity (name); + } + + public void startDTD (String name, String publicId, String systemId) + throws SAXException + { + lexFirst.startDTD (name, publicId, systemId); + lexRest.startDTD (name, publicId, systemId); + } + + public void endDTD () + throws SAXException + { + lexFirst.endDTD (); + lexRest.endDTD (); + } +} diff --git a/libjava/classpath/gnu/xml/pipeline/TextConsumer.java b/libjava/classpath/gnu/xml/pipeline/TextConsumer.java new file mode 100644 index 000000000..13dcfa7f6 --- /dev/null +++ b/libjava/classpath/gnu/xml/pipeline/TextConsumer.java @@ -0,0 +1,117 @@ +/* TextConsumer.java -- + Copyright (C) 1999,2000,2001 Free Software Foundation, Inc. + +This file is part of GNU Classpath. + +GNU Classpath is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2, or (at your option) +any later version. + +GNU Classpath is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GNU Classpath; see the file COPYING. If not, write to the +Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA +02110-1301 USA. + +Linking this library statically or dynamically with other modules is +making a combined work based on this library. Thus, the terms and +conditions of the GNU General Public License cover the whole +combination. + +As a special exception, the copyright holders of this library give you +permission to link this library with independent modules to produce an +executable, regardless of the license terms of these independent +modules, and to copy and distribute the resulting executable under +terms of your choice, provided that you also meet, for each linked +independent module, the terms and conditions of the license of that +module. An independent module is a module which is not derived from +or based on this library. If you modify this library, you may extend +this exception to your version of the library, but you are not +obligated to do so. If you do not wish to do so, delete this +exception statement from your version. */ + +package gnu.xml.pipeline; + +import java.io.*; + +import org.xml.sax.*; + +import gnu.xml.util.XMLWriter; + + +/** + * Terminates a pipeline, consuming events to print them as well formed + * XML (or XHTML) text. + * + * <p> Input must be well formed, and must include XML names (e.g. the + * prefixes and prefix declarations must be present), or the output of + * this class is undefined. + * + * @see NSFilter + * @see WellFormednessFilter + * + * @author David Brownell + */ +public class TextConsumer extends XMLWriter implements EventConsumer +{ + /** + * Constructs an event consumer which echoes its input as text, + * optionally adhering to some basic XHTML formatting options + * which increase interoperability with old (v3) browsers. + * + * <p> For the best interoperability, when writing as XHTML only + * ASCII characters are emitted; other characters are turned to + * entity or character references as needed, and no XML declaration + * is provided in the document. + */ + public TextConsumer (Writer w, boolean isXhtml) + throws IOException + { + super (w, isXhtml ? "US-ASCII" : null); + setXhtml (isXhtml); + } + + /** + * Constructs a consumer that writes its input as XML text. + * XHTML rules are not followed. + */ + public TextConsumer (Writer w) + throws IOException + { + this (w, false); + } + + /** + * Constructs a consumer that writes its input as XML text, + * encoded in UTF-8. XHTML rules are not followed. + */ + public TextConsumer (OutputStream out) + throws IOException + { + this (new OutputStreamWriter (out, "UTF8"), false); + } + + /** <b>EventConsumer</b> Returns the document handler being used. */ + public ContentHandler getContentHandler () + { return this; } + + /** <b>EventConsumer</b> Returns the dtd handler being used. */ + public DTDHandler getDTDHandler () + { return this; } + + /** <b>XMLReader</b>Retrieves a property (lexical and decl handlers) */ + public Object getProperty (String propertyId) + throws SAXNotRecognizedException + { + if (EventFilter.LEXICAL_HANDLER.equals (propertyId)) + return this; + if (EventFilter.DECL_HANDLER.equals (propertyId)) + return this; + throw new SAXNotRecognizedException (propertyId); + } +} diff --git a/libjava/classpath/gnu/xml/pipeline/ValidationConsumer.java b/libjava/classpath/gnu/xml/pipeline/ValidationConsumer.java new file mode 100644 index 000000000..0346984d3 --- /dev/null +++ b/libjava/classpath/gnu/xml/pipeline/ValidationConsumer.java @@ -0,0 +1,1928 @@ +/* ValidationConsumer.java -- + Copyright (C) 1999,2000,2001 Free Software Foundation, Inc. + +This file is part of GNU Classpath. + +GNU Classpath is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2, or (at your option) +any later version. + +GNU Classpath is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GNU Classpath; see the file COPYING. If not, write to the +Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA +02110-1301 USA. + +Linking this library statically or dynamically with other modules is +making a combined work based on this library. Thus, the terms and +conditions of the GNU General Public License cover the whole +combination. + +As a special exception, the copyright holders of this library give you +permission to link this library with independent modules to produce an +executable, regardless of the license terms of these independent +modules, and to copy and distribute the resulting executable under +terms of your choice, provided that you also meet, for each linked +independent module, the terms and conditions of the license of that +module. An independent module is a module which is not derived from +or based on this library. If you modify this library, you may extend +this exception to your version of the library, but you are not +obligated to do so. If you do not wish to do so, delete this +exception statement from your version. */ + +package gnu.xml.pipeline; + +import java.io.IOException; +import java.io.StringReader; +import java.io.StringWriter; +import java.util.EmptyStackException; +import java.util.Enumeration; +import java.util.Hashtable; +import java.util.Stack; +import java.util.StringTokenizer; +import java.util.Vector; + +import org.xml.sax.Attributes; +import org.xml.sax.EntityResolver; +import org.xml.sax.ErrorHandler; +import org.xml.sax.InputSource; +import org.xml.sax.Locator; +import org.xml.sax.SAXException; +import org.xml.sax.SAXParseException; +import org.xml.sax.XMLReader; +import org.xml.sax.helpers.XMLReaderFactory; + +/** + * This class checks SAX2 events to report validity errors; it works as + * both a filter and a terminus on an event pipeline. It relies on the + * producer of SAX events to: </p> <ol> + * + * <li> Conform to the specification of a non-validating XML parser that + * reads all external entities, reported using SAX2 events. </li> + * + * <li> Report ignorable whitespace as such (through the ContentHandler + * interface). This is, strictly speaking, optional for nonvalidating + * XML processors. </li> + * + * <li> Make SAX2 DeclHandler callbacks, with default + * attribute values already normalized (and without "<").</li> + * + * <li> Make SAX2 LexicalHandler startDTD() and endDTD () + * callbacks. </li> + * + * <li> Act as if the <em>(URI)/namespace-prefixes</em> property were + * set to true, by providing XML 1.0 names and all <code>xmlns*</code> + * attributes (rather than omitting either or both). </li> + * + * </ol> + * + * <p> At this writing, the major SAX2 parsers (such as Ælfred2, + * Crimson, and Xerces) meet these requirements, and this validation + * module is used by the optional Ælfred2 validation support. + * </p> + * + * <p> Note that because this is a layered validator, it has to duplicate some + * work that the parser is doing; there are also other cost to layering. + * However, <em>because of layering it doesn't need a parser</em> in order + * to work! You can use it with anything that generates SAX events, such + * as an application component that wants to detect invalid content in + * a changed area without validating an entire document, or which wants to + * ensure that it doesn't write invalid data to a communications partner.</p> + * + * <p> Also, note that because this is a layered validator, the line numbers + * reported for some errors may seem strange. For example, if an element does + * not permit character content, the validator + * will use the locator provided to it. + * That might reflect the last character of a <em>characters</em> event + * callback, rather than the first non-whitespace character. </p> + * + * <hr /> + * + * <!-- + * <p> Of interest is the fact that unlike most currently known XML validators, + * this one can report some cases of non-determinism in element content models. + * It is a compile-time option, enabled by default. This will only report + * such XML errors if they relate to content actually appearing in a document; + * content models aren't aggressively scanned for non-deterministic structure. + * Documents which trigger such non-deterministic transitions may be handled + * differently by different validating parsers, without losing conformance + * to the XML specification. </p> + * --> + * + * <p> Current limitations of the validation performed are in roughly three + * categories. </p> + * + * <p> The first category represents constraints which demand violations + * of software layering: exposing lexical details, one of the first things + * that <em>application</em> programming interfaces (APIs) hide. These + * invariably relate to XML entity handling, and to historical oddities + * of the XML validation semantics. Curiously, + * recent (Autumn 1999) conformance testing showed that these constraints are + * among those handled worst by existing XML validating parsers. Arguments + * have been made that each of these VCs should be turned into WFCs (most + * of them) or discarded (popular for the standalone declaration); in short, + * that these are bugs in the XML specification (not all via SGML): </p><ul> + * + * <li> The <em>Proper Declaration/PE Nesting</em> and + * <em>Proper Group/PE Nesting</em> VCs can't be tested because they + * require access to particularly low level lexical level information. + * In essence, the reason XML isn't a simple thing to parse is that + * it's not a context free grammar, and these constraints elevate that + * SGML-derived context sensitivity to the level of a semantic rule. + * + * <li> The <em>Standalone Document Declaration</em> VC can't be + * tested. This is for two reasons. First, this flag isn't made + * available through SAX2. Second, it also requires breaking that + * lexical layering boundary. (If you ever wondered why classes + * in compiler construction or language design barely mention the + * existence of context-sensitive grammars, it's because of messy + * issues like these.) + * + * <li> The <em>Entity Declared</em> VC can't be tested, because it + * also requires breaking that lexical layering boundary! There's also + * another issue: the VC wording (and seemingly intent) is ambiguous. + * (This is still true in the "Second edition" XML spec.) + * Since there is a WFC of the same name, everyone's life would be + * easier if references to undeclared parsed entities were always well + * formedness errors, regardless of whether they're parameter entities + * or not. (Note that nonvalidating parsers are not required + * to report all such well formedness errors if they don't read external + * parameter entities, although currently most XML parsers read them + * in an attempt to avoid problems from inconsistent parser behavior.) + * + * </ul> + * + * <p> The second category of limitations on this validation represent + * constraints associated with information that is not guaranteed to be + * available (or in one case, <em>is guaranteed not to be available</em>, + * through the SAX2 API: </p><ul> + * + * <li> The <em>Unique Element Type Declaration</em> VC may not be + * reportable, if the underlying parser happens not to expose + * multiple declarations. (Ælfred2 reports these validity + * errors directly.)</li> + * + * <li> Similarly, the <em>Unique Notation Name</em> VC, added in the + * 14-January-2000 XML spec errata to restrict typing models used by + * elements, may not be reportable. (Ælfred reports these + * validity errors directly.) </li> + * + * </ul> + * + * <p> A third category relates to ease of implementation. (Think of this + * as "bugs".) The most notable issue here is character handling. Rather + * than attempting to implement the voluminous character tables in the XML + * specification (Appendix B), Unicode rules are used directly from + * the java.lang.Character class. Recent JVMs have begun to diverge from + * the original specification for that class (Unicode 2.0), meaning that + * different JVMs may handle that aspect of conformance differently. + * </p> + * + * <p> Note that for some of the validity errors that SAX2 does not + * expose, a nonvalidating parser is permitted (by the XML specification) + * to report validity errors. When used with a parser that does so for + * the validity constraints mentioned above (or any other SAX2 event + * stream producer that does the same thing), overall conformance is + * substantially improved. + * + * @see gnu.xml.aelfred2.SAXDriver + * @see gnu.xml.aelfred2.XmlReader + * + * @author David Brownell + */ +public final class ValidationConsumer extends EventFilter +{ + // report error if we happen to notice a non-deterministic choice? + // we won't report buggy content models; just buggy instances + private static final boolean warnNonDeterministic = false; + + // for tracking active content models + private String rootName; + private Stack contentStack = new Stack (); + + // flags for "saved DTD" processing + private boolean disableDeclarations; + private boolean disableReset; + + // + // most VCs get tested when we see element start tags. the per-element + // info (including attributes) recorded here duplicates that found inside + // many nonvalidating parsers, hence dual lookups etc ... that's why a + // layered validator isn't going to be as fast as a non-layered one. + // + + // key = element name; value = ElementInfo + private Hashtable elements = new Hashtable (); + + // some VCs relate to ID/IDREF/IDREFS attributes + // key = id; value = boolean true (defd) or false (refd) + private Hashtable ids = new Hashtable (); + + // we just record declared notation and unparsed entity names. + // the implementation here is simple/slow; these features + // are seldom used, one hopes they'll wither away soon + private Vector notations = new Vector (5, 5); + private Vector nDeferred = new Vector (5, 5); + private Vector unparsed = new Vector (5, 5); + private Vector uDeferred = new Vector (5, 5); + + // note: DocBk 3.1.7 XML defines over 2 dozen notations, + // used when defining unparsed entities for graphics + // (and maybe in other places) + + + + /** + * Creates a pipeline terminus which consumes all events passed to + * it; this will report validity errors as if they were fatal errors, + * unless an error handler is assigned. + * + * @see #setErrorHandler + */ + // constructor used by PipelineFactory + // ... and want one taking system ID of an external subset + public ValidationConsumer () + { + this (null); + } + + /** + * Creates a pipeline filter which reports validity errors and then + * passes events on to the next consumer if they were not fatal. + * + * @see #setErrorHandler + */ + // constructor used by PipelineFactory + // ... and want one taking system ID of an external subset + // (which won't send declaration events) + public ValidationConsumer (EventConsumer next) + { + super (next); + + setContentHandler (this); + setDTDHandler (this); + try { setProperty (DECL_HANDLER, this); } + catch (Exception e) { /* "can't happen" */ } + try { setProperty (LEXICAL_HANDLER, this); } + catch (Exception e) { /* "can't happen" */ } + } + + + private static final String fakeRootName + = ":Nobody:in:their_Right.Mind_would:use:this-name:1x:"; + + /** + * Creates a validation consumer which is preloaded with the DTD provided. + * It does this by constructing a document with that DTD, then parsing + * that document and recording its DTD declarations. Then it arranges + * not to modify that information. + * + * <p> The resulting validation consumer will only validate against + * the specified DTD, regardless of whether some other DTD is found + * in a document being parsed. + * + * @param rootName The name of the required root element; if this is + * null, any root element name will be accepted. + * @param publicId If non-null and there is a non-null systemId, this + * identifier provides an alternate access identifier for the DTD's + * external subset. + * @param systemId If non-null, this is a URI (normally URL) that + * may be used to access the DTD's external subset. + * @param internalSubset If non-null, holds literal markup declarations + * comprising the DTD's internal subset. + * @param resolver If non-null, this will be provided to the parser for + * use when resolving parameter entities (including any external subset). + * @param resolver If non-null, this will be provided to the parser for + * use when resolving parameter entities (including any external subset). + * @param minimalElement If non-null, a minimal valid document. + * + * @exception SAXNotSupportedException If the default SAX parser does + * not support the standard lexical or declaration handlers. + * @exception SAXParseException If the specified DTD has either + * well-formedness or validity errors + * @exception IOException If the specified DTD can't be read for + * some reason + */ + public ValidationConsumer ( + String rootName, + String publicId, + String systemId, + String internalSubset, + EntityResolver resolver, + String minimalDocument + ) throws SAXException, IOException + { + this (null); + + disableReset = true; + if (rootName == null) + rootName = fakeRootName; + + // + // Synthesize document with that DTD; is it possible to do + // better for the declaration of the root element? + // + // NOTE: can't use SAX2 to write internal subsets. + // + StringWriter writer = new StringWriter (); + + writer.write ("<!DOCTYPE "); + writer.write (rootName); + if (systemId != null) { + writer.write ("\n "); + if (publicId != null) { + writer.write ("PUBLIC '"); + writer.write (publicId); + writer.write ("'\n\t'"); + } else + writer.write ("SYSTEM '"); + writer.write (systemId); + writer.write ("'"); + } + writer.write (" [ "); + if (rootName == fakeRootName) { + writer.write ("\n<!ELEMENT "); + writer.write (rootName); + writer.write (" EMPTY>"); + } + if (internalSubset != null) + writer.write (internalSubset); + writer.write ("\n ]>"); + + if (minimalDocument != null) { + writer.write ("\n"); + writer.write (minimalDocument); + writer.write ("\n"); + } else { + writer.write (" <"); + writer.write (rootName); + writer.write ("/>\n"); + } + minimalDocument = writer.toString (); + + // + // OK, load it + // + XMLReader producer; + + producer = XMLReaderFactory.createXMLReader (); + bind (producer, this); + + if (resolver != null) + producer.setEntityResolver (resolver); + + InputSource in; + + in = new InputSource (new StringReader (minimalDocument)); + producer.parse (in); + + disableDeclarations = true; + if (rootName == fakeRootName) + this.rootName = null; + } + + private void resetState () + { + if (!disableReset) { + rootName = null; + contentStack.removeAllElements (); + elements.clear (); + ids.clear (); + + notations.removeAllElements (); + nDeferred.removeAllElements (); + unparsed.removeAllElements (); + uDeferred.removeAllElements (); + } + } + + + private void warning (String description) + throws SAXException + { + ErrorHandler errHandler = getErrorHandler (); + Locator locator = getDocumentLocator (); + SAXParseException err; + + if (errHandler == null) + return; + + if (locator == null) + err = new SAXParseException (description, null, null, -1, -1); + else + err = new SAXParseException (description, locator); + errHandler.warning (err); + } + + // package private (for ChildrenRecognizer) + private void error (String description) + throws SAXException + { + ErrorHandler errHandler = getErrorHandler (); + Locator locator = getDocumentLocator (); + SAXParseException err; + + if (locator == null) + err = new SAXParseException (description, null, null, -1, -1); + else + err = new SAXParseException (description, locator); + if (errHandler != null) + errHandler.error (err); + else // else we always treat it as fatal! + throw err; + } + + private void fatalError (String description) + throws SAXException + { + ErrorHandler errHandler = getErrorHandler (); + Locator locator = getDocumentLocator (); + SAXParseException err; + + if (locator != null) + err = new SAXParseException (description, locator); + else + err = new SAXParseException (description, null, null, -1, -1); + if (errHandler != null) + errHandler.fatalError (err); + // we always treat this as fatal, regardless of the handler + throw err; + } + + + private static boolean isExtender (char c) + { + // [88] Extender ::= ... + return c == 0x00b7 || c == 0x02d0 || c == 0x02d1 || c == 0x0387 + || c == 0x0640 || c == 0x0e46 || c == 0x0ec6 || c == 0x3005 + || (c >= 0x3031 && c <= 0x3035) + || (c >= 0x309d && c <= 0x309e) + || (c >= 0x30fc && c <= 0x30fe); + } + + + // use augmented Unicode rules, not full XML rules + private boolean isName (String name, String context, String id) + throws SAXException + { + char buf [] = name.toCharArray (); + boolean pass = true; + + if (!Character.isUnicodeIdentifierStart (buf [0]) + && ":_".indexOf (buf [0]) == -1) + pass = false; + else { + int max = buf.length; + for (int i = 1; pass && i < max; i++) { + char c = buf [i]; + if (!Character.isUnicodeIdentifierPart (c) + && ":-_.".indexOf (c) == -1 + && !isExtender (c)) + pass = false; + } + } + + if (!pass) + error ("In " + context + " for " + id + + ", '" + name + "' is not a name"); + return pass; // true == OK + } + + // use augmented Unicode rules, not full XML rules + private boolean isNmtoken (String nmtoken, String context, String id) + throws SAXException + { + char buf [] = nmtoken.toCharArray (); + boolean pass = true; + int max = buf.length; + + // XXX make this share code with isName + + for (int i = 0; pass && i < max; i++) { + char c = buf [i]; + if (!Character.isUnicodeIdentifierPart (c) + && ":-_.".indexOf (c) == -1 + && !isExtender (c)) + pass = false; + } + + if (!pass) + error ("In " + context + " for " + id + + ", '" + nmtoken + "' is not a name token"); + return pass; // true == OK + } + + private void checkEnumeration (String value, String type, String name) + throws SAXException + { + if (!hasMatch (value, type)) + // VC: Enumeration + error ("Value '" + value + + "' for attribute '" + name + + "' is not permitted: " + type); + } + + // used to test enumerated attributes and mixed content models + // package private + static boolean hasMatch (String value, String orList) + { + int len = value.length (); + int max = orList.length () - len; + + for (int start = 0; + (start = orList.indexOf (value, start)) != -1; + start++) { + char c; + + if (start > max) + break; + c = orList.charAt (start - 1); + if (c != '|' && c != '('/*)*/) + continue; + c = orList.charAt (start + len); + if (c != '|' && /*(*/ c != ')') + continue; + return true; + } + return false; + } + + /** + * <b>LexicalHandler</b> Records the declaration of the root + * element, so it can be verified later. + * Passed to the next consumer, unless this one was + * preloaded with a particular DTD. + */ + public void startDTD (String name, String publicId, String systemId) + throws SAXException + { + if (disableDeclarations) + return; + + rootName = name; + super.startDTD (name, publicId, systemId); + } + + /** + * <b>LexicalHandler</b> Verifies that all referenced notations + * and unparsed entities have been declared. + * Passed to the next consumer, unless this one was + * preloaded with a particular DTD. + */ + public void endDTD () + throws SAXException + { + if (disableDeclarations) + return; + + // this is a convenient hook for end-of-dtd checks, but we + // could also trigger it in the first startElement call. + // locator info is more appropriate here though. + + // VC: Notation Declared (NDATA can refer to them before decls, + // as can NOTATION attribute enumerations and defaults) + int length = nDeferred.size (); + for (int i = 0; i < length; i++) { + String notation = (String) nDeferred.elementAt (i); + if (!notations.contains (notation)) { + error ("A declaration referred to notation '" + notation + + "' which was never declared"); + } + } + nDeferred.removeAllElements (); + + // VC: Entity Name (attribute values can refer to them + // before they're declared); VC Attribute Default Legal + length = uDeferred.size (); + for (int i = 0; i < length; i++) { + String entity = (String) uDeferred.elementAt (i); + if (!unparsed.contains (entity)) { + error ("An attribute default referred to entity '" + entity + + "' which was never declared"); + } + } + uDeferred.removeAllElements (); + super.endDTD (); + } + + + // These are interned, so we can rely on "==" to find the type of + // all attributes except enumerations ... + // "(this|or|that|...)" and "NOTATION (this|or|that|...)" + static final String types [] = { + "CDATA", + "ID", "IDREF", "IDREFS", + "NMTOKEN", "NMTOKENS", + "ENTITY", "ENTITIES" + }; + + + /** + * <b>DecllHandler</b> Records attribute declaration for later use + * in validating document content, and checks validity constraints + * that are applicable to attribute declarations. + * Passed to the next consumer, unless this one was + * preloaded with a particular DTD. + */ + public void attributeDecl ( + String eName, + String aName, + String type, + String mode, + String value + ) throws SAXException + { + if (disableDeclarations) + return; + + ElementInfo info = (ElementInfo) elements.get (eName); + AttributeInfo ainfo = new AttributeInfo (); + boolean checkOne = false; + boolean interned = false; + + // cheap interning of type names and #FIXED, #REQUIRED + // for faster startElement (we can use "==") + for (int i = 0; i < types.length; i++) { + if (types [i].equals (type)) { + type = types [i]; + interned = true; + break; + } + } + if ("#FIXED".equals (mode)) + mode = "#FIXED"; + else if ("#REQUIRED".equals (mode)) + mode = "#REQUIRED"; + + ainfo.type = type; + ainfo.mode = mode; + ainfo.value = value; + + // we might not have seen the content model yet + if (info == null) { + info = new ElementInfo (eName); + elements.put (eName, info); + } + if ("ID" == type) { + checkOne = true; + if (!("#REQUIRED" == mode || "#IMPLIED".equals (mode))) { + // VC: ID Attribute Default + error ("ID attribute '" + aName + + "' must be #IMPLIED or #REQUIRED"); + } + + } else if (!interned && type.startsWith ("NOTATION ")) { + checkOne = true; + + // VC: Notation Attributes (notations must be declared) + StringTokenizer tokens = new StringTokenizer ( + type.substring (10, type.lastIndexOf (')')), + "|"); + while (tokens.hasMoreTokens ()) { + String token = tokens.nextToken (); + if (!notations.contains (token)) + nDeferred.addElement (token); + } + } + if (checkOne) { + for (Enumeration e = info.attributes.keys (); + e.hasMoreElements (); + /* NOP */) { + String name; + AttributeInfo ainfo2; + + name = (String) e.nextElement (); + ainfo2 = (AttributeInfo) info.attributes.get (name); + if (type == ainfo2.type || !interned /* NOTATION */) { + // VC: One ID per Element Type + // VC: One Notation per Element TYpe + error ("Element '" + eName + + "' already has an attribute of type " + + (interned ? "NOTATION" : type) + + " ('" + name + + "') so '" + aName + + "' is a validity error"); + } + } + } + + // VC: Attribute Default Legal + if (value != null) { + + if ("CDATA" == type) { + // event source rejected '<' + + } else if ("NMTOKEN" == type) { + // VC: Name Token (is a nmtoken) + isNmtoken (value, "attribute default", aName); + + } else if ("NMTOKENS" == type) { + // VC: Name Token (is a nmtoken; at least one value) + StringTokenizer tokens = new StringTokenizer (value); + if (!tokens.hasMoreTokens ()) + error ("Default for attribute '" + aName + + "' must have at least one name token."); + else do { + String token = tokens.nextToken (); + isNmtoken (token, "attribute default", aName); + } while (tokens.hasMoreTokens ()); + + } else if ("IDREF" == type || "ENTITY" == type) { + // VC: Entity Name (is a name) + // VC: IDREF (is a name) (is declared) + isName (value, "attribute default", aName); + if ("ENTITY" == type && !unparsed.contains (value)) + uDeferred.addElement (value); + + } else if ("IDREFS" == type || "ENTITIES" == type) { + // VC: Entity Name (is a name; at least one value) + // VC: IDREF (is a name; at least one value) + StringTokenizer names = new StringTokenizer (value); + if (!names.hasMoreTokens ()) + error ("Default for attribute '" + aName + + "' must have at least one name."); + else do { + String name = names.nextToken (); + isName (name, "attribute default", aName); + if ("ENTITIES" == type && !unparsed.contains (name)) + uDeferred.addElement (value); + } while (names.hasMoreTokens ()); + + } else if (type.charAt (0) == '(' /*)*/ ) { + // VC: Enumeration (must match) + checkEnumeration (value, type, aName); + + } else if (!interned && checkOne) { /* NOTATION */ + // VC: Notation attributes (must be names) + isName (value, "attribute default", aName); + + // VC: Notation attributes (must be declared) + if (!notations.contains (value)) + nDeferred.addElement (value); + + // VC: Enumeration (must match) + checkEnumeration (value, type, aName); + + } else if ("ID" != type) + throw new RuntimeException ("illegal attribute type: " + type); + } + + if (info.attributes.get (aName) == null) + info.attributes.put (aName, ainfo); + /* + else + warning ("Element '" + eName + + "' already has an attribute named '" + aName + "'"); + */ + + if ("xml:space".equals (aName)) { + if (!("(default|preserve)".equals (type) + || "(preserve|default)".equals (type) + // these next two are arguable; XHTML's DTD doesn't + // deserve errors. After all, it's not like any + // illegal _value_ could pass ... + || "(preserve)".equals (type) + || "(default)".equals (type) + )) + error ( + "xml:space attribute type must be like '(default|preserve)'" + + " not '" + type + "'" + ); + + } + super.attributeDecl (eName, aName, type, mode, value); + } + + /** + * <b>DecllHandler</b> Records the element declaration for later use + * when checking document content, and checks validity constraints that + * apply to element declarations. Passed to the next consumer, unless + * this one was preloaded with a particular DTD. + */ + public void elementDecl (String name, String model) + throws SAXException + { + if (disableDeclarations) + return; + + ElementInfo info = (ElementInfo) elements.get (name); + + // we might have seen an attribute decl already + if (info == null) { + info = new ElementInfo (name); + elements.put (name, info); + } + if (info.model != null) { + // NOTE: not all parsers can report such duplicates. + // VC: Unique Element Type Declaration + error ("Element type '" + name + + "' was already declared."); + } else { + info.model = model; + + // VC: No Duplicate Types (in mixed content models) + if (model.charAt (1) == '#') // (#PCDATA... + info.getRecognizer (this); + } + super.elementDecl (name, model); + } + + /** + * <b>DecllHandler</b> passed to the next consumer, unless this + * one was preloaded with a particular DTD + */ + public void internalEntityDecl (String name, String value) + throws SAXException + { + if (!disableDeclarations) + super.internalEntityDecl (name, value); + } + + /** + * <b>DecllHandler</b> passed to the next consumer, unless this + * one was preloaded with a particular DTD + */ + public void externalEntityDecl (String name, + String publicId, String systemId) + throws SAXException + { + if (!disableDeclarations) + super.externalEntityDecl (name, publicId, systemId); + } + + + /** + * <b>DTDHandler</b> Records the notation name, for checking + * NOTATIONS attribute values and declararations of unparsed + * entities. Passed to the next consumer, unless this one was + * preloaded with a particular DTD. + */ + public void notationDecl (String name, String publicId, String systemId) + throws SAXException + { + if (disableDeclarations) + return; + + notations.addElement (name); + super.notationDecl (name, publicId, systemId); + } + + /** + * <b>DTDHandler</b> Records the entity name, for checking + * ENTITY and ENTITIES attribute values; records the notation + * name if it hasn't yet been declared. Passed to the next consumer, + * unless this one was preloaded with a particular DTD. + */ + public void unparsedEntityDecl ( + String name, + String publicId, + String systemId, + String notationName + ) throws SAXException + { + if (disableDeclarations) + return; + + unparsed.addElement (name); + if (!notations.contains (notationName)) + nDeferred.addElement (notationName); + super.unparsedEntityDecl (name, publicId, systemId, notationName); + } + + + /** + * <b>ContentHandler</b> Ensures that state from any previous parse + * has been deleted. + * Passed to the next consumer. + */ + public void startDocument () + throws SAXException + { + resetState (); + super.startDocument (); + } + + + private static boolean isAsciiLetter (char c) + { + return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z'); + } + + + /** + * <b>ContentHandler</b> Reports a fatal exception. Validating + * XML processors may not skip any entities. + */ + public void skippedEntity (String name) + throws SAXException + { + fatalError ("may not skip entities"); + } + + /* + * SAX2 doesn't expand non-PE refs in attribute defaults... + */ + private String expandDefaultRefs (String s) + throws SAXException + { + if (s.indexOf ('&') < 0) + return s; + +// FIXME: handle &#nn; &#xnn; &name; + String message = "Can't expand refs in attribute default: " + s; + warning (message); + + return s; + } + + /** + * <b>ContentHandler</b> Performs validity checks against element + * (and document) content models, and attribute values. + * Passed to the next consumer. + */ + public void startElement ( + String uri, + String localName, + String qName, + Attributes atts + ) throws SAXException + { + // + // First check content model for the enclosing scope. + // + if (contentStack.isEmpty ()) { + // VC: Root Element Type + if (!qName.equals (rootName)) { + if (rootName == null) + warning ("This document has no DTD, can't be valid"); + else + error ("Root element type '" + qName + + "' was declared to be '" + rootName + "'"); + } + } else { + Recognizer state = (Recognizer) contentStack.peek (); + + if (state != null) { + Recognizer newstate = state.acceptElement (qName); + + if (newstate == null) + error ("Element type '" + qName + + "' in element '" + state.type.name + + "' violates content model " + state.type.model + ); + if (newstate != state) { + contentStack.pop (); + contentStack.push (newstate); + } + } + } + + // + // Then check that this element was declared, and push the + // object used to validate its content model onto our stack. + // + // This is where the recognizer gets created, if needed; if + // it's a "children" (elements) content model, an NDFA is + // created. (One recognizer is used per content type, no + // matter how complex that recognizer is.) + // + ElementInfo info; + + info = (ElementInfo) elements.get (qName); + if (info == null || info.model == null) { + // VC: Element Valid (base clause) + error ("Element type '" + qName + "' was not declared"); + contentStack.push (null); + + // for less diagnostic noise, fake a declaration. + elementDecl (qName, "ANY"); + } else + contentStack.push (info.getRecognizer (this)); + + // + // Then check each attribute present + // + int len; + String aname; + AttributeInfo ainfo; + + if (atts != null) + len = atts.getLength (); + else + len = 0; + + for (int i = 0; i < len; i++) { + aname = atts.getQName (i); + + if (info == null + || (ainfo = (AttributeInfo) info.attributes.get (aname)) + == null) { + // VC: Attribute Value Type + error ("Attribute '" + aname + + "' was not declared for element type " + qName); + continue; + } + + String value = atts.getValue (i); + + // note that "==" for type names and "#FIXED" is correct + // (and fast) since we've interned those literals. + + if ("#FIXED" == ainfo.mode) { + String expanded = expandDefaultRefs (ainfo.value); + + // VC: Fixed Attribute Default + if (!value.equals (expanded)) { + error ("Attribute '" + aname + + "' must match " + expanded + ); + continue; + } + } + + if ("CDATA" == ainfo.type) + continue; + + // + // For all other attribute types, there are various + // rules to follow. + // + + if ("ID" == ainfo.type) { + // VC: ID (must be a name) + if (isName (value, "ID attribute", aname)) { + if (Boolean.TRUE == ids.get (value)) + // VC: ID (appears once) + error ("ID attribute " + aname + + " uses an ID value '" + value + + "' which was already declared."); + else + // any forward refs are no longer problems + ids.put (value, Boolean.TRUE); + } + continue; + } + + if ("IDREF" == ainfo.type) { + // VC: IDREF (value must be a name) + if (isName (value, "IDREF attribute", aname)) { + // VC: IDREF (must match some ID attribute) + if (ids.get (value) == null) + // new -- assume it's a forward ref + ids.put (value, Boolean.FALSE); + } + continue; + } + + if ("IDREFS" == ainfo.type) { + StringTokenizer tokens = new StringTokenizer (value, " "); + + if (!tokens.hasMoreTokens ()) { + // VC: IDREF (one or more values) + error ("IDREFS attribute " + aname + + " must have at least one ID ref"); + } else do { + String id = tokens.nextToken (); + + // VC: IDREF (value must be a name) + if (isName (id, "IDREFS attribute", aname)) { + // VC: IDREF (must match some ID attribute) + if (ids.get (id) == null) + // new -- assume it's a forward ref + ids.put (id, Boolean.FALSE); + } + } while (tokens.hasMoreTokens ()); + continue; + } + + if ("NMTOKEN" == ainfo.type) { + // VC: Name Token (is a name token) + isNmtoken (value, "NMTOKEN attribute", aname); + continue; + } + + if ("NMTOKENS" == ainfo.type) { + StringTokenizer tokens = new StringTokenizer (value, " "); + + if (!tokens.hasMoreTokens ()) { + // VC: Name Token (one or more values) + error ("NMTOKENS attribute " + aname + + " must have at least one name token"); + } else do { + String token = tokens.nextToken (); + + // VC: Name Token (is a name token) + isNmtoken (token, "NMTOKENS attribute", aname); + } while (tokens.hasMoreTokens ()); + continue; + } + + if ("ENTITY" == ainfo.type) { + if (!unparsed.contains (value)) + // VC: Entity Name + error ("Value of attribute '" + aname + + "' refers to unparsed entity '" + value + + "' which was not declared."); + continue; + } + + if ("ENTITIES" == ainfo.type) { + StringTokenizer tokens = new StringTokenizer (value, " "); + + if (!tokens.hasMoreTokens ()) { + // VC: Entity Name (one or more values) + error ("ENTITIES attribute " + aname + + " must have at least one name token"); + } else do { + String entity = tokens.nextToken (); + + if (!unparsed.contains (entity)) + // VC: Entity Name + error ("Value of attribute '" + aname + + "' refers to unparsed entity '" + entity + + "' which was not declared."); + } while (tokens.hasMoreTokens ()); + continue; + } + + // + // check for enumerations last; more expensive + // + if (ainfo.type.charAt (0) == '(' /*)*/ + || ainfo.type.startsWith ("NOTATION ") + ) { + // VC: Enumeration (value must be defined) + checkEnumeration (value, ainfo.type, aname); + continue; + } + } + + // + // Last, check that all #REQUIRED attributes were provided + // + if (info != null) { + Hashtable table = info.attributes; + + if (table.size () != 0) { + Enumeration e = table.keys (); + + // XXX table.keys uses the heap, bleech -- slows things + + while (e.hasMoreElements ()) { + aname = (String) e.nextElement (); + ainfo = (AttributeInfo) table.get (aname); + + // "#REQUIRED" mode was interned in attributeDecl + if ("#REQUIRED" == ainfo.mode + && atts.getValue (aname) == null) { + // VC: Required Attribute + error ("Attribute '" + aname + "' must be specified " + + "for element type " + qName); + } + } + } + } + super.startElement (uri, localName, qName, atts); + } + + /** + * <b>ContentHandler</b> Reports a validity error if the element's content + * model does not permit character data. + * Passed to the next consumer. + */ + public void characters (char ch [], int start, int length) + throws SAXException + { + Recognizer state; + + if (contentStack.empty ()) + state = null; + else + state = (Recognizer) contentStack.peek (); + + // NOTE: if this ever supports with SAX parsers that don't + // report ignorable whitespace as such (only XP?), this class + // needs to morph it into ignorableWhitespace() as needed ... + + if (state != null && !state.acceptCharacters ()) + // VC: Element Valid (clauses three, four -- see recognizer) + error ("Character content not allowed in element " + + state.type.name); + + super.characters (ch, start, length); + } + + + /** + * <b>ContentHandler</b> Reports a validity error if the element's content + * model does not permit end-of-element yet, or a well formedness error + * if there was no matching startElement call. + * Passed to the next consumer. + */ + public void endElement (String uri, String localName, String qName) + throws SAXException + { + try { + Recognizer state = (Recognizer) contentStack.pop (); + + if (state != null && !state.completed ()) + // VC: Element valid (clauses two, three, four; see Recognizer) + error ("Premature end for element '" + + state.type.name + + "', content model " + + state.type.model); + + // could insist on match of start element, but that's + // something the input stream must to guarantee. + + } catch (EmptyStackException e) { + fatalError ("endElement without startElement: " + qName + + ((uri == null) + ? "" + : ( " { '" + uri + "', " + localName + " }"))); + } + super.endElement (uri, localName, qName); + } + + /** + * <b>ContentHandler</b> Checks whether all ID values that were + * referenced have been declared, and releases all resources. + * Passed to the next consumer. + * + * @see #setDocumentLocator + */ + public void endDocument () + throws SAXException + { + for (Enumeration idNames = ids.keys (); + idNames.hasMoreElements (); + /* NOP */) { + String id = (String) idNames.nextElement (); + + if (Boolean.FALSE == ids.get (id)) { + // VC: IDREF (must match ID) + error ("Undeclared ID value '" + id + + "' was referred to by an IDREF/IDREFS attribute"); + } + } + + resetState (); + super.endDocument (); + } + + + /** Holds per-element declarations */ + static private final class ElementInfo + { + String name; + String model; + + // key = attribute name; value = AttributeInfo + Hashtable attributes = new Hashtable (11); + + ElementInfo (String n) { name = n; } + + private Recognizer recognizer; + + // for validating content models: one per type, shared, + // and constructed only on demand ... so unused elements do + // not need to consume resources. + Recognizer getRecognizer (ValidationConsumer consumer) + throws SAXException + { + if (recognizer == null) { + if ("ANY".equals (model)) + recognizer = ANY; + else if ("EMPTY".equals (model)) + recognizer = new EmptyRecognizer (this); + else if ('#' == model.charAt (1)) + // n.b. this constructor does a validity check + recognizer = new MixedRecognizer (this, consumer); + else + recognizer = new ChildrenRecognizer (this, consumer); + } + return recognizer; + } + } + + /** Holds per-attribute declarations */ + static private final class AttributeInfo + { + String type; + String mode; // #REQUIRED, etc (or null) + String value; // or null + } + + + // + // Content model validation + // + + static private final Recognizer ANY = new Recognizer (null); + + + // Base class defines the calls used to validate content, + // and supports the "ANY" content model + static private class Recognizer + { + final ElementInfo type; + + Recognizer (ElementInfo t) { type = t; } + + // return true iff character data is legal here + boolean acceptCharacters () + throws SAXException + // VC: Element Valid (third and fourth clauses) + { return true; } + + // null return = failure + // otherwise, next state (like an FSM) + // prerequisite: tested that name was declared + Recognizer acceptElement (String name) + throws SAXException + // VC: Element Valid (fourth clause) + { return this; } + + // return true iff model is completed, can finish + boolean completed () + throws SAXException + // VC: Element Valid (fourth clause) + { return true; } + + public String toString () + // n.b. "children" is the interesting case! + { return (type == null) ? "ANY" : type.model; } + } + + // "EMPTY" content model -- no characters or elements + private static final class EmptyRecognizer extends Recognizer + { + public EmptyRecognizer (ElementInfo type) + { super (type); } + + // VC: Element Valid (first clause) + boolean acceptCharacters () + { return false; } + + // VC: Element Valid (first clause) + Recognizer acceptElement (String name) + { return null; } + } + + // "Mixed" content model -- ANY, but restricts elements + private static final class MixedRecognizer extends Recognizer + { + private String permitted []; + + // N.B. constructor tests for duplicated element names (VC) + public MixedRecognizer (ElementInfo t, ValidationConsumer v) + throws SAXException + { + super (t); + + // (#PCDATA...)* or (#PCDATA) ==> ... or empty + // with the "..." being "|elname|..." + StringTokenizer tokens = new StringTokenizer ( + t.model.substring (8, t.model.lastIndexOf (')')), + "|"); + Vector vec = new Vector (); + + while (tokens.hasMoreTokens ()) { + String token = tokens.nextToken (); + + if (vec.contains (token)) + v.error ("element " + token + + " is repeated in mixed content model: " + + t.model); + else + vec.addElement (token.intern ()); + } + permitted = new String [vec.size ()]; + for (int i = 0; i < permitted.length; i++) + permitted [i] = (String) vec.elementAt (i); + + // in one large machine-derived DTD sample, most of about + // 250 mixed content models were empty, and 25 had ten or + // more entries. 2 had over a hundred elements. Linear + // search isn't obviously wrong. + } + + // VC: Element Valid (third clause) + Recognizer acceptElement (String name) + { + int length = permitted.length; + + // first pass -- optimistic w.r.t. event source interning + // (and document validity) + for (int i = 0; i < length; i++) + if (permitted [i] == name) + return this; + // second pass -- pessimistic w.r.t. event source interning + for (int i = 0; i < length; i++) + if (permitted [i].equals (name)) + return this; + return null; + } + } + + + // recognizer loop flags, see later + private static final int F_LOOPHEAD = 0x01; + private static final int F_LOOPNEXT = 0x02; + + // for debugging -- used to label/count nodes in toString() + private static int nodeCount; + + /** + * "Children" content model -- these are nodes in NDFA state graphs. + * They work in fixed space. Note that these graphs commonly have + * cycles, handling features such as zero-or-more and one-or-more. + * + * <p>It's readonly, so only one copy is ever needed. The content model + * stack may have any number of pointers into each graph, when a model + * happens to be needed more than once due to element nesting. Since + * traversing the graph just moves to another node, and never changes + * it, traversals never interfere with each other. + * + * <p>There is an option to report non-deterministic models. These are + * always XML errors, but ones which are not often reported despite the + * fact that they can lead to different validating parsers giving + * different results for the same input. (The XML spec doesn't require + * them to be reported.) + * + * <p><b>FIXME</b> There's currently at least one known bug here, in that + * it's not actually detecting the non-determinism it tries to detect. + * (Of the "optional.xml" test, the once-or-twice-2* tests are all non-D; + * maybe some others.) This may relate to the issue flagged below as + * "should not" happen (but it was), which showed up when patching the + * graph to have one exit node (or more EMPTY nodes). + */ + private static final class ChildrenRecognizer extends Recognizer + implements Cloneable + { + // for reporting non-deterministic content models + // ... a waste of space if we're not reporting those! + // ... along with the 'model' member (in base class) + private ValidationConsumer consumer; + + // for CHOICE nodes -- each component is an arc that + // accepts a different NAME (or is EMPTY indicating + // NDFA termination). + private Recognizer components []; + + // for NAME/SEQUENCE nodes -- accepts that NAME and + // then goes to the next node (CHOICE, NAME, EMPTY). + private String name; + private Recognizer next; + + // loops always point back to a CHOICE node. we mark such choice + // nodes (F_LOOPHEAD) for diagnostics and faster deep cloning. + // We also mark nodes before back pointers (F_LOOPNEXT), to ensure + // termination when we patch sequences and loops. + private int flags; + + + // prevent a needless indirection between 'this' and 'node' + private void copyIn (ChildrenRecognizer node) + { + // model & consumer are already set + components = node.components; + name = node.name; + next = node.next; + flags = node.flags; + } + + // used to construct top level "children" content models, + public ChildrenRecognizer (ElementInfo type, ValidationConsumer vc) + { + this (vc, type); + populate (type.model.toCharArray (), 0); + patchNext (new EmptyRecognizer (type), null); + } + + // used internally; populating is separate + private ChildrenRecognizer (ValidationConsumer vc, ElementInfo type) + { + super (type); + consumer = vc; + } + + + // + // When rewriting some graph nodes we need deep clones in one case; + // mostly shallow clones (what the JVM handles for us) are fine. + // + private ChildrenRecognizer shallowClone () + { + try { + return (ChildrenRecognizer) clone (); + } catch (CloneNotSupportedException e) { + throw new Error ("clone"); + } + } + + private ChildrenRecognizer deepClone () + { + return deepClone (new Hashtable (37)); + } + + private ChildrenRecognizer deepClone (Hashtable table) + { + ChildrenRecognizer retval; + + if ((flags & F_LOOPHEAD) != 0) { + retval = (ChildrenRecognizer) table.get (this); + if (retval != null) + return this; + + retval = shallowClone (); + table.put (this, retval); + } else + retval = shallowClone (); + + if (next != null) { + if (next instanceof ChildrenRecognizer) + retval.next = ((ChildrenRecognizer)next) + .deepClone (table); + else if (!(next instanceof EmptyRecognizer)) + throw new RuntimeException ("deepClone"); + } + + if (components != null) { + retval.components = new Recognizer [components.length]; + for (int i = 0; i < components.length; i++) { + Recognizer temp = components [i]; + + if (temp == null) + retval.components [i] = null; + else if (temp instanceof ChildrenRecognizer) + retval.components [i] = ((ChildrenRecognizer)temp) + .deepClone (table); + else if (!(temp instanceof EmptyRecognizer)) + throw new RuntimeException ("deepClone"); + } + } + + return retval; + } + + // connect subgraphs, first to next (sequencing) + private void patchNext (Recognizer theNext, Hashtable table) + { + // backpointers must not be repatched or followed + if ((flags & F_LOOPNEXT) != 0) + return; + + // XXX this table "shouldn't" be needed, right? + // but some choice nodes looped if it isn't there. + if (table != null && table.get (this) != null) + return; + if (table == null) + table = new Hashtable (); + + // NAME/SEQUENCE + if (name != null) { + if (next == null) + next = theNext; + else if (next instanceof ChildrenRecognizer) { + ((ChildrenRecognizer)next).patchNext (theNext, table); + } else if (!(next instanceof EmptyRecognizer)) + throw new RuntimeException ("patchNext"); + return; + } + + // CHOICE + for (int i = 0; i < components.length; i++) { + if (components [i] == null) + components [i] = theNext; + else if (components [i] instanceof ChildrenRecognizer) { + ((ChildrenRecognizer)components [i]) + .patchNext (theNext, table); + } else if (!(components [i] instanceof EmptyRecognizer)) + throw new RuntimeException ("patchNext"); + } + + if (table != null && (flags & F_LOOPHEAD) != 0) + table.put (this, this); + } + + /** + * Parses a 'children' spec (or recursively 'cp') and makes this + * become a regular graph node. + * + * @return index after this particle + */ + private int populate (char parseBuf [], int startPos) + { + int nextPos = startPos + 1; + char c; + + if (nextPos < 0 || nextPos >= parseBuf.length) + throw new IndexOutOfBoundsException (); + + // Grammar of the string is from the XML spec, but + // with whitespace removed by the SAX parser. + + // children ::= (choice | seq) ('?' | '*' | '+')? + // cp ::= (Name | choice | seq) ('?' | '*' | '+')? + // choice ::= '(' cp ('|' choice)* ')' + // seq ::= '(' cp (',' choice)* ')' + + // interior nodes only + // cp ::= name ... + if (parseBuf [startPos] != '('/*)*/) { + boolean done = false; + do { + switch (c = parseBuf [nextPos]) { + case '?': case '*': case '+': + case '|': case ',': + case /*(*/ ')': + done = true; + continue; + default: + nextPos++; + continue; + } + } while (!done); + name = new String (parseBuf, startPos, nextPos - startPos); + + // interior OR toplevel nodes + // cp ::= choice .. + // cp ::= seq .. + } else { + // collect everything as a separate list, and merge it + // into "this" later if we can (SEQUENCE or singleton) + ChildrenRecognizer first; + + first = new ChildrenRecognizer (consumer, type); + nextPos = first.populate (parseBuf, nextPos); + c = parseBuf [nextPos++]; + + if (c == ',' || c == '|') { + ChildrenRecognizer current = first; + char separator = c; + Vector v = null; + + if (separator == '|') { + v = new Vector (); + v.addElement (first); + } + + do { + ChildrenRecognizer link; + + link = new ChildrenRecognizer (consumer, type); + nextPos = link.populate (parseBuf, nextPos); + + if (separator == ',') { + current.patchNext (link, null); + current = link; + } else + v.addElement (link); + + c = parseBuf [nextPos++]; + } while (c == separator); + + // choice ... collect everything into one array. + if (separator == '|') { + // assert v.size() > 1 + components = new Recognizer [v.size ()]; + for (int i = 0; i < components.length; i++) { + components [i] = (Recognizer) + v.elementAt (i); + } + // assert flags == 0 + + // sequence ... merge into "this" to be smaller. + } else + copyIn (first); + + // treat singletons like one-node sequences. + } else + copyIn (first); + + if (c != /*(*/ ')') + throw new RuntimeException ("corrupt content model"); + } + + // + // Arity is optional, and the root of all fun. We keep the + // FSM state graph simple by only having NAME/SEQUENCE and + // CHOICE nodes (or EMPTY to terminate a model), easily + // evaluated. So we rewrite each node that has arity, using + // those primitives. We create loops here, if needed. + // + if (nextPos < parseBuf.length) { + c = parseBuf [nextPos]; + if (c == '?' || c == '*' || c == '+') { + nextPos++; + + // Rewrite 'zero-or-one' "?" arity to a CHOICE: + // - SEQUENCE (clone, what's next) + // - or, what's next + // Size cost: N --> N + 1 + if (c == '?') { + Recognizer once = shallowClone (); + + components = new Recognizer [2]; + components [0] = once; + // components [1] initted to null + name = null; + next = null; + flags = 0; + + + // Rewrite 'zero-or-more' "*" arity to a CHOICE. + // - LOOP (clone, back to this CHOICE) + // - or, what's next + // Size cost: N --> N + 1 + } else if (c == '*') { + ChildrenRecognizer loop = shallowClone (); + + loop.patchNext (this, null); + loop.flags |= F_LOOPNEXT; + flags = F_LOOPHEAD; + + components = new Recognizer [2]; + components [0] = loop; + // components [1] initted to null + name = null; + next = null; + + + // Rewrite 'one-or-more' "+" arity to a SEQUENCE. + // Basically (a)+ --> ((a),(a)*). + // - this + // - CHOICE + // * LOOP (clone, back to the CHOICE) + // * or, whatever's next + // Size cost: N --> 2N + 1 + } else if (c == '+') { + ChildrenRecognizer loop = deepClone (); + ChildrenRecognizer choice; + + choice = new ChildrenRecognizer (consumer, type); + loop.patchNext (choice, null); + loop.flags |= F_LOOPNEXT; + choice.flags = F_LOOPHEAD; + + choice.components = new Recognizer [2]; + choice.components [0] = loop; + // choice.components [1] initted to null + // choice.name, choice.next initted to null + + patchNext (choice, null); + } + } + } + + return nextPos; + } + + // VC: Element Valid (second clause) + boolean acceptCharacters () + { return false; } + + // VC: Element Valid (second clause) + Recognizer acceptElement (String type) + throws SAXException + { + // NAME/SEQUENCE + if (name != null) { + if (name.equals (type)) + return next; + return null; + } + + // CHOICE ... optionally reporting nondeterminism we + // run across. we won't check out every transition + // for nondeterminism; only the ones we follow. + Recognizer retval = null; + + for (int i = 0; i < components.length; i++) { + Recognizer temp = components [i].acceptElement (type); + + if (temp == null) + continue; + else if (!warnNonDeterministic) + return temp; + else if (retval == null) + retval = temp; + else if (retval != temp) + consumer.error ("Content model " + this.type.model + + " is non-deterministic for " + type); + } + return retval; + } + + // VC: Element Valid (second clause) + boolean completed () + throws SAXException + { + // expecting a specific element + if (name != null) + return false; + + // choice, some sequences + for (int i = 0; i < components.length; i++) { + if (components [i].completed ()) + return true; + } + + return false; + } + +/** / + // FOR DEBUGGING ... flattens the graph for printing. + + public String toString () + { + StringBuffer buf = new StringBuffer (); + + // only one set of loop labels can be generated + // at a time... + synchronized (ANY) { + nodeCount = 0; + + toString (buf, new Hashtable ()); + return buf.toString (); + } + } + + private void toString (StringBuffer buf, Hashtable table) + { + // When we visit a node, label and count it. + // Nodes are never visited/counted more than once. + // For small models labels waste space, but if arity + // mappings were used the savings are substantial. + // (Plus, the output can be more readily understood.) + String temp = (String) table.get (this); + + if (temp != null) { + buf.append ('{'); + buf.append (temp); + buf.append ('}'); + return; + } else { + StringBuffer scratch = new StringBuffer (15); + + if ((flags & F_LOOPHEAD) != 0) + scratch.append ("loop"); + else + scratch.append ("node"); + scratch.append ('-'); + scratch.append (++nodeCount); + temp = scratch.toString (); + + table.put (this, temp); + buf.append ('['); + buf.append (temp); + buf.append (']'); + buf.append (':'); + } + + // NAME/SEQUENCE + if (name != null) { + // n.b. some output encodings turn some name chars into '?' + // e.g. with Japanese names and ASCII output + buf.append (name); + if (components != null) // bug! + buf.append ('$'); + if (next == null) + buf.append (",*"); + else if (next instanceof EmptyRecognizer) // patch-to-next + buf.append (",{}"); + else if (next instanceof ChildrenRecognizer) { + buf.append (','); + ((ChildrenRecognizer)next).toString (buf, table); + } else // bug! + buf.append (",+"); + return; + } + + // CHOICE + buf.append ("<"); + for (int i = 0; i < components.length; i++) { + if (i != 0) + buf.append ("|"); + if (components [i] instanceof EmptyRecognizer) { + buf.append ("{}"); + } else if (components [i] == null) { // patch-to-next + buf.append ('*'); + } else { + ChildrenRecognizer r; + + r = (ChildrenRecognizer) components [i]; + r.toString (buf, table); + } + } + buf.append (">"); + } +/**/ + } +} diff --git a/libjava/classpath/gnu/xml/pipeline/WellFormednessFilter.java b/libjava/classpath/gnu/xml/pipeline/WellFormednessFilter.java new file mode 100644 index 000000000..7a3db6593 --- /dev/null +++ b/libjava/classpath/gnu/xml/pipeline/WellFormednessFilter.java @@ -0,0 +1,363 @@ +/* WellFormednessFilter.java -- + Copyright (C) 1999,2000,2001 Free Software Foundation, Inc. + +This file is part of GNU Classpath. + +GNU Classpath is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2, or (at your option) +any later version. + +GNU Classpath is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GNU Classpath; see the file COPYING. If not, write to the +Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA +02110-1301 USA. + +Linking this library statically or dynamically with other modules is +making a combined work based on this library. Thus, the terms and +conditions of the GNU General Public License cover the whole +combination. + +As a special exception, the copyright holders of this library give you +permission to link this library with independent modules to produce an +executable, regardless of the license terms of these independent +modules, and to copy and distribute the resulting executable under +terms of your choice, provided that you also meet, for each linked +independent module, the terms and conditions of the license of that +module. An independent module is a module which is not derived from +or based on this library. If you modify this library, you may extend +this exception to your version of the library, but you are not +obligated to do so. If you do not wish to do so, delete this +exception statement from your version. */ + +package gnu.xml.pipeline; + +import java.util.EmptyStackException; +import java.util.Stack; + +import org.xml.sax.Attributes; +import org.xml.sax.ErrorHandler; +import org.xml.sax.Locator; +import org.xml.sax.SAXException; +import org.xml.sax.SAXParseException; + +/** + * This filter reports fatal exceptions in the case of event streams that + * are not well formed. The rules currently tested include: <ul> + * + * <li>setDocumentLocator ... may be called only before startDocument + * + * <li>startDocument/endDocument ... must be paired, and all other + * calls (except setDocumentLocator) must be nested within these. + * + * <li>startElement/endElement ... must be correctly paired, and + * may never appear within CDATA sections. + * + * <li>comment ... can't contain "--" + * + * <li>character data ... can't contain "]]>" + * + * <li>whitespace ... can't contain CR + * + * <li>whitespace and character data must be within an element + * + * <li>processing instruction ... can't contain "?>" or CR + * + * <li>startCDATA/endCDATA ... must be correctly paired. + * + * </ul> + * + * <p> Other checks for event stream correctness may be provided in + * the future. For example, insisting that + * entity boundaries nest correctly, + * namespace scopes nest correctly, + * namespace values never contain relative URIs, + * attributes don't have "<" characters; + * and more. + * + * @author David Brownell + */ +public final class WellFormednessFilter extends EventFilter +{ + private boolean startedDoc; + private Stack elementStack = new Stack (); + private boolean startedCDATA; + private String dtdState = "before"; + + + /** + * Swallows all events after performing well formedness checks. + */ + // constructor used by PipelineFactory + public WellFormednessFilter () + { this (null); } + + + /** + * Passes events through to the specified consumer, after first + * processing them. + */ + // constructor used by PipelineFactory + public WellFormednessFilter (EventConsumer consumer) + { + super (consumer); + + setContentHandler (this); + setDTDHandler (this); + + try { + setProperty (LEXICAL_HANDLER, this); + } catch (SAXException e) { /* can't happen */ } + } + + /** + * Resets state as if any preceding event stream was well formed. + * Particularly useful if it ended through some sort of error, + * and the endDocument call wasn't made. + */ + public void reset () + { + startedDoc = false; + startedCDATA = false; + elementStack.removeAllElements (); + } + + + private SAXParseException getException (String message) + { + SAXParseException e; + Locator locator = getDocumentLocator (); + + if (locator == null) + return new SAXParseException (message, null, null, -1, -1); + else + return new SAXParseException (message, locator); + } + + private void fatalError (String message) + throws SAXException + { + SAXParseException e = getException (message); + ErrorHandler handler = getErrorHandler (); + + if (handler != null) + handler.fatalError (e); + throw e; + } + + /** + * Throws an exception when called after startDocument. + * + * @param locator the locator, to be used in error reporting or relative + * URI resolution. + * + * @exception IllegalStateException when called after the document + * has already been started + */ + public void setDocumentLocator (Locator locator) + { + if (startedDoc) + throw new IllegalStateException ( + "setDocumentLocator called after startDocument"); + super.setDocumentLocator (locator); + } + + public void startDocument () throws SAXException + { + if (startedDoc) + fatalError ("startDocument called more than once"); + startedDoc = true; + startedCDATA = false; + elementStack.removeAllElements (); + super.startDocument (); + } + + public void startElement ( + String uri, String localName, + String qName, Attributes atts + ) throws SAXException + { + if (!startedDoc) + fatalError ("callback outside of document?"); + if ("inside".equals (dtdState)) + fatalError ("element inside DTD?"); + else + dtdState = "after"; + if (startedCDATA) + fatalError ("element inside CDATA section"); + if (qName == null || "".equals (qName)) + fatalError ("startElement name missing"); + elementStack.push (qName); + super.startElement (uri, localName, qName, atts); + } + + public void endElement (String uri, String localName, String qName) + throws SAXException + { + if (!startedDoc) + fatalError ("callback outside of document?"); + if (startedCDATA) + fatalError ("element inside CDATA section"); + if (qName == null || "".equals (qName)) + fatalError ("endElement name missing"); + + try { + String top = (String) elementStack.pop (); + + if (!qName.equals (top)) + fatalError ("<" + top + " ...>...</" + qName + ">"); + // XXX could record/test namespace info + } catch (EmptyStackException e) { + fatalError ("endElement without startElement: </" + qName + ">"); + } + super.endElement (uri, localName, qName); + } + + public void endDocument () throws SAXException + { + if (!startedDoc) + fatalError ("callback outside of document?"); + dtdState = "before"; + startedDoc = false; + super.endDocument (); + } + + + public void startDTD (String root, String publicId, String systemId) + throws SAXException + { + if (!startedDoc) + fatalError ("callback outside of document?"); + if ("before" != dtdState) + fatalError ("two DTDs?"); + if (!elementStack.empty ()) + fatalError ("DTD must precede root element"); + dtdState = "inside"; + super.startDTD (root, publicId, systemId); + } + + public void notationDecl (String name, String publicId, String systemId) + throws SAXException + { +// FIXME: not all parsers will report startDTD() ... +// we'd rather insist we're "inside". + if ("after" == dtdState) + fatalError ("not inside DTD"); + super.notationDecl (name, publicId, systemId); + } + + public void unparsedEntityDecl (String name, + String publicId, String systemId, String notationName) + throws SAXException + { +// FIXME: not all parsers will report startDTD() ... +// we'd rather insist we're "inside". + if ("after" == dtdState) + fatalError ("not inside DTD"); + super.unparsedEntityDecl (name, publicId, systemId, notationName); + } + + // FIXME: add the four DeclHandler calls too + + public void endDTD () + throws SAXException + { + if (!startedDoc) + fatalError ("callback outside of document?"); + if ("inside" != dtdState) + fatalError ("DTD ends without start?"); + dtdState = "after"; + super.endDTD (); + } + + public void characters (char ch [], int start, int length) + throws SAXException + { + int here = start, end = start + length; + if (elementStack.empty ()) + fatalError ("characters must be in an element"); + while (here < end) { + if (ch [here++] != ']') + continue; + if (here == end) // potential problem ... + continue; + if (ch [here++] != ']') + continue; + if (here == end) // potential problem ... + continue; + if (ch [here++] == '>') + fatalError ("character data can't contain \"]]>\""); + } + super.characters (ch, start, length); + } + + public void ignorableWhitespace (char ch [], int start, int length) + throws SAXException + { + int here = start, end = start + length; + if (elementStack.empty ()) + fatalError ("characters must be in an element"); + while (here < end) { + if (ch [here++] == '\r') + fatalError ("whitespace can't contain CR"); + } + super.ignorableWhitespace (ch, start, length); + } + + public void processingInstruction (String target, String data) + throws SAXException + { + if (data.indexOf ('\r') > 0) + fatalError ("PIs can't contain CR"); + if (data.indexOf ("?>") > 0) + fatalError ("PIs can't contain \"?>\""); + } + + public void comment (char ch [], int start, int length) + throws SAXException + { + if (!startedDoc) + fatalError ("callback outside of document?"); + if (startedCDATA) + fatalError ("comments can't nest in CDATA"); + int here = start, end = start + length; + while (here < end) { + if (ch [here] == '\r') + fatalError ("comments can't contain CR"); + if (ch [here++] != '-') + continue; + if (here == end) + fatalError ("comments can't end with \"--->\""); + if (ch [here++] == '-') + fatalError ("comments can't contain \"--\""); + } + super.comment (ch, start, length); + } + + public void startCDATA () + throws SAXException + { + if (!startedDoc) + fatalError ("callback outside of document?"); + if (startedCDATA) + fatalError ("CDATA starts can't nest"); + startedCDATA = true; + super.startCDATA (); + } + + public void endCDATA () + throws SAXException + { + if (!startedDoc) + fatalError ("callback outside of document?"); + if (!startedCDATA) + fatalError ("CDATA end without start?"); + startedCDATA = false; + super.endCDATA (); + } +} diff --git a/libjava/classpath/gnu/xml/pipeline/XIncludeFilter.java b/libjava/classpath/gnu/xml/pipeline/XIncludeFilter.java new file mode 100644 index 000000000..a1445fa0c --- /dev/null +++ b/libjava/classpath/gnu/xml/pipeline/XIncludeFilter.java @@ -0,0 +1,579 @@ +/* XIncludeFilter.java -- + Copyright (C) 2001,2002 Free Software Foundation, Inc. + +This file is part of GNU Classpath. + +GNU Classpath is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2, or (at your option) +any later version. + +GNU Classpath is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GNU Classpath; see the file COPYING. If not, write to the +Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA +02110-1301 USA. + +Linking this library statically or dynamically with other modules is +making a combined work based on this library. Thus, the terms and +conditions of the GNU General Public License cover the whole +combination. + +As a special exception, the copyright holders of this library give you +permission to link this library with independent modules to produce an +executable, regardless of the license terms of these independent +modules, and to copy and distribute the resulting executable under +terms of your choice, provided that you also meet, for each linked +independent module, the terms and conditions of the license of that +module. An independent module is a module which is not derived from +or based on this library. If you modify this library, you may extend +this exception to your version of the library, but you are not +obligated to do so. If you do not wish to do so, delete this +exception statement from your version. */ + +package gnu.xml.pipeline; + +import java.io.IOException; +import java.io.InputStream; +import java.io.InputStreamReader; +import java.net.URL; +import java.net.URLConnection; +import java.util.Hashtable; +import java.util.Stack; +import java.util.Vector; + +import org.xml.sax.Attributes; +import org.xml.sax.ErrorHandler; +import org.xml.sax.InputSource; +import org.xml.sax.Locator; +import org.xml.sax.SAXException; +import org.xml.sax.SAXParseException; +import org.xml.sax.XMLReader; +import org.xml.sax.helpers.XMLReaderFactory; + +import gnu.xml.util.Resolver; + + + +/** + * Filter to process an XPointer-free subset of + * <a href="http://www.w3.org/TR/xinclude">XInclude</a>, supporting its + * use as a kind of replacement for parsed general entities. + * XInclude works much like the <code>#include</code> of C/C++ but + * works for XML documents as well as unparsed text files. + * Restrictions from the 17-Sept-2002 CR draft of XInclude are as follows: + * + * <ul> + * + * <li> URIs must not include fragment identifiers. + * The CR specifies support for XPointer <em>element()</em> fragment IDs, + * which is not currently implemented here. + * + * <li> <em>xi:fallback</em> handling of resource errors is not + * currently supported. + * + * <li> DTDs are not supported in included files, since the SAX DTD events + * must have completely preceded any included file. + * The CR explicitly allows the DTD related portions of the infoset to + * grow as an effect of including XML documents. + * + * <li> <em>xml:base</em> fixup isn't done. + * + * </ul> + * + * <p> XML documents that are included will normally be processed using + * the default SAX namespace rules, meaning that prefix information may + * be discarded. This may be changed with {@link #setSavingPrefixes + * setSavingPrefixes()}. <em>You are strongly advised to do this.</em> + * + * <p> Note that XInclude allows highly incompatible implementations, which + * are specialized to handle application-specific infoset extensions. Some + * such implementations can be implemented by subclassing this one, but + * they may only be substituted in applications at "user option". + * + * <p>TBD: "IURI" handling. + * + * @author David Brownell + */ +public class XIncludeFilter extends EventFilter implements Locator +{ + private Hashtable extEntities = new Hashtable (5, 5); + private int ignoreCount; + private Stack uris = new Stack (); + private Locator locator; + private Vector inclusions = new Vector (5, 5); + private boolean savingPrefixes; + + /** + */ + public XIncludeFilter (EventConsumer next) + throws SAXException + { + super (next); + setContentHandler (this); + // DTDHandler callbacks pass straight through + setProperty (DECL_HANDLER, this); + setProperty (LEXICAL_HANDLER, this); + } + + private void fatal (SAXParseException e) throws SAXException + { + ErrorHandler eh; + + eh = getErrorHandler (); + if (eh != null) + eh.fatalError (e); + throw e; + } + + /** + * Passes "this" down the filter chain as a proxy locator. + */ + public void setDocumentLocator (Locator locator) + { + this.locator = locator; + super.setDocumentLocator (this); + } + + /** Used for proxy locator; do not call directly. */ + public String getSystemId () + { return (locator == null) ? null : locator.getSystemId (); } + /** Used for proxy locator; do not call directly. */ + public String getPublicId () + { return (locator == null) ? null : locator.getPublicId (); } + /** Used for proxy locator; do not call directly. */ + public int getLineNumber () + { return (locator == null) ? -1 : locator.getLineNumber (); } + /** Used for proxy locator; do not call directly. */ + public int getColumnNumber () + { return (locator == null) ? -1 : locator.getColumnNumber (); } + + /** + * Assigns the flag controlling the setting of the SAX2 + * <em>namespace-prefixes</em> flag. + */ + public void setSavingPrefixes (boolean flag) + { savingPrefixes = flag; } + + /** + * Returns the flag controlling the setting of the SAX2 + * <em>namespace-prefixes</em> flag when parsing included documents. + * The default value is the SAX2 default (false), which discards + * information that can be useful. + */ + public boolean isSavingPrefixes () + { return savingPrefixes; } + + // + // Two mechanisms are interacting here. + // + // - XML Base implies a stack of base URIs, updated both by + // "real entity" boundaries and element boundaries. + // + // - Active "Real Entities" (for document and general entities, + // and by xincluded files) are tracked to prevent circular + // inclusions. + // + private String addMarker (String uri) + throws SAXException + { + if (locator != null && locator.getSystemId () != null) + uri = locator.getSystemId (); + + // guard against InputSource objects without system IDs + if (uri == null) + fatal (new SAXParseException ("Entity URI is unknown", locator)); + + try { + URL url = new URL (uri); + + uri = url.toString (); + if (inclusions.contains (uri)) + fatal (new SAXParseException ( + "XInclude, circular inclusion", locator)); + inclusions.addElement (uri); + uris.push (url); + } catch (IOException e) { + // guard against illegal relative URIs (Xerces) + fatal (new SAXParseException ("parser bug: relative URI", + locator, e)); + } + return uri; + } + + private void pop (String uri) + { + inclusions.removeElement (uri); + uris.pop (); + } + + // + // Document entity boundaries get both treatments. + // + public void startDocument () throws SAXException + { + ignoreCount = 0; + addMarker (null); + super.startDocument (); + } + + public void endDocument () throws SAXException + { + inclusions.setSize (0); + extEntities.clear (); + uris.setSize (0); + super.endDocument (); + } + + // + // External general entity boundaries get both treatments. + // + public void externalEntityDecl (String name, + String publicId, String systemId) + throws SAXException + { + if (name.charAt (0) == '%') + return; + try { + URL url = new URL (locator.getSystemId ()); + systemId = new URL (url, systemId).toString (); + } catch (IOException e) { + // what could we do? + } + extEntities.put (name, systemId); + } + + public void startEntity (String name) + throws SAXException + { + if (ignoreCount != 0) { + ignoreCount++; + return; + } + + String uri = (String) extEntities.get (name); + if (uri != null) + addMarker (uri); + super.startEntity (name); + } + + public void endEntity (String name) + throws SAXException + { + if (ignoreCount != 0) { + if (--ignoreCount != 0) + return; + } + + String uri = (String) extEntities.get (name); + + if (uri != null) + pop (uri); + super.endEntity (name); + } + + // + // element boundaries only affect the base URI stack, + // unless they're XInclude elements. + // + public void + startElement (String uri, String localName, String qName, Attributes atts) + throws SAXException + { + if (ignoreCount != 0) { + ignoreCount++; + return; + } + + URL baseURI = (URL) uris.peek (); + String base; + + base = atts.getValue ("http://www.w3.org/XML/1998/namespace", "base"); + if (base == null) + uris.push (baseURI); + else { + URL url; + + if (base.indexOf ('#') != -1) + fatal (new SAXParseException ( + "xml:base with fragment: " + base, + locator)); + + try { + baseURI = new URL (baseURI, base); + uris.push (baseURI); + } catch (Exception e) { + fatal (new SAXParseException ( + "xml:base with illegal uri: " + base, + locator, e)); + } + } + + if (!"http://www.w3.org/2001/XInclude".equals (uri)) { + super.startElement (uri, localName, qName, atts); + return; + } + + if ("include".equals (localName)) { + String href = atts.getValue ("href"); + String parse = atts.getValue ("parse"); + String encoding = atts.getValue ("encoding"); + URL url = (URL) uris.peek (); + SAXParseException x = null; + + if (href == null) + fatal (new SAXParseException ( + "XInclude missing href", + locator)); + if (href.indexOf ('#') != -1) + fatal (new SAXParseException ( + "XInclude with fragment: " + href, + locator)); + + if (parse == null || "xml".equals (parse)) + x = xinclude (url, href); + else if ("text".equals (parse)) + x = readText (url, href, encoding); + else + fatal (new SAXParseException ( + "unknown XInclude parsing mode: " + parse, + locator)); + if (x == null) { + // strip out all child content + ignoreCount++; + return; + } + + // FIXME the 17-Sept-2002 CR of XInclude says we "must" + // use xi:fallback elements to handle resource errors, + // if they exist. + fatal (x); + + } else if ("fallback".equals (localName)) { + fatal (new SAXParseException ( + "illegal top level XInclude 'fallback' element", + locator)); + } else { + ErrorHandler eh = getErrorHandler (); + + // CR doesn't say this is an error + if (eh != null) + eh.warning (new SAXParseException ( + "unrecognized toplevel XInclude element: " + localName, + locator)); + super.startElement (uri, localName, qName, atts); + } + } + + public void endElement (String uri, String localName, String qName) + throws SAXException + { + if (ignoreCount != 0) { + if (--ignoreCount != 0) + return; + } + + uris.pop (); + if (!("http://www.w3.org/2001/XInclude".equals (uri) + && "include".equals (localName))) + super.endElement (uri, localName, qName); + } + + // + // ignore all content within non-empty xi:include elements + // + public void characters (char ch [], int start, int length) + throws SAXException + { + if (ignoreCount == 0) + super.characters (ch, start, length); + } + + public void processingInstruction (String target, String value) + throws SAXException + { + if (ignoreCount == 0) + super.processingInstruction (target, value); + } + + public void ignorableWhitespace (char ch [], int start, int length) + throws SAXException + { + if (ignoreCount == 0) + super.ignorableWhitespace (ch, start, length); + } + + public void comment (char ch [], int start, int length) + throws SAXException + { + if (ignoreCount == 0) + super.comment (ch, start, length); + } + + public void startCDATA () throws SAXException + { + if (ignoreCount == 0) + super.startCDATA (); + } + + public void endCDATA () throws SAXException + { + if (ignoreCount == 0) + super.endCDATA (); + } + + public void startPrefixMapping (String prefix, String uri) + throws SAXException + { + if (ignoreCount == 0) + super.startPrefixMapping (prefix, uri); + } + + public void endPrefixMapping (String prefix) throws SAXException + { + if (ignoreCount == 0) + super.endPrefixMapping (prefix); + } + + public void skippedEntity (String name) throws SAXException + { + if (ignoreCount == 0) + super.skippedEntity (name); + } + + // JDK 1.1 seems to need it to be done this way, sigh + void setLocator (Locator l) { locator = l; } + Locator getLocator () { return locator; } + + + // + // for XIncluded entities, manage the current locator and + // filter out events that would be incorrect to report + // + private class Scrubber extends EventFilter + { + Scrubber (EventFilter f) + throws SAXException + { + // delegation passes to next in chain + super (f); + + // process all content events + super.setContentHandler (this); + super.setProperty (LEXICAL_HANDLER, this); + + // drop all DTD events + super.setDTDHandler (null); + super.setProperty (DECL_HANDLER, null); + } + + // maintain proxy locator + // only one startDocument()/endDocument() pair per event stream + public void setDocumentLocator (Locator l) + { setLocator (l); } + public void startDocument () + { } + public void endDocument () + { } + + private void reject (String message) throws SAXException + { fatal (new SAXParseException (message, getLocator ())); } + + // only the DTD from the "base document" gets reported + public void startDTD (String root, String publicId, String systemId) + throws SAXException + { reject ("XIncluded DTD: " + systemId); } + public void endDTD () + throws SAXException + { reject ("XIncluded DTD"); } + // ... so this should never happen + public void skippedEntity (String name) throws SAXException + { reject ("XInclude skipped entity: " + name); } + + // since we rejected DTDs, only builtin entities can be reported + } + + // <xi:include parse='xml' ...> + // relative to the base URI passed + private SAXParseException xinclude (URL url, String href) + throws SAXException + { + XMLReader helper; + Scrubber scrubber; + Locator savedLocator = locator; + + // start with a parser acting just like our input + // modulo DTD-ish stuff (validation flag, entity resolver) + helper = XMLReaderFactory.createXMLReader (); + helper.setErrorHandler (getErrorHandler ()); + helper.setFeature (FEATURE_URI + "namespace-prefixes", true); + + // Set up the proxy locator and event filter. + scrubber = new Scrubber (this); + locator = null; + bind (helper, scrubber); + + // Merge the included document, except its DTD + try { + url = new URL (url, href); + href = url.toString (); + + if (inclusions.contains (href)) + fatal (new SAXParseException ( + "XInclude, circular inclusion", locator)); + + inclusions.addElement (href); + uris.push (url); + helper.parse (new InputSource (href)); + return null; + } catch (java.io.IOException e) { + return new SAXParseException (href, locator, e); + } finally { + pop (href); + locator = savedLocator; + } + } + + // <xi:include parse='text' ...> + // relative to the base URI passed + private SAXParseException readText (URL url, String href, String encoding) + throws SAXException + { + InputStream in = null; + + try { + URLConnection conn; + InputStreamReader reader; + char buf [] = new char [4096]; + int count; + + url = new URL (url, href); + conn = url.openConnection (); + in = conn.getInputStream (); + if (encoding == null) + encoding = Resolver.getEncoding (conn.getContentType ()); + if (encoding == null) { + ErrorHandler eh = getErrorHandler (); + if (eh != null) + eh.warning (new SAXParseException ( + "guessing text encoding for URL: " + url, + locator)); + reader = new InputStreamReader (in); + } else + reader = new InputStreamReader (in, encoding); + + while ((count = reader.read (buf, 0, buf.length)) != -1) + super.characters (buf, 0, count); + in.close (); + return null; + } catch (IOException e) { + return new SAXParseException ( + "can't XInclude text", + locator, e); + } + } +} diff --git a/libjava/classpath/gnu/xml/pipeline/XsltFilter.java b/libjava/classpath/gnu/xml/pipeline/XsltFilter.java new file mode 100644 index 000000000..86b6190c5 --- /dev/null +++ b/libjava/classpath/gnu/xml/pipeline/XsltFilter.java @@ -0,0 +1,130 @@ +/* XsltFilter.java -- + Copyright (C) 2001 Free Software Foundation, Inc. + +This file is part of GNU Classpath. + +GNU Classpath is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2, or (at your option) +any later version. + +GNU Classpath is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GNU Classpath; see the file COPYING. If not, write to the +Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA +02110-1301 USA. + +Linking this library statically or dynamically with other modules is +making a combined work based on this library. Thus, the terms and +conditions of the GNU General Public License cover the whole +combination. + +As a special exception, the copyright holders of this library give you +permission to link this library with independent modules to produce an +executable, regardless of the license terms of these independent +modules, and to copy and distribute the resulting executable under +terms of your choice, provided that you also meet, for each linked +independent module, the terms and conditions of the license of that +module. An independent module is a module which is not derived from +or based on this library. If you modify this library, you may extend +this exception to your version of the library, but you are not +obligated to do so. If you do not wish to do so, delete this +exception statement from your version. */ + +package gnu.xml.pipeline; + +import java.io.IOException; + +import javax.xml.transform.TransformerFactory; +import javax.xml.transform.TransformerConfigurationException; +import javax.xml.transform.sax.*; +import javax.xml.transform.stream.StreamSource; + +import org.xml.sax.SAXException; +import org.xml.sax.ext.LexicalHandler; + + +/** + * Packages an XSLT transform as a pipeline component. + * Note that all DTD events (callbacks to DeclHandler and DTDHandler + * interfaces) are discarded, although XSLT transforms may be set up to + * use the LexicalHandler to write DTDs with only an external subset. + * Not every XSLT engine will necessarily be usable with this filter, + * but current versions of + * <a href="http://saxon.sourceforge.net">SAXON</a> and + * <a href="http://xml.apache.org/xalan-j">Xalan</a> should work well. + * + * @see TransformerFactory + * + * @author David Brownell + */ +final public class XsltFilter extends EventFilter +{ + /** + * Creates a filter that performs the specified transform. + * Uses the JAXP 1.1 interfaces to access the default XSLT + * engine configured for in the current execution context, + * and parses the stylesheet without custom EntityResolver + * or ErrorHandler support. + * + * @param stylesheet URI for the stylesheet specifying the + * XSLT transform + * @param next provides the ContentHandler and LexicalHandler + * to receive XSLT output. + * @exception SAXException if the stylesheet can't be parsed + * @exception IOException if there are difficulties + * bootstrapping the XSLT engine, such as it not supporting + * SAX well enough to use this way. + */ + public XsltFilter (String stylesheet, EventConsumer next) + throws SAXException, IOException + { + // First, get a transformer with the stylesheet preloaded + TransformerFactory tf = null; + TransformerHandler th; + + try { + SAXTransformerFactory stf; + + tf = TransformerFactory.newInstance (); + if (!tf.getFeature (SAXTransformerFactory.FEATURE) // sax inputs + || !tf.getFeature (SAXResult.FEATURE) // sax outputs + || !tf.getFeature (StreamSource.FEATURE) // stylesheet + ) + throw new IOException ("XSLT factory (" + + tf.getClass ().getName () + + ") does not support SAX"); + stf = (SAXTransformerFactory) tf; + th = stf.newTransformerHandler (new StreamSource (stylesheet)); + } catch (TransformerConfigurationException e) { + throw new IOException ("XSLT factory (" + + (tf == null + ? "none available" + : tf.getClass ().getName ()) + + ") configuration error, " + + e.getMessage () + ); + } + + // Hook its outputs up to the pipeline ... + SAXResult out = new SAXResult (); + + out.setHandler (next.getContentHandler ()); + try { + LexicalHandler lh; + lh = (LexicalHandler) next.getProperty (LEXICAL_HANDLER); + out.setLexicalHandler (lh); + } catch (Exception e) { + // ignore + } + th.setResult (out); + + // ... and make sure its inputs look like ours. + setContentHandler (th); + setProperty (LEXICAL_HANDLER, th); + } +} diff --git a/libjava/classpath/gnu/xml/pipeline/package.html b/libjava/classpath/gnu/xml/pipeline/package.html new file mode 100644 index 000000000..352f4c87c --- /dev/null +++ b/libjava/classpath/gnu/xml/pipeline/package.html @@ -0,0 +1,255 @@ +<html><head><title> +blah +<!-- +/* + * Copyright (C) 1999-2001 The Free Software Foundation, Inc. + */ +--> +</title></head><body> + +<p>This package exposes a kind of XML processing pipeline, based on sending +SAX events, which can be used as components of application architectures. +Pipelines are used to convey streams of processing events from a producer +to one or more consumers, and to let each consumer control the data seen by +later consumers. + +<p> There is a <a href="PipelineFactory.html">PipelineFactory</a> class which +accepts a syntax describing how to construct some simple pipelines. Strings +describing such pipelines can be used in command line tools (see the +<a href="../util/DoParse.html">DoParse</a> class) +and in other places that it is +useful to let processing be easily reconfigured. Pipelines can of course +be constructed programmatically, providing access to options that the +factory won't. + +<p> Web applications are supported by making it easy for servlets (or +non-Java web application components) to be part of a pipeline. They can +originate XML (or XHTML) data through an <em>InputSource</em> or in +response to XML messages sent from clients using <em>CallFilter</em> +pipeline stages. Such facilities are available using the simple syntax +for pipeline construction. + + +<h2> Programming Models </h2> + +<p> Pipelines should be simple to understand. + +<ul> + <li> XML content, typically entire documents, + is pushed through consumers by producers. + + <li> Pipelines are basically about consuming SAX2 callback events, + where the events encapsulate XML infoset-level data.<ul> + + <li> Pipelines are constructed by taking one or more consumer + stages and combining them to produce a composite consumer. + + <li> A pipeline is presumed to have pending tasks and state from + the beginning of its ContentHandler.startDocument() callback until + it's returned from its ContentHandler.doneDocument() callback. + + <li> Pipelines may have multiple output stages ("fan-out") + or multiple input stages ("fan-in") when appropriate. + + <li> Pipelines may be long-lived, but need not be. + + </ul> + + <li> There is flexibility about event production. <ul> + + <li> SAX2 XMLReader objects are producers, which + provide a high level "pull" model: documents (text or DOM) are parsed, + and the parser pushes individual events through the pipeline. + + <li> Events can be pushed directly to event consumer components + by application modules, if they invoke SAX2 callbacks directly. + That is, application modules use the XML Infoset as exposed + through SAX2 event callbacks. + + </ul> + + <li> Multiple producer threads may concurrently access a pipeline, + if they coordinate appropriately. + + <li> Pipeline processing is not the only framework applications + will use. + + </ul> + + +<h3> Producers: XMLReader or Custom </h3> + +<p> Many producers will be SAX2 XMLReader objects, and +will read (pull) data which is then written (pushed) as events. +Typically these will parse XML text (acquired from +<code>org.xml.sax.helpers.XMLReaderFactory</code>) or a DOM tree +(using a <code><a href="../util/DomParser.html">DomParser</a></code>) +These may be bound to event consumer using a convenience routine, +<em><a href="EventFilter.html">EventFilter</a>.bind()</em>. +Once bound, these producers may be given additional documents to +sent through its pipeline. + +<p> In other cases, you will write producers yourself. For example, some +data structures might know how to write themselves out using one or +more XML models, expressed as sequences of SAX2 event callbacks. +An application module might +itself be a producer, issuing startDocument and endDocument events +and then asking those data structures to write themselves out to a +given EventConsumer, or walking data structures (such as JDBC query +results) and applying its own conversion rules. WAP format XML +(WBMXL) can be directly converted to producer output. + +<p> SAX2 introduced an "XMLFilter" interface, which is a kind of XMLReader. +It is most useful in conjunction with its XMLFilterImpl helper class; +see the <em><a href="EventFilter.html">EventFilter</a></em> javadoc +for information contrasting that XMLFilterImpl approach with the +relevant parts of this pipeline framework. Briefly, such XMLFilterImpl +children can be either producers or consumers, and are more limited in +configuration flexibility. In this framework, the focus of filters is +on the EventConsumer side; see the section on +<a href="#fitting">pipe fitting</a> below. + + +<h3> Consume to Standard or Custom Data Representations </h3> + +<p> Many consumers will be used to create standard representations of XML +data. The <a href="TextConsumer.html">TextConsumer</a> takes its events +and writes them as text for a single XML document, +using an internal <a href="../util/XMLWriter.html">XMLWriter</a>. +The <a href="DomConsumer.html">DomConsumer</a> takes its events and uses +them to create and populate a DOM Document. + +<p> In other cases, you will write consumers yourself. For example, +you might use a particular unmarshaling filter to produce objects +that fit your application's requirements, instead of using DOM. +Such consumers work at the level of XML data models, rather than with +specific representations such as XML text or a DOM tree. You could +convert your output directly to WAP format data (WBXML). + + +<h3><a name="fitting">Pipe Fitting</a></h3> + +<p> Pipelines are composite event consumers, with each stage having +the opportunity to transform the data before delivering it to any +subsequent stages. + +<p> The <a href="PipelineFactory.html">PipelineFactory</a> class +provides access to much of this functionality through a simple syntax. +See the table in that class's javadoc describing a number of standard +components. Direct API calls are still needed for many of the most +interesting pipeline configurations, including ones leveraging actual +or logical concurrency. + +<p> Four basic types of pipe fitting are directly supported. These may +be used to construct complex pipeline networks. <ul> + + <li> <a href="TeeConsumer.html">TeeConsumer</a> objects split event + flow so it goes to two two different consumers, one before the other. + This is a basic form of event fan-out; you can use this class to + copy events to any number of output pipelines. + + <li> Clients can call remote components through HTTP or HTTPS using + the <a href="CallFilter.html">CallFilter</a> component, and Servlets + can implement such components by extending the + <a href="XmlServlet.html">XmlServlet</a> component. Java is not + required on either end, and transport protocols other than HTTP may + also be used. + + <li> <a href="EventFilter.html">EventFilter</a> objects selectively + provide handling for callbacks, and can pass unhandled ones to a + subsequent stage. They are often subclassed, since much of the + basic filtering machinery is already in place in the base class. + + <li> Applications can merge two event flows by just using the same + consumer in each one. If multiple threads are in use, synchronization + needs to be addressed by the appropriate application level policy. + + </ul> + +<p> Note that filters can be as complex as +<a href="XsltFilter.html">XSLT transforms</a> +available) on input data, or as simple as removing simple syntax data +such as ignorable whitespace, comments, and CDATA delimiters. +Some simple "built-in" filters are part of this package. + + +<h3> Coding Conventions: Filter and Terminus Stages</h3> + +<p> If you follow these coding conventions, your classes may be used +directly (give the full class name) in pipeline descriptions as understood +by the PipelineFactory. There are four constructors the factory may +try to use; in order of decreasing numbers of parameters, these are: <ul> + + <li> Filters that need a single String setup parameter should have + a public constructor with two parameters: that string, then the + EventConsumer holding the "next" consumer to get events. + + <li> Filters that don't need setup parameters should have a public + constructor that accepts a single EventConsumer holding the "next" + consumer to get events when they are done. + + <li> Terminus stages may have a public constructor taking a single + paramter: the string value of that parameter. + + <li> Terminus stages may have a public no-parameters constructor. + + </ul> + +<p> Of course, classes may support more than one such usage convention; +if they do, they can automatically be used in multiple modes. If you +try to use a terminus class as a filter, and that terminus has a constructor +with the appropriate number of arguments, it is automatically wrapped in +a "tee" filter. + + +<h2> Debugging Tip: "Tee" Joints can Snapshot Data</h2> + +<p> It can sometimes be hard to see what's happening, when something +goes wrong. Easily fixed: just snapshot the data. Then you can find +out where things start to go wrong. + +<p> If you're using pipeline descriptors so that they're easily +administered, just stick a <em>write ( filename )</em> +filter into the pipeline at an appropriate point. + +<p> Inside your programs, you can do the same thing directly: perhaps +by saving a Writer (perhaps a StringWriter) in a variable, using that +to create a TextConsumer, and making that the first part of a tee -- +splicing that into your pipeline at a convenient location. + +<p> You can also use a DomConsumer to buffer the data, but remember +that DOM doesn't save all the information that XML provides, so that DOM +snapshots are relatively low fidelity. They also are substantially more +expensive in terms of memory than a StringWriter holding similar data. + +<h2> Debugging Tip: Non-XML Producers</h2> + +<p> Producers in pipelines don't need to start from XML +data structures, such as text in XML syntax (likely coming +from some <em>XMLReader</em> that parses XML) or a +DOM representation (perhaps with a +<a href="../util/DomParser.html">DomParser</a>). + +<p> One common type of event producer will instead make +direct calls to SAX event handlers returned from an +<a href="EventConsumer.html">EventConsumer</a>. +For example, making <em>ContentHandler.startElement</em> +calls and matching <em>ContentHandler.endElement</em> calls. + +<p> Applications making such calls can catch certain +common "syntax errors" by using a +<a href="WellFormednessFilter.html">WellFormednessFilter</a>. +That filter will detect (and report) erroneous input data +such as mismatched document, element, or CDATA start/end calls. +Use such a filter near the head of the pipeline that your +producer feeds, at least while debugging, to help ensure that +you're providing legal XML Infoset data. + +<p> You can also arrange to validate data on the fly. +For DTD validation, you can configure a +<a href="ValidationConsumer.html">ValidationConsumer</a> +to work as a filter, using any DTD you choose. +Other validation schemes can be handled with other +validation filters. + +</body></html> |