summaryrefslogtreecommitdiff
path: root/libjava/classpath/gnu/xml/dom/package.html
diff options
context:
space:
mode:
Diffstat (limited to 'libjava/classpath/gnu/xml/dom/package.html')
-rw-r--r--libjava/classpath/gnu/xml/dom/package.html273
1 files changed, 273 insertions, 0 deletions
diff --git a/libjava/classpath/gnu/xml/dom/package.html b/libjava/classpath/gnu/xml/dom/package.html
new file mode 100644
index 000000000..fbc864a4d
--- /dev/null
+++ b/libjava/classpath/gnu/xml/dom/package.html
@@ -0,0 +1,273 @@
+<html>
+<body>
+
+<p>
+This is a Free Software DOM Level 3 implementation, supporting these features:
+<ul>
+<li>"XML"</li>
+<li>"Events"</li>
+<li>"MutationEvents"</li>
+<li>"HTMLEvents" (won't generate them though)</li>
+<li>"UIEvents" (also won't generate them)</li>
+<li>"USER-Events" (a conformant extension)</li>
+<li>"Traversal" (optional)</li>
+<li>"XPath"</li>
+<li>"LS" and "LS-Async"</li>
+</ul>
+It is intended to be a reasonable base both for
+experimentation and supporting additional DOM modules as clean layers.
+</p>
+
+<p>
+Note that while DOM does not specify its behavior in the
+face of concurrent access, this implementation does.
+Specifically:
+<ul>
+<li>If only one thread at a time accesses a Document,
+of if several threads cooperate for read-only access,
+then no concurrency conflicts will occur.</li>
+<li>If several threads mutate a given document
+(or send events using it) at the same time,
+there is currently no guarantee that
+they won't interfere with each other.</li>
+</ul>
+</p>
+
+<h3>Design Goals</h3>
+
+<p>
+A number of DOM implementations are available in Java, including
+commercial ones from Sun, IBM, Oracle, and DataChannel as well as
+noncommercial ones from Docuverse, OpenXML, and Silfide. Why have
+another? Some of the goals of this version:
+</p>
+
+<ul>
+<li>Advanced DOM support. This was the first generally available
+implementation of DOM Level 2 in Java, and one of the first Level 3
+and XPath implementations.</li>
+
+<li> Free Software. This one is distributed under the GPL (with
+"library exception") so it can be used with a different class of
+application.</li>
+
+<li>Second implementation syndrome. I can do it simpler this time
+around ... and heck, writing it only takes a bit over a day once you
+know your way around.</li>
+
+<li>Sanity check the then-current Last Call DOM draft. Best to find
+bugs early, when they're relatively fixable. Yes, bugs were found.</li>
+
+<li>Modularity. Most of the implementations mentioned above are part
+of huge packages; take all (including bugs, of which some have far
+too many), or take nothing. I prefer a menu approach, when possible.
+This code is standalone, not beholden to any particular parser or XSL
+or XPath code.</li>
+
+<li>OK, I'm a hacker, I like to write code.</li>
+</ul>
+
+<p>
+This also works with the GNU Compiler for Java (GCJ). GCJ promises
+to be quite the environment for programming Java, both directly and from
+C++ using the new CNI interfaces (which really use C++, unlike JNI). </p>
+
+
+<h3>Open Issues</h3>
+
+<p>At this writing:</p>
+<ul>
+<li>See below for some restrictions on the mutation event
+support ... some events aren't reported (and likely won't be).</li>
+
+<li>More testing and conformance work is needed.</li>
+
+<li>We need an XML Schema validator (actually we need validation in the DOM
+full stop).</li>
+</ul>
+
+<p>
+I ran a profiler a few times and remove some of the performance hotspots,
+but it's not tuned. Reporting mutation events, in particular, is
+rather costly -- it started at about a 40% penalty for appendNode calls,
+I've got it down around 12%, but it'll be hard to shrink it much further.
+The overall code size is relatively small, though you may want to be rid of
+many of the unused DOM interface classes (HTML, CSS, and so on).
+</p>
+
+
+<h2><a name="features">Features of this Package</a></h2>
+
+<p> Starting with DOM Level 2, you can really see that DOM is constructed
+as a bunch of optional modules around a core of either XML or HTML
+functionality. Different implementations will support different optional
+modules. This implementation provides a set of features that should be
+useful if you're not depending on the HTML functionality (lots of convenience
+functions that mostly don't buy much except API surface area) and user
+interface support. That is, browsers will want more -- but what they
+need should be cleanly layered over what's already here. </p>
+
+<h3> Core Feature Set: "XML" </h3>
+
+<p> This DOM implementation supports the "XML" feature set, which basically
+gets you four things over the bare core (which you're officially not supposed
+to implement except in conjunction with the "XML" or "HTML" feature). In
+order of decreasing utility, those four things are: </p> <ol>
+
+ <li> ProcessingInstruction nodes. These are probably the most
+ valuable thing. Handy little buggers, in part because all the APIs
+ you need to use them are provided, and they're designed to let you
+ escape XML document structure rules in controlled ways.</li>
+
+ <li> CDATASection nodes. These are of of limited utility since CDATA
+ is just text that prints funny. These are of use to some sorts of
+ applications, though I encourage folk to not use them. </li>
+
+ <li> DocumentType nodes, and associated Notation and Entity nodes.
+ These appear to be useless. Briefly, these "Type" nodes expose no
+ typing information. They're only really usable to expose some lexical
+ structure that almost every application needs to ignore. (XML editors
+ might like to see them, but they need true typing information much more.)
+ I strongly encourage people not to use these. </li>
+
+ <li> EntityReference nodes can show up. These are actively annoying,
+ since they add an extra level of hierarchy, are the cause of most of
+ the complexity in attribute values, and their contents are immutable.
+ Avoid these.</li>
+
+ </ol>
+
+<h3> Optional Feature Sets: "Events", and friends </h3>
+
+<p> Events may be one of the more interesting new features in Level 2.
+This package provides the core feature set and exposes mutation events.
+No gooey events though; if you want that, write a layered implementation! </p>
+
+<p> Three mutation events aren't currently generated:</p> <ul>
+
+ <li> <em>DOMSubtreeModified</em> is poorly specified. Think of this
+ as generating one such event around the time of finalization, which
+ is a fully conformant implementation. This implementation is exactly
+ as useful as that one. </li>
+
+ <li> <em>DOMNodeRemovedFromDocument</em> and
+ <em>DOMNodeInsertedIntoDocument</em> are supposed to get sent to
+ every node in a subtree that gets removed or inserted (respectively).
+ This can be <em>extremely costly</em>, and the removal and insertion
+ processing is already significantly slower due to event reporting.
+ It's much easier, and more efficient, to have a listener higher in the
+ tree watch removal and insertion events through the bubbling or capture
+ mechanisms, than it is to watch for these two events.</li>
+
+ </ul>
+
+<p> In addition, certain kinds of attribute modification aren't reported.
+A fix is known, but it couldn't report the previous value of the attribute.
+More work could fix all of this (as well as reduce the generally high cost
+of childful attributes), but that's not been done yet. </p>
+
+<p> Also, note that it is a <em>Bad Thing&#153;</em> to have the listener
+for a mutation event change the ancestry for the target of that event.
+Or to prevent mutation events from bubbling to where they're needed.
+Just don't do those, OK? </p>
+
+<p> As an experimental feature (named "USER-Events"), you can provide
+your own "user" events. Just name them anything starting with "USER-"
+and you're set. Dispatch them through, bubbling, capturing, or what
+ever takes your fancy. One important thing you can't currently do is
+pass any data (like an object) with those events. Maybe later there
+will be a "UserEvent" interface letting you get some substantial use
+out of this mechanism even if you're not "inside" of a DOM package.</p>
+
+<p> You can create and send HTML events. Ditto UIEvents. Since DOM
+doesn't require a UI, it's the UI's job to send them; perhaps that's
+part of your application. </p>
+
+<p><em>This package may be built without the ability to report mutation
+events, gaining a significant speedup in DOM construction time. However,
+if that is done then certain other features -- notably node iterators
+and getElementsByTagname -- will not be available.</em>
+
+
+<h3> Optional Feature: "Traversal" </h3>
+
+<p> Each DOM node has all you need to walk to everything connected
+to that node. Lightweight, efficient utilities are easily layered on
+top of just the core APIs. </p>
+
+<p> Traversal APIs are an optional part of DOM Level 2, providing
+a not-so-lightweight way to walk over DOM trees, if your application
+didn't already have such utilities for use with data represented via
+DOM. Implementing this helped debug the (optional) event and mutation
+event subsystems, so it's provided here. </p>
+
+<p> At this writing, the "TreeWalker" interface isn't implemented. </p>
+
+
+
+<h2><a name='avoid'>DOM Functionality to Avoid</a></h2>
+
+<p> For what appear to be a combination of historical and "committee
+logic" reasons, DOM has a number of <em>features which I strongly advise
+you to avoid using</em> in your library and application code. These
+include the following types of DOM nodes; see the documentation for the
+implementation class for more information: <ul>
+
+ <li> CDATASection
+ (<a href='DomCDATA.html'>DomCDATA</a> class)
+ ... use normal Text nodes instead, so you don't have to make
+ every algorithm recognize multiple types of character data
+
+ <li> DocumentType
+ (<a href='DomDoctype.html'>DomDocType</a> class)
+ ... if this held actual typing information, it might be useful
+
+ <li> Entity
+ (<a href='DomEntity.html'>DomEntity</a> class)
+ ... neither parsed nor unparsed entities work well in DOM; it
+ won't even tell you which attributes identify unparsed entities
+
+ <li> EntityReference
+ (<a href='DomEntityReference.html'>DomEntityReference</a> class)
+ ... permitted implementation variances are extreme, all children
+ are readonly, and these can interact poorly with namespaces
+
+ <li> Notation
+ (<a href='DomNotation.html'>DomNotation</a> class)
+ ... only really usable with unparsed entities (which aren't well
+ supported; see above) or perhaps with PIs after the DTD, not with
+ NOTATION attributes
+
+ </ul>
+
+<p> If you really need to use unparsed entities or notations, use SAX;
+it offers better support for all DTD-related functionality.
+It also exposes actual
+document typing information (such as element content models).</p>
+
+<p> Also, when accessing attribute values, use methods that provide their
+values as single strings, rather than those which expose value substructure
+(Text and EntityReference nodes). (See the <a href='DomAttr.html'>DomAttr</a>
+documentation for more information.) </p>
+
+<p> Note that many of these features were provided as partial support for
+editor functionality (including the incomplete DTD access). Full editor
+functionality requires access to potentially malformed lexical structure,
+at the level of unparsed tokens and below. Access at such levels is so
+complex that using it in non-editor applications sacrifices all the
+benefits of XML; editor aplications need extremely specialized APIs. </p>
+
+<p> (This isn't a slam against DTDs, note; only against the broken support
+for them in DOM. Even despite inclusion of some dubious SGML legacy features
+such as notations and unparsed entities,
+and the ongoing proliferation of alternative schema and validation tools,
+DTDs are still the most widely adopted tool
+to constrain XML document structure.
+Alternative schemes generally focus on data transfer style
+applications; open document architectures comparable to
+DocBook 4.0 don't yet exist in the schema world.
+Feel free to use DTDs; just don't expect DOM to help you.) </p>
+
+</body>
+</html>
+