Re: Editing XML



Hans-Peter Diettrich wrote:
Maarten Wiltink wrote:
Does here "@" refer to (all?) attributes, "*" to all childnodes, and
text() to the text? The members of the elements (or nodes?) still are
somewhat unclear to me :-(

* is short for all nested elements, @* for all nested attributes,
text() for all nested text nodes, node() for all nested nodes. A
few other node types such as processing instructions and comments
are dutifully ignored in the above code.

I wonder how text and child nodes are distinguished in the XML text. Or can a element have either text or child nodes, but not both at the same time?

A text node is just another kind of child node. The "*" location path selects element children, and the "text()" location path selects text children. The "@*" location path selects all attributes of the context node, which must be an element node.

Consider this XML document:

<?xml version="1.0"?>
<tag>Hello, world!</tag>

There are three nodes there. The first is the processing instruction, which you'll typically ignore in XSLT. THe next is the element node named "tag" and the last is the text node. (There could really be more nodes -- you're allowed to have two consecutive text nodes -- but they usually get combined into a single node at some point.)

You may want to read the XSLT and XPath specs from W3C. It's a long
read, but well worth it - at least I thought so. Be warned that I
also thought Christopher Tolkien's editions of his father's mislaid
writing worth it. There is a certain similarity; both are not to
everyone's taste.

Right, I only was not sure which exact syntax is used by which tool. I suspect that MS does not respect "foreign" standards much, as usual...

XSLT is XSLT. There's not much choice of another syntax. If Microsoft used anything else, it wouldn't be XSLT anymore.

In the XPath spec, I find section 2 and section 2.5 to be the most useful -- they're the ones I keep going back to for reference. If you're doing more advanced stuff, section 4 will also be useful.

The following link is also very helpful.

http://www.dpawson.co.uk/xsl/sect2/sect21.html

Also I didn't understand the difference between XMLDoc.DocumentElement,
.Node and .ChildNodes. Is .Node the very root node of the document,
whose children are the .ChildNodes, and .DocumentElement is one of these
children? There also seems to exist a flat list of all elements or nodes
in a document?

No, .DocumentElement is the document's root node. .Node is probably
a more volatile 'current node' thing, more useful from other node types
than documents.

There must exist a difference. When I start walking through the children of .Node, I get 3 xml nodes, but only 2 when starting with .DocumentElement. The last of these nodes seems to be the /xml tag, so I have no idea yet, what the Delphi classes or the MS interface understand as "node", "element" and so on :-(

I still don't understand why you're getting </xml> at all. It's not a node. It's the closing tag, which defines the end of the same element node that <xml> began.

In my example above, the element named "tag" is the document element. It is also the root element. There is always exactly one root element in a valid XML document.

There is also a document root, which in XPath is the parent of the root element. In the XML spec, I think it's the same as the "document entity." It serves as the parent for not only the root element but also any processing instructions and doctype declarations that appear outside the root element.

But note that the name "xml" is reserved. You're not supposed to use that as a name for an element or attribute unless some W3C standard defines it. No name is allowed to start with those characters. Try using a different tag name in your test file so there's no possible confusion between your tag and the <?xml?> processing instruction at the top of the file. (I suspect that's the third node you're getting when you start with Node instead of DocumentElement.)

--
Rob
.



Relevant Pages

  • Re: Editing XML
    ... THe next is the element node named "tag" and the last is the text node. ... which defines the end of the same element node that <xml> began. ... Most probably it's the implementation of MSXML, where closing tags seem to translate into their own nodes, with the name and type of the opening tag, they only don't have child nodes and attributes. ... There is always exactly one root element in a valid XML document. ...
    (comp.lang.pascal.delphi.misc)
  • Re: Namespace Issue with InnerXML
    ... I'm sure it's probably not a complex XSLT that can remove the root element of an XML file but as I never used XMLT.... ... But your original change is more than only removing the root element, you want to change the namespace of the descendant nodes. ...
    (microsoft.public.dotnet.xml)
  • Re: Namespace Issue with InnerXML
    ... I know understand why I would need to transform the XML using XSLT. ... I'm sure it's probably not a complex XSLT that can remove the root element ...
    (microsoft.public.dotnet.xml)
  • Re: getting the right XML tag in the parse.
    ... If this is the entire document, then it's not well-formed XML, because ... there's no root element. ... I can only get the value of the lowest nested tag in a nest. ...
    (comp.lang.php)
  • [Ann] Your ultimate XML engine, RefleX 0.3.0, is out !
    ... For a Java programmer, RefleX can help you significantly if you have to build applications that deal with XML datas: you'll find means to "cast" almost transparently a SAX document to a DOM document, to merge a set of DOM fragments to a single SAX documents, to parse ill-formed HTML documents, to map your SQL queries to an arbitrary complex XML structure and much more ... You can also consider RefleX for your configuration files: mapping a Java class to an XML tag is straigthforward, and maybe you have to design a declarative language in XML? ... RefleX is available freely and you'll find lots of tips and tutorial in the documentation; the learning curve is not steep for people that know XSLT and XPath since the basic concepts are very similar: you mix active tags with litterals, and the documents are XPath-centric, but instead of having a single instruction set, you'll have several ones ...
    (comp.lang.java.programmer)