Re: How can I ensure that I always have a list?
- From: "Michael A. Cleverly" <michael@xxxxxxxxxxxx>
- Date: Sat, 25 Nov 2006 18:46:55 -0700
On Sat, 25 Nov 2006, comp.lang.tcl wrote:
Gerald W. Lester wrote:
The following converts your XML to a list, I also posted it on another
thread. It took 5 minutes to write:
##
## A node will be the following list of name value pairs:
## NAME nodeName
## TEXT text
## ATTRIBUTES attributeNameValueList
## CHILDREN childNodeList
##
package require tdom
You already lost me. "package require tdom" = HUH?
tdom is an XML parsing extension for Tcl. It's home is at
http://www.tdom.org and there are lots of pages on the Tcl'ers wiki that
use it when dealing with XML data.
Compared to TclXML it is much simpler to build and install (in my
experience). It is definitely a worthy investment time-wise to learn how
to use it.
However, I understand from other messages in this thread that you have ADD
and just want to get the job done as quickly as possible. I gather that
suggestions that involve using code outside of vanilla Tcl do not qualify
as sufficiently quick due to the time it would take to download and
install and understand these packages.
Here is some plain-vanilla Tcl code that should meet your parsing needs.
I've tried to comment it heavily, including the regular expressions it
uses (expanded regular expression syntax is our friend in this regard).
That said, I echo the advice that so many others in this thread have given
you: for dealing with XML data it behooves you to use a real XML parser.
But if this works to solve your immediate problems then great! Once you
have this off your plate perhaps you can come back and we can help you
understand XML parsing itself with more leisure...
#!/bin/sh
#\
exec tclsh "$0" ${1+"$@"}
# Tcl 8.0 and earlier did not support expanded regexp syntax which we
# use in the code below
package require Tcl 8.1
proc get-all-xml-attributes {xml} {
set all_attributes [list]
# Match one tag. This will ignore tags that are commented out or
# literal text that looks like a tag within a <![CDATA[ ... ]]> section
set RE(tag) {<([^<>]*)>}
set RE(name-attribs) {(?x)^ # This is an expanded regexp w/comments
(\S+) # non-whitespace chars (i.e., the tag name)
\s* # maybe followed by some white-space
(\S.*)? # everything else (i.e., the attributes)
$}
set RE(next-attrib) {(?x)^ # This is another expanded regexp w/comments
(\S+) # attribute name
\s*=\s* # equals
(["'"].+) # everything else (attr val + other attr(s))
$}
# One version for each of the two possible quoting conventions--single
# quotes (which could contain double quotes), or double quotes (which
# could contain single quotes). In both cases the second set of
# capturing parenthesis will get the rest of the remaining attribute
# data (if any remains)
set RE(single-quote) {^'([^'']*)'(.*)$}
set RE(double-quote) {^"([^""]*)"(.*)$}
# Iterate over each tag in the XML provided
foreach {whole_tag contents} [regexp -inline -all -- $RE(tag) $xml] {
# Start with an empty list of attributes for this tag
set attributes_this_tag [list]
# Trim off any extraneous whitespace to make life easier
set contents [string trim $contents]
# Ignore a completely empty tag (which would be invalid xml
# to begin with)
if {[string length $contents] == 0} then continue
# Ignore closing tags; they aren't supposed to have attributes
if {[string index $contents 0] == "/"} then continue
# Ignore processing instructions; they don't have attributes
if {[string index $contents 0] == "?"} then continue
# Ignore comments and CDATA tags; they don't have attributes
if {[string index $contents 0] == "!"} then continue
# Separate out the tag name and the data of the attribute(s)
regexp -- $RE(name-attribs) $contents => tag_name data
# If the string length of data is zero then there were no attributes
if {[string length $data] == 0} then continue
# Now we will get the name of an attribute (key), see what
# type of quoting is used (single or double), then get the value
# of the attribute (val), and save the rest of the data (additional
# key/value pair(s)) for further processing the next time we go
# through the while loop.
#
# Processing ends when we run out of key/value pairs (data is
# exhausted and our regexp fails to match any more) or when
# we encounter an attribute that is improperly quoted (i.e.,
# no closing single or double quote) which is definitely invalid xml.
while {[regexp -- $RE(next-attrib) $data => key data]} {
# Which type of quoting was used, single or double?
if {[string match '* $data]} then {
set quote_type single-quote
} else {
set quote_type double-quote
}
# There should be a corresponding close $quote_type; between
# the opening & closing quote will be the value of this attrib
# if there is no closing quote of the appropriate type then
# this is invalid xml and we ignore any further processing
# of attributes for this tag
if {![regexp -- $RE($quote_type) $data => val data]} then break
# We now know a key/val attribute pair; add it to the list
# we are accumulating for this tag
lappend attributes_this_tag $key $val
# Trim off leading whitespace that separated this key/val
# attribute pair from any that follow it
set data [string trimleft $data]
}
# Did we find any attributes for this tag?
if {[llength $attributes_this_tag]} then {
# If so, append to the overall list-of-lists we're accumulating
# for the entire XML document
lappend all_attributes $attributes_this_tag
}
}
# Return the list-of-lists of key/val attribute pairs that were found
return $all_attributes
}
# The sample XML included in an earlier post in this thread
set xml { <?xml version="1.0" encoding="utf-8" ?><trivia><entry
id="1101"
triviaID="233" question="Who wrote "Trilogy of Knowledge"?"
answerID="1" correctAnswerID="1" answer="Believer"
expDate="1139634000"></entry><entry id="1102" triviaID="233"
question="Who wrote "Trilogy of Knowledge"?" answerID="2"
correctAnswerID="1" answer="Saviour Machine"
expDate="1139634000"></entry><entry id="1103" triviaID="233"
question="Who wrote "Trilogy of Knowledge"?" answerID="3"
correctAnswerID="1" answer="Seventh Avenue"
expDate="1139634000"></entry><entry id="1104" triviaID="233"
question="Who wrote "Trilogy of Knowledge"?" answerID="4"
correctAnswerID="1" answer="Inevitable End"
expDate="1139634000"></entry><entry id="1105" triviaID="233"
question="Who wrote "Trilogy of Knowledge"?" answerID="5"
correctAnswerID="1" answer="No such song existed"
expDate="1139634000"></entry></trivia> }
# Find the attributes and print them out
foreach set_of_attributes [get-all-xml-attributes $xml] {
puts $set_of_attributes
foreach {key val} $set_of_attributes {
puts " $key = $val"
}
}
Michael
.
- Follow-Ups:
- Re: How can I ensure that I always have a list?
- From: comp.lang.tcl
- Re: How can I ensure that I always have a list?
- References:
- How can I ensure that I always have a list?
- From: comp.lang.tcl
- Re: How can I ensure that I always have a list?
- From: Gerald W. Lester
- Re: How can I ensure that I always have a list?
- From: comp.lang.tcl
- How can I ensure that I always have a list?
- Prev by Date: Re: TCL/PHP/XML problem: I need to convert an XML file into a TCL list
- Next by Date: Re: Newbie tip install question
- Previous by thread: Re: How can I ensure that I always have a list?
- Next by thread: Re: How can I ensure that I always have a list?
- Index(es):
Relevant Pages
|