Re: NEWB: reverse traversal of xml file



manstey wrote:
Hi,

I have an xml file of about 140Mb like this:

<book>
<record>
...
<wordpartWTS>1</wordpartWTS>
</record>
<record>
...
<wordpartWTS>2</wordpartWTS>
</record>
<record>
...
<wordpartWTS>1</wordpartWTS>
</record>
</book>

I want to traverse it from bottom to top and add another field to each
record <totalWordPart>1</totalWordPart>
which would give the highest value of wordpartWTS for each record for
each word

so if wordparts for the first ten records were 1 2 1 1 1 2 3 4 1 2
I want totalWordPart to be 2 2 1 1 4 4 4 4 2 2

I figure the easiest way to do this is to go thru the file backwards.

Any ideas how to do this with an xml data file?

You need to iterate from the beginning and use itertools.groupby:

from itertools import groupby

def enumerate_words(parts):
word_num = 0
prev = 0
for part in parts:
if prev >= part:
word_num += 1
prev = part
yield word_num, part


def get_word_num(item):
return item[0]

parts = 1,2,1,1,1,2,3,4,1,2
for word_num, word in groupby(enumerate_words(parts), get_word_num):
parts_list = list(word)
max_part = parts_list[-1][1]
for word_num, part_num in parts_list:
print max_part, part_num

prints:

2 1
2 2
1 1
1 1
4 1
4 2
4 3
4 4
2 1
2 2

.



Relevant Pages

  • newb comment request
    ... So i wrote my first module which reads this pickled file and writes ... who creates the tables according to details found in the XML file). ... def ExtractFieldNamesFromData: ... NoneValues, sampleValue] ...
    (comp.lang.python)
  • Re: Ordered Sets
    ... to store the key itself in the Node or the list. ... script that stores only prev and next. ... def discard: ... yield start ...
    (comp.lang.python)
  • Re: tuples within tuples
    ... Actually i'm parsing an xml file using pyrxp, ... (tagName, attributes, list_of_children, spare) ... def visitDefault: ... then when you want to use it you subclass Visitor adding appropriate ...
    (comp.lang.python)
  • Re: Advice for editing xml file using ElementTree and wxPython
    ... The trick is to keep a reference to the actual ElementTree objects ... If the XML file is very large you may have performance issues since ... root = tree.AddRoot ... def add: ...
    (comp.lang.python)
  • Re: Advice for editing xml file using ElementTree and wxPython
    ... The trick is to keep a reference to the actual ElementTree objects ... If the XML file is very large you may have performance issues since ... root = tree.AddRoot ... def add: ...
    (comp.lang.python)

Loading