Testing for changes on a web page (was: how to find difference in number of characters)



harryos, 09.10.2010 14:24:
I am trying to determine if a wep page is updated by x number of
characters..Mozilla firefox plugin 'update scanner' has a similar
functionality ..A user can specify the x ..I think this would be done
by reading from the same url at two different times and finding the
change in body text.

"Number of characters" sounds like a rather useless measure here. I'd rather apply an XPath, CSS selector or PyQuery expression to the parsed page and check if the interesting subtree of it has changed at all or not, potentially disregarding any structural changes by stripping all tags and normalising the resulting text to ignore whitespace and case differences.

Stefan

.