Testing for changes on a web page (was: how to find difference in number of characters)



harryos, 09.10.2010 14:24:
I am trying to determine if a wep page is updated by x number of
characters..Mozilla firefox plugin 'update scanner' has a similar
functionality ..A user can specify the x ..I think this would be done
by reading from the same url at two different times and finding the
change in body text.

"Number of characters" sounds like a rather useless measure here. I'd rather apply an XPath, CSS selector or PyQuery expression to the parsed page and check if the interesting subtree of it has changed at all or not, potentially disregarding any structural changes by stripping all tags and normalising the resulting text to ignore whitespace and case differences.

Stefan

.



Relevant Pages

  • Re: Unformatted, big-endian files and fseek
    ... as it is still a compiler used by several. ... F2003 I/O will improve the reading speed versus reading the whole ... most of them have the functionality in one form or other. ... I/O syntax (I think, for example, that the latest version of the Intel ...
    (comp.lang.fortran)
  • RE: Reading Audio CD Text Data
    ... You should specify 2048 bytes when reading in cooked ... Reading Audio CD Text Data ... using IOCTL_CDROM_MEDIA_REMOVAL then I read the TOC from the cd using ...
    (microsoft.public.development.device.drivers)
  • Re: [RFC/RFT][PATCH -mm] swsusp: userland interface
    ... > Looks mostly okay, few nits... ... >> +The device can be open either for reading or for writing. ... please refer to the source code. ... > We should specify that userland suspend/resume utilities should lock ...
    (Linux-Kernel)
  • Re: Strategey Reading VERY large files
    ... I found fscan to be considerably faster for reading text coded numbers than ... dlmread, especially if you read line by line. ... values R and C specify the row and column where the upper left corner ...
    (comp.soft-sys.matlab)
  • Re: [RFC/RFT][PATCH -mm] swsusp: userland interface
    ... Looks mostly okay, few nits... ... > +The device can be open either for reading or for writing. ... please refer to the source code. ... We should specify that userland suspend/resume utilities should lock ...
    (Linux-Kernel)