Re: clean up html document created by Word



jd wrote:
I am looking for python code (working or sample code) that can take an
html document created by Microsoft Word and clean it up (if you've
never had to look at a Word-generated html document, consider yourself
lucky ;-) Alternatively, if you know of a non-python solution, I'd
like to hear about it.

Thanks...

-- jeff

There is a Microsoft add-on for Word which helps to reduce the mess called 'HTML filter'. Go for it here:

http://www.microsoft.com/downloads/details.aspx?FamilyID=209ADBEE-3FBD-482C-83B0-96FB79B74DED&displaylang=EN

run it and then use afterwards the other in this thread suggested 'cleaning' methods.

Claudio
.



Relevant Pages

  • Re: clean up html document created by Word
    ... html document created by Microsoft Word and clean it up (if you've ... never had to look at a Word-generated html document, ... lucky;-) Alternatively, if you know of a non-python solution, I'd ...
    (comp.lang.python)
  • Re: clean up html document created by Word
    ... never had to look at a Word-generated html document, ... lucky;-) Alternatively, if you know of a non-python solution, I'd ...
    (comp.lang.python)
  • clean up html document created by Word
    ... I am looking for python code that can take an ... html document created by Microsoft Word and clean it up (if you've ... never had to look at a Word-generated html document, ...
    (comp.lang.python)
  • Re: Printing headers and/or footers
    ... No, i don't have MS Word, i have Internet Explorer 6 component. ... >> an HTML document ... Microsoft Word? ... Prev by Date: ...
    (microsoft.public.win2000.printing)
  • Re: clean up html document created by Word
    ... never had to look at a Word-generated html document, ... lucky;-) Alternatively, if you know of a non-python solution, I'd ... Python is good for parsing HTML/XML, so you could also try googling ... Python parsing as well. ...
    (comp.lang.python)