Re: Word to text translation



On Fri, 11 Apr 2008 12:56:14 -0700 (PDT), kenoli wrote:

Does anyone know a class or other script for translating the contents
of a MSWord document into a text file with simple formatting, e.g.
paragraph breaks, not totally mangling lists, etc. so it can be stored
in a text field in a mysql database.

I believe HTML Tidy can do good things with Word docs, although I have
never tried using it for that myself. (I have cleaned Word docs *by hand*,
however, and I can say that software that can do it is something valuable.)

Go to http://www.w3.org/People/Raggett/tidy/#word2000 and nose around a
little. See if it does (or can be made to do) what you need.

--
John
.



Relevant Pages

  • Re: WORD vs. EXCEL: How to Optimize
    ... option may be to write these blocks of text in Word docs, ... Your code then would simply choose cells ... change the text or the formatting of a block of text, ... > pricing, generates legal documents, and other info. ...
    (microsoft.public.excel)
  • Re: Cant use SendMessage...WM_PASTE with a regular Text Box
    ... By copying and pasting, they retain the ... formatting in the Word docs, ...
    (comp.databases.ms-access)
  • Re: Word to pdf conversion - advice appreciated
    ... PDFs rather than Word docs simply isn't practical for reasons quite other ... than the formatting issues. ... your email lists -- but I expend a fair amount of energy here trying to ...
    (comp.text.pdf)
  • Re: OOo and MS Word
    ... The lack of knowledge of proper document formatting usually results in ... So upon my suggestion that why doesn't he just dump windows ... edit MS Word docs. ...
    (alt.os.linux)