Re: Word to text translation



kenoli wrote:
Have you ever seen the gack that Word puts in its html files? They
are really xml files with all kinds of special definitions. I have
found a web site that will remove it all, one file at a time, which is
useful for cleaning up a file now and then. What I am trying to do is
find something that will let me batch upload files and let a php
script do the work. I have more material than I can handle one file
at a time.

Thanks,

--Kenoli

On Apr 11, 7:19 pm, Preventer of Work <not_t...@xxxxxxxxxx> wrote:
kenoli wrote:
Does anyone know a class or other script for translating the contents
of a MSWord document into a text file with simple formatting, e.g.
paragraph breaks, not totally mangling lists, etc. so it can be stored
in a text field in a mysql database.
The point of this is storing data from documents so that selections
can be cut and pasted into another database where it will be utilized
as text content in a database driven web site.
I realize that one way to do this is to simply link to the actual
MSWord file located in a directory. Putting it into a database field,
however, would be useful as I don't care about the formatting, aside
from keeping it readable. Having it in this form makes it possible to
easily copy and paste stuff from fields in the one database to fields
in the database driving the web site.
Thanks,
--Kenoli
Don't know of anything that does that directly.
You could export them from Word as html files - it is at least text, and
there are parsers for html.


The MS Visual Studio langauges come with Word APIs. You can search, extract text, stuff like that (I've not used them, but do know they exist). You could write a program that pulls out all the text from as many files as you want at one time.

You can also do that with OpenOffice.org on any platform. You can have a program tell it to open and import Word files, then pull content out - same as VS/Word operations.
http://api.openoffice.org/

I know this isn't what you wanted, but maybe someone else will remember seeing something based on these. Such a tool should be handy to lots of people,
.



Relevant Pages

  • nside MySpace.com 1.
    ... never have to bear more than a small fraction of the traffic MySpace ... the Web site displays what he calls "the ... database, with MySpace users frequently seeing a Web page headlined ... database and storage systems to try to keep pace with exploding growth, ...
    (microsoft.public.windows.server.general)
  • And yet another one from the mind of Lohkee!
    ... Internet Content Blocking Software ... Some organizations that allow employees to surf the net have ... a database of websites that have been categorized by type of the content ... existence of a web site before you can categorize it. ...
    (comp.security.misc)
  • And yet another one from the mind of Lohkee!
    ... Internet Content Blocking Software ... Some organizations that allow employees to surf the net have ... a database of websites that have been categorized by type of the content ... existence of a web site before you can categorize it. ...
    (comp.os.ms-windows.nt.admin.security)
  • And yet another one from the mind of Lohkee!
    ... Internet Content Blocking Software ... Some organizations that allow employees to surf the net have ... a database of websites that have been categorized by type of the content ... existence of a web site before you can categorize it. ...
    (microsoft.public.win2000.security)
  • Re: And yet another one from the mind of Lohkee!
    ... already done the work of setting up the database. ... > Internet Content Blocking Software ... Some organizations that allow employees to surf the net ... > existence of a web site before you can categorize it. ...
    (comp.os.ms-windows.nt.admin.security)