Re: Reading/Parse Word '97 documents with Tcl
- From: Christian Gollwitzer <Christian.Gollwitzer@xxxxxxxxxxxxxxx>
- Date: Wed, 14 Feb 2007 12:33:07 +0100
JohanBG.Johansson@xxxxxxxxx wrote:
The same thing have been done with Word 2.0 without extentions,
modules or components.
It can extract metadata from Word 2.0 document with an heuristic
algorithm that parses the document with no real knowledge of the
format, I am working towards it also being able to do the same thing
with Word '97.
AFAIK the Word 2.0 format is the same as RTF, which is a pure text format similiar to HTML/XML and can be read easily. .doc from word 95 on is binary and very complicated - consider the problems that OpenOffice still has on reading Word files. There are some projects, however, that allow you to extract text from Word-doc-files.
Antiword http://www.winfield.demon.nl/
KOffice
OpenOffice
Abiword
In principle you could write a parser in pur Tcl, but it would be slow and a very heavy task. For pointers, look up the format descriptions at
http://www.wotsit.org/
Christian
.
- References:
- Reading/Parse Word '97 documents with Tcl
- From: JohanBG . Johansson
- Re: Reading/Parse Word '97 documents with Tcl
- From: Rob
- Re: Reading/Parse Word '97 documents with Tcl
- From: JohanBG . Johansson
- Reading/Parse Word '97 documents with Tcl
- Prev by Date: Re: Reading/Parse Word '97 documents with Tcl
- Next by Date: Re: local variable access from "namespace eval"?
- Previous by thread: Re: Reading/Parse Word '97 documents with Tcl
- Next by thread: Re: Reading/Parse Word '97 documents with Tcl
- Index(es):
Relevant Pages
|