Re: Is there any package to convert a word formatted documet to xml?



Steve Ball schrieb:

BTW, I (mostly) wrote and currently maintain the DocBook roundtripping
system. It does specifically target DocBook, but not for general-
purpose conversion of documents; just for roundtripping. However, it
can be adapted for general-purpose conversion.




Thanks!

Although, I don't use roundtrip --I am writing directly
in DocBook. Using roundtrip for wordprocessor
text conversion appears to be the best approach to get structured
documents. The styles are the only reliable structure element
in a word document.

What's needed is sort of a
style mapper in word itself --VBA-- replacing the
current styles with those originating from DocBook.

The Fo-way is also interesting, it makes it possible
to design text-chunks in a wordprocessor and reuse it wysiwyg
in a Tk interface, for a sophisticated online help system for example.


-roger

The best thing to do is use Word 2003 or Word 2007. Both of these
versions of Word save their documents as XML which can then be
transformed using XSLT (either TclXML or tDOM may provide the
transformation infrastructure). With Office 2007, the Word document is
actually a Zip file which can be mounted as a virtual filesystem using
Tcl's VFS.

HTHs,
Steve Ball




Sounds like a lot of work.


On Nov 23, 10:07 pm, Arndt Roger Schneider <arndt.ro...@xxxxxx> wrote:


Anil A Kumar schrieb:







Hi all,


My name is Anil Kumar. I am planning to develop a tool, which converts
a microsoft word/pdf formatted file to a xml file.


I know parsing an xml file using tDom package. I just want to know how
to convert a word/pdf formatted document to an xml formatted document
using TCL. Is there any package available?


Thanks in advance!


Thanks for thr entire comp.lang.tcl group, because you helped me when
I started learning tDom.


Regards,
Anil A Kumar


word is already xml.
Otherwise see roundtrip inside the docbook-xsl on sourceforge,
there is a word template which allows you to import and export
word documents from and to docbook.

-roger





.



Relevant Pages

  • Re: OT - pdf to docbook
    ... potential DocBook users! ... format, per se, it's a XML markup language that enforces a particular ... The analogous situation is converting a PDF to Microsoft ... but that doesn't give you a Word document with a ...
    (Ubuntu)
  • Re: XHTML-Dokumente als Quelldokumente
    ... >> Die notwendigen Angaben für die Ausgabe sollten in separaten Dateien ... wenn ich zB DocBook für mein XML ... Inhalt der XML-Datei als LaTeX-Code auszugeben? ...
    (de.comp.text.tex)
  • XML in und um TeX
    ... Docbook zu verwenden (vermtl. ... XML Kram sein). ... Meine Frage ist nun ob ConTeXt da eher für taugt. ... Für LaTeX gibt es ja FOPs, tbook und obige Programme. ...
    (de.comp.text.tex)
  • Docbook toolchain - PDF creation failed
    ... I started to write a document with docbook xml. ... But I did not succeed in getting PDF or MS Word output. ... but apache fop cannot cope with the embedded images ...
    (comp.text.xml)