Re: Convert MS-Word to plain text
- From: "Ben Bullock" <benkasminbullock@xxxxxxxxx>
- Date: Mon, 12 May 2008 22:14:38 +0900
"Jurgen Exner" <jurgenex@xxxxxxxxxxx> wrote in message news:n7t924pig28pcl1kgcpubip5m73chum672@xxxxxxxxxx
backpack <curtyoung@xxxxxxxxx> wrote:Are there any perl modules that will allow you to convert MS-Word docs
to plain text?
Opposite to earlier formats the DOCX format is an open XML format and
information about it is available on the Microsoft website. I don't know
if someone already wrote a parser for it, but at least it should be
possible now.
The docx format only applies to Word 2007. Microsoft have also made their binary formats for other versions of Word public:
http://www.microsoft.com/interop/docs/OfficeBinaryFormats.mspx
But deciphering these is only necessary if you want to read the files on a non-Windows system or you don't have a working copy of Microsoft Word. If you have Microsoft Word available, the most sensible thing to do is to use Win32::OLE. Here is a quick program to save a Word file as text:
#! perl
use warnings;
use strict;
use Win32::OLE;
use Win32::OLE::Const 'Microsoft.Word';
my $dir = "C:/Documents and Settings/bkb/My Documents/scripts/tests/";
my $doc = $dir."test2.doc";
my $txt = $dir."test.txt";
my $word = Win32::OLE->new('Word.Application', 'Quit')
or die "Word problem: ",Win32::OLE->LastError();
my $document = $word->Documents->Open($doc)
or die "Word problem: ",Win32::OLE->LastError();
$document->SaveAs($txt, wdFormatText);
.
- Follow-Ups:
- Re: Convert MS-Word to plain text
- From: David Combs
- Re: Convert MS-Word to plain text
- References:
- Convert MS-Word to plain text
- From: backpack
- Re: Convert MS-Word to plain text
- From: Jürgen Exner
- Convert MS-Word to plain text
- Prev by Date: Re: test fails
- Next by Date: Re: test fails
- Previous by thread: Re: Convert MS-Word to plain text
- Next by thread: Re: Convert MS-Word to plain text
- Index(es):
Relevant Pages
|