Re: funny blobs




"slebetman@xxxxxxxxx" <slebetman@xxxxxxxxx> wrote in message
news:1185771811.718066.106070@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

You're already successful since you can already get the "funny
symbols". Your problem is not how to read since you can already do
that. Your problem is how to parse a MS Word file.

Now, this is no small matter since even Microsoft Word 6.0 will have
problems parsing a Microsoft Word 2001 file (yes, there is such an
application as Word 2001, hint: it's not for Windows). There are
plenty of efforts out there to reverse engineer the MS Word .doc
format. Everything from OpenOffice's attempt at feature and bug
compatibility to Google Doc's (formerly Writely) good-enough approach.
Unfortunately I don't know of any pure-tcl implementation of a
Word .doc parser.

Why are you using Word files in the first place? Can't you use simpler
to parse formats like plain .txt file or HTML (both of which can be
exported from Word documents)?


hi slebetman -
i kind of achieved what i wanted tonight. i know what you mean by parsing.

i wanted to use a word file as i knew it was binary. i wanted to know can
i stick a binary file (word) into the sqlite db in a table and then extract
it with sql
and look at the file contents. for a while i could just see either the
encoded or
decoded print out of the file when i was fooling around. so i wondered
did i actually do it? i couldn't see the text typed originally into the
word file.
just gibberish (too me anyways). finally when i output the binary data
correctly to a file i could see that it worked.

you could kind of see that i read it. but i couldnt.

i really dont need to parse it. that is really good advice though on
choosing
the right format to parse.

thanks very much.
have a good week :-)
jim




.



Relevant Pages

  • Re: REGEXPS
    ... the meaning and purpose of it 4) if you need to change parse ... > format it's easily and quickly done. ... > unless you commented it and if you need to changes input format it ... Considering that regexps quite popular ...
    (comp.lang.lisp)
  • Re: Detect Undeliverable Email using WebDAV?
    ... formatted text which would need to parse also. ... I used CDOSYS because its easy to load a message in a serialized format and ... in .NET system.web.mail is a wrapper of CDOSYS but unfortunately they only ... Now the only problem is the response stream is not in xml format. ...
    (microsoft.public.exchange2000.development)
  • Re: Localizing dates
    ... >I need to know because I have to parse dates in different locations, ... >want to give the parser a hint about what format to expect. ... SimpleDateFormat with the dots replaced by colons. ...
    (comp.lang.java.programmer)
  • Re: Which codec is required?
    ... Where can I find the format for the different file types? ... that much I could parse apart the file..... ... Depending on the type of media, you will probably have to find or write ... (I don't want them to download codec - I just want to use the codecs I ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Safe conversion from string to DateTime
    ... I have tried DateTime.Parse but it threw an exception on "12 noon". ... > Parse ignores leading and trailing white space, ... > DateTime.Parse will parse a valid date and time from a string. ... > match the format that you specify in the IFormatProvider parameter. ...
    (microsoft.public.dotnet.languages.csharp)