Re: [PHP] How to fetch .DOC or .DOCX file in php



On Thu, Dec 4, 2008 at 10:35 PM, Jim Lucas <lists@xxxxxxxxx> wrote:
Shawn McKenzie wrote:
Jim Lucas wrote:
Boyd, Todd M. wrote:
-----Original Message-----
From: Jagdeep Singh [mailto:jagsaini1982@xxxxxxxxx]
Sent: Thursday, December 04, 2008 8:39 AM
To: php-general@xxxxxxxxxxxxx
Subject: [PHP] How to fetch .DOC or .DOCX file in php
Importance: Low

Hi !

I want to fetch text from .doc / .docx file and save it into database
file.
But when I tried to fetch text with fopen/fgets etc ... It gave me
special
characters with text.

(With .txt files everything is fine)
Only problem is with doc/docx files.
I dont know whow to remove "SPECIAL CHARACTERS" from this text ...
A.) This has been handled on this list several times. Please search the
archives before posting a question.
B.) Did you even TRY to Google for this? In the first 5 matches for "php
open ms word" I found this:

http://www.developertutorials.com/blog/php/extracting-text-from-word-doc
uments-via-php-and-com-81/

You will need an MS Windows machine for this solution to work. If you're
using *nix... well... good luck.


// Todd

Ah, not true about the MS requirement. If all you want is the clear/clean
text (without any formatting), then I can do it with php on any platform.

If this is what is needed, here is the code to do it.

<?php

$filename = './12345.doc';
if ( file_exists($filename) ) {

if ( ($fh = fopen($filename, 'r')) !== false ) {

$headers = fread($fh, 0xA00);

# 1 = (ord(n)*1) ; Document has from 0 to 255 characters
$n1 = ( ord($headers[0x21C]) - 1 );

# 1 = ((ord(n)-8)*256) ; Document has from 256 to 63743 characters
$n2 = ( ( ord($headers[0x21D]) - 8 ) * 256 );

# 1 = ((ord(n)*256)*256) ; Document has from 63744 to 16775423 characters
$n3 = ( ( ord($headers[0x21E]) * 256 ) * 256 );

# (((ord(n)*256)*256)*256) ; Document has from 16775424 to 4294965504 characters
$n4 = ( ( ( ord($headers[0x21F]) * 256 ) * 256 ) * 256 );

# Total length of text in the document
$textLength = ($n1 + $n2 + $n3 + $n4);

$extracted_plaintext = fread($fh, $textLength);

# if you want the plain text with no formatting, do this
echo $extracted_plaintext;

# if you want to see your paragraphs in a web page, do this
echo nl2br($extracted_plaintext);

}

}

?>

Hope this helps.

I am working on a set of php classes that will be able to read the text with the formatting included and convert it to a standard document format.
The standard format that it will end up in has yet

"has yet"... what?

Are you O.K. Jim? Did you die while writing this?


Sorry, still kickin'

I was going to say that I haven't yet decided on what the final output format is going to be. Probably either rtf or OpenXML.

How about I ask for suggestions on what would be the best format to store the final copy.

I figured that this tool would mainly be used for .doc to web conversion, but I guess it could be used to also convert to other document formats too.

But, I would like to have the ability to at least store the formating inline with the text. So, either some form of xml. Be it (x)HTML or plain XML
or even OpenXML.

A question to all then. How would you like to see the text, with formating, stored?

All suggestions welcome!

--
Jim Lucas

"Some men are born to greatness, some achieve greatness,
and some have greatness thrust upon them."

Twelfth Night, Act II, Scene V
by William Shakespeare

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Is there a way to make it so that additional output renderers could be
created? I'd lean towards xml though, since that can be parsed fairly
easily.
.



Relevant Pages

  • RE: [PHP] How to fetch .DOC or .DOCX file in php
    ... [PHP] How to fetch .DOC or .DOCX file in php ... I want to fetch text from .doc / .docx file and save it into database ... I dont know whow to remove "SPECIAL CHARACTERS" from this text ... ...
    (php.general)
  • Re: [PHP] How to fetch .DOC or .DOCX file in php
    ... [PHP] How to fetch .DOC or .DOCX file in php ... I want to fetch text from .doc / .docx file and save it into database ... I am working on a set of php classes that will be able to read the text with the formatting included and convert it to a standard document format. ...
    (php.general)
  • Re: Ruby performance woes
    ... With PHP, I needed to set up a .HTACCESS file which enabled PHP on ... to convert either syntax to fully HTML output) ... into and out of a scriptlet to mix output generated by scriptlets ... within a single huge format string within a single call to FORMAT, ...
    (comp.lang.lisp)
  • Re: posting hidden form data to a popup window
    ... On my formatted PHP page, I'm set up to get vlues in this format; ... What I need to see is both how I can send the hidden values, and how do I display them on the PHP popup page. ... Is there some way to make sure that the popup is fully opened before I start having it get the hidden values.? ... var iMyHeight; ...
    (comp.lang.javascript)
  • RE: [PHP] PHP form to fax?
    ... Does anyone use this service with PHP and would it be ... possible to see some sample code on how you format the ... email/message to their server? ... > to the your jfax ...
    (php.general)