Extracting text from pdf

From: JustinCase (no_at_spam)
Date: 10/25/04


Date: 25 Oct 2004 16:09:36 GMT

Hi,

I have to index the text of a pdf document.

Does any of you know of a PHP script/extension or a binary that is able
to extract the text ?

The pdf extension mentioned in the php.net docs seem to indicate that
it's for _creation_ of documents only, is that so? Same with all the
PHP classes i have found.

Regards,
Johnny

-- 
Never express yourself more clearly than you are able to think. 
- Niels Bohr


Relevant Pages

  • Extracting text from pdf
    ... I have to index the text of a pdf document. ... Does any of you know of a PHP script/extension or a binary that is able ... to extract the text? ... Regards, ...
    (comp.lang.php)
  • Extracting text from pdf
    ... I have to index the text of a pdf document. ... Does any of you know of a PHP script/extension or a binary that is able ... to extract the text? ... Regards, ...
    (alt.php)
  • Re: PDF -> image (snapshot tool?)
    ... I am looking for a way to extract part of a PDF document as an image file. ... But in a pinch imagemagick could crop down and convert it if given the whole page in any format, so that's not required from the extracting program. ...
    (comp.text.pdf)
  • Extracting Images from a Password Protected PDF
    ... I got a PDF document from the government through FOIA. ... I would like to extract the pages and OCR them. ...
    (comp.text.pdf)
  • Re: Ghostscript bpm and an ocr
    ... > I have saved a pdf document as a bpm file through ghostscript. ... > wanting to use something like jocr to extract the text from it... ... How can I extract text from it? ...
    (comp.lang.postscript)