Re: pdf2txt

From: Aurelio Martin (amartin_at_wpsnetwork.com)
Date: 05/28/04


Date: Fri, 28 May 2004 09:21:07 +0200


B P wrote:
> Is there a way via Python or even Perl to capture records from a pdf and
> output a delimited text file? My work has a situation with a trunk
> load of data forms that were scanned as pdfs.
>
> The data needs to be taken from the forms and moved into a database, so
> I figure that comma-delimited format will work fine. The amount of
> man-hours it would take to manually do this is very cost-prohibitive for
> what we have to work with.
>
> I know that a txt2pdf exists, was checking to see if the opposite would
> as well.
>
> BP

You may try XPDF

http://www.foolabs.com/xpdf/

They include source code and some utilities like pdfimages of pdftotext.
Maybe you can call these from Python, or link via a C extension.

Hope this helps

Aurelio



Relevant Pages

  • Re: pdf2txt
    ... > Is there a way via Perl or even Python to capture records from a pdf and ... > The data needs to be taken from the forms and moved into a database, ...
    (comp.lang.perl)
  • Re: pdf2txt
    ... >Is there a way via Python or even Perl to capture records from a pdf and ...
    (comp.lang.python)
  • pdf2txt
    ... Is there a way via Perl or even Python to capture records from a pdf and ... The data needs to be taken from the forms and moved into a database, ...
    (comp.lang.perl)
  • pdf2txt
    ... Is there a way via Python or even Perl to capture records from a pdf and ... The data needs to be taken from the forms and moved into a database, ...
    (comp.lang.python)
  • Re: io module and pdf question
    ... Adobe javascript was used to insert the metadata, so the added data looks something like this: ... With python 2.7, it successfully loops through the file contents and I'm able to find the line that contains "XYZ:colorList". ... Is it safe to assume pdf files should always be encoded as latin-1? ... When built with the Unicode technology, the text of a pdf is ...
    (comp.lang.python)