Re: pdf2txt
From: Benjamin Niemann (b.niemann_at_betternet.de)
Date: 05/28/04
- Next message: David Fraser: "Re: ANN : ConfigObj 3.0.0 - Simple config file parsing"
- Previous message: Jon Perez: "Re: partial / wildcard string match in 'in' and 'list.index()'"
- In reply to: B P: "pdf2txt"
- Next in thread: Marco Aschwanden: "Re: pdf2txt"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Fri, 28 May 2004 09:43:41 +0200
B P wrote:
> Is there a way via Python or even Perl to capture records from a pdf and
> output a delimited text file? My work has a situation with a trunk
> load of data forms that were scanned as pdfs.
>
> The data needs to be taken from the forms and moved into a database, so
> I figure that comma-delimited format will work fine. The amount of
> man-hours it would take to manually do this is very cost-prohibitive for
> what we have to work with.
>
> I know that a txt2pdf exists, was checking to see if the opposite would
> as well.
>
> BP
Have a look at pdftext, part of xpdf
(http://www.foolabs.com/xpdf/home.html). This will convert the pdf into
plaintext format. You will probably have to parse this plaintext to
convert it into somesthing useful.
- Next message: David Fraser: "Re: ANN : ConfigObj 3.0.0 - Simple config file parsing"
- Previous message: Jon Perez: "Re: partial / wildcard string match in 'in' and 'list.index()'"
- In reply to: B P: "pdf2txt"
- Next in thread: Marco Aschwanden: "Re: pdf2txt"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|