Re: Colored Text extraction from PDF



On Wed, 3 Jun 2009 07:49:18 -0700 (PDT), Azodious
<nehilparashar@xxxxxxxxx> wrote, quoted or indirectly quoted someone
who said :

Hi All
is it possible to extract the colored text from pdf.

for example:
There are 3 color texts in a pdf -- RED, GREEN and BLACK.
is it possible to extract text which are red and green in color?

There are all kinds of tools for manipulating PDF files.
Unfortunately, I don't have personal experience with them.
see http://mindprod.com/jgloss/pdf.html

PostScript is similar to using Java's drawString and brothers in
PaintComponent. In PostScript, you use setrgbcolor or set hsbcolor to
load up your paintbrush with a colour. The problem is similar to
trying to extract text painted in a particular colour from Java source
using drawString. You would mostly likely do it by substituting the
paint methods and capturing parameters when you run the code. Trying
to do it statically would be extremely difficult.

--
Roedy Green Canadian Mind Products
http://mindprod.com

Never discourage anyone... who continually makes progress, no matter how slow.
~ Plato 428 BC died: 348 BC at age: 80
.



Relevant Pages

  • Re: Colored Text extraction from PDF
    ... is it possible to extract the colored text from pdf. ... There are 3 color texts in a pdf -- RED, ... Just my ¤0.02 worth. ...
    (comp.lang.java.programmer)
  • Re: ghostscript PDF page extraction, leaving text as text
    ... the PDF as downloaded from your site is OK. ... complaints you got must be due to a transfer error (probably some end of line ... On the linux system extract a single page with this command: ... xref table. ...
    (comp.lang.postscript)
  • Re: Extract Image From PDF
    ... I have a demo app that can execute Ghostscript with command line parameters, and at the moment I can only get the revision number and a thumbnail view of the first page based on the content I have found. ... Do you know the parameters I would need to extract the image on the first page to a TIFF please? ... Here are the args I found to generate a jpeg based on a pdf document: ...
    (microsoft.public.dotnet.languages.vb)
  • Re: How to extract text from an PDF document
    ... > Hi Nils, ... > Gnostice PDFtoolkit can extract text from a PDF document, ... You can even extract pages to ... Skype ID: nilsboedeker ...
    (borland.public.delphi.thirdpartytools.general)
  • Re: document processing
    ... I have to work with filled forms, so I know what the fields are and I need to extract the info in the filled fields. ... I would like to build the user interface with some kind of script extracting info from the document and presentig to the user the necessary fields to fill in. ... I need to import documents in html, DOC and PDF formats and would like to parse them and automatically create fields to fill the documents. ...
    (comp.games.development.programming.algorithms)