Iterating over IMG in HTML file

From: Mike Mimic (ppagee_at_yahoo.com)
Date: 04/29/04


Date: Thu, 29 Apr 2004 19:26:18 +0200

Hi!

I would like that my program prints path of every image in a
HTML file (that is SRC attribute of IMG tag).

I have made this (code that prints names of all atributes in
all IMG tags):

HTMLEditorKit kit = new HTMLEditorKit();
kit.read(new BufferedReader(new FileReader(file)), html, 0);

HTMLDocument.Iterator it = html.getIterator(HTML.Tag.IMG);
while (it.isValid()) {
        SimpleAttributeSet attrs =
                (SimpleAttributeSet)it.getAttributes();
        if (attrs != null) {
                for (Enumeration e = attrs.getAttributeNames();
                        e.hasMoreElements();) {
                        System.out.println(e.nextElement());
                }
        }
}

and it does not work. But if I change HTML.Tag.IMG to HTML.Tag.A
it works as it should (for links).

HTML file has IMG tags (as well as A tags).

Mike



Relevant Pages

  • Re: Html to Text Convertor?
    ... piece of code that removes all tags from an HTML file. ... Take a look at the Web Browser Control. ... MVP Tips:http://www.flounder.com/mvp_tips.htm ...
    (microsoft.public.vc.mfc)
  • Re: Html to Text Convertor?
    ... piece of code that removes all tags from an HTML file. ... Take a look at the Web Browser Control. ... MVP Tips:http://www.flounder.com/mvp_tips.htm ...
    (microsoft.public.vc.mfc)
  • Re: Extracting bolds and italics from HTML
    ... I have to make some calculations on the contents of url before ... > I had found a very useful program of Word Count from sun java forum, ... > but its problem is that it also includes the HTML tags in calculation. ... > i) A program which counts words in HTML file but doesnt include HTML ...
    (comp.lang.java.programmer)
  • Re: Problem page IE clear float problem, Opera/FF header problem and N4
    ... > caps (which doesn't work so well with css). ... > be missing quotes or tags but those quotes and tags are already there, ... Yes but you also have to change your CSS file, an id is prefixed with a # ... to id="mainimage" in your HTML file but that they are still in your CSS ...
    (comp.infosystems.www.authoring.stylesheets)
  • Re: how to capture multiple lines?
    ... > that as I parse through an html file and find the first line of the ... If you want to grab ... anything between and (including other tags) you must extend ...
    (comp.lang.perl.misc)