Re: crawling the net...

From: JKop (NULL_at_NULL.NULL)
Date: 04/29/04


Date: Thu, 29 Apr 2004 10:38:17 GMT

ask josephsen posted:

> Hi NG
>
> I'm making a program to crawl the internet. It works by retrieving all
> links in a page, downloading the page of each link and again retrieving
> all the links. (If there is better ways I'd like to hear)
>
> My problem is relative links (like "../../wohoo.asp"). What is the
> smartest way to get the full url (http://www.xyz.com/wohoo.asp)? Do I
> have to parse the relative link in relation to the url where the
> relative link was found and then concatenate it? Does anyone know how
> other search-engines/ crawlers walk the net?
>
>
> Thanks :)
>
> ./ask

You should have posted this on:

alt.sports.gymnastics

It would've been more on-topic _there_.

-JKop



Relevant Pages

  • crawling the net...
    ... I'm making a program to crawl the internet. ... It works by retrieving all links ... downloading the page of each link and again retrieving all the ... the relative link in relation to the url where the relative link was found ...
    (comp.lang.cpp)
  • crawling the net...
    ... I'm making a program to crawl the internet. ... It works by retrieving all links ... downloading the page of each link and again retrieving all the ... the relative link in relation to the url where the relative link was found ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: crawling the net...
    ... > I'm making a program to crawl the internet. ... downloading the page of each link and again retrieving ... > have to parse the relative link in relation to the url where the ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: BLOB (sqlserver) and timeouts
    ... When downloading the image from a webpage (it's actually a movie stored in ... >> What's the better way for retrieving a very large BLOB field from a SQL ...
    (microsoft.public.dotnet.languages.csharp)
  • A couple of new sysctl variables - maybe?
    ... Something similar would be nice in FreeBSD as well, though, I think ... the natural place for retrieving such a value would most likely be ... to the basic sysctl variables. ... / Internet: Jukka.UkkonenOxit.Fi ...
    (freebsd-questions)