Re: Non-blocking connect BLOCKS

From: Dave Brueck (dave_at_pythonapocrypha.com)
Date: 04/27/04


Date: Tue, 27 Apr 2004 11:20:52 -0600
To: "python-list" <python-list@python.org>


> I'm using asyncore to download a large list of web pages, and I've
> noticed dispatcher.connect blocks for some hosts. I was under the
> impression that non-blocking sockets do not block on connects, in
> addition to reads and writes. My connect code is essentially the same
> as the asyncore example:
>
> http://docs.python.org/lib/asyncore-example.html
>
> It seems unlikely that I am the first to encounter this problem, can
> someone explain what's wrong and suggest a remedy?

Most likely the connect call is doing a DNS lookup, which means your execution
pauses while some other (non-Python) code goes and talks to the DNS server. For
many hosts the lookup will be fast (or even already cached locally, depending
on how your OS is configured), but for others the lookup may require checking
with an upstream DNS server (and in the worst case it'll involve several
upstream queries for a lookup that ultimately fails).

You can eliminate the delay by only passing in IP addresses to connect (it'll
notice that they are IP addresses rather than hostnames, and skip the DNS
lookup). The problem of course is that you need to then somehow get the DNS
addresses yourself. Maintaining a cache of resolved hostnames is a quick hack
to reduce the number of lookups, but it doesn't eliminate them. The only
alternative is to talk to the DNS server yourself - using asyncore of course so
that other connections don't block. IIRC there is some Python code for
creating/unpacking DNS packets and at one time it was even included in the
Python install (like in the Demo folder or something).

If you can find a third-party asynchronous DNS lookup library then that might
be the way to go - the above approach can get really messy (lots of details to
manage), but it also works and completely solves the problem, so basically you
have to decide how badly this problem hurts you. If you do go this route,
here's a few hints:

- on Windows you can semi-reliably detect the DNS servers by parsing the output
of 'ipconfig /all' and on Linux you can usually parse /etc/resolve.conf.

- you might also want to parse and honor the values in the /etc/hosts file
(LMHOSTS on Windows)

- you can of course skip the lookup of the hostname 'localhost'

- it might be helpful to cache both the queries that succeed and the ones that
fail, depending on your application, as failed lookups can be really slow.

- I use a simple class to track cached entries:

class AgingMap:
    def __init__(self):
        self.dict = {}

    def Get(self, key):
        try:
            expTime, val = self.dict[key]
            if expTime is not None and time.time() >= expTime:
                val = None
        except KeyError:
            # Not found
            val = None
        return val

    def Set(self, key, value, ttlSec):
        expires = ttlSec
        if ttlSec is not None:
            expires = ttlSec + time.time()
        self.dict[key] = (expires, value)

This of course grows without bounds but most of the time I don't really care.
You could add a cleanup() method or something that gets called every once in
awhile from your main event loop. A ttlSec value of None indicates a
non-expiring entry, e.g. due to an entry from the hosts file.

- IIRC you actually get a time-to-live (TTL) value for each IP returned by the
DNS, but to simplify things I usually store them all using the minimum TTL
value from the whole set (makes the cache simpler).

HTH,
-Dave



Relevant Pages

  • Re: Cannot access a web page
    ... "Vicky" wrote in message ... >>> First let's find out what your DNS is really telling you. ... >>>>> Now we have no proof after all that your DNS or HOSTS is working. ... >>>>> to be passed to your AutoSearch if a DNS lookup for it fails. ...
    (microsoft.public.windows.inetexplorer.ie6.browser)
  • Re: Cannot access a web page
    ... Ping statistics for 80.160.91.13. ... > Now we have no proof after all that your DNS or HOSTS is working. ... > to be passed to your AutoSearch if a DNS lookup for it fails. ...
    (microsoft.public.windows.inetexplorer.ie6.browser)
  • Re: Internet Explorer 6 Home Page
    ... E.g. check with nslookup if your DNS has a lookup for that sitename. ... about the problem with their DNS and add a record to your HOSTS file ...
    (microsoft.public.windows.inetexplorer.ie6.browser)
  • Machine not using files for name lookup
    ... I have a machine that has the "dns" nsswitch.conf file in place. ... It has the following for hosts: ... However it is not hitting the hosts file first for lookup. ... machine is also a primary dns server for the domain that it is on. ...
    (comp.unix.solaris)
  • Re: Beating the spam filter ...
    ... A name that is not a machine's internal identity is more easily moved to refer to another machine, and that capability seems to be driving a lot of the interesting novelty in IT these days. ... You use names to refer to services where as I use names to refer to hosts and then use CNAMEs to refer service names to hosts. ... I think using the RFC-I lists for spam control is properly career-limiting for a mail admin, but people do use them, and the "bogus MX" list is probably the least problematic. ... That name carries a complex meaning to me and about a dozen other people, and it is in DNS from the viewpoint of tens of thousands of other machines. ...
    (comp.mail.sendmail)