Re: What is best searching algorithm for URL



sandeep wrote:
Our team is developing proxy server(in VC++)which can handle 5000
clients. I have to implement cache part so when ever a new request com
from client I have to check the request URL content is in cache of
proxy and send to client if it is cache, if it is not there then it
have to get data from web server and store in proxy server cache.

so i am thinking to use binary tree search(or AVL tree) to search
request URL content in cache if it is not there to insert in it
is it a good idea so that insertion and searching is faster

I also used hash table and key I has chosen according to first
character in URL

I would think you'd get a lot of hash collisions that way which
cosidering string compares start with the first letter and could be
expensive.


So now in that bucket it contain double linked list now I have search
in it, for that I am thinking to use binary tree


Well, unless you have unlimited storage, you might want to keep that
around to determine least recently used (LRU) URL's to purge when
storage becomes constrained. Also you'll need an ordered list for
URL's expiring from cache.

You also need to worry about locking if you're going to be multi-threaded.
You'll need a global mutex or rwlock which could have contention problems.
A lot of implementations don't bother and trade off stability and integrity
for speed.

You could use lock-free data structures to get around that problem since
lock-free is a natural fit for this kind of problem. It's an advanced
topic and if you could do that, you wouldn't have posted here in the first
place. Linked lists and hash tables are fairly easy to make lock-free.
Balanced trees can be made lock-free but I haven't worked out the balancing
heuristics to determine how efficient it is. Unmodified AVL and red/black
trees algorithms aren't usable for lock-free.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software. .



Relevant Pages

  • [PATCH 12/12] FS-Cache: CacheFS: Add Documentation
    ... The attached patch adds documentation for CacheFS. ... +CacheFS is a backend for the general filesystem cache facility. ... +CacheFS is based on a wandering tree approach. ...
    (Linux-Kernel)
  • Re: If not readdir() then what?
    ... You are storing an internal tree representation of part of the ... reorganized when the directory's hash tree gets reorg'ed. ... a very mature cache seems to make a lot of sense. ... collision chain". ...
    (Linux-Kernel)
  • Re: [QUICKLIST 0/4] Arch independent quicklists V2
    ... layer of the tree into cache to find all the pages in the address ... every time you set a PTE, set the corresponding bit in the mask ... Thus, you avoid visiting most of a PMD page in the sparse case, ... should be in cache with the appropriate bits of the mm already locked, ...
    (Linux-Kernel)
  • [Fwd: Re: + radixtree-look-aside-cache.patch added to -mm tree]
    ... + radixtree-look-aside-cache.patch added to -mm tree ... Subject: radixtree: look-aside cache ... Introduce a set of lookup functions to radix tree for the read-ahead logic. ...
    (Linux-Kernel)
  • Re: What is best searching algorithm for URL
    ... I have to implement cache part so when ever a new request com ... proxy and send to client if it is cache, if it is not there then it ... request URL content in cache if it is not there to insert in it ... Just use a hash table with the full url as the key. ...
    (comp.programming)