Finding approximate list of nearest neighbors for high-dimensional vectors
From: Jim Meyer (jimmeyer_at_email.dk)
Date: 07/29/04
- Next message: Robert C. Martin: "Re: Rework [Was: Static vs. Dynamic typing...]"
- Previous message: marco: "Re: directly uploading bookmarks from browser"
- Next in thread: rossum: "Re: Finding approximate list of nearest neighbors for high-dimensional vectors"
- Reply: rossum: "Re: Finding approximate list of nearest neighbors for high-dimensional vectors"
- Reply: Dr Chaos: "Re: Finding approximate list of nearest neighbors for high-dimensional vectors"
- Reply: Paul E. Black: "Re: Finding approximate list of nearest neighbors for high-dimensional vectors"
- Reply: Thad Smith: "Re: Finding approximate list of nearest neighbors for high-dimensional vectors"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Thu, 29 Jul 2004 20:04:23 +0200
Hi, I guess this one is for those with a firm knowledge of data structures.
Basically the problem is that of finding an approximate list of nearest
neighbors for high-dimensional vectors (1.000 plus dimensions, 10.000 plus
vectors).
A simple brute-force approach takes O(d * n^2) for n vectors with d
dimensions, in order to determine the k nearest neighbors for each vector.
Instead, I'm looking for a smarter and faster approach, where I don't have
to examine each unique pair of vectors, in order to get an approximate list
of nearest neighbors. The list does not have to be the exact nearest
neighbors, but should of course contains those that are good candidates.
Currently, each vector is stored in its own hash table, which has an entry
for every non-zero dimension. Also, I have hash table with an inverted
index, which allows me to lookup the vectors that have a non-zero value for
a certain dimension. This information could be used to create a large
matrix, but since the vectors seldom have values for even 1/10 the
dimensions, the matrix is very sparse and has a lot of zero-values.
I know of spatial access trees, but have been unable to find any that can
handle this many dimensions effectively.
Does anyone know any smart algorithms, data structures or heuristics that
can be used?
I would appreciate any help I can get.
Regards, Jim.
- Next message: Robert C. Martin: "Re: Rework [Was: Static vs. Dynamic typing...]"
- Previous message: marco: "Re: directly uploading bookmarks from browser"
- Next in thread: rossum: "Re: Finding approximate list of nearest neighbors for high-dimensional vectors"
- Reply: rossum: "Re: Finding approximate list of nearest neighbors for high-dimensional vectors"
- Reply: Dr Chaos: "Re: Finding approximate list of nearest neighbors for high-dimensional vectors"
- Reply: Paul E. Black: "Re: Finding approximate list of nearest neighbors for high-dimensional vectors"
- Reply: Thad Smith: "Re: Finding approximate list of nearest neighbors for high-dimensional vectors"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|