Iterating & Comparing String Data Sets Efficiently?
- From: "dpapathanasiou" <denis.papathanasiou@xxxxxxxxx>
- Date: 30 Nov 2006 17:30:39 -0800
I have data consisting of lists of strings, which I need to compare
versus each other, to find which strings each set has in common.
As an example, say I have this set, called my-data, composed of 5 lists
of strings:
my-data
(("fancy" "ourselves" "creative" "DIY" "types" "mention" "cheap"
"furniture"
"pile" "pillows" "roof" "tarp")
("shelling" "grade" "stuff" "folks" "dates" "breakfast" "sponsoring"
"tasty"
"consisting" "shots")
("photographers" "saving" "pennies" "roof" "burns" "knees" "starting"
"awfully")
("painful" "Tiny" "Dispatch" "Couches" "Valley" "Italian" "Chair"
"Pillows")
("Ladies" "breakfast" "posh" "lunches" "breed" "female" "charity"
"stuff"))
What I want to do is compare the first set -- (nth 0 my-data) -- versus
each of the other four sets and find out which string tokens they have
in common.
In this example, the answer for the first set -- (nth 0 my-data) -- is:
(NIL ; ignore itself
NIL
("roof") ; "roof" is common to both (nth 0) and (nth 2)
("Pillows") ; "pillows" '' ''
(nth 3)
NIL)
And so on for (nth 1 my-data), (nth 2 my-data), etc.
I've been using the (intersection) function wrapped inside a mapcar
iterator for this -- the real data, though different in content from
the example above, is structured exactly the same way: a list of lists
of strings.
Although it's logically correct, it's slow (the real data is quite
large), and so I've been thinking about ways of improving its
efficiency.
For example, instead of lists of strings, using hash tables, i.e.
iterate each hash, and look for where the keys (string) match, since
hash lookups are fast.
Also, I was wondering about more effective ways of storing large
amounts of string data in memory in the first place, e.g. perhaps as
lower-level arrays of chars or bits, but I'm not quite sure how to go
about that efficiently.
Any suggestions?
.
- Follow-Ups:
- Re: Iterating & Comparing String Data Sets Efficiently?
- From: Victor Kryukov
- Re: Iterating & Comparing String Data Sets Efficiently?
- From: KevinZzz
- Re: Iterating & Comparing String Data Sets Efficiently?
- Prev by Date: Re: SBCL just turned 1.0!
- Next by Date: Re: SBCL just turned 1.0!
- Previous by thread: A better syntax for type declaration?
- Next by thread: Re: Iterating & Comparing String Data Sets Efficiently?
- Index(es):
Relevant Pages
|