Re: How to check for duplicates?
From: aniesen (axeln_at_interform.cc)
Date: 11/16/03
- Next message: Charles Urbina: "Re: Delphi VCL Internet Fax Server"
- Previous message: Denis Hancock: "Tape Drives"
- In reply to: Zoran: "Re: How to check for duplicates?"
- Next in thread: Zoran: "Re: How to check for duplicates?"
- Reply: Zoran: "Re: How to check for duplicates?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Sat, 15 Nov 2003 17:52:29 -0800
Zoran:
I don't want to spoil your optimism. Your problem still remains that some of
the fields, address in particular, may be spelled differently even though
they are the one and the same. Like, '240 S. Brentwood St.' is the same as
'240 South Brentwood' or ' 240 S. Brentwood Street' and do on. These are all
valid postal addresses. So, I guess your success heavily depends on how
uniform your data is formatted.
-- Axel Niesen axeln@interform.cc interFORM Consulting Corp. http://www.interform.cc (866) 503-6005 Don't always say what you know, but always know what you say! Matthias Claudius 1740-1815 <Zoran> wrote in message news:3fb63c1f@newsgroups.borland.com... > Hi Axel; > > Unfortunately NexusDB does not support stored procedures yet. I think I've > solved the problem. > > This is what I did (if you are interested): on the existing table I > concatenated strings for name, street and zip. Then I lowercased it, and > took out all characters except a-z and 0-9. Then I made hash string (32 > bytes long) using MD5 hash (what Ignacio recommended). I created an > additional key on that field. On 3 million rcds table I am not sure if I > have some duplicates or not. I don't know much about hash algorithms, but it > looks like unique strings to me. When inserting I create the same key out of > input fields and check against the table. If the key exists, then I go into > loop and check input fields (name, address, zip) against the same fields in > existing table. > > No big deal, but it looks like it works for me. > > If the hash string is guaranteed to be unique, then this looping makes no > sense. I have to learn more about hash procedures. Do you know some web site > where I can find some information about hashing? > > Thanks for your time. > > Zoran. > > "aniesen" <axeln@interform.cc> wrote in message > news:3fb56841$1@newsgroups.borland.com... > > Zoran: > > > > I don't know anything about NexusDB but if it supports stored procedures > you > > can speed up the verification process immensely. Just use a nested loop > for > > comparison and leave the nested loop if there is no match returning FALSE. > > Return TRUE if it exits the finishes the loop. > > > > -- > > Axel Niesen axeln@interform.cc > > interFORM Consulting Corp. http://www.interform.cc > > (866) 503-6005 > > > > Don't always say what you know, > > but always know what you say! > > Matthias Claudius > > 1740-1815 > >
- Next message: Charles Urbina: "Re: Delphi VCL Internet Fax Server"
- Previous message: Denis Hancock: "Tape Drives"
- In reply to: Zoran: "Re: How to check for duplicates?"
- Next in thread: Zoran: "Re: How to check for duplicates?"
- Reply: Zoran: "Re: How to check for duplicates?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|