Re: Comparing diacritic chars
- From: "Maarten Wiltink" <maarten@xxxxxxxxxxxxxxxxxx>
- Date: Mon, 5 Nov 2007 00:37:41 +0100
"Tom de Neef" <tdeneef@xxxxxxxx> wrote in message
news:472de835$0$237$e4fe514c@xxxxxxxxxxxxxxxxx
I have to compare names (in family databases). Spelling is often not
uniform, so I use an algorithm to bring a name back to a basic form.
Eg: in double equal consonants the second one will be dropped; F and
V are the same; all vowels are treated as one and the same.
This last rule causes me some difficulties. (for example: recognize
that Étienne starts with a variation of E.)
Is there an easy way to recognize all vowels with diacritics (like
èéêë, ÒÓÔÕÖ, etc) or do I have to test for them explicitly?
That sounds like a Unicode question. The description for U+00C0 ('À')
includes the fact that it's very similar to U+0041 ('A') plus U+0300
(combining '`'), but short of OCR'ing PDFs, I know of no way to
extract that information in any automated manner. However, some
richer database may have it.
However, considering the limited numbers of both vowels and diacritics,
perhaps simply printing a few code charts and hardcoding the lot is
best. While U+FE8F could be transliterated 'b', I doubt you want to
go that far. (Have I mentioned lately that my favourite code point is
U+FDFB? Transliterate *that*, as a single letter.)
Groetjes,
Maarten Wiltink
.
- References:
- Comparing diacritic chars
- From: Tom de Neef
- Comparing diacritic chars
- Prev by Date: Re: Comparing diacritic chars
- Next by Date: Re: Comparing diacritic chars
- Previous by thread: Re: Comparing diacritic chars
- Next by thread: Re: Comparing diacritic chars
- Index(es):
Relevant Pages
|