Re: Converting codepages to UTF8
- From: "Dr.Ruud" <rvtol+news@xxxxxxxxxxxx>
- Date: Thu, 30 Mar 2006 21:16:03 +0200
P schreef:
I have one file, which
is in UTF8, which contains a set of strings. I want to
determine whether any of the strings matches any file name
in a specified directory.
Since there can be special
characters in the file names (and in the strings in the UTF8
file), sometimes I'll get false negatives, because a simple
eq on the strings in the UTF8 file and on the file names in
the directory won't match (due to the different encodings).
So I want to normalise the directory listing first (and this
should be dependent on the code page, because different
users might be using different code pages) and compare the
resulting list to the list in the UTF8 file. Does that make
sense? :)
Yes, that is much clearer. I'll assume that you have Windows and maybe
Cygwin.
Have you read perllocale, perluniintro, perlunicode, perlebcdic?
Use the command:
for /f "tokens=4" %w in ('chcp') do dir >text.%w
to create a file called "text.437" (if your chcp is 437)
with the dir-output for the current directory.
Under cygwin, you can use the command:
iconv -f CP437 -t UTF-8 text.437 > text.utf8
to convert the file from cp437 to utf8.
But that second step can also be done with Perl.
(Almost) platform-independent way to see all available encodings:
perl -MEncode -e "print join $/, Encode->encodings(':all')" |more
Now it is your turn to create some code and try to make it work.
--
Affijn, Ruud
"Gewoon is een tijger."
.
- Follow-Ups:
- Re: Converting codepages to UTF8
- From: P
- Re: Converting codepages to UTF8
- References:
- Converting codepages to UTF8
- From: P
- Re: Converting codepages to UTF8
- From: Dr.Ruud
- Re: Converting codepages to UTF8
- From: P
- Converting codepages to UTF8
- Prev by Date: Re: perl deparse question.
- Next by Date: split()'s regex pattern parameter
- Previous by thread: Re: Converting codepages to UTF8
- Next by thread: Re: Converting codepages to UTF8
- Index(es):
Relevant Pages
|