Re: Reading a Unicode text file



On 2/16/06, Baskaran Sankaran <baskaran@xxxxxxxxxxxxx> wrote:

Here is the new code:

open(F1, "<:utf8", $file1) || die("Can not find file $file1\n");
open(F2, "<:utf8", $file2) || die("Can not find file $file2\n");
open(O, ">:utf8", $ARGV[2]);

Why no die on the third open? I believe that you can use $!, even on
Win32, to get some clue as to why the open failed. (But it didn't
fail, since you've got some output.)

binmode(O, ":utf8");

This probably shouldn't happen each time through the loop; binmode()
should generally be called soon after open(), before any actual I/O is
done. Does it fix anything to call binmode() just once?

And, here I am trying to read two txt files (having equal no. of lines)
and print the contents in a single file given in the command prompt. The
files are in Unicode. I am using ActivePerl v5.8.7 in windows.

Even though the files are "Unicode", it's possible that one or both
has some non-printing control characters (or something even more
esoteric) that is causing trouble. If nothing else works, you may need
to examine the file contents (maybe with the help of a Unicode table)
to find out what's going on.

This is where it would help to have some files to work with, because I
can't reproduce a bug on my machine; which is to say: Your code works
for me (with different hardware, different OS, different input
files...). If you can't share the actual file contents (proprietary?),
perhaps you could share some mock files that exhibit the bug.
Cut-and-paste some small pieces from the real files, redact or alter
as needed, and test them to be sure they still show the bug.

Your code is pretty clean. Other than the things I've mentioned, and
things that perl should warn you about, I don't see any reason it
should be giving you trouble. Have you tried using the debugger? It
shouldn't take long to single-step until you see where something goes
wrong.

Good luck with it!

--Tom Phoenix
Stonehenge Perl Training
.



Relevant Pages

  • Bug in CryptEnumProviderTypesW under XP SP3
    ... To see the bug, just run the MSDN sample given in the documentation of the ... function CryptEnumProviderTypes on XP SP3 after compiling it in UNICODE. ... DWORD cbName = sizeof; ...
    (microsoft.public.platformsdk.security)
  • Re: IDE Nightmare - Unicode or Ansi is DFM?
    ... > Jeff Overcash (TeamB) wrote: ... >> It is not a bug. ... The switch to Unicode was intentional. ...
    (borland.public.delphi.non-technical)
  • Re: Char... Unicode version (bug?): what about 2.0?
    ... I'd like to submit what it seems to be a bug as for the Unicode ... For these codes I get the following ... > static private void DumpSingleChar ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: "Out of Memory" Error Message When Using a Template
    ... That looks like a bug to me... ... Transferring Unicode TEXT back to an application with very ... If you had external modules registered in your VBA, ... Since it's no longer possible to register OS components in Mac VBA, ...
    (microsoft.public.mac.office.word)
  • Re: Ethernet port dead
    ... @Andrei, could it be related to this old bug? ... ip1394 not needed and causing trouble ... I'd say not, that bug looks more like a naming issue, but blacklisting ...
    (Debian-User)