Reading UTF-8 string from file with read() function.
From: Sergei (sergeisn-tma_at_yahoo.com)
Date: 08/31/04
- Next message: CBFalconer: "Re: Xah Lee's Unixism"
- Previous message: Craig A. Finseth: "Xah Lee's Unixism"
- Next in thread: Brian McCauley: "Re: Reading UTF-8 string from file with read() function."
- Reply: Brian McCauley: "Re: Reading UTF-8 string from file with read() function."
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 31 Aug 2004 09:38:54 -0700
Hi,
I need to read a string from UTF-8 encoded text file.
I know at which byte position the string starts and its length (also
in byte units).
The problem is that read( FILEHANDLE,SCALAR,LENGTH) function takes
LENGTH in character units, not in bytes.
I've tried to open the file in binary mode instead of UTF-8, so I can
read the correct length, but then I can't process the string with
regular expressions correctly as Perl thinks it's in binary encoding,
not UTF-8.
Also, I've tried to read the string using getc() function, but it is
unacceptably slow.
Is there any solution ?
Thanks a lot,
--Sergei
- Next message: CBFalconer: "Re: Xah Lee's Unixism"
- Previous message: Craig A. Finseth: "Xah Lee's Unixism"
- Next in thread: Brian McCauley: "Re: Reading UTF-8 string from file with read() function."
- Reply: Brian McCauley: "Re: Reading UTF-8 string from file with read() function."
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|