Correct use of Unicode in RegExp
From: mike blamires (mike_at_mysurname.co.uk)
Date: 04/22/04
- Next message: mike blamires: "Correct use of Unicode in RegExp"
- Previous message: Jim Gibson: "Re: File Name Checking"
- Next in thread: mike blamires: "Correct use of Unicode in RegExp"
- Reply: mike blamires: "Correct use of Unicode in RegExp"
- Reply: Daniel N. Andersen: "Re: Correct use of Unicode in RegExp"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Thu, 22 Apr 2004 22:36:44 +0100
I am having great difficulty using Unicode characters in a Regular
Expression, I am trying to match extended Unicode characters.
I am wishing to split a large Dumpfile (containing only JPEGS) I have used
a hex editor to manually extract a file just to show it can be done, so I
know the input is intact.
Each JPEG starts with the Unicode characters \u00FF \u00D8 \u00FF \u00E1
and there are plenty of these to be found within the file.
open(DUMPFILE, "/pathtodumpfile");
my $line;
while(<DUMPFILE>) {
$line = $line.$_;
}
@files = split(/\x{00FF}\x{00D8}\x{00FF}\x{00E1}/, $line);
(As you may see from the above style I am relatively inexperienced to the
perl side of programming ;)
I have tried inserting the Unicode characters in various ways \xFF, \x{FF}
etc. It just doesn't seem to find the pattern. I am at a bit of a loss as
to whether it is my regexp that is wrong, my use of Unicode characters
or use of Extended Unicode characters.
many thanks for your help.
cheers
Mike
- Next message: mike blamires: "Correct use of Unicode in RegExp"
- Previous message: Jim Gibson: "Re: File Name Checking"
- Next in thread: mike blamires: "Correct use of Unicode in RegExp"
- Reply: mike blamires: "Correct use of Unicode in RegExp"
- Reply: Daniel N. Andersen: "Re: Correct use of Unicode in RegExp"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|
|