stripping out ASCII chars using regexp?

From: Greg (djbitchpimp_at_snowboard.com)
Date: 03/18/04


Date: 17 Mar 2004 15:23:48 -0800

I am trying to get rid of the ASCII chars from the end of a string
that I download from a webpage using LWP::Simple. The script downloads
the HTML from a webpage and then uses HTML::TableExtract to extract
the information from specific tables on the page.

This basically gives me a string like this:

���,Temper Tantrum�,Take Care Comb Your
Hair�,,,CD�,23.56��,�,0 Days
Ago�,Scotla�

which I then split into an array using:

my @line = split (',', $line) ;

I then do a comparison on $line [6]:

if ($line [6] >= 75) {

... do something

When I run this using -w, I get the following error:

Argument "15.99M- M- " isn't numeric in numeric ge (>=) at
./parse_wants.pl line 49.

This is because somehow some extended ASCII chars got in the end of
the string. If I do:

my @chars = split ('', $line [6]) ;
foreach $char (@chars) {
    print "$char ";
    print ord ($char) ;
    print "\n" ;
}

It gives me

1 49
5 53
. 46
9 57
9 57
� 160
� 160

I have tried stripping off these trailing ASCII 160 chars a number of
ways:

s/\240//g
s/[\200-\377]//g
tr/\177-\377//d
s/\&#65533//g

but the only way I could get rid of them was using:

chop $line [6]
chop $line [6]
 
Can anyone figure out a way to get rid of these trailing ASCII
characters using a regular expression?

Thanks

Greg



Relevant Pages

  • Re: stripping out ASCII chars using regexp?
    ... > This basically gives me a string like this: ... > I have tried stripping off these trailing ASCII 160 chars a number of ... > Can anyone figure out a way to get rid of these trailing ASCII ...
    (comp.lang.perl.misc)
  • Re: ascii length??
    ... did you try to convert the string to a char array and take the length of ... that (if the double-byte chars count as 2 ascii chars that might work) ... > I am wondering how I can get the ascii lengh of a string in VB.NET? ...
    (microsoft.public.dotnet.languages.vb)
  • Re: [PHP] triming utf8 (?) a string
    ... get rid of those extra bytes? ... trimis meant to remove chars from the beginning and ending of a string. ... http://us2.php.net/str_replace is meant to remove a set of chars from a string. ... "Some men are born to greatness, some achieve greatness, ...
    (php.general)
  • Re: [Emacs] Kommentieren
    ... ;; completely up to the user to decide, what the string ... "Chars preserved of STRING. ... `CHARS-PRESERVE' must be a parentized expression, ...
    (de.comp.editoren)
  • Re: FASTEST way to try all strings (a until ZZZZZZZZZZZZZZZZZZZZZZZZ)
    ... > It will be a very huge table so I in my opinion. ... > When it would be used, than it should be converted to a string, however ... >> How would an array of Byte be any faster then an array of Char? ... >> array of Byte is needed, however the OP suggested Chars (A to Z, a to z ...
    (microsoft.public.dotnet.languages.vb)