Re: HTTP::Request::Common::POST and UTF-8



On Tue, 27 Sep 2005, Stephen Collyer wrote:

> I've investigated to the point that I can see that the problem
> seems to occur at line 53 of URI::_query::query_form:
>
> 53:b $self->query(@query ? join('&', @query) : undef);

Please understand that I'm thinking aloud here: I don't have the
answer, but, as no-one else has stepped in, I thought my ponderings
might just be helpful.

Hmmm, the version of _query.pm that I'm looking at here (which
might be old) invokes URI::Escape::escapes{$1}

Looking at http://search.cpan.org/~gaas/URI-1.35/URI/Escape.pm
it appears there are two different functions, for escaping in an
8-bit context and for escaping in a utf8 context. As it says, they
produce different results, even for the characters from 128-255.

However, if I look at the URI/Escape.pm that's installed hereabouts,
it describes itself as Revision 3.21, and shows no sign of being
capable of escaping any character above 255.

> This routine escapes the data in the POST content array, and
> all seems well up to line 53 where it sets the content of the
> query.

Seems to me that one needs to take a look whether there's any
machinery, in the version that you're using, for invoking the
utf8-context escapes, and, if so, how to trigger it. I'm not by any
means certain that the mere utf8-ness of a string would be the right
lever to trigger this, to be honest.

> When I look at $self->query(), all UTF-8 chars seem to have
> been converted to +. This looks bizarre as it's only doing a join.

My hunch is that they've been offered to a routine that can only
escape the characters 0-255.

hope this is vaguely useful at least.
.



Relevant Pages

  • Question about CGI.pm
    ... I have been exploring CGI.pm and am of course interested in the HTML ... Escape HTML formatting characters in a string. ... the standard HTML escaping rules will be used. ... is passed through a function called escapeHTML(): ...
    (perl.beginners)
  • Re: Suggestions for custom application-layer protocol?
    ... But I think I'll end up with some concept of special characters ... arguments just enclose them all in a single netstring. ... ...if you want to get really fancy (and allow easier exensibility) you can ... escaping, like IAC escaping in telnet ... ...
    (comp.os.linux.embedded)
  • Re: Suggestions for custom application-layer protocol?
    ... But I think I'll end up with some concept of special characters ... arguments just enclose them all in a single netstring. ... ...if you want to get really fancy (and allow easier exensibility) you can ... escaping, like IAC escaping in telnet ... ...
    (comp.unix.programmer)
  • Re: What is better encoding method?
    ... One uses two-character MIME ... escapes and the other uses four-character character literal escapes. ... If the encoded text contains non-ASCII characters with a Unicode code ...
    (comp.lang.javascript)
  • Re: [Full-disclosure] FW: Introducing a new generic approach to detecting SQL injection
    ... sql injection attack using the character set? ... characters in a legitimate query will only be in the set, ... > encoding, decoding and escaping. ...
    (Full-Disclosure)