Re: HTTP::Request::Common::POST and UTF-8



Alan J. Flavell wrote:

I don't know the answer to your question, but, in principle the web specifications say that application/x-www-form-urlencoded is only guaranteed to support us-ascii. You and I know, in practical terms, that it may not be as bad as that, and when executing a GET we'd have no other choice; but if you're using POST rather than GET, then you might be advised to use multipart/form-data instead.

1. Yes, I've read your nice web page on the matter, so AFAICS it should be possible

2. I'm currently constrained to application/x-www-form-urlencoded
but, yes, it may make more sense to use multipart/form-data.

Have you proved that the transaction that you're trying to carry out can be successfully initiated "by hand" or from a web browser, before you try to implement it from LWP? Just to be sure you're looking in the right place for the problem, I mean.

Yes, this is working code that I'm reworking for UTF-8 support. So I know precisely where the problem is.

If the test didn't reveal the problem to me, I'd consider posting the complete code here.

I've investigated to the point that I can see that the problem seems to occur at line 53 of URI::_query::query_form:

53:b            $self->query(@query ? join('&', @query) : undef);

This routine escapes the data in the POST content array, and
all seems well up to line 53 where it sets the content of the
query. When I look at $self->query(), all UTF-8 chars seem to have
been converted to +. This looks bizarre as it's only doing a join.

I need to investigate this further - it should be easy enough to
cook up a small example to reproduce if it is indeed a bug.

This is using perl, v5.8.3


Does Perl know that this is a utf-8 text string i.e in the sense of the Unicode support that is in Perl 5.8+ versions? Or are you handing it around as binary, or what?

Yes, these are marked as UTF-8 according to Encode::is_utf8.

Steve Collyer
.