Re: HTTP::Request::Common::POST and UTF-8
Alan J. Flavell wrote:
I don't know the answer to your question, but, in principle the web
specifications say that application/x-www-form-urlencoded is only
guaranteed to support us-ascii. You and I know, in practical terms,
that it may not be as bad as that, and when executing a GET we'd have
no other choice; but if you're using POST rather than GET, then you
might be advised to use multipart/form-data instead.
1. Yes, I've read your nice web page on the matter, so AFAICS it
should be possible
2. I'm currently constrained to application/x-www-form-urlencoded
but, yes, it may make more sense to use multipart/form-data.
Have you proved that the transaction that you're trying to carry out
can be successfully initiated "by hand" or from a web browser, before
you try to implement it from LWP? Just to be sure you're looking in
the right place for the problem, I mean.
Yes, this is working code that I'm reworking for UTF-8 support.
So I know precisely where the problem is.
If the test didn't reveal the problem
to me, I'd consider posting the complete code here.
I've investigated to the point that I can see that the problem
seems to occur at line 53 of URI::_query::query_form:
53:b $self->query(@query ? join('&', @query) : undef);
This routine escapes the data in the POST content array, and
all seems well up to line 53 where it sets the content of the
query. When I look at $self->query(), all UTF-8 chars seem to have
been converted to +. This looks bizarre as it's only doing a join.
I need to investigate this further - it should be easy enough to
cook up a small example to reproduce if it is indeed a bug.
This is using perl, v5.8.3
Does Perl know that this is a utf-8 text string i.e in the sense of
the Unicode support that is in Perl 5.8+ versions? Or are you handing
it around as binary, or what?
Yes, these are marked as UTF-8 according to Encode::is_utf8.
Steve Collyer
.