Re: Is it ok to change $ENV{'QUERY_STRING'} before "use CGI;" is called..?
- From: "Raymundo" <gypark@xxxxxxxxx>
- Date: 6 Mar 2007 06:30:56 -0800
oops.. I wrote a reply. It took about 3 hours. (It's too difficult to
me to write in English) I posted it an hour ago but I can't see it
even now. I'm afraid it's lost :'(
I'll rewrite my last reply...
At first, thank you Ben for your kind advice.
In fact, the Perl script that I'm modifying is not my own code. It is
UseModWiki (http://www.usemod.com/cgi-bin/wiki.pl) and I've been
modifying it to use it for my personal homepage. (But I'm just a
novice in Perl so it's not easy :-)
In wiki site, the URL of each page consists of script URL and "the
title of that page", like ".../wiki.pl?Perl". I'm a Korean and my wiki
has many pages whose names are in Korean.
Well... a not-url-encoded URL is invalid. At least Firefox appears to
automatically translate (say) a URL typed into the address bar into its
correct URL-escaped form before submitting it to the server; I don't
know what IE or Konq/Safari or Opera do.
As you said, multi-byte characters in URL is invalid. I know it :'( So
url-encoded URL is the answer. However, see the following URLs:
1: .../wiki.pl?Linux <- Everyone can know it is the page about "Linux"
2: .../wiki.pl?%EB%A6%AC%EB%88%85%EC%8A%A4 <- Can anyone guess what
the title of this page is?? :-/ It's "Linux" in Korean
3: .../wiki.pl?리눅스 <- (If you can't see the Korean chars, plz see
http://gypark.pe.kr/upload/linux_in_korean.gif ) Everyone who are able
to read Korean can know it is the page about Linux. (I'll type
"LINUX(ko)" for this word from now on)
URL 2 is valid, but its appearance is so.... :-/ And I must give up
the big advantage of wiki, "URL represent the content"
URL 3 is said to be invalid. But I still want to support it. That is,
when someone types that URL in the address bar of a browser, or
someone clicks the link to URL 3 in other site, I want my wiki.pl
script show the proper page, "LINUX(ko)".
Fortunately, web browsers like FF, IE, and Safari convert the URL into
%-encoded form before they submit it, as you said. Therefore, I think,
it's not main issue that URL contains multi-bytes chars, because the
server will receive %-encoded request. The problem is that, as I'd
said in my first article, the %-encoded form of "LINUX(ko)" is not
unique. It can be "%EB%A6%AC%EB%88%85%EC%8A%A4" (UTF-8 sequence) or
"%B8%AE%B4%AA%BD%BA" (EUC-KR, in Korea) The browsers choose which
encoding to use according to the option in them. (for FF,
"network.standard-url.encode-utf8" in "about:config") Server can't
choose it and even can't know what is chosen explictily, which is the
reason that wiki.pl should "guess".
Returing to my first post in this thread... Is it so bad idea to
change the environment variable QUERY_STRING? It solves every problem
about this. It requires only one additional line in code. I think that
change may affect only the script and its child processes, and the
script doesn't fork any child process.
If you're using CGI.pm to process QUERY_STRING, then you should stick to
that. Messing about is just asking for trouble. What is the problem with
decoding the submitted values afterwards? (It can still be one line or
so of code, if you do it right. See Anno's example.)
"The problem with decoding the submitted values afterward" is...
(following are come from my testing results. it may be fixed but I'm
not so expert in Perl)
1) There are hundreds of lines that call "->param()". I don't think
it's good idea to insert so many "guess_and_convert()" after those
lines.
1-1) In fact, those lines actually call "GetParam()" subroutine and
GetParam() calles ->param in it. So it can be a solution to insert
guess_and_convert() in GetParam(). However, GetParam() fetches the
value of a parameter not only from GET request but also from POST
request and even from saved files. For now, I'm not sure it's ok to
modify GetParam(). In addition, it seems to be inefficient to call
convert routine every time a single parameter is fetched.
2) Concering Anno's example, it looks good because it calls convert
routine only once. However, it shows some problem while processing
POST request, like file uploading, receiving trackback, etc. I tried
to debug but failed to find why. I think it is the second best way to
apply that code with additional if-clause: if ($q->request_method() eq
"GET")
3) In the original code, there are some lines that access
$ENV{QUERY_STRING} directly, without calling CGI functions. I need to
apply "guess_and_convert" to those lines.
So I cling to Q_S like this. :-) As far as I know: (please correct me
if I am wrong)
1) Q_S is related to only GET request. (All the forms in wiki.pl calls
"wiki.pl" without any appending URL query when it submits)
2) Q_S may be in the form of "keywords" or
"param1=value1¶m2=value2...". guess_and_convert() will not change
the important characters like "&", "=", "+". It will not change any
other ASCII characters. It will just change the multi-byte chars.
Because those characters have been already encoded by browser, this
change is just the change of the number and the sequence of the "%HH"
runs. There is, I think, no problem when CGI object is created and
initialized using Q_S.
3) Changing Q_S affects only the running script and it's child
process.
4) After I began to test my approach, no problem shown until now. (Of
course, this can't be the proof that it will never make a problem. So
I asked your advices in usenet :-)
5) Most of all, I expect that I don't need to care about it when the
rest of code is updated. (at least until the browser's behavior change
dramatically or CGI module)
If anyone give me concrete examples of the problem that may appear
when I convert the encoding of Q_S, I'll give up my way immediately...
Raymundo
.
- Follow-Ups:
- References:
- Is it ok to change $ENV{'QUERY_STRING'} before "use CGI;" is called..?
- From: Raymundo
- Re: Is it ok to change $ENV{'QUERY_STRING'} before "use CGI;" is called..?
- From: Ben Morrow
- Re: Is it ok to change $ENV{'QUERY_STRING'} before "use CGI;" is called..?
- From: Raymundo
- Re: Is it ok to change $ENV{'QUERY_STRING'} before "use CGI;" is called..?
- From: Ben Morrow
- Is it ok to change $ENV{'QUERY_STRING'} before "use CGI;" is called..?
- Prev by Date: FAQ 6.22 How can I match strings with multibyte characters?
- Next by Date: Re: Net-SSH-W32Perl strange behaviour.
- Previous by thread: Re: Is it ok to change $ENV{'QUERY_STRING'} before "use CGI;" is called..?
- Next by thread: Re: Is it ok to change $ENV{'QUERY_STRING'} before "use CGI;" is called..?
- Index(es):
Relevant Pages
|