Re: Is it ok to change $ENV{'QUERY_STRING'} before "use CGI;" is called..?



3: .../wiki.pl? <- (If you can't see the Korean chars, plz see
http://gypark.pe.kr/upload/linux_in_korean.gif) Everyone who are able
to read Korean can know it is the page about Linux. (I'll type
"LINUX(ko)" for this word from now on)

URL 2 is valid, but its appearance is so.... :-/ And I must give up
the big advantage of wiki, "URL represent the content"

URL 3 is said to be invalid. But I still want to support it. That is,
when someone types that URL in the address bar of a browser, or
someone clicks the link to URL 3 in other site,

Is it common practice for people to write links to URLs with multibyte
chars in them? Since the actual link itself is not user-visible (the
text of the link is, but that's quite different) there's no reason not
to encode it correctly, is there? Of course, if it *is* common practice,
you may well want to handle it (if you can), regardless of its
incorrectness.

Do you mean this case?
[a href="actual link itself"] text of the link [/a]
(I replaced "less than" and "greater than" signs with brackets, so
that any smart(?) news-reader doesn't process it as real link)

Yes, you're right. In that case the URL is hidden to user, so it
doesn't matter that URL is "...%EB%A6". And this is very typical in
plain html documents.

However many recent CGI tools, like blog(MovableType, TatterTools,
etc) and almost (as far as I know) wikis, provide the feature of "auto-
linking"(say). Someone post an article in plain text to his/her blog,
then the blog tool looks for URL pattern in the text, converts it to
"a href" links, and print it in its html output. In this case, "text
of the link" is equal to "actual link".

Another example is, wiki provides the concept of "interwiki" for a
convenient linking. That is, when I submit the text:
UseMod:UseModWiki
Google:UseModWiki (even though google is not a wiki..)
In html output, they are converted automatically to the following
links, respectively:
[a href="http://www.usemod.com/cgi-bin/wiki.pl?
UseModWiki"]UseMod:UseModWiki[/a]
[a href="http://www.google.com/search?q=UseModWiki"]Google:UseModWiki[/
a]
(The mapping table, between a interwiki name like "Google:" and the
real URL like "http://www.google.com/search?q=";, is stored in a file
in the server)

In this case, someone may want to put a link to my page in his wiki.
Then "Raymundo:LINUX(ko)" is much (x 100) easier for him and more
understandable to other visitors than "Raymundo:%EB%A6%AC%EB%88%85%EC
%8A%A4".

I've already modified my wiki, so that it encodes the actual link when
it processes interwiki. But it's impossible to force every developers
of all wikis in the world. :-)

Anyway this type of links can be common practice nowadays, in my
opinion.



I want my wiki.pl script show the proper page, "LINUX(ko)".

Firstly, let me say that I entirely sympathise with this desire :). It
is a major failing in the design of URLs that they are so unfriendly to
people whose native language is not English.

That said, I do not think you can win here :). At least my copy of FF
will convert .../wiki.pl?KOREAN_CHARS into %-encodings *in the address
bar* before it submits the URL. IE6 appears to do the opposite: that is,
AFAICT it both displays the URL as typed in the address bar and actually
submits a multi-byte URL to the server. Your Q_S munging will need to be
quite subtle, to handle cases like .../wiki.pl?foo%3bbar, and correctly
distinguish them from .../wiki.pl?foo;bar, which presumably means
something quite different.


I agree IE6 acts differently (and strange). This is the access_log of
apache server when a request URL includes "wiki/LINUX(ko)":

"GET /wiki/\xb8\xae\xb4\xaa\xbd\xba" <- IE, EUC-KR
"GET /wiki/%B8%AE%B4%AA%BD%BA <- FF, EUC-KR
"GET /wiki/%EB%A6%AC%EB%88%85%EC%8A%A4" <- IE and FF, UTF-8

I don't know why IE's requests are in diffrent forms as the encoding
differs. It does url-encode if its option is set to use UTF-8 request,
but it doesn't if the option is unchecked. But as fas as I have
tested, my wiki.pl showed no difference between when a request came
from FF and from IE.

I'll consider what you mention with the example ";" and "%3b" and test
more.



2) Concering Anno's example, it looks good because it calls convert
routine only once. However, it shows some problem while processing
POST request, like file uploading, receiving trackback, etc. I tried
to debug but failed to find why. I think it is the second best way to
apply that code with additional if-clause: if ($q->request_method() eq
"GET")

What sort of problems? If your guessing routine is guessing incorrectly
for some of you real data, this indicates it's not safe to use it
anyway.

I agree and I tried to find the exact problem and the reason of it.


I'll describe here what I found until now:

At first, Anno's code was to change the values of CGI->Vars hash:

$q = new CGI;
# convert
my $param = $q->Vars;
$_ = check_and_convert($_) for values %$param;


File-uploading and trackback features are not part of the original
file. I added it myself about two years ago, getting codes from
examples in WWW.

For file-uploading, wiki.pl prints the form including:

$q->start_form('post',"$ScriptName", 'multipart/form-data') . "\n";
"<input type='hidden' name='action' value='upload'>";
"<input type='hidden' name='upload' value='1'>" . "\n";
$q->filefield("upload_file","",60,80) . "\n"; #
<-- file selection field
"&nbsp;&nbsp;" . "\n";
print $q->submit('Upload') . "\n";
$q->endform


User is supposed to click "open" button, choose a file in a file
selection window, and click "Upload" button to submit.

To save the file in server, the following code is used:

$file = $q->upload('upload_file');
open(FILE, ">file_in_local_disk_of_server");
binmode FILE;
while (<$file>) {
print FILE $_; # read from client's file and write to
server's disk
}
close(FILE);


I put "die;" for check:

$file = $q->upload('upload_file');
die "[$file]"; # here
open(FILE, ">file_in_local_disk_of_server");

If I don't convert Vars, script dies printing "[D:\download
\text.txt]". But when Vars is converted, script dies printing "[]".
That means $file lost the information that it's a file handle.

How can I keep it as valid file handle? Even without converting, I
found that any write access to $file causes the same problem.

my $param = $q->Vars;
$$param{'upload_file'} .= ""; # no other string appended, but
it lose file handle
or even
$$param{'upload_file'} = $$param{'upload_file'}; # it also lose
file handle!!! :-O


So there is nothing that check_and_convert() can do. Modifying "-
Vars" itself cause problem. If I have to choose this approach anyway,
I can do like this:
my $param = $q->Vars;
foreach (keys %$param) {
$$param{$_} = guess_and_convert($$param{$_}) if ($_ ne
"upload_file"); # don't try to assign param{'upload_file'}
}

But there is no confirm that all other parameters are ordinary
strings.




So I cling to Q_S like this. :-) As far as I know: (please correct me
if I am wrong)
1) Q_S is related to only GET request. (All the forms in wiki.pl calls
"wiki.pl" without any appending URL query when it submits)

You may be correct in this case that your wiki.pl only uses a query
string for GET requests. It is certainly possible to POST to a URL with
a query string.

Yes, I have to consider it in the future. And I still believe it
doesn't matter, because "query string" in URL is anyway just a string
which can't have any invisible information (like $file in above).


2) Q_S may be in the form of "keywords" or
"param1=value1&param2=value2...". guess_and_convert() will not change
the important characters like "&", "=", "+". It will not change any
other ASCII characters. It will just change the multi-byte chars.
Because those characters have been already encoded by browser, this
change is just the change of the number and the sequence of the "%HH"
runs. There is, I think, no problem when CGI object is created and
initialized using Q_S.

Err... OK. You must make sure you alter Q_S *before* any CGI.pm calls
are mode, though.

I agree.



3) Changing Q_S affects only the running script and it's child
process.

I don't know what happens under mod_perl, if you ever move your script
to that envionment. Under standard CGI, this is certainly true.


That's the type of answer I want! I've never thought of mod_perl or
anything like it. (Actually I have no idea of what it is.)


It seems to me that you are trying to take a piece of rather
badly-written code you don't really understand, and alter it do do
something that isn't really possible anyway. Given that you're in that
much of a mess, a simple edit of $ENV{QUERY_STRING} may well be the best
way out :).

Ben


I plan to check and test more things and choose what to do.

I thank you for your constant help. Have a nice day!

Raymundo at South Korea.

.



Relevant Pages

  • Re: Controlling Javascript from server side
    ... but five different language implementations here. ... 'true' means that the request must be handled asynchronously. ... There is exactly *no* reason for such a thing here. ... | percent-endoded string). ...
    (comp.lang.javascript)
  • Re: Problems installing Microsoft .NET Framework 1.1 Service Pack
    ... and it worked and I have filled in the request and sent it. ... An unhandled exception occurred during the execution of the ... siteName, String httpVerb, String path, String QS, String httpVersion, ... request, String site, TicketState& ticketState, String& responseHeaders) +100 ...
    (microsoft.public.windowsupdate)
  • Re: Problems installing Microsoft .NET Framework 1.1 Service Pack
    ... and it worked and I have filled in the request and sent it. ... An unhandled exception occurred during the execution of the ... siteName, String httpVerb, String path, String QS, String httpVersion, ... request, String site, TicketState& ticketState, String& responseHeaders) ...
    (microsoft.public.windowsupdate)
  • Re: which pattern to use...
    ... Pick from queue. ... readup and it says pass it as string, then parse the string in the ... Then Request gets the next module, checks its type, ... Module.processwith the data packet object. ...
    (comp.object)
  • Re: Tcl Style Guide
    ... > After a brief wiki conversation with LV... ... I see a preponderance of quoting with "...". ... the format string, which did not need command or variable ... I tend to use barewords (simple strings containing ...
    (comp.lang.tcl)