Re: File Creation From Form

From: Alan J. Flavell (flavell_at_ph.gla.ac.uk)
Date: 03/13/04


Date: Sat, 13 Mar 2004 14:20:25 +0000

On Sat, 13 Mar 2004, blnukem wrote:

> I have a form on a Linux system that takes a file name from the user and
> creates a file according to the name. The problem is if the user type in
> something like Giclée it creates a file named gicl(-)es.

Handling anything more than us-ascii characters on HTML forms input is
a complex topic, and not (in itself) really a matter for
comp.lang.perl.misc - probably better dealt with in the CGI authoring
group, if you really, _really_ need it. [1]

> If I test the code locally with a defined variable of: my
> $NewPageName = "Giclée"; it works fine.

It would, that doesn't surprise me. So the Perl part is working ;-)

> Any ideas?

Do you actually _require_ to create such file names? I dare to
suggest you've be better off, and surely safer, if you could factor
them out of the problem.

> Code:
> #!/usr/bin/perl -w
>
> use strict;
> use CGI qw(:standard);
>
> print "Content-type: text/html\n\n";
>
> my $NewPageName = param('NewPageName');
> $NewPageName = lc($NewPageName);
> $NewPageName =~ tr/ \n\r\t//d;
>
> open (TEXT, ">$NewPageName.txt") or die "Could not print the data: $!";

You've got most of this right, but the frightening thing is the one
vital bit that you've missed out.

You desperately, but *desperately* need to know how to handle tainted
data. And to help you do that, it's *strongly* recommended to enable
taint checking (-T on the shebang line).

Don't put another CGI script anywhere near the public WWW until you've
grasped these principles, and implemented them in your scripts. Read
Stein's CGI security FAQ, as well as the perlsec pod (perldoc
perlsec).

Once that's out of the way, you'll recognise that you need a regex
which is untainting the data that's coming from outside, i.e
allowing-through the allowed characters and keeping out the dangerous
ones, before attempting to use it for any system-critical purposes.

Usually the best strategy is to entirely reject values which contain
anything that you aren't willing to accept, and ask the user to try
again, rather than by stealthily filtering out stuff that you don't
care for and just using what's left. However, there might be an
argument for silently turning accented letters into unaccented letters
and using the result, as long as that makes sense in the particular
context.

Remember, the key principle of security is only to accept what you
know to be safe (don't just try to filter out what you believe to be
dangerous).

Finally, if you're using CGI.pm, as you are, then I'd recommend using
its features. Not hand-knitted stuff like
  print "Content-type: text/html\n\n";

There's a specific reason for that. In the absence of an explicit
character encoding specification (charset=) for the HTML document,
forms submission of non-ascii characters is even more uncertain than
when it's specified. So, tell CGI.pm what charset you intend to use
(be it utf-8 or be it iso-8859-1 or whatever) and use _it_ to create
the appropriate headers for you, including the "charset=" attribute.

take care

[1] there's a very incomplete survey on my page at
http://ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html



Relevant Pages

  • Re: accentuation mark
    ... hang on to a prviously declared non-ISO-8859-1 charset. ... If you have read an e-mail that declares ISO-8859-2, ... screen as ISO-8859-1 characters, but if you then edit the ... read it fine in my UTF-8 news client. ...
    (comp.sys.acorn.misc)
  • Re: [kde] Character sets / encoding
    ... viewed with UTF-8. ... page for incoming mail to either ISO 8859-1 or IBM cp 1252. ... If the characters you typed were umlauted, ... wants to show the bits from the net in a readable form) which Charset (and ...
    (KDE)
  • Re: Issue with request.getParameter while reading UTF-8 Characters in Servlet using GET method
    ... But Chinese characters, ... the browser will encode characters into the Request-URI ... charset will not be expressed anywhere in the request message. ...
    (comp.lang.java.programmer)
  • Re: HELP: Unicode in Java 1.3.1 vs 1.4.2
    ... > Unicode with one of the String constructor methods. ... > convert a String object into a byte array of non-Unicode characters ... You are not using the canonical name of the charset, ... String then it is already Unicode, ...
    (comp.lang.java.programmer)
  • Re: Drought! My ideas are all dried up...
    ... Witch and the Wardrobe movie. ... They were pretty faithful to the book, the movie looked great, the children were passable actors and the CGI was very watchable... ... The Farscape characters - especially Rygel and Pilot were amazing. ...
    (rec.arts.sf.composition)