Re: Automated Form Validation?

From: Matt Mitchell (m_a_t_t_remove_the_underscores_at_metalsponge.net)
Date: 02/25/05


Date: Fri, 25 Feb 2005 01:45:06 GMT


"Chung Leong" <chernyshevsky@hotmail.com> wrote in message
news:wbydnRkpWqVdy4PfRVn-gg@comcast.com...
: "Matt Mitchell" <m_a_t_t_remove_the_underscores@metalsponge.net> wrote in
: message news:ucoTd.174668$B8.79016@fe3.news.blueyonder.co.uk...
: I don't what kind of application would "often" ask the user for a e-mail
: address. But anyway...

It was intended as an example - to reduce the level of abstraction in the
discussion (which was about abstraction in code!...)

:
: Taking your example, say initially the validation function will only
return
: true if an address is of the form xxxx@xxxx.xxx. Throughout the
application
: the assumption is made that a e-mail address is of that format and that
: format only. Since it cannot contain single quotes, when you insert an
: address into a SQL statement, you decide that it's not necessary to escape
: them. Likewise, since it cannot contain square brackets, you decide not to
: pass it through htmlspecialchars() when you echo it.

I would refute this "sane programming scenario" right at the point where you
decide that user-inputted data is fine to insert into a database without
escaping. On which particular planet is this a good idea? If you are
taking even basic precautions against attacks, then you escape ALL data
before putting it into the database - even down to things like making sure
that numeric fields contain numeric data, etc.

: Now some time latter, you--or perhaps someone else--decide to relax the
: validation rule to accommodate full RFC 2822 syntax, that is, e-mail
address
: with display names. With change to that one line of code, you suddenly
: introduce God knows how many SQL injection and cross-site scripting
: vulnerabilities into your application. At the least, your application
: wouldn't work correctly. And you wouldn't even know that has happened,
since
: your argument is predicated on you not retesting features that have been
: affected by the change.

Or you don't introduce any vulnerabilities at all, if you follow proper
programming practices. Data is escaped before putting it into a database,
entity-escaped before putting it in an html page, and url-encoded before
putting it in a url.

If changing the validation rules on a field can cause this kind of problem,
then you don't have a security problem with the field values - you have a
security problem with a programmer who doesn't check data before using it.
Period.

: > The point is, if there are repeating aspects of your code, you place
them
: in
: > the same place, so that you only have to write, check and change the
code
: > ONCE. I really do fail to see the sense in what you're saying here,
: sorry.

OK, I'll quote your earlier post:

[quote]
"Matt Mitchell" <m_a_t_t_remove_the_underscores@metalsponge.net> wrote in
message news:bVkTd.138693$68.114857@fe1.news.blueyonder.co.uk...
> It's not hard, but it's pointless doing it 80-200 times in a single
> application, when you can do it once and it will all just *work*. Less
code
> to debug, for starters.

It sounds good in theory, but we all know that it rarely is the case that
you write it once and it all just works. Validation is open-ended. You will
likely have to tinker with the code over time. When you do, good QA practice
tells you to retest all parts of the application that could be affected. And
obviously, it's no fun having to retest 80-200 features just because one new
input field requires special handling.
[/quote]

(To set it in context, this was a response to my comment that it would make
more sense to code regex email validation in a write-once function, rather
than coding it for each use)

This posting would seem to be arguing that modularization is *not* a good
idea, that it's better to write the code each time you need that
functionality.

:
: No one is arguing against modularization here. Data validation is just not
: well disposed to be modularized and centralized. I mean, think about it,
: what constitutes valid data? That term has little meaning without a
context.

OK, so here are a few examples of classes of data that can be validated to
some kind of rule:

UK phone numbers/postcodes
US phone numbers/zipcodes
European postcodes

Most countries' car registration numbers, except for "vanity" plates
Social security numbers for most countries
Credit card numbers
Bank account numbers
Passport numbers

: We say that a piece of data is valid if it conforms to the expectation of
: the code that makes use of the data. The most obvious example is date
: handling. If the date is going to be converted to a Unix timestamp, it
: cannot be earlier than 1970. On the other hand, if it's going to be stored
: as a string, then "N/A" could be valid. How the validation should be done

But if a user enters "N/A" when you want to make sure they are entering a
date, then "N/A" is NOT valid - maybe it would be a better idea to indicate
somewhere else that there is no valid date in an input field/database field.
Remember the problems that came up for "09-09-99" being used to indicate "no
date"?

: depends on how the data will be used. Instead of trying to communicate
this
: context information to some independent validation module, it often easier
: to just do the validation right there.

But in the vast majority of cases, the validation IS generic. Most computer
software, most people, and most businesses handle the same type of data
repeatedly; computers are useful because they are good at doing the same
task over and over and over again, exactly the same each time. People are
very bad at doing this, and that's why it's better to get something right,
and then let a computer handle getting it done right the next time.



Relevant Pages

  • Re: Real world musings
    ... tectonics tend to match domain entity tectonics-- then good OO design ... Their context may be different enuff from the UI context that their ... Refactoring is usually easy with good OO designs. ... >> validation code into the SAX objects. ...
    (comp.object)
  • Re: Simple CEdit question
    ... The issue on validation depends on whether the control has enough information to do the ... my validating edit control has two levels of validation. ... done without any enclosing context. ... then subclass from CEdit and call it CEdit1 ...
    (microsoft.public.vc.mfc)
  • Re: OO Refactoring question.
    ... > same parameters that require the same validation rules. ... It is hard to speculate without more specific problem context. ... In this case I imbue [EntryPoint] with some specific functionality. ... Let's assume some [Client] needs to invoke a method. ...
    (comp.object)
  • Re: Automated Form Validation?
    ... say initially the validation function will only return ... address into a SQL statement, you decide that it's not necessary to escape ... No one is arguing against modularization here. ... context information to some independent validation module, ...
    (comp.lang.php)