Re: Evaluate my first python script, please

Jean-Michel Pichavant <jeanmichel@xxxxxxxxxxx> wrote:

And tell me how not using regexp will ensure the /etc/hosts processing
is correct ? The non regexp solutions provided in this thread did not
handled what you rightfully pointed out about host list and commented

It won't make is automatically correct, but I'd guess that written without
being so dependent on regexes might have made someone point out those
deficiencies sooner. The point being that casual readers of the code won't
take the time to decode the regex, they'll glance over it and assume it
does something or other sensible.

If I was writing that code, I'd read each line, strip off comments and
leading whitespace (so you can use re.match instead of, split on
whitespace and take all but the first field. I might check that the field
I'm ignoring it something like a numeric ip address, but if I did want to
do then I'd include range checking for valid octets so still no regex.

The whole of that I'd wrap in a generator so what you get back is a
sequence of host names.

However that's just me. I'm not averse to regular expressions, I've written
some real mammoths from time to time, but I do avoid them when there are
simpler clearer alternatives.

And FYI, the OP pattern does match ' (foo123)'
Ok that's totally unfair :D You're right I made a mistake. Still the
comment is absolutely required (provided it's correct).

Yes, the comment would have been good had it been correct. I'd also go for
a named group as that provides additional context within the regex.

Also if there are several similar regular expressions in the code, or if
they get too complex I'd build them up in parts. e.g.

OCTET = r'(?:\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])'
ADDRESS = (OCTET + r'\.') * 3 + OCTET
HOSTNAME = r'[-a-zA-Z0-9]+(?:\.[-a-zA-Z0-9]+)*'
# could use \S+ but my Linux manual says
# alphanumeric, dash and dots only
.... and so on ...

which provides another way of documenting the intentions of the regex.

BTW, I'm not advocating that here, the above patterns would be overkill,
but in more complex situations thats what I'd do.

Duncan Booth

Relevant Pages

  • Re: Search for multiple things in a string
    ... >> I also feel that Regular Expressions, being an object in (not ... So using Regex is not really like using another language (as C# is different ... I agree with you that readability is important. ... And I was not saying experiment with it. ...
  • Re: for a laught (???)
    ... Regex doesn't work too well with a null byte delimiter :-) ... the API for a particular form of regular expressions ... Regex doesn't work with null terminated strings. ... qualifier or the qualifier "commonly" might have suggested. ...
  • Re: Regex Question
    ... We start by compiling a regex: ... Then we define a pattern string. ... converts backslash combinations as special characters, ... Regular expressions use a lot of backslashes, and so it is useful to ...
  • Re: [OT] Re: Chris Sonnack on VB.Nets putative Set statement
    ... > regex is in a C-ish language string. ... the ease of making errors in regular expressions is a concern ... You do need to replicate the pattern on either side of the comma ...
  • Re: Search for multiple things in a string
    ... >>> As far as readability, it has nothing to do with Regular Expressions ... > and Regex. ... >> characters in the string, perhaps even writing your own state machine ...