Regexp: unexspected splitting of string in several groups

From: Piet (pit.grinja_at_gmx.de)
Date: 05/31/04


Date: 31 May 2004 04:41:11 -0700

Hello,
I have a very strange problem with regular expressions. The problem
consists of analyzing the properties of columns of a MySQL database.
When I request the column type, I get back a string with the following
composition:
vartype(width[,decimals]|list) further variable attributes.
vartype is a simple string(varchar, tinyint ...) which might be
followed by a string in curved brackets. This bracketed string is
either composed of a single number, two numbers separated by a comma,
or a list of strings separated by a comma. After the bracketed string,
there might be a list of further strings (separated by blanks)
describing some more properties of the column.
Typical examples are:
char(30) binary
int(10) zerofill
float(3,2)...
I would like to extract the vartype, the bracketed string and the
further properties separately and thus defined the following regular
expression:
#snip
vartypePattern = re.compile("([a-zA-Z]+)(\(.*\))*([^(].*[^)])")
vartypeSplit = vartypePattern.match("float(3,2) not null")
#snip
That works for some expressions with a bracketed expression. E.g. the
above expression gives back:
vartypeSplit.groups() = ('float', '(30,2)', ' not null').
However, simple one-string expressions like
vartypeSplit = vartypePattern.match("float")
are always splitted into two strings. The result is:
vartypeSplit.groups() = ('flo', None, 'at').
I would have either expected ('float',None,None) or ('float','','').
For other strings, the last two characters are also found in a
separate group.
Is this a bug or a feature? ;-)
Can anybody point me in the right direction to solve the problem.
Many thanks
Piet



Relevant Pages

  • Re: Regular Expression, to use or not to use...
    ... So I have something that will search the string 10x ... Yea thats true, like I said I still use them occasionally, as a hack, ... Also this is an extremly simple re, no |'s or complex expressions. ... >> simple string operations where I can come up with the expression in a ...
    (microsoft.public.dotnet.general)
  • Re: Small confusion about negative lookbehind
    ... > My candidate string is "ab". ... > The expressions I'm testing this string against are the following, ... but the position between characters. ... Regular expressions describe not only strings, ...
    (comp.lang.java.programmer)
  • Re: Why not FP for Money?
    ... >> conversion of binary floats to decimal floats, and the string looks ... >> out of place in numeric expressions. ... > that using 'd' is a compromise to having no way to write ... Carlos Ribeiro ...
    (comp.lang.python)
  • Re: Regular Expression, to use or not to use...
    ... > Expressions for a long time now, ... > external tools to help me with. ... > order of 10 times slower then straight forward string parsing code. ... This could be a result of .NET engine, ...
    (microsoft.public.dotnet.general)
  • RE: speed up string matching
    ... > I need to match an expression and its reverse to a very long string. ... you'd have to merge your expressions somehow - the easiest ... So in order to match a very long string with multiple expressions simultaneously and faster than the matching procedure I have described above I need multiple computers? ...
    (perl.beginners)