Re: for a laught (???)



Yes, indeed Rick, you have correctly stated the OC postion.
Moreover, whenever possible, OC uses POSIX C functions
to enable porting to a wide variety of platforms.
Here are some comments.
Snip from POSIX regex -
------------
NAME
regcomp, regexec, regerror, regfree - POSIX regex functions

SYNOPSIS
#include <sys/types.h>
#include <regex.h>

int regcomp(regex_t *preg, const char *regex, int cflags);
int regexec(const regex_t *preg, const char *string, size_t
nmatch,
regmatch_t pmatch[], int eflags);
size_t regerror(int errcode, const regex_t *preg, char *errbuf,
size_t
errbuf_size);
void regfree(regex_t *preg);

POSIX REGEX COMPILING
regcomp is used to compile a regular expression into a form that
is
suitable for subsequent regexec searches.

regcomp is supplied with preg, a pointer to a pattern buffer
storage
area; regex, a pointer to the null-terminated string and cflags,
-------------

Note that regex is a null-terminated string and it is NOT possible to
use a (even escaped) null-byte within it.
Further -
Snip -

---------
POSIX REGEX MATCHING
regexec is used to match a null-terminated string against the
precom-
piled pattern buffer, preg.
---------

Even disregarding the null-byte problem, we were jumping through hoops
using regex. eg. The Cobol delimiter(s) can be any character including those
defined as RE's - ^.[$()|*+?{\ (basic set). These, of courses, have to be
tested for and suitable escaped. Here note that any particular regex
implementation may define more RE's (extra set).

Bottom line, for OC we rewrote using standard C code which turned
out to be simpler and a magnitude faster at runtime.
(For OC 0.33, code is in libcob/strings.c)

Roger



"Rick Smith" <ricksmith@xxxxxxx> schrieb im Newsbeitrag
news:137ji6im2o5qs93@xxxxxxxxxxxxxxxxxxxxx

"Pete Dashwood" <dashwood@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:5dsnsfF34noa5U1@xxxxxxxxxxxxxxxxxxxxx

"Rick Smith" <ricksmith@xxxxxxx> wrote in message
news:137h6ukjstdb84d@xxxxxxxxxxxxxxxxxxxxx

"Pete Dashwood" <dashwood@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:5dr8c2F367vg3U1@xxxxxxxxxxxxxxxxxxxxx

"Rick Smith" <ricksmith@xxxxxxx> wrote in message
news:137g387h00alt56@xxxxxxxxxxxxxxxxxxxxx

"Pete Dashwood" <dashwood@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in
message
news:5dpuatF34gd3gU1@xxxxxxxxxxxxxxxxxxxxx

"Roger While" <simrw@xxxxxxxxxxxx> wrote in message
news:f5802v$8co$00$1@xxxxxxxxxxxxxxxxxxxx
Be aware of the limitations of regex.

I believe I am so aware :-)

I'll let you into a little secret.
We used to use regex in OC for the runtime component
of UNSTRING.
UNTIL we came across -
UNSTRING ... DELIMITED BY LOW-VALUE ...
:-)

Regex doesn't work too well with a null byte delimiter :-)

I think one of us is missing something here.

Yes, Mr Dashwood, you are! <g>

"regex" is the name of the header file used to include
the API (?) for a particular form of regular expressions
into C programs.

"RegEx" is, apparently, the function (or method) name
for processing a particular form of regular expressions
in C#.

After further investigation, I found that "RegEx" (spelled
Regex in a C# function a couple days ago) is a class, or
whatever is is called in C#, and not a function name as I
stated above.

It is indeed a Class (one of several concerned with Regular Expressions)
in
the System.Text.RegularExpression namespace of the DotNET Framework Class
Library (FCL) but, not being of a pedantic nature, and realising what you
meant, I allowed your loose use of "function" as being near enough,
didn't
correct you on it, and even used it myself for the sake of our
conversation....(I sometimes wish that people here would cut me the same
slack I cut them :-)). It doesn't matter, when what we were actually
discussing is the use of Regular Expressions with null terminated
strings.

"regex" is not the same as "RegEx".

This assertion remains unchanged, however.

OK, point taken. But I was pretty clear about the fact that I was
talking
about the MicroSoft implementation.

You may be able to keep this clear in your mind if you
remember to prefix your knowledge with MVO
(Microsoft's Version Of); thus, rephrasing your following
statement, "[(MVO) regular expressions] works fine with
null delimited, ..."

I qualified that below the statement, rather than at the start, with
the
phrase: "...However, my experience is with the MS RegEx engine (which
some
consider to be a perversion :-)) and the engine you were using may
have
been more limited...:-)"

Does any part of that NOT limit my discussion to MS?

Regardless of your expeience, to follow Mr While's
"Regex doesn't work ..." with your "RegEx works fine ..."
is a non sequitur that arises from "one of us is missing
something here." That is what I attempted to address.

No, I disagree. It is incorrect to state, without any qualification, that
Regex doesn't work with null terminated strings. I refuted it with a
specific example which DOES work with null-terminated strings. I then
gave
examples of how you could get it to work with nulls in general. There is
no
non-sequitur in that.

The problem was not null terminated strings. The problem
was embedded null characters ("null byte delimiter"). You
recognized this earlier (later in this post) when you wrote
"RegEx works fine with null delimited, or even null
embedded strings ...". The qualification was implicit; that is,
it is public knowledge that Open Cobol is written in C
(the source code is available for download) and that regex
was used in Open Cobol, but is no longer in use; therefore,
the statement concerning regex was limited to the C language.
What you claim to be a refutation was what may be done
with C#.

I does not follow that a problem doing something in C
may be overcome by using techniques available in C#
and not available in C.

[snip]
RegEx works fine with null
delimited, or even null embedded strings, PROVIDED you cater for
nulls
in
the RegEx expression. (You may need to include an escape...\0x00 if
embedded, or $ if terminated by null. $ actually represents the
"null
string
at the end of the string"; if you want just the end of the string
itself,
specify \z). Some implementations of RegEx allow you to specify
control
parameters to the RegEx engine, and one of these parameters is
whether
or
not strings for matching are to be null terminated. >>





.



Relevant Pages