Detecting potential regular expression matches?



I have a server-type TCL script, which needs to accept socket
connections from several different beasts.

Most of them, identify themselves pretty much straight off with a hello
keyword, one waits until the end of the line to throw in its magical
keyword, and now I need to add support for two binary streams. Both
binary streams are guaranteed to include a regexp'able pattern within
the first couple KB of data, and look very un-textish within the first
hundred or so.

What I would like to be able to do, is determine not only whether a
particular RE matches the incoming data stream, but also whether it
MIGHT match if we receive some more data.


What I do now, is to have a general wrapper on the readable file events
for the socket connection. The wrapper reads in whatever data is
available, and hands it to the proper handler function which was passed
to it as an argument. The default ("new connection") handler, in turn,
adds that data to a buffer, and then looks over it with a bunch of
regexp's. If one matches, it reconfigures the readable file event to
use that handler instead, calls an initialisation procedure to let it
configure the connection to its liking, and then calls the handler with
the contents of the buffer (which it then destroys) to get things
rolling.

The problem, is what happens when I don't recognise the incoming data.
At the moment, the new connection handler checks the size of the
buffer, and dumps the connection if the buffer exceeds a certain size.
It also has a timeout going to dump the connection if it isn't
recognised in a certain time frame. What I would like to be able to
do, is start off with a list of all the known handlers, and knock them
off the list as they get ruled out. Then instead of having to fill a
buffer to a certain point or sit there waiting for a timeout
(which is what happens when someone telnets the server and messes up
their entry), I can dump the connection as soon as it runs out of
potential matches (the binary streams look like binary streams very
early on, so all non-matching text connections can be dumped at the
first end-of-line).

I can still do it, for example, by checking the buffer for either an
end-of-line, or binary-looking data, and applying different constraints
appropriately, but I think it's going to be an awful lot more fiddly,
and slightly less reliable that way.


Fredderic
.



Relevant Pages

  • Re: Detecting potential regular expression matches?
    ... and now I need to add support for two binary streams. ... The default ("new connection") handler, in turn, ... the contents of the buffer to get things ...
    (comp.lang.tcl)
  • Re: Context of error messages with respect to BSD sockets
    ... under what circumstances will send or recv block? ... A 'connection mode' socket usually has an in-kernel send and ... receive buffer associated with it. ... this signifies that the connection has closed ...
    (comp.unix.programmer)
  • WMP9 Performance tab - Use it, or no?
    ... If I let WMP detect the connection speed, ... the audio plays a half second, breaks for a half second, plays for ... If I extend the buffer out to the max, I do get decent playback, ... Only 2 years ago NASA was using the older format ...
    (microsoft.public.windowsmedia.player)
  • Re: help out a boob? whats wrong with this? callback not firing when called recursively
    ... sender has not closed the connection. ... much as all of the rest of the data or the size of your buffer (whichever ... But just as you need to be able to reassemble ... Since reassembling delimiters isn't any different from reassembling the ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: embedded tcp input only optimizations
    ... todays syn flood prevention algorithms in place I think ... interupt based on the timer and deal there with segment ... same time summarize it in a CPU register, then store it in your buffer ... stack should only deal with one connection at a time. ...
    (comp.arch.embedded)