Re: String parsing question

From: Dan Pop (Dan.Pop_at_cern.ch)
Date: 10/14/03


Date: 14 Oct 2003 15:29:34 GMT

In <bmh0cj$t31$1@chessie.cirr.com> Christopher Benson-Manica <ataru@nospam.cyberspace.org> writes:

>I'm wondering about the best way to do the following:
>
>I have a string delimited by semicolons. The items delimited may be in any of
>the following formats:
>1) 14 alphanum characters
>2) 5 alphanums space 8 alphanums
>3) 6 alphanums colon 8 alphanums
>4) 5 alphanums colon 8 alphanums
>
>My task is to convert items in the third format to the first format, and items
>in the fourth format to the second. Also, I need to count the number of items
>in the string, which may or may not have a trailing semicolon.
>
>My plan (which I feel is sub-optimal - hence this post), is to step through
>the initial string one character at a time to accomplish these things in one
>pass. While I could count semicolons easily with strchr(), deleting the
>colons properly means stepping through the whole string anyway (right?) and so
>I may as well count semicolons simultaneously. I'd also like to validate the
>data format (i.e., 15-character items are not allowed).
>
>int myfunc( const char *list )
>{
> int items=0;
> char *cp=strdup( idlist ); /* nonstandard */
> char *newstr=cp;
> int shifts=0;
> int chars=0;
>
> for( ; *cp ; *cp++ ) {
> if( *cp == ':' ) {
> if( chars == 6 ) {
> shifts++;
> continue;
> }
> if( chars == 5 ) {
> *(cp-shifts)=' ';
> chars++;
> continue;
> }
> return( -1 ); /* error */
> }
> if( *cp == ';' ) {
> items++;
> if( chars != 14 ) {
> return( -1 ); /* error */
> }
> chars=0;
> }
> else if( ++chars > 14 ) {
> return( -1 ); /* error */
> }
> *(cp-shifts)=*cp;
> }
> *(cp-shifts)='\0';
> if( chars == 14 ) {
> items++;
> }
> if( !items || (chars && chars != 14) ) {
> return( -1 ); /* error */
> }
> printf( "The string '%s' has %d items.", newstr, items );
> free( newstr );
> return( 0 ); /* success */
>}
>
>Is there a better way?

1. Such a code is a maintenance nightmare (imagine that you'll have to
   make some changes, 5 years from now).

2. I may be missing something, but I can't find any attempt to test that
   your characters really are alphanums, you're merely looking for your
   separators.

I would implement this function using sscanf calls. The result would be
slower, but a lot more readable. The conversion specifier for
alphanumerics can use the following macro:

    #define ALNUM "[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789]"

Dan

-- 
Dan Pop
DESY Zeuthen, RZ group
Email: Dan.Pop@ifh.de


Relevant Pages

  • Re: Compare Instr() and IndexOf() performance
    ... The largest file is 780 kb, the format of the file is fixed length record ... (17 characters), I am always searching this files on first 10 characters ... but the contents of this file changes every ... If you just calculate the maximum number of string comparisions that may be ...
    (microsoft.public.dotnet.framework.compactframework)
  • Re: How many bytes per Italian character?
    ... to format an ANSI string, and yes, there's a possibility that it isn't implemented on CE, ... you went from a discussion about the number of bytes in Italian characters to a ... Remote Registry Editor rather than in Windows CE. ...
    (microsoft.public.vc.mfc)
  • Re: Long Integer truncates 0 at beginning of number
    ... a human in that format. ... If you want to change it to a string: ... characters, then it is likely NOT a number (something you'd ... Is there anything I can do to keep the field type long? ...
    (microsoft.public.access.gettingstarted)
  • Re: converting a string to a list
    ... format of the input data, ... languages is that you can treat the compiler as a black box, ... I input a string, the ... string is a sequence of characters. ...
    (comp.lang.lisp)
  • Re: Is this good use of Properties?
    ... Should Custom Collections expose them? ... Should your domain object expose characters of its used strings? ... >Assuming some class external to string needed to know it's length How would you get it? ... >I need to send it across the pipe to another process (I don't control the format).. ...
    (microsoft.public.dotnet.languages.csharp)