Re: atoi return



Richard Heathfield <rjh@xxxxxxxxxxxxxxx> writes:

CBFalconer said:

Richard Heathfield wrote:
CBFalconer said:

<snip>

atoi is always safe if you limit the input string to
length 4,

Rubbish. Even a length of 1 isn't guaranteed to be safe. The
behaviour of atoi("X") is undefined.

You are slow today. The "X" contains no digits, so atoi accurately
returns zero.

Wrong. It would be accurate to return 0 for "0", and indeed for "0X", since
the "initial portion of the string" (as the Standard has it) can be
represented as an int. But for "X", the 'X' marks a non-convertible
character, so any "initial portion of the string" must precede it, but
there isn't any string portion preceding it. Since the behaviour is
undefined, atoi *may* return 0, but it is not required to do that.

This issue can be resolved by a reading of 7.20.1.4, viz.,

7.20.1.4 p 2 -

The strtol, strtoll, strtoul, and strtoull functions convert
the initial portion of the string pointed to by nptr to long
int, long long int, unsigned long int, and unsigned long long
int representation, respectively. First, they decompose the
input string into three parts: an initial, possibly empty,
sequence of white-space characters (as specified by the isspace
function), a subject sequence resembling an integer represented
in some radix determined by the value of base, and a final
string of one or more unrecognized characters, including the
terminating null character of the input string. Then, they
attempt to convert the subject sequence to an integer, and
return the result.

7.20.1.4 p 4 -

The subject sequence is defined as the longest initial subsequence
of the input string, starting with the first non-white-space
character, that is of the expected form. The subject sequence
contains no characters if the input string is empty or consists
entirely of white space, or if the first non-white-space character
is other than a sign or permissible letter or digit.

7.20.1.4 p 7 -

If the subject sequence is empty or does not have the expected form,
no conversion is performed; the value of nptr is stored in the object
pointed to by endptr, provided that endptr is not a null pointer.

7.20.1.4 p 8 -

The strtol, strtoll, strtoul, and strtoull functions return the
converted value, if any. If no conversion could be performed,
zero is returned. If the correct value is outside the range of
representable values, LONG_MIN, LONG_MAX, LLONG_MIN, LLONG_MAX,
ULONG_MAX, or ULLONG_MAX is returned (according to the return
type and sign of the value, if any), and the value of the macro
ERANGE is stored in errno.


7.20.1.4 p 2,4 make clear that the phrase "the initial portion"
includes the possibility that this portion could be of zero length.

By 7.20.1.4 p 4, the subject sequence is empty.

By 7.20.1.4 p 7, no conversion is performed.

By 7.20.1.4 p 8, the return value is zero.

.



Relevant Pages

  • Re: user defined function that converts string to float
    ... > I need user defined function that converts string to float in c. ... initial, possibly empty, sequence of white-space characters (as ... point character, then an optional exponent part as defined in ... then a nonempty sequence of hexadecimal digits ...
    (comp.lang.c)
  • Re: Check for Common character sequence ( I will pay)?
    ... Dude, programming is all problem-solving. ... You need to identify character sequences of 3 or more characters that appear ... in more than one string. ... and test each 3-character sequence that results. ...
    (microsoft.public.dotnet.framework)
  • Re: Check for Common character sequence ( I will pay)?
    ... Do I need to return an array? ... You need to identify character sequences of 3 or more characters that appear ... in more than one string. ... and test each 3-character sequence that results. ...
    (microsoft.public.dotnet.framework)
  • Re: Check for Common character sequence ( I will pay)?
    ... Yes you are returning an array of FoundString objects. ... in more than one string. ... This means that you have to identify sequences 1 character at a time, ... Again, obviously, if the 3-character sequence doesn't match, neither will ...
    (microsoft.public.dotnet.framework)
  • Re: sed applying the same regexp twice
    ... >before the same dot (that terminated the just substituted sequence) ... so matching another empty sequence of non-dots is an error. ... get replaced) the search will continue at the next character ... the string as matching also and you would have got another hyphen. ...
    (comp.unix.shell)