Re: set character at string position



I'm still confused..

binary format returns a string too, not a list:

set data [binary format c* {0 255 200}]

(I use c and S format specifiers with binary format/scan)

[string length $data] always seems to return the number of bytes, no matter
whether it contains byte values 0 or larger than 127.

So then what's wrong with inverting the second byte this way:

set data [string replace $data 1 1[eval [string index $data 1]^0xff ] ]

So will I have problems assuming string index will always return a single
byte? How does [string] know whether my string contains bytes or UTF-8
characters?

Lisa


"Donald Arseneau" <asnd@xxxxxxxxx> wrote in message
news:yfid5j4spdr.fsf@xxxxxxxxxxxx
> "Lisa Pearlson" <no@xxxxxxxx> writes:
>
>> Hmm, strings in TCL should be binary safe.. I know that to get byte at
>> position 3 [string index $data 3] may theoretically return more than 1
>> byte
>> if it's an UTF-8 character that can not be represented with a single
>> byte.
>> However, in which situations would that occur?
>
> Whenever the incoming byte is greater than 127, or when it is 0.
>
>> I receive binary data from socket/stdin, I append each byte received
>> with:
>> I use [binary format] and [binary scan] extensively.
>
> OK, you are far ahead of where your original code suggested.
> In particular, your code: expr [string index $s $i] ^ 0xff
> is nonsense, because [string index] returns a character, not
> a number. It is a syntax error unless the character is a
> digit, and then the corresponding number isn't what you intended.
>
> The confusion stems from C's char data type, which is actually
> a (1-byte) number, not a character at all. Those numbers *may*
> correlate to characters through an obsolete coding scheme like
> ascii, but not under multi-byte coding as used today.
>
> Anyway, you doing numeric operations on byte values, and your
> use of [binary scan] indicates you are doing the right thing.
>
> Uwe Klein had a good sample using [binary scan] to generate the
> list of numbers, and [foreach] to process each one.
>
> Such bulk numeric processing is indeed better done in C, and it
> would be good to introduce an [invert] command to your Tcl.
>
>
> --
> Donald Arseneau asnd@xxxxxxxxx


.



Relevant Pages

  • [TOMOYO #15 3/8] Common functions for TOMOYO Linux.
    ... This file contains common functions (e.g. policy I/O, pattern matching). ... Since TOMOYO Linux is a name based access control, ... TOMOYO Linux's string manipulation functions make reviewers feel crazy, ... the Linux kernel accepts all characters but NUL character ...
    (Linux-Kernel)
  • Re: SoapException character encoding
    ... >> display the character. ... In binary format however, this is still the ... The string we're discussing is XML escaped, ... >> exception, ...
    (microsoft.public.dotnet.framework.webservices)
  • RfD: Escaped Strings version 4
    ... the S" string can only contain printable characters, ... the S" string cannot contain the '"' character, ... as an escape character for the entry of characters that cannot be ... \b BS (backspace, ASCII 8) ...
    (comp.lang.forth)
  • RfD: Escaped Strings version 4
    ... the S" string can only contain printable characters, ... the S" string cannot contain the '"' character, ... as an escape character for the entry of characters that cannot be ... \b BS (backspace, ASCII 8) ...
    (comp.lang.forth)
  • Re: RfD: Escaped Strings
    ... the S" string can only contain printable characters, ... the S" string cannot contain the '"' character, ... \b BS (backspace, ASCII 8) ... \ ** escapes to characters much as C does. ...
    (comp.lang.forth)