Re: Transmitting strings via tcp from a windows c++ client to a Java server



Very interesting input, Chris. It does seem
that UTF-8 is the right way for us...


1. Our data will mainly consist of ASCII text

2. It turns out Windows does have an API for to/from UTF-8
conversions. See WideCharToMultiByte -and-
MultiByteToWideChar (code page s/b set to CP_UTF8)

3. Our system does not use DataInputStream, but rather:
CharsetEncoder/Decoder.

4. Each of our msgs is indeed preceded by a length field
(as fixed-size text field). Length is measured in Java
characters and dup by 2 to obtain size in bytes

5. The BOM issue is, frankly, news to me. If I limit myself to
UTF-8 strings only, and stick to standard Win/Java api at
both client & server end, do I need to worry about BOM ?


Thanks in advance,


Gilad

.



Relevant Pages

  • Re: aps.net : BIG BUG in streamwriter
    ... look the BOM! ... editor which proceeds to rewrite it as UTF-16? ... when i want deserialize it with an utf-8 encoding... ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Custom Resource, XML problem
    ... Why are you assuming that it is 8-bit characters? ... //JWxml is namespace used by CXml ... which is then screamingly obvious as the UTF-8 Byte Order Mark, ... BOM is the only meaning of BOM in my brain was for "Bill Of Material" which ...
    (microsoft.public.vc.mfc)
  • Re: Invalid characters before xml header
    ... "UTF-8" hence the BOM which is a 16 a magic 16 bit unicode value usually put ... Just to confuse things I seem to remember that Encoding.UTF8 and new ... checked - the output XML files were identical. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Custom Resource, XML problem
    ... Mr.David Chingand I tried to use it with a XML wrapping ... Why are you assuming that it is 8-bit characters? ... which is then screamingly obvious as the UTF-8 Byte Order Mark, ... you have a BOM, if you do, which one, and convert the text appropriately. ...
    (microsoft.public.vc.mfc)
  • Re: Defacto standard string library
    ... context was strings that were known to be UTF-8, ... that other programs can recognize the encoding. ... since the BOM convention developed for UTF-16 ... I tried the Vista speech recognition by running the tutorial. ...
    (comp.lang.c)