Re: Transmitting strings via tcp from a windows c++ client to a Java server
- From: "Chris Uppal" <chris.uppal@xxxxxxxxxxxxxxxxxxxxxxxxxxx>
- Date: Mon, 20 Feb 2006 12:10:49 -0000
qqq111 wrote:
We have a C++ client which runs on Windows and that needs to transmit
char* / wchar* strings to and from a Java server.
The client should correctly handle both 'standard' languages & east
Asian
languages (i.e. using wchar).
The obvious options are:
Use UTF-8.
Advantages: Compact /if/ you send mostly ASCII text. Easily readable (for
debugging) /if/ you send mostly ASCII text. No byte-order issues.
Disadvantages: Consumes more bandwidth if you send mostly non-ASCII. Requires
explicit en/de-coding on the Windows box (perfectly possible, but you have to
write the code for it).
Use: UTF16-LE
Advantages: Compact in the cases where UTF-8 is not. Requires no special
handling in the Windows code (since that's the native format for a wstring) and
you always have to specify an encoding at the Java end so it makes no
difference which encoding you use from the Java point-of-view.
Disadvantages: Consumes more bandwidth if you send mostly ASCII text.
Without knowing your requirements, I'd can't guess which option would be best
for you, but I don't think any other options make sense.
Some other points to consider.
If you choose UTF8 then don't use java.io.DataInputStream.readUTF() or the
corresponding write method They doesn't do what the method names suggest.
If you choose UTF16-LE then you should consider whether a BOM (byte order mark)
is forbidden, tolerated, or required by your protocol. Alternatively you could
mandate merely UTF16 (either byte order) and /require/ a BOM -- that would give
you flexibility if you anticipate creating non Windows clients (which I doubt).
If you choose UTF8 then you should consider whether a BOM forbidden or
tolerated by your protocol.
If your choice between UTF-8 and -16 is significantly swayed by bandwidth
considerations, then it might be worthwhile considering using zlib compression.
Java already understands that, and it's easy to use the ZLIB1.DLL from Windows
code.
If your protocol is of the form:
<character count><character data>
then you should be very clear about what you mean by a "character", especially
if you use UTF16 (where there may be more 16-bit wchars / Java chars than
actual Unicode characters). Is the BOM (if any) included in the count ?
-- chris
.
- Follow-Ups:
- References:
- Prev by Date: to use import java.lang.* or import java.lang.Math or none at all?
- Next by Date: Re: Where should I place resource-files like images ?
- Previous by thread: Re: Transmitting strings via tcp from a windows c++ client to a Java server
- Next by thread: Re: Transmitting strings via tcp from a windows c++ client to a Java server
- Index(es):
Relevant Pages
|