Re: Delphi Quiz: SetLength( WideString, 10 );
From: Skybuck Flying (nospam_at_hotmail.com)
Date: 01/17/05
- Next message: Robert: "Can I remove the scrollbar from TListbox?"
- Previous message: Paul: "Debugging a Delphi OCX running inside Internet Explorer"
- In reply to: Rob Kennedy: "Re: Delphi Quiz: SetLength( WideString, 10 );"
- Next in thread: Rob Kennedy: "Re: Delphi Quiz: SetLength( WideString, 10 );"
- Reply: Rob Kennedy: "Re: Delphi Quiz: SetLength( WideString, 10 );"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Mon, 17 Jan 2005 11:52:51 +0100
"Rob Kennedy" <me3@privacy.net> wrote in message
news:3502f5F4gri53U1@individual.net...
> Skybuck Flying wrote:
> > I call a function and the function returns a buffer of bytes.
> >
> > This buffer contains a unicode string. (It's probably a 16 bit unicode
> > windows string)
>
> "Probably"? Find out for sure before you proceed.
I am working on that ;)
Let's assume it's a 16 bit unicode string. Since windows likes to work with
16 bit unicode strings.
Delphi's help on wide characters:
"
One approach to working with ideographic character sets is to convert all
characters to a wide character encoding scheme such as Unicode. Unicode
characters and strings are also called wide characters and wide character
strings. In the Unicode character set, each character is represented by two
bytes. Thus a Unicode string is a sequence not of individual bytes but of
two-byte words.
The first 256 Unicode characters map to the ANSI character set. The Windows
operating system supports Unicode (UCS-2). The Linux operating system
supports UCS-4, a superset of UCS-2. Delphi supports UCS-2 on both
platforms. Because wide characters are two bytes instead of one, the
character set can represent many more different characters.
Using a wide character encoding scheme has the advantage that you can make
many of the usual assumptions about strings that do not work for MBCS
systems. There is a direct relationship between the number of bytes in the
string and the number of characters in the string. You do not need to worry
about cutting characters in half or mistaking the second half of a character
for the start of a different character.
The biggest disadvantage of working with wide characters is that Windows
supports a few wide character API function calls. Because of this, the VCL
components represent all string values as single byte or MBCS strings.
Translating between the wide character system and the MBCS system every time
you set a string property or read its value would require additional code
and slow your application down. However, you may want to translate into wide
characters for some special string processing algorithms that need to take
advantage of the 1:1 mapping between characters and WideChars.
"
So apperently:
WideString = UCS-2 = 16 bit unicode strings
WideChar = UCS-2 = 16 bit unicode chars
At least on current delphi versions ;)
The lengty documentation above says in short:
WideStrings / UCS-2 / WideChars are garantued to always contain 16 bit
characters.
Which makes it "easy" (relative to once experience/iq ;)) to work with
(since they are fixed in size) ;)
However I do have a question about this:
What about string conversions ?
For example:
1. 32 bit unicode to 16 bit unicode
or
2. Utf8 encoding (multi byte characters) to 16 bit unicode
Suppose that we have a 32 bit unicode string or utf8 encoded string which
contains 100.000 different/unique unicode characters.
Only 65536 characters will fit into 16 bit unicode character (WideChar)
right ?
So what will happen to the rest of the 34464 characters ?
Possible reasonable answers:
1. They are thrown away ?
2. WideChars work a bit different and reserve some bits to do tricks
specially ment for this case to indicate this scenerio ?
Funny Crazy answers:
3. WideChar would suddenly grow in size :)
>
> > I would like to copy these bytes to a widestring.
>
> Use the SetString procedure.
>
> var
> w: WideString;
> pw: PWideChar;
> len: Integer;
>
> SetString(w, pw, len);
>
> The first parameter is the WideString variable to be filled. The second
> parameter is the pointer to the first character in the buffer. The third
> parameter is the length of the buffer, in characters. The string will be
> allocated automatically.
How dare you abuse that procedure like that ! =D
Have you actually looked at the prototype:
procedure SetString(var s: string; buffer: PChar; len: Integer);
That procedure is ment to initialize "normal" delphi strings.
Which I happen to know to be AnsiStrings with default settings.
AnsiStrings are not WideStrings !!!
Also
PChars are not WideChars !!!
Furthermore your code is doubtfull ;) :D
Delphi 7's documentation:
"
procedure SetString(var s: string; buffer: PChar; len: Integer);
Description
In Delphi code, SetString sets the contents and length of the given string
variable to the block of characters given by the Buffer and Len parameters.
For a short string variable, SetString sets the length indicator character
(the character at S[0]) to the value given by Len and then, if the Buffer
parameter is not nil, copies Len characters from Buffer into the string
starting at S[1]. For a short string variable, the Len parameter must be a
value between 0 and 255.
For a long string variable, SetString sets S to reference a newly allocated
string of the given length. If the Buffer parameter is not nil, SetString
then copies Len characters from Buffer into the string; otherwise, the
contents of the new string is left uninitialized. If there is not enough
memory available to create the string, an EOutOfMemory exception is raised.
Following a call to SetString, S is guaranteed to reference a unique string,
that is a string with a reference count of one.
"
The len parameter is in characters. (Not in bytes)
Delphi-Pop-Quiz Question:
Assume pw points to a buffer contain 20 bytes. (ten 16 bit unicode chars).
Let's look at your rape case =D
> var
> w: WideString;
> pw: PWideChar;
> len: Integer;
>
> SetString(w, pw, len);
What should be the value of len in this rape case ? =D
I suspect it would have to be 20 and not 10.
However 10 should be the correct answer occurding to the delphi
documentation.
I have not tried your rape case yet ;)
I would be surprised if
SetString(w, pw, 10 );
would allocate 20 bytes ;)
>
> If you don't know the length of the buffer but the buffer is
> null-terminated, then you can just assign the one to the other, just
> like with AnsiStrings.
>
> w := pw;
I don't like copieing a buffer to a string like that.
If the buffer contains multiple null chars this line of code will stop at
the first null char.
For my current case the length of the returned buffer is given, so I can use
the move procedure to exactly copy that ammount of bytes to my newly
allocated widestring with SetLength( widestring, buffer_length div 2 ) :)
>
> > However the SetLength method is not that well documented ;)
>
> "Sets the length of a string or dynamic-array variable."
>
> Seems pretty clear to me.
I ment the NewLength parameter.
This short line is clear:
"NewLength is the new number of characters or elements in S."
As long as one knows characters is not the same as bytes !
That's where the confusion could start. I have programmed in C.
And C does not know bytes, words, integers.
C talks about signed chars and unsigned chars. (which are shortint and byte)
So when C documentation states Char it means Byte.
So one needs to remember character is not char ! :)
Also the rest of the text makes it a bit of a drag to read... Because then I
have to read all that stuff to be sure that there are no exceptions. Borland
just loves putting a little note at the end stating: Note: in this and this
case it's in bytes lol =D
>
> > Let's make it a funny delphi quiz =D
> >
> > Delphi Quiz Question 1:
> >
> > Var
> > S : WideString;
> >
> > begin
> > SetLength( S, 10 );
> > end;
> >
> > How many bytes are allocated ?
>
> However many are needed to hold 10 characters. 10 * SizeOf(s[1]); 20
I don't like that code... calling SizeOf on a variable.
SizeOf should be used on types only !? ;)
.
>
> > Delphi Quiz Question 2:
> >
> > Var
> > S : WideString;
> > begin
> > S := '1234567890';
> > end;
> >
> > How many bytes are allocated ?
>
> Same as above.
Cheater... lol... you have to provide the answer yourself in hard numbers
;)... not the code =D or via code ;)
Oh I see you did provide a number... 20.
Right, you probably cheated and executed that code right ? ;) :P CHEATER !
=D
> > (Memory Manager Overhead can be ignored ;))
> >
> > From this little experiment below I would say the answer is probably 20
> > bytes.
> >
> > So to allocate a widestring which is big enough to store all bytes in I
> > would probably have to call
> >
> > SetLength( S, BufferBytes div 2 );
>
> If BufferBytes holds the number of bytes in the buffer, then yes, that
> would be the right answer.
>
> > program Project1;
> >
> > {$APPTYPE CONSOLE}
> >
> > uses
> > SysUtils;
> >
> > var
> > Buffer : packed array[0..9] of byte;
> > S : WideString;
> > I : integer;
> > begin
> >
> > // SetLength( S, 10 );
> > // writeln('length: ', Length(S) );
> >
> > S := '1234567890';
> >
> > move( s[1], Buffer, 10 );
>
> OK, good. You've copied half of the string into the array. Remember, you
> already determined above that a Unicode string of 10 characters occupies
> 20 bytes, not 10.
>
> Now copy the other half. Although you started out claiming you wanted to
> copy the buffer into the string, not vice versa.
Assume you don't know how SetLength works.
Assume you don't know how many bytes S contains.
How would you find out ?
I did it by using this code since I do know how move works.
move( s[1], Buffer, 10 );
This will copy 10 bytes for sure.
Then I look/print the buffer byte by byte.
If it prints 1234567890 then the string must be 10 bytes.
It if prints 1 2 3 4 5 then the string must be 20 bytes ;)
That's how I figured it out :D
Bye,
Skybuck.
- Next message: Robert: "Can I remove the scrollbar from TListbox?"
- Previous message: Paul: "Debugging a Delphi OCX running inside Internet Explorer"
- In reply to: Rob Kennedy: "Re: Delphi Quiz: SetLength( WideString, 10 );"
- Next in thread: Rob Kennedy: "Re: Delphi Quiz: SetLength( WideString, 10 );"
- Reply: Rob Kennedy: "Re: Delphi Quiz: SetLength( WideString, 10 );"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|