Re: (Simple?) Unicode Question



On Thu, 2009-08-27 at 22:09 +0530, Shashank Singh wrote:
Hi All!

I have a very simple (and probably stupid) question eluding me.
When exactly is the char-set information needed?

To make my question clear consider reading a file.
While reading a file, all I get is basically an array of bytes.

Now suppose a file has 10 bytes in it (all is data, no metadata,
forget the BOM and stuff for a little while). I read it into an array
of 10
bytes, replace, say, 2nd bytes and write all the bytes back to a new
file.

Do i need the character encoding mumbo jumbo anywhere in this?

Further, does anything, except a printing device need to know the
encoding of a piece of "text"? I mean, as long as we are not trying
to get a symbolic representation of a "text" or get "i"th character
of it, all we need to do is to carry the intended encoding as
an auxiliary information to the data stored as byte array.

If you are just reading and writing bytes then you are just reading and
writing bytes. Where you need to worry about unicode, etc. is when you
start treating a series of bytes as TEXT (e.g. how many *characters* are
in this byte array).*

This is no different, IMO, than treating a byte stream vs a image file.
You don't, need to worry about resolution, palette, bit-depth, etc. if
you are only treating as a stream of bytes. The only difference between
the two is that in Python "unicode" is a built-in type and "image"
isn't ;)

* Just make sure that if you are manipulating byte streams independent
of it's textual representation that you open files, e.g., in binary
mode.

-a


.



Relevant Pages

  • Re: increasing the size of a byte array and reading streams
    ... > reading the stream below (where buf is a byte array). ... When you're reading a file, for example, and you know that the file is ... There are times, however, when it makes sense to read a stream in small ... write those uncompressed bytes to your secondary MemoryStream. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Really Stuck, Please Help
    ... I am not sure how to construct an array. ... Reading Master Log", of the "Copreco Master Log" workbook. ... Dim sourceBook As String ...
    (microsoft.public.excel.programming)
  • Re: Problem with a script
    ... a loop there becomes impractical. ... You still have them as uniquely named array indexes... ... writing the code twice will only ... reading your entire code and parsing it in their head, ...
    (comp.lang.php)
  • Re: Problem with a script
    ... Okay, so variables have unique labels, that doesn't mean they still couldn't be handled in a loop. ... You still have them as uniquely named array indexes... ... I believe that for the new guy this code would be readable, and identifying problems should really not be any more difficult with this, plus I think that it actually might save some time to write the actual code from the beginnig, even though it's not at it's final stage, instead of first writing everything spread out, and then rewriting the same code again cleaned. ... If you expect a person to spend an hour reading your entire code and parsing it in their head, you wont get any help and have to solve the problem by yourself. ...
    (comp.lang.php)
  • Re: Reading Files Byte-For-Byte
    ... >> attempting to read the file into an array. ... to be slower due simply to the additional commands required to accomplish ... I think the misunderstanding came about due to your mentioning reading the ... of printable characters, you'll need to do something else if your file data falls ...
    (microsoft.public.vb.general.discussion)