Re: unicode and hashlib



On Nov 29, 12:23 pm, Scott David Daniels <Scott.Dani...@xxxxxxx>
wrote:
Scott David Daniels wrote:

...

If you now, and for all time, decide that the only source you will take
is cp1252, perhaps you should decode to cp1252 before hashing.

Of course my dyslexia sticks out here as I get encode and decode exactly
backwards -- Marc 'BlackJack' Rintsch has it right.

Characters (a concept) are "encoded" to a byte format (representation).
Bytes (a precise representation) are "decoded" to characters (a format
with semantics).

--Scott David Daniels
Scott.Dani...@xxxxxxx

Ok, so the fog lifts, thanks to Scott and Marc, and I begin to realize
that the hashlib was trying to encode (not decode) my unicode object
as 'ascii' (my default encoding) and since that resulted in characters
128 - shhh'boom. So once I have character strings transformed
internally to unicode objects, I should encode them in 'utf-8' before
attempting to do things that guess at the proper way to encode them
for further processing.(i.e. hashlib)

a='André'
b=unicode(a,'cp1252')
b
u'Andr\xc3\xa9'
hashlib.md5(b.encode('utf-8')).hexdigest()
'b4e5418a36bc4badfc47deb657a2b50c'

Scott then points out that utf-8 is probably superior (for use within
the code I control) to utf-16 and utf-32 which both have 2 variants
and sometimes which one used is based on installed software and/or
processors. utf-8 unlike -16/-32 stays reliable and reproducible
irrespective of software or hardware.

decode vs encode
You decode from on character set to a unicode object
You encode from a unicode object to a specifed character set

Please correct me if you see something wrong and thank you for your
advice and direction.

u'unicordial-ly yours. ;)'
Jeff
.



Relevant Pages

  • Re: unicode and hashlib
    ... > decode vs encode ... You encode from a unicode object to a specifed character set ... You decode the byte stream into characters, ...
    (comp.lang.python)
  • Re: Need a bit of information about Compression
    ... either encode a full length, or a length mod some constant. ... Yes, full Length coding, arithmetic coding which is certainly better ... if you are decoding, say, 1024 symbols, then you stop as soon as you decode ... less common is to set up some special condition, where the eof is ...
    (comp.compression)
  • Re: Sending floats over a client-server in Smalltalk
    ... The trick is knowing what to decode them ... Then encode the number in the remaining bytes. ... ByteString>>floatAt: byteIndex ... I could then take a string ...
    (comp.lang.smalltalk)
  • Re: ENCODE and DECODE
    ... > there are alternative formats for ENCODE and DECODE on different ... confirming the gcos Fortran style, where the number of characters was taken ...
    (comp.lang.fortran)
  • Re: How to flip a coin over e-mail?
    ... >between these outcomes. ... You encode the room descriptions with your key A, coding each entry ... there's one entry left on the list which you can decode to ...
    (sci.math)