Re: unicode and hashlib
- From: Jeff H <dundeemt@xxxxxxxxx>
- Date: Sat, 29 Nov 2008 18:54:10 -0800 (PST)
On Nov 29, 12:23 pm, Scott David Daniels <Scott.Dani...@xxxxxxx>
wrote:
Scott David Daniels wrote:
...
If you now, and for all time, decide that the only source you will take
is cp1252, perhaps you should decode to cp1252 before hashing.
Of course my dyslexia sticks out here as I get encode and decode exactly
backwards -- Marc 'BlackJack' Rintsch has it right.
Characters (a concept) are "encoded" to a byte format (representation).
Bytes (a precise representation) are "decoded" to characters (a format
with semantics).
--Scott David Daniels
Scott.Dani...@xxxxxxx
Ok, so the fog lifts, thanks to Scott and Marc, and I begin to realize
that the hashlib was trying to encode (not decode) my unicode object
as 'ascii' (my default encoding) and since that resulted in characters
128 - shhh'boom. So once I have character strings transformedinternally to unicode objects, I should encode them in 'utf-8' before
attempting to do things that guess at the proper way to encode them
for further processing.(i.e. hashlib)
u'Andr\xc3\xa9'a='André'
b=unicode(a,'cp1252')
b
'b4e5418a36bc4badfc47deb657a2b50c'hashlib.md5(b.encode('utf-8')).hexdigest()
Scott then points out that utf-8 is probably superior (for use within
the code I control) to utf-16 and utf-32 which both have 2 variants
and sometimes which one used is based on installed software and/or
processors. utf-8 unlike -16/-32 stays reliable and reproducible
irrespective of software or hardware.
decode vs encode
You decode from on character set to a unicode object
You encode from a unicode object to a specifed character set
Please correct me if you see something wrong and thank you for your
advice and direction.
u'unicordial-ly yours. ;)'
Jeff
.
- Follow-Ups:
- Re: unicode and hashlib
- From: Scott David Daniels
- Re: unicode and hashlib
- References:
- unicode and hashlib
- From: Jeff H
- Re: unicode and hashlib
- From: Terry Reedy
- Re: unicode and hashlib
- From: Jeff H
- Re: unicode and hashlib
- From: Jeff H
- Re: unicode and hashlib
- From: Scott David Daniels
- Re: unicode and hashlib
- From: Scott David Daniels
- unicode and hashlib
- Prev by Date: Re: Pyhon (with wxPython) on Windows' cygwin: can it be done fully ?
- Next by Date: Re: Exhaustive Unit Testing
- Previous by thread: Re: unicode and hashlib
- Next by thread: Re: unicode and hashlib
- Index(es):
Relevant Pages
|