A neat trick to serialize arrays and hashes

From: J. Romano (jl_post_at_hotmail.com)
Date: 06/18/04


Date: 17 Jun 2004 20:32:16 -0700

Dear Perl community,

   Today I invented a neat new trick that I thought I'd share with
everyone here.

   But before I continue, I'd like to point out to anyone out there
who thinks that my trick is "obvious to everyone but inexperienced
programmers" or that "it's not worth knowing because better approaches
exist" that some people enjoy learning a new simple trick, even if
they never get a chance to apply it. Besides, sharing a trick that
was just discovered (even if most programmers already know about it)
has the benefit of educating any programmer who, for some reason or
another, happens to not be aware of that particular technique. So if
you really must reply saying that you already knew this trick, instead
of saying how it didn't help you at all, how about sharing something
else that might be useful to someone in the Perl community? That
would be much appreciated.

   Anyway, now that I'm off my soap box, here is what I discovered
this morning:

   The pack string "(w/a*)*" is useful for serializing arrays and
hashes -- that is, it can pack and unpack arrays and hashes to and
from a string. Let me explain in more detail:

   I have an array, which holds the names of some animals:

      @a = ("dog", "cat", "bird", "camel", "giraffe");

I might want to serialize @a into a string for the purpose of storing
it off into a file so I can retrieve it later. Well, I could use the
Data::Dumper module to create a string (and later the eval command to
extract out the reference which then I can assign to the array), but
that can get complicated if I don't have much experience using the
Data::Dumper module.

   Well, using the pack string "(w/a*)*" I can easily serialize the
array into a string like so:

      $string = pack("(w/a*)*", @a);

Now $string contains all the encoded information needed to reconstruct
the @a array. So if I wanted to use $string to create a @b array that
was identical to the @a array, I can use unpack() with the same pack
string:

      @b = unpack("(w/a*)*", $string);

   Neat, doncha think? This same technique also works with hashes:

      $string = pack("(w/a*)*", %ENV);
      %wow = unpack("(w/a*)*", $string);
      # The %wow hash is now an exact copy of %ENV

   Now that we have a string representation of an array or hash, we
can save the string to a file, send it over a socket, or even encrypt
it using some encryption algorithm.

   This approach can even handle arrays (and hashes) that contain
scalars consisting of newlines, null-bytes, and other unprintable
characters!

   There are a few important items to point out:

1. The serialized string will most likely contain
    non-printable characters, which may include some
    newline characters, even if no scalar in the
    original array/hash contains a "\n" character.
    Because of this, you should use the binmode()
    function on any filehandle you plan to print the
    string out to.

2. If the array or hash contains any numbers, they
    will be converted to their string representation.

3. This technique only handles simple arrays and hashes.
    In other words, multi-dimensional arrays and hashes,
    lists of lists, an references are not handled
    correctly. If you really want to serialize a
    complex structure such as one of these, I recommend
    using another approach, like taking advantage of
    the Data::Dumper module. You CAN however, create
    an array of these serialized arrays, and serialize
    that array!

4. The "w" in the pack string "(w/a*)*" allows for the
    encoding of any arbitrary-length string, even if it
    is longer than 0xffffffff bytes (4,294,967,295
    bytes). But since "w" is only used for encoding
    non-negative integers, the "(w/a*)*" pack string
    cannot be used to encode arrays or hashes
    containing negative-length strings. Fortunately,
    that's never been a problem for me. :)

5. I do not know if this trick can handle arrays
    and hashes containing Unicode strings. My guess
    is that it can, but I haven't tested it so I can't
    say for sure.

   Anyway, that's my trick that I thought I would share with the rest
of you. Have fun with it!

   -- Jean-Luc Romano



Relevant Pages

  • Re: Use Incoming Variable to Select Pre-Defined Array
    ... Array as a hash. ... from hashes or dictionaries. ... as) strings, and converted to string on access if not already a string. ...
    (comp.lang.javascript)
  • Re: Checking if a value in one hash exists in another
    ... > key from one hash is found as a value in another hash then print the ... > key from both hashes where the match. ... a string or into an array, then index that string or array with ...
    (comp.lang.perl.misc)
  • Re: How can I use the string variable expansion for OO "$self->attribute"
    ... The trick of creating hashes that are really ... functions allows you to get function calls to interpolate (because ... "stringeval"), which would process the string correctly? ...
    (comp.lang.perl.misc)
  • Re: combining two arrays
    ... so there's no chance of an empty entry in the array. ... > look into hashes now- perhaps that's what I needed all along. ... A good candidate for C-style loops, assuming you really want string ...
    (comp.lang.perl.misc)
  • Re: Newbie output to byte array question
    ... fiziwig wrote: ... Are you sure you really mean a byte array, or will a char array do? ... "write" as a String. ... perhaps StringWriter will do the trick;-) ...
    (comp.lang.java.programmer)