Re: reduced size symbols/keywords



"John Thingstad" <jpthing@xxxxxxxxx> writes:

På Fri, 29 Aug 2008 07:39:14 +0200, skrev verec <verec@xxxxxxx>:

Toying, toying ... rather than doing useful work ... (Kenny must be
right after all :-)


As far as I know there is no way to make symbols any smaller.
A more serious problem that the size is that it makes a package a
leaky abstraction.
You export a function you automatically export the variable with the
same name as well as the plist, class etc.


This question is not (entirely) rhetoric, as I'm thinking of some
application reading 1,000,000s words out of text files and turning them
into symbols/keywords for processing. Cutting down the memory
footprint by half or more would be extremely significant, while still
preserving "most" of the properties of symbols, while sacrifing a few,
ie, while


But not 1,000,000 DIFFERENT words I trust.

The great languages have more than three million words. Most of them
are technical and jargon, but nonetheless you can read one million
different words.

Of course it depends on what you mean by "word", if you mean the
roots, or if you mean the various forms a word can take. But when
reading words, I guess that the various forms is what is read.


With that voulume wouldn't a hash table be a better choice anyhow?
I find that plist's are best for 100 elements or less.


The standard way of dealing with this is to Goedelize it yourself.
Your file reads the words as strings.
They are stored in a hash table and assigned a number.
Each time you see a string look the number up in the hash table. If it
is not there generate a new number and store (string - value) there.
This is more compact.

--------------
John Thingstad

--
__Pascal Bourguignon__
.



Relevant Pages

  • Re: Immutable, Interned strings??
    ... What is the meaning of immutable, interned strings? ... A String object can have its value ... The 'hash' method is used by Hash objects to convert key objects to numbers, ... they are stored in a pool in memory and reused where possible. ...
    (comp.lang.ruby)
  • Re: Calculating hashcode in double hashing
    ... suspect that you'll find that double hashing is a waste of effort. ... the machine's data caching than a normal hash table -- normally once you've ... hashed your string you then do a linear scan down a stretch of contiguous ... memory looking for the relevant slot -- and that strip of memory will usually ...
    (comp.lang.java.programmer)
  • Re: How to write a diff in VB6 for comparing two xml files?
    ... No, the best you could do is to read both into string and use StrCompbut it's inefficient and, but using the hash ... Private Declare Function CryptAcquireContext Lib "AdvAPI32.dll" Alias _ ... Dim HashAAs Byte, HashLenA As Long ...
    (microsoft.public.vb.general.discussion)
  • Re: something like switch in c
    ... >> straightforward string comparisions. ... > inner table size and/or add symbols to expand the hash. ... It all depends on the empirical pattern of the actual keys. ... The value of the random number generator is UNCHANGED on ...
    (comp.programming)
  • Re: How to make PKCS#7 signature using CryptoAPI?
    ... Those MSDN samples hash a string PLUS the null byte (so that it ... I tried your sample and had no problem verifying with openssl (after I added ... functions (including CryptSignMessage). ...
    (microsoft.public.platformsdk.security)