MemoryError on reading mbox file



Hello everybody,

I have to convert a huge mbox file (~1.5G) to MySQL.

I tried with the following simple code:

for m in mailbox.mbox(fileName):

msg = m.as_string(True)
hash = md5.new(msg).hexdigest()

try:
dbcurs.execute("""INSERT INTO archive (hash, msg) VALUES (%s,
%s)""", (hash, msg))
except MySQLdb.OperationalError, err:
print "%s Error (%d): %s" % (file, err[0], err[1])
else:
print "%s: Message successfully added to database" % (hash,
spamSource)

The problem seems to be the size of file, every time I try to execute
the script, after about 20000 messages, the following error occurs:

Traceback (most recent call last):
File "email_to_mysql_mbox.py", line 21, in <module>
for m in mailbox.mbox(fileName):
File "/usr/lib/python2.5/mailbox.py", line 98, in itervalues
value = self[key]
File "/usr/lib/python2.5/mailbox.py", line 70, in __getitem__
return self.get_message(key)
File "/usr/lib/python2.5/mailbox.py", line 633, in get_message
string = self._file.read(stop - self._file.tell())
MemoryError

My system has 512M RAM and 768M swap, which seems to run out at an
early stage of this. Is there a way to clean up memory for messages
already processed?

Thanks and regards,
Christoph

.



Relevant Pages

  • Re: Comparing 2 MSG files .
    ... I don't know how to retrieve those properties from a formerly attached .msg ... "Dmitry Streblechenko" wrote: ... calculate a hash. ... We've written an application that processes e-mails we receive in Outlook ...
    (microsoft.public.outlook.program_vba)
  • Re: Comparing 2 MSG files .
    ... Attachment.SaveAsFile, calculate the file hash, ... OutlookSpy - Outlook, CDO ... "Dmitry Streblechenko" wrote: ... except attached .msg files (note we are not saving a certain ...
    (microsoft.public.outlook.program_vba)
  • Re: [opensuse] Re: [SLE] Problem with procmail [solved] at last
    ... Hash: SHA1 ... I have a max msg size but maxmbox is set to "0". ...
    (SuSE)
  • Re: using mail with linux
    ... Hash: SHA1 ... > i'm in mail, i'm typing my msg.... ... > i want to get out of the msg, ...
    (RedHat)
  • Re: [Full-disclosure] Ioncube Encoded PHP Files
    ... > rather than change or server control. ... compares it against the hash of the password you want to see, ... permissions in the database, the irreversibility of hashes, or somesuch ... effort to find out it uses MySQL, and it is not likely very difficult to ...
    (Full-Disclosure)