Re: Almost Done: Need some Help in Generating FEATURE VECTORS

From: dont bother (dontbotherworld_at_yahoo.com)
Date: 03/06/04


Date: Fri, 5 Mar 2004 19:11:39 -0800 (PST)
To: Josiah Carlson <jcarlson@nospam.uci.edu>, python-list@python.org

Hi Josiah and Others,
Thanks a Ton.
I could figure out my work with your help.
However, I am stuck up with a little thing now:

Dictionary: I want to associate each word in the
dictionary with an index. Ex:
right now I have:

viagra
play

etc
I want to associate each word in the dictionary with
an index like this:

1 viagra
2 play

I tried with enumerate did not work.

Problem 2: I want to create feature vectors of the
type

[1 1:<value> 10:<value> ...18:<value>]

I am able to compute the right value. I have to
associate this value with the index in the dictionary.

I want some help regarding framing this feature vector
specifically adding '[' ,']' and inserting a value 1
or 0 which should be from the user input. And a index:
value pair.

The loop that checks the word with the dictionary is
here:

A body of the program is here:

import os
import sys
import re
import mailbox
import email.Parser
import email.Message
import getopt

#load up external dictionary:
words = open('dictionary', 'r').read().split()
dct = {}
for i in xrange(len(words)):
     dct[words[i]] = i

#make vector:
vector = {}

fp=open(sys.argv[1], 'r')

msg=email.message_from_file(fp)

msg=msg.get_payload()

#a = float(len(fp))

#a = float(len(words_in_body))

#get rid of anything that isn't a letter, and make it
all lowercase:
lower = ''.join(map(chr, range(97, 123)))
fixed_body = msg.translate(65*' '+lower+6*'
'+lower+133*' ')

#words_in_body = fixed_body.split()

msg = fixed_body.split()

a = float(len(msg))
print a

for i in msg:
     if i in dct:
         try:
             vector[i] += 1
         except:
             vector[i] = 1

for i in vector:
    vector[i] /= a
    print i, vector[i]

--- Josiah Carlson <jcarlson@nospam.uci.edu> wrote:
> #First, normalize the line breaks:
> email_source = email_source.replace('\r\n',
> '\n').replace('\r', '\n')
>
> #toss the headers:
> pos = email_source.find('\n\n')
> if pos != -1:
> email_body = email_source[pos:]
> else:
> email_body = email_source
>
> #clean out html:
> (use the method given
> http://flangy.com/dev/python/striphtml.html )
>
> #get rid of anything that isn't a letter, and make
> it all lowercase:
> lower = ''.join(map(chr, range(97, 123)))
> fixed_body = email_body.translate(65*' '+lower+6*'
> '+lower+133*' ')
>
> words_in_body = fixed_body.split()
>
> #load up external dictionary:
> words = open('dictionary', 'r').read().split()
> dct = {}
> for i in xrange(len(words)):
> dct[words[i]] = i
>
> #make vector:
> vector = {}
> a = float(len(words_in_body))
> for i in words_in_body:
> if i in dct:
> try:
> vector[i] += 1
> except:
> vector[i] = 1
>
> for i in vector:
> vector[i] /= a
>
>
>
> I know the above doesn't fit with what you have, but
> you should be able
> to adapt it.
>
> - Josiah
> --
> http://mail.python.org/mailman/listinfo/python-list

__________________________________
Do you Yahoo!?
Yahoo! Search - Find what you’re looking for faster
http://search.yahoo.com



Relevant Pages

  • Re: a technical how to
    ... you want to get rid of (lets say you want to get rid of 1-1152 and keep ... For a binary file, use dd and tell it to skip the bytes in front ... How to truncate a file from the ... > Do you Yahoo!? ...
    (freebsd-questions)
  • Re: send email from windows aplication..
    ... use yahoo or hotmail servers to send emails from your windows ... You can use yahoo to send email if the account has been configured to allow ... LOCAL msg AS MailMessage ... LOCAL receiver, sender AS MailAddress ...
    (microsoft.public.dotnet.general)
  • Re: explorer.exe - application error
    ... I kept getting a similar msg. ... I rooted around and got rid of most of the ... Under Winamp preferences> file types I cleared the box ...
    (microsoft.public.win2000.applications)
  • Re: Runs like a slug!
    ... Unless yahoo, gmail, and some few other well used sites are dangerous, I should be safe. ... (Instead of a new opperator, maybe you should just mind your own business.) ... I'm getting rid of 9.0 and trying some of the other programs suggested here. ...
    (microsoft.public.windowsxp.general)
  • Re: Not able to send Yahoo mail with Outlook Express & dial-up
    ... detail in my previous msg. ... "The connection to the server has failed. ... >> I cannot send from my Yahoo address when using OUtlook ...
    (microsoft.public.windows.inetexplorer.ie6_outlookexpress)