Newbie here... getting a count of repeated instances in a list.

From: Amy G (amy-g-art_at_cox.net)
Date: 11/22/03


Date: Fri, 21 Nov 2003 16:38:43 -0800

I started trying to learn python today. The program I am trying to write
will open a text file containing email addresses and store them in a list.
Then it will go through them saving only the domain portion of the email.
After that it will count the number of times the domain occurs, and if above
a certain threshhold, it will add that domain to a list or text file, or
whatever. For now I just have it printing to the screen.

This is my code, and it works and does what I want. But I want to do
something with hash object to make this go a whole lot faster. Any
suggestions are appreciated a great deal.

Thanks,
Amy

ps. Sorry about the long post. Just really need some help here.

CODE
************************
file = open(sys.argv[1], 'r') # Opens up file containing emails
mail_list = file.readlines() # and sets the contents into a
list

def get_domains(email_list): # This function takes list of emails
and returns the domains only
            domain_list = email_list
            line_count = 0
            while line_count < len(email_list):
                        domain_list[line_count] =
email_list[line_count].split('@', 1)[1]
                        domain_list[line_count] =
email_list[line_count].strip()
            return domain_list

def count_domains(domain_list): # Takes argument of a list of domains and
returns a list of domains that
            counted_domains = 0 # occur more than <threshhold> number
of times
            line_count = 0
            domain_count = 0
            threshhold = 10
            while line_count < len(domain_list):
                        domain_count =
domain_list.count(domain_list[line_count])
                        if domain_count > threshhold:
                                    r = 0
                                    counted_domains.append(d)
                                    while r < (domain_count -1):
# Remove all other instances of an email once counted
                                                    domain_list.remove(d)
                                                    r = r + 1
                        line_count = line_count + 1
            return counted_domains

domains = get_domains(mail_list)
counted = count_domains(domains)
print counted

********************************************



Relevant Pages

  • Re: Newbie here... getting a count of repeated instances in a list.
    ... AMY ... > I started trying to learn python today. ... For now I just have it printing to the screen. ... > something with hash object to make this go a whole lot faster. ...
    (comp.lang.python)
  • OODB vs RDBMS
    ... If you want to store several email addresses of one ... this supported by the python binding? ... Are there OR-mappers which support lists in ... How is the unicode support of the python bindings to RDBMSs? ...
    (comp.lang.python)
  • Re: Module for mod_python
    ... state of record selection, and making that selection affect the ... parameter to a new page which lists the books in that store. ... would be interesting to implement it in Python. ...
    (comp.lang.python)
  • Py 3.0 print
    ... because I think I'm not fit for the python-dev mailing list yet. ... A printing operation as a function requires the, ... # Print without a trailing newline ... I think the % string formatting used in Python can be fine for the C ...
    (comp.lang.python)
  • Re: handling tabular data in python--newbie question
    ... Just jump in python few days. ... I am planning to use the column names as variables to access data, currently I am thinking of using a dictionary to store this file but did not figure out a elegant way to start. ... Let's store the rows in a dictionary of dictionaries, using the first column to index each row. ... rdict = dict(zip(names, cols)) ...
    (comp.lang.python)