Newbie here... getting a count of repeated instances in a list.
From: Amy G (amy-g-art_at_cox.net)
Date: 11/22/03
- Next message: Douglas Alan: "Tkinter widget that functions like Explorer "Details" mode?"
- Previous message: Peter Otten: "Re: Help Pliz ! stuck on page 4 of tutorial"
- Next in thread: Robert Brewer: "RE: Newbie here... getting a count of repeated instances in a list."
- Maybe reply: Robert Brewer: "RE: Newbie here... getting a count of repeated instances in a list."
- Reply: Peter Otten: "Re: Newbie here... getting a count of repeated instances in a list."
- Reply: Amy G: "Re: Newbie here... getting a count of repeated instances in a list."
- Reply: Amy G: "Re: Newbie here... getting a count of repeated instances in a list."
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Fri, 21 Nov 2003 16:38:43 -0800
I started trying to learn python today. The program I am trying to write
will open a text file containing email addresses and store them in a list.
Then it will go through them saving only the domain portion of the email.
After that it will count the number of times the domain occurs, and if above
a certain threshhold, it will add that domain to a list or text file, or
whatever. For now I just have it printing to the screen.
This is my code, and it works and does what I want. But I want to do
something with hash object to make this go a whole lot faster. Any
suggestions are appreciated a great deal.
Thanks,
Amy
ps. Sorry about the long post. Just really need some help here.
CODE
************************
file = open(sys.argv[1], 'r') # Opens up file containing emails
mail_list = file.readlines() # and sets the contents into a
list
def get_domains(email_list): # This function takes list of emails
and returns the domains only
domain_list = email_list
line_count = 0
while line_count < len(email_list):
domain_list[line_count] =
email_list[line_count].split('@', 1)[1]
domain_list[line_count] =
email_list[line_count].strip()
return domain_list
def count_domains(domain_list): # Takes argument of a list of domains and
returns a list of domains that
counted_domains = 0 # occur more than <threshhold> number
of times
line_count = 0
domain_count = 0
threshhold = 10
while line_count < len(domain_list):
domain_count =
domain_list.count(domain_list[line_count])
if domain_count > threshhold:
r = 0
counted_domains.append(d)
while r < (domain_count -1):
# Remove all other instances of an email once counted
domain_list.remove(d)
r = r + 1
line_count = line_count + 1
return counted_domains
domains = get_domains(mail_list)
counted = count_domains(domains)
print counted
********************************************
- Next message: Douglas Alan: "Tkinter widget that functions like Explorer "Details" mode?"
- Previous message: Peter Otten: "Re: Help Pliz ! stuck on page 4 of tutorial"
- Next in thread: Robert Brewer: "RE: Newbie here... getting a count of repeated instances in a list."
- Maybe reply: Robert Brewer: "RE: Newbie here... getting a count of repeated instances in a list."
- Reply: Peter Otten: "Re: Newbie here... getting a count of repeated instances in a list."
- Reply: Amy G: "Re: Newbie here... getting a count of repeated instances in a list."
- Reply: Amy G: "Re: Newbie here... getting a count of repeated instances in a list."
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|