Re: OT: writing resumes with VT100 for a Lisp job

From: rem642b_at_Yahoo.Com (RobertMaas_at_YahooGroups.Com)
Date: 08/12/04


Date: Wed, 11 Aug 2004 17:31:10 -0700

Part 2 of very long reply I haven't yet finished composing:

--> http://www.paulgraham.com/sofar.html
(regarding sharing a blacklist of domains of spamvertised WebSites)
   To take advantage of this kind of information, we should ideally delay
   filtering as long as possible. I.e. filter when the user checks his
   mail, not when it arrives at the server. By the time you check your
   mail, odds are that any spam that made it into in your inbox has
   already been seen by thousands of people.
So you log in, see you have 200 new messages, open your MUA, it sits
there for ten minutes checking all those messages, finally after you
are totally running out of patience because you really had only two
minutes before you wanted to get offline to free your phone line for an
important call, your MUA finally says you have no new messages, all the
200 were spam. Further, all those 200 spam are filling up your disk
space, instead of rejected by the SMTP server in the first place.
Further, one of those 200 is an important message that was mistakenly
recognized as spam and which you won't see until it's too late, and the
sender won't have any idea why you haven't responded yet.

--> http://www.paulgraham.com/ffb.html
(Regarding auto-fetching WebPage for any URL in suspected spam:)
   Auto-retrieving spam filters would drive
   the spammer's costs up, and his sales down: his bandwidth usage would
   go through the roof, and his servers would grind to a halt under the
   load, which would make them unavailable to the people who would have
   responded to the spam.
I like the idea because the counter-attack is scaled with the actual
amount of spam that goes out. Somebody innocently sending spam to 20
friends wouldn't be attacked hardly at all, but somebody sending spam
to 50 million addresses would be bit back severely if a sizeable
fraction of those victims used auto-fetching filters.

   We would want to ensure that this is only done to suspected spams. As
   a rule, any url sent to millions of people is likely to be a spam url,
   so submitting every http request in every email would work fine nearly
   all the time. But there are a few cases where this isn't true: the
   urls at the bottom of mails sent from free email services like Yahoo
   Mail and Hotmail, for example.
   To protect such sites, and to prevent abuse, auto-retrieval should be
   combined with blacklists of spamvertised sites. Only sites on a
   blacklist would get crawled, and sites would be blacklisted only after
   being inspected by humans.
I disagree. The auto-fetching should occur immediately upon receipt of
each individual suspected spam, so that the spammer sees the hits on
the server start to happen almost immediately after starting the spam
run, before anyone actually responds to the ad. If the spammer realizes
he's being counterattacked, he may abort the spam run. If the admin of
the spamvertized site sees the sudden influx of hits, the admin may
have time to track down the spammer while he's still online.

If Yahoo! Mail is used for sending spam, let their standard servers
advertised at the end of each regular e-mail be hit too. If there are
enough false positives that Yahoo's servers are hit badly because of
legitimate e-mail that has these bottom-of-message URLs, there can be a
white-list for such specific URLs to prevent them from being hit after
they've been in use long enough to get incorporated into the
white-list. But actually I find those advertisements on outgoing e-mail
to be abuse. I accept that if I use Yahoo! Mail, I'll have to look at
ads for Yahoo services occasionally, because that's how my free Yahoo!
Mail account is paid for, but I don't think anyone I e-mail to should
likewise have to see those ads, because Yahoo isn't providing any of
them with any service. I feel that ads attached to the bottom of e-mail
constitute unsolicited advertising, and punishing Yahoo for such ads
might actually be a good thing.

Question: Does anybody have a small program that scans a message
looking for any full URL (including the leading http:// or whatever) or
semi-URL (missing that part), and then presenting it in KWIC (KeyWord
In Context) format? I understand Perl has built-in regular-expression
utilities, so perhaps a Perl program would be best for this purpose?
For example, manually looking at the latest spam to my secret ISP
address, and manually converting to KWIC format (my e-mail address **):
<a href=3D" http://limestone.mnbasdn.info/?O1QTQ3i9zmVb4iOdope ">Windows X
r> <a href=3D" http://to.mnbasdn.info/?O1QTQ3i9zmVb4iOcirce ">Adobe - Phot
 href=3D" http://armonk.mnbasdn.info/?O1QTQ3i9zmVb4iOpenitential ">Macrome
ref=3D" http://protactinium.mnbasdn.info/?O1QTQ3i9zmVb4iOjefferson ">Enter
 http://sprain.mnbasdn.info/>?HqdMdYb2YfOAZHHtask|recruit=*@*****.com
Note I've put two spaces before and after each URL to separate it from
context, to make it easy to eyeball the printout. So how hard would it
be to scan my e-mail automatically to generate such a report? For
comparison, here's the same format manually-generated report for the
last five URLs in legitimate e-mail (to my secret ISP address, again):
ef=" mailto:RobertMaas-unsubscribe@yahoogroups.com?subject=Unsubscribe ">R
ect to the <a href="
http://docs.yahoo.com/info/terms/ ">Yahoo! Terms of S
n your mobile phone. http://mobile.yahoo.com/maildemo --0-1419467261-10875
ecurity. So do we. http://promotions.yahoo.com/new_mail --0-1315019205-108
n your mobile phone. http://mobile.yahoo.com/maildemo --0-965139806-108760

--> http://www.paulgraham.com/wfks.html
(Re spammers trying to get past Bayesian filters:)
   they have to use fewer bad words. They can't use
   weird spellings (e.g. "Freee" instead of "Free") because filters
   quickly learn those.
Apparently Paul Graham has never heard of inserting random numbers in
and around "bad" words to make each instance a different word as seen
by the filter? 43fRe43e3 7FR579EE4367 Of course the filter can simply
remove all numbers from words to fix that trick, right? But about
randomly repeating the letters? ffreee frrre frreeee fffree ffffrrrre
Or intermixing repeated letters? frrereee efreee efffreeer
And inserting noise letters? fraee xfrre frzeee ffrea
How can the Bayesian filter possibly recognizing all those as mutations
of the same word so each isn't regarded as a brand-new non-bad word?
C1ak hear 4 f7xrre s4m9lle of v$46ra - p3n1s.comREMOVE
Hmm, I did a DNS lookup, using DIG, of that domain name, and there's
no such domain. It's such an obvious name, so I'm surprised it hasn't
yet been taken. Well, I bet within a week somebody scanning this
newsgroup will see it and register it! However a Google search for that
domain name string returned:
http://weblog.siliconvalley.com/column/dangillmor/archives/010475.shtml
   Oh, great: we finally get the capability to track down bin Laden by
   getting access to the tracebacks from windozeupdate.com, but, when we
   launch a cruise missile to take him out, instead of blowing up it
   scatters leaflets urging people to visit www.b1gg3r-p3n1s.com...
   Posted by: Ran Talbott on June 11, 2004 07:46 AM
I did a DIG on that full domain name too, nothing registered yet.



Relevant Pages

  • Re: Why cant ISPs stop spam/virus ?!
    ... I don't doubt that a small load of well designed spam can pass through. ... You need to get a decent ISP. ... The method of distribution is now thousands of Windows computers, ... You cannot filter by place of origin. ...
    (comp.os.linux.misc)
  • RE: Bystander shot by a spam filter.
    ... Bystander shot by a spam filter. ... bad advice is being mass marketed through the good offices of FreeBSD, ... Spambouncer doesn't like Inflow. ...
    (FreeBSD-Security)
  • Re: Junk mail filter for RWW?
    ... The two best blacklists that I have found so ... I have configured catch slightly greater than 90% of all the spam I receive. ... The Junk email filter in Outlook 2003 catches almost ... Spam caught per blacklist server Server Spam caught % of total spam ...
    (microsoft.public.windows.server.sbs)
  • Re: Look at these update from M$ Corporation.
    ... a mass scale which results in the complete breakdown of communication without ... few samples for the filters to learn that this is spam and that is not. ... because you're posting tripe to mailing lists with a needless Reply-To set ... samples of what I don't want and feeding them to the filter when the show up. ...
    (Debian-User)
  • Re: OT Spam filters?
    ... Trevor Smith wrote: ... Either delete them or, better, report them to a spam ... blacklists to see if the thing is already known about and report it if ... Change 'news' to 'sewn' in my Reply address to avoid my spam filter. ...
    (uk.comp.homebuilt)