googling for fun (and profit...? naah!-)

From: Alex Martelli (aleax_at_aleax.it)
Date: 10/31/03


Date: Fri, 31 Oct 2003 11:36:34 GMT


(Note: you need to download & install Mark Pilgrim's pygoogle, see
http://diveintomark.org/projects/pygoogle/ , get a personal license to the
google api, see http://www.google.com/apis/ , save it in a file such as
"googlekey.txt" in your home directory [pygoogle looks in several places,
see http://diveintomark.org/projects/pygoogle/readme.txt for the list]).

So, a little script such as...:

#! /usr/local/bin/python2.3
# programming languages popularity web-survey

import google
import time

def quoter(xs): return ['"%s"'%x for x in xs]
langs = '''
    python ruby perl caml java haskell lisp eiffel sml scheme
    fortran ada forth apl javascript ecmascript vbscript vba sql
    bash awk tcsh csh zsh ksh autolisp elisp occam intercal basic
    abc algol applescript assembly befunge beta chill cobol dylan
    erlang pascal delphi idl limbo smalltalk squeak m4 matlab logo
    foxpro turing tcl snobol simula setl self rexx rebol postscript
    php oz modula ml miranda mercury mumps oberon sather stackless
    functional procedural parallel hpf agile extreme database
    relational rpg
    '''.split() + quoter([
      'visual basic', 'object pascal', 'objective c', 'c++', 'c#', 'c',
      'stackless python', 'object oriented',
    ])

# ensure all duplications are removed
langs = dict.fromkeys(langs).keys()

print 'examining %d terms' % len(langs)
results = []
for i, lang in enumerate(langs):
    while True:
        print '%2d: %20s' % (i, lang.strip('"'), ),
        try: data = google.doGoogleSearch(lang + ' programming')
        except Exception:
            print "... likely internal server error, we wait & retry... "
            time.sleep(0.5)
        else:
            results.append((data.meta.estimatedTotalResultsCount, lang))
            print '%9d' % data.meta.estimatedTotalResultsCount
            break
results.sort()
results.reverse()
print
print
print '%20s %9s' % ("Language", "# of hits")
print

for numb, lang in results:
    print '%20s %9d' % (lang.strip('"'), numb)

Gives me the following results:

            Language # of hits

                   c 4980000
            database 3750000
               basic 3750000
                java 3320000
                self 2000000
                 php 1880000
                 c++ 1860000
                perl 1640000
                 sql 1150000
                logo 1070000
            parallel 1030000
          javascript 1030000
          functional 997000
     object oriented 944000
        visual basic 847000
                beta 745000
              python 729000
              scheme 693000
            assembly 687000
               forth 591000
             extreme 572000
                  c# 506000
          relational 377000
              delphi 354000
             fortran 344000
              pascal 329000
          postscript 297000
                 tcl 277000
                 abc 259000
                lisp 220000
          procedural 204000
                  ml 201000
                 ada 196000
            vbscript 181000
               cobol 171000
              foxpro 137000
                 vba 123000
              matlab 111000
           smalltalk 101000
                ruby 97900
                bash 87400
             mercury 86800
                 rpg 81600
                  oz 78500
              turing 72200
                rexx 66100
               agile 62700
              eiffel 58300
                 idl 58100
             haskell 55100
                 awk 53100
               mumps 49800
               chill 47600
         objective c 44900
              modula 39000
                 apl 38800
                 csh 31700
               dylan 31500
              simula 30600
              erlang 29900
                  m4 28000
              squeak 24400
             miranda 24300
         applescript 24000
       object pascal 23900
               algol 21000
                 ksh 17900
                tcsh 17600
                 sml 16000
              oberon 15400
                caml 15300
                 hpf 11900
               limbo 11400
               rebol 10800
               occam 10300
               elisp 8780
          ecmascript 7080
                 zsh 5640
            autolisp 5430
              sather 4260
              snobol 3900
            intercal 2700
                setl 2010
           stackless 1040
             befunge 951
    stackless python 431

of course there are quite a few anomalies here -- e.g. i think there is
no automatic way to "clean" the C hit count from the hits for objective c,
c++, c# -- basic from visual basic -- and so on. But then, this is for
fun, not a scientific query, which is why i've mixed other catchwords
with the programming languages as I thought of them.

Doing some "eyeball cleanup" we can see that c, net of c++, c# etc, must
be a little below Java; basic, net of visual basic, ditto. 'self' is
alas too unlikely to refer to that little-known though interesting
language:-). similarly for 'logo', 'beta', ... -- and 'sql' is likely
to be mixed up with many other languages too.

So, I think the top ten places, in order, for actual languages, are really:
        java
           c (not objective/c++/c#)
       basic (not visual)
         php
         c++
        perl
  javascript
visual basic
      python
      scheme

not too surprising, I guess. One could explore a bit more of course
(e.g. specifically look for 'basic -visual' etc etc) but I'm running
a bit short of my daily 1000 searches so I'm gonna leave that fun to
you, o readers. Points to ponder: the preponderance of visual basic
over python, and of python over scheme, is really small; the latter
may perhaps be explained by some occurrences of 'scheme' as an ordinary
word rather than the language name, and the former by the fact that the
typical web usage of many visual basic programmers is unlikely to include
writing websites about VB, compared to the web usage of Pythonistas.

If scheme's apparent popularity does turn out to be an artefact, then
forth (or is it an artefact from "go forth" etc...?-), assembly (but IS
that used in the programming sense...?), and C# are the other possible
contenders for the coveted tenth place. After the contenders for the
top places we have a (to me!) somewhat surprising bunch -- delphi,
fortran, pascal, postscript (!), tcl, abc (!?), lisp, ml, ada, and
vbscript in this order. Wow -- how are the mighty fallen! -- cobol
is BELOW this second bunch...!

Coming to buzzwords that aren't programming languages, other
surprises await: "functional" edges out "object oriented", "extreme"
is WAY more popular than "procedural" (yeah right:-), "agile"
programming isn't as popular a term as I'd have thought (but still,
more than eiffel...:-).

Plenty of other food for flamewars here -- can mercury AND oz
really be THAT much more popular than haskell, erlang, caml -- the
latter badly outscored even by OLD miranda -- and ML so WAY more
popular than ALL other pure functional languages & dialects (and
indeed even more than ada, vbscript, cobol, foxpro, vba, matlab,
smalltalk, ruby, bash...)...?!

googling sure IS plenty of fun!!!-)

Alex



Relevant Pages