Re: Neural net - guessing literature kind

From: Will Dwinnell (predictr_at_bellatlantic.net)
Date: 11/29/03


Date: 29 Nov 2003 12:03:10 -0800

Calum <calum.bulk@ntlworld.com> wrote:
"I think the problem is that a neural network would not know how to
structure its input."

Structuring the input is the job of the analyst who builds the neural
network- or whatever model is being used.

Calum <calum.bulk@ntlworld.com> continues:
"If you just put a book in front of a child, it will not learn to
read. Similarly, an artifical neural net could hardly build up a
lexicon or grammar, this is way beyond the capabilities of an
artificial neural network."

Since we're arguing by analogy, let's also note that 'If you just put
a disc in front of a computer, it will not read it'. Again, people
still have some niggling responsibilities in the information age.

Calum <calum.bulk@ntlworld.com> continues:
"So what exactly would it discriminate against???"

Assuming that this is at all possible, such a model would require
derived features, as you describe below.

Calum <calum.bulk@ntlworld.com> continues:
"On the other hand, some pre-processed metrics could be classified. I
mean, suppose there was a text processing algorithm that could build
up a "signature" of an author. Perhaps ten numbers or so. These
could then be classified using a neural network. But I've never heard
of that being done before."

Consider the following, which discriminate authors (of text and
computer programs), genders of authors, etc.:

  http://www.cavi.univ-paris3.fr/lexicometrica/jadt/jadt2002/PDF-2002/baayen_vanhalteren_neijt_tweedie.pdf

  http://webster.cs.uga.edu/~khaled/MLcourse/Abstract1.pdf

  http://clue.eng.iastate.edu/~guan/course/CprE-592-YG-Fall-2002/paper/Olivier_DeVel.pdf

  http://ftp.cerias.purdue.edu/pub/papers/ivan-krsul/krsul-spaf-authorship-analysis.pdf

"Doing something like "average word length" could perhaps discriminate
between a tabloid and a broad***. But I don't think this would work
reliably in general. Cosine correlation is one method that does work,
I'd rather see a project that works, than one that is k001 because it
uses a neural net."

How well this works (keep in mind there is a spectrum of performance)
will depend on the particulars of the problem. So far, I haven't seen
enough information about the original poster's problem to even hazard
a guess as to whether a solution is feasible.

-Will Dwinnell
http://will.dwinnell.com