Re: Text fingerprinting
- From: Thad Smith <ThadSmith@xxxxxxx>
- Date: Sat, 26 Nov 2005 11:43:41 -0700
Sumedh wrote:
> The problem that i have is something as follows:
> We would like to find similiarity for text that has been copied from a
> source. However, comparing the whole text would not be feasible (so
> string matching algorithms are not useful) and so we would like to
> generate some kind of a fingerprint for the texts which can be compared
> against the stored corpus of fingerprints to detect copying.
You need to define a theory for what you are trying to do. I presume
that the fingerprint is designed to be smaller than the original
text. What aspect of the original do you expect to be preserved in
the fingerprint? When people classify text, they might list subject,
author, fiction/non-fiction, length, creation date, etc. Are you
intending to identify a writing style -- level of formality,
regionalisms, etc.?
I have heard of software comparing text for high correlation (i.e.
copying text with minor modifications), but that requires complete
text. There is also style analysis, that helps to identify an
author. If the style for a written piece didn't match that of the
purported author, you have reason to suspect either copying or
ghost-writing, but it doesn't identify the source (unless it matches
the stored style for the source author).
--
Thad
.
- References:
- Text fingerprinting
- From: Sumedh
- Text fingerprinting
- Prev by Date: Re: Can someone give me an example of this type of problem?
- Next by Date: Re: Can someone give me an example of this type of problem?
- Previous by thread: Text fingerprinting
- Next by thread: Re: Text fingerprinting
- Index(es):
Relevant Pages
|