Re: count of each word occurred

From: Karl Heinz Buchegger (kbuchegg_at_gascad.at)
Date: 06/18/04


Date: Fri, 18 Jun 2004 13:09:05 +0200

Edo wrote:

Sorry, hit accidently on "send"
Please continue reading, where I dropped in
the last post.

>
>
> thanks, that is a great help, I did the logic part
> but need help with
> C++ part
>
> // take each work and compare it with
> // each word in the list
> int count = 1;
> int idx = 0;
> for (int i=0; i < words.size(); ++i) {
> for (int j=i+1; j < words.size(); ++j) {
> if (words[i] == words[j]){
> count++;
> }
> }
> word_count[idx].word.push_back(words[i]);
> word_count[idx].count.push_back(count);
> count = 1;
> idx++;
> }
>

As to your errors.

> 4_5.cpp:28: warning: comparison between signed and unsigned integer
> expressions

What is the return value of size()?
With what are you comparing it to? What is it's type?

> 4_5.cpp:29: warning: comparison between signed and unsigned integer
> expressions
> 4_5.cpp:34: error: syntax error before `[' token

Seems that you have applied [ on something that is not an array
or vector.
But without seeing the definitions of word_count or words it is
impossible to tell exactly.

If word_count is the same as in your original post then:

word_count is a *data type*

struct word_count {
  vector<string> word;
  vector<int> count;
};

defines what a structure looks like, but it is not a variable. You need
to create a variable to work with it. In the same way that you cannot write

   int[5] = 8;

but need to define a variable for that

  int a[10];

  a[5] = 8;

you cannot write

  word_count[5].word.....

You need a variable for that

  word_count TheWords;

  TheWords.word.push_back( .... );

BTW: You design seems to be flawed (which brings me back to:
do the logic part first).

What you want is a structure which bundles a *single* word with
how often that word occoured:

   struct WordEntry {
     string word;
     int count;
   }

You then use this new data type to build a vector from it:

   vector< WordEntry > TheWords;

and use it

   WordEntry NewWord;
   NewWord.word = "In";
   NewWord.count = 1;

   TheWords.push_back( NewWord );

How I came up with this?
Well. If you are anything like me, you would do the whole thing
on paper and pencil as follows:

Have 2 tables. One contains the original words, the other contains
the unique words paired with a count how often this word has occoured:

table 1 tabel 2
********* ***********
In
the
beginning
the
earth
was
void
and
dark

Now start at the first word. It is "In". I then would try to look it
up in table 2. Hmm, it's not there. Thus I add it to that table give
it a count of 1.

table 1 tabel 2
********* ***********
In In 1
the
beginning
the
earth
was
void
and
dark

Next word: "the".
Looking up table 2 shows that it is not there. Thus I add another entry for
"the" and again give it a count of 1

table 1 tabel 2
********* ***********
In In 1
the the 1
beginning
the
earth
was
void
and
dark

Next word: "beginning"
Same thing: not in table 2, thus add it

table 1 tabel 2
********* ***********
In In 1
the the 1
beginning beginning 1
the
earth
was
void
and
dark

Next word: "the"
Look up table 2 to see if it is already there reveals: it is already
there, thus I simply increment the count by 1

table 1 tabel 2
********* ***********
In In 1
the the 2
beginning beginning 1
the
earth
was
void
and
dark

Next word: "earth"
....
and so on and so on.

Stepping back and analyzing what I have done:

  for all words in table 1 {

    find word in table 2

    if not found then
      create new entry and give it a count of 1
    else
      increment counter at found position
  }

Refining this brings us closer to a C program, but first you will
need to spend some thoughts on how table 2 should be organized:
The important thing in table 2 is the connection between the word
and the counter. Those 2 things belong together as far as table 2
is concerned. When doing the paper/pencil test, this relationship
was emphasized by the fact, that I wrote both items on the same line,
while 2 lines have really nothing in common; there is no relationship
between 2 lines besides that both of them happen to be in the same table.

That's why I build the structure to group those 2 item: a single word
and a single counter.

-- 
Karl Heinz Buchegger
kbuchegg@gascad.at


Relevant Pages