Re: count of each word occurred

From: Edo (edwardoJE_at_aking.com)
Date: 06/22/04

  • Next message: Edo: "Re: do thing with the inputs"
    Date: Mon, 21 Jun 2004 15:32:05 -0700
    
    

    B. v Ingen Schenau wrote:
    > After you have sorted the list, why would you compare the current word with
    > *all* the rest of them?

    Ok, I see the problem, even thought I fixed the problem in this
    following algorithm, I think there would be yet better ways.

            struct WordEntry {
                    string word;
                    int count;
            };
            vector<WordEntry> WordCount;
            WordEntry wordcount;

    int main()
    {
            vector<string> words; //to store list of words from input
            typedef vector<WordEntry>::size_type vec_size;
            
            sort(words.begin(), words.end());

            int count = 1;
            vec_size j;
            vec_size i = 0;
            for (j=i+1; j < words.size(); ++j) {
                    if (words[i] == words[j]){
                            count++; //
                            ++i; // jumps to the next unique word
                    } else {
                            wordcount.word = words[i++];
                            wordcount.count = count;
                            WordCount.push_back( wordcount );
                            count = 1;
                    }
            }
            wordcount.word = words[i++];
            wordcount.count = count;
            WordCount.push_back( wordcount );
            count = 1;
            
    }
    the above works with no looping waste. but this repeat does not look
    good, maybe I will make a function call to look nicer.

            wordcount.word = words[i++];
            wordcount.count = count;
            WordCount.push_back( wordcount );

    >
    > words: +---------+ Wordcount .word: +---------+ .count: +---+
    > (size=9) |and | (size=5) |and | (size=5) | 1 |
    > +---------+ +---------+ +---+
    > |beginning| |beginning| | 1 |
    > +---------+ +---------+ +---+
    > |dark | |dark | | 1 |
    > +---------+ +---------+ +---+
    > |earth | |earth | | 1 |
    > +---------+ +---------+ +---+
    > |In | |In | | 1 |
    > +---------+ +---------+ +---+
    > |the |
    > +---------+
    > |the |
    > +---------+
    > |void |
    > +---------+
    > |was |
    > +---------+
    >
    > count: +---+ i: +---+ j: +---+
    > | 1 | | 5 | | 6 |
    > +---+ +---+ +---+
    >

    BTW; what tool did you use to explain the problem in your last post so
    elegantly?
    it would have been very helpful if I could do that on my own to avoid
    future code with unnecessary looping.

    I think Kerl’s way will require 2 tables to loop through, where as the
    above loop requires only one pass through the data “after sorting-not
    sure if it is expensive”.


  • Next message: Edo: "Re: do thing with the inputs"

    Relevant Pages

    • Re: Anyone know how F# discriminated unions are implemented?
      ... seems to take around 300 nS per compare by my ... You must remember that F# was created by Haskell programmers in the aftermath of the failed Haskell.NET project at MSR. ... He doesn't like type classes either because of the abuse they suffer in Haskell due to the lack of more appropriate language features. ... idiomatic FP equivalant of a loop, it would seem that I cannot exploit ...
      (comp.lang.functional)
    • Re: count of each word occurred
      ... > take each word in order and compare it with the rest of the list ... Another version of this algorithm takes advantage of the fact that it does ... loop over all words in the list ... a.c.l.l.c-c++ FAQ: http://www.comeaucomputing.com/learn/faq ...
      (alt.comp.lang.learn.c-cpp)
    • Re: Performance benefits of rep/loop?
      ... >> The loop instruction has always been slow on Intel processors. ... > Then compare the buffered a to the strings one at a time. ... > make up for the cost in setting up a. ... > that the instruction cache is a bit more reliable than the data cache. ...
      (comp.lang.asm.x86)
    • Re: Getting 290e -T/-T2 tuner working with Xubuntu
      ... Is there likely to be a problem with 'missing packets' if I make the loop ... jitter centered on 1000 rather than all being after 1000. ... or compare 'England' with 'Scotland'. ... That in the presence of 1/f variations the random variations of the ...
      (uk.comp.os.linux)
    • Re: Need an identity operator because lambda is too slow
      ... func = some_transform_function ... but does a function look-up every loop. ... identity = lambda x: x ... Now use timeit to compare doing the work on its own with doing the work ...
      (comp.lang.python)