Re: Comparing files (Fast)

From: The One (egast_at_hotmail.com)
Date: 03/23/04


Date: Tue, 23 Mar 2004 17:43:19 +0100

Hi,

Thanks for replying.

The reason I used fseed() is because if I didn't use it, after a few
comparisons fgetc returned -1.
I didn't knew what is was, but using fseek solved the problem. Does anyone
knows why fgetc returned -1 after a few comparisons?.

Thanks.

The One.

"Leor Zolman" <leor@bdsoft.com> wrote in message
news:ibpu50tet9h5uojkgqeq4opiafh6b9dqqh@4ax.com...
> On Mon, 22 Mar 2004 22:25:19 +0100, "The One" <egast@hotmail.com> wrote:
>
> >Hi,
> >
> >ofcourse the code, here it is:
> >int ch1,ch2;
> >
> >int Filesize = filesize(File1); //gets the filesize of the file :-)
> >
> >for(i=0;i<=Filesize;i++)
> >
> >{
> >
> >fseek(File1,i,0);
> >
> >fseek(File2,i,0);
> >
> >ch1 = fgetc(File1);
> >
> >ch2 = fgetc(File2) ;
> >
> >if(ch1 != ch2)
> >
> >{
> >
> >cout << hex << ch1 << " changed to " << ch2;
> >
> >cout << " at offset "<< i << endl;
> >
> >}
> >
> >
> >
> >I know it must be inefficient, but I don't know an other way to do it. If
> >you know an other way please tell me.
> >
> >Thanks.
>
> I think the fseeks are killing you big time. Let's measure.
> I created two text files, file1.txt and file2.txt, having length 471,288
> (nothing special, just how it turned out by copying and pasting in my text
> editor). I changed the last character of the last line of file2.txt to be
> different than file1, but otherwise they're identical.
>
> To time the programs, I'm using a timer utility Scott Meyers wrote for use
> in his courses (if anyone's interested, the entire utility header,
> including function templates for displaying values of STL containers, is
> included in my InitUtil distributions).
>
> Here's the first version of the program, using fseek as you did:
>
>
> //
> // fcomp.cpp: uses fseek
> //
>
> #include <iostream>
> #include <cstdio>
> #include "ESTLUtil.h"
>
> using namespace std;
> using namespace ESTLUtils;
>
> size_t filesize(FILE *fp)
> {
> fseek(fp, 0, 2);
> size_t s = ftell(fp);
> fseek(fp, 0, 0);
> return s;
> }
>
> int main()
> {
> FILE *File1 = fopen("file1.txt", "r");
> FILE *File2 = fopen("file2.txt", "r");
>
> int ch1,ch2;
>
> int Filesize = filesize(File1); //gets the filesize of the file :-)
>
> cout << "File size: " << filesize << endl;
>
> cout << "Beginning timing:" << endl;
> Timer t;
>
> for(int i=0;i<=Filesize;i++)
> {
>
> fseek(File1,i,0);
>
> fseek(File2,i,0);
>
> ch1 = fgetc(File1);
>
> ch2 = fgetc(File2) ;
>
> if(ch1 != ch2)
>
> {
>
> cout << hex << ch1 << " changed to " << ch2;
>
> cout << " at offset "<< i << endl;
>
> }
>
> }
>
> cout << t;
>
> return 0;
> }
>
>
> (Of course my filesize isn't "portable" but it works well enough on XP
> where I'm testing). Results:
>
> d:\src\learn>fcomp
> File size: 004010D0
> Beginning timing:
> 66 changed to 67 at offset 730f5
> 6.259
>
>
> Now version 2, not using fseek:
>
> //
> // fcomp2.cpp: not using fseek
> //
>
> #include <iostream>
> #include <cstdio>
> #include "ESTLUtil.h"
>
> using namespace std;
> using namespace ESTLUtils;
>
> size_t filesize(FILE *fp)
> {
> fseek(fp, 0, 2);
> size_t s = ftell(fp);
> fseek(fp, 0, 0);
> return s;
> }
>
> int main()
> {
> FILE *File1 = fopen("file1.txt", "r");
> FILE *File2 = fopen("file2.txt", "r");
>
> int ch1,ch2;
>
> int Filesize = filesize(File1); //gets the filesize of the file :-)
>
> cout << "File size: " << filesize << endl;
>
> cout << "Beginning timing:" << endl;
> Timer t;
>
> for(int i=0;i<=Filesize;i++)
>
> {
> ch1 = fgetc(File1);
> ch2 = fgetc(File2) ;
>
> if(ch1 != ch2)
>
> {
>
> cout << hex << ch1 << " changed to " << ch2;
>
> cout << " at offset "<< i << endl;
>
> }
>
> }
>
> cout << t;
>
>
> return 0;
> }
>
>
> Results:
>
> File size: 004010D0
> Beginning timing:
> 66 changed to 67 at offset 71d6e
> 0.03
>
> Quite a difference, 6+ seconds vs. .03 seconds.
>
> I expected it to be faster without the seeking, but I didn't expect it to
> be /that/ much faster. Learn something every day....
> -leor
>
>
> --
> Leor Zolman --- BD Software --- www.bdsoft.com
> On-Site Training in C/C++, Java, Perl and Unix
> C++ users: Download BD Software's free STL Error Message Decryptor at:
> www.bdsoft.com/tools/stlfilt.html



Relevant Pages

  • Re: Comparing files (Fast)
    ... I'm using a timer utility Scott Meyers wrote for use ... Here's the first version of the program, using fseek as you did: ... using namespace std; ... 66 changed to 67 at offset 730f5 ...
    (comp.lang.cpp)
  • Re: input and output questions about file
    ... using namespace std; ... junk.c:5: warning: type defaults to `int' in declaration of `std' ... junk.c:10: `string' undeclared ... junk.c:12: parse error before '/' token ...
    (comp.lang.c)
  • Re: return memory to the OS
    ... > 4 using namespace std; ... > 5 int main ... > possible to make STL return this memory to the OS (say using a system ... :: "out of confusion comes chaos -- out of chaos comes confusion and fear ...
    (comp.lang.cpp)
  • Re: Huge Floating Point Error
    ... using namespace std; ... int main ... floating point math is different from integer math. ...
    (microsoft.public.dotnet.languages.vc)
  • Re: New to c++.net (need help)
    ... > using namespace std; ... > int main ... Here's a hint for when ... your program is behaving other than how you expect, ...
    (alt.comp.lang.learn.c-cpp)