Re: WILDCARD: output all a* by searching a text file
- From: Pascal Bourguignon <pjb@xxxxxxxxxxxxxxxxx>
- Date: Fri, 18 May 2007 10:35:41 +0200
Umesh <fraternitydisposal@xxxxxxxxx> writes:
/*program to search a* in a text file & write output in a file.*
indicated any character. IT IS WORKING BUT HOW TO GENERALISE IT FOR A
LONG STRING
LIKE umesh*** OR Suppose I want to find all words in a text file which
starts with 'a'
and ends with 'z' i.e a*z
where * denotes a string of characters. How can I do it? */
First, you should use an editor that indents your code properly (for
example, emacs), so your program becomes a little more readable, like this:
#include<stdio.h>
#include<stdlib.h>
int main(void)
{
FILE *f,*fp;
f=fopen("c:/1.txt","r");
if(f==NULL) { puts("Error opening file");exit(0);}
fp=fopen("c:/2.txt","w");
char c,ch;
while((c=getc(f))!=EOF && (ch=getc(f))!=EOF )
{
if(c=='a'&& ch!=' ')
fprintf(fp,"%c%c\n",c,ch);
}
fclose(f);
fclose(fp);
return 0;
}
Now, the problem with your program is that there is no abstraction.
When you want to write a program that search a pattern such as /a./
(using the regex(7) syntax for regular expressions), you should not
hardcode the test for a followed by anything in the middle of the code
to read the file.
One early question to answer is whether you're considering the file as
a sequence of character, or as a sequence of line, searching the
pattern on each line. Assume this is the later, you will have two
layers, two different problems:
- read the file line by line,
- search a pattern on a line.
You can then easily write the program, that will work both for the
simple pattern, or for a more complex pattern, implementing simple
functions such as:
int main(void){
pattern_t* pattern=make_pattern("a.");
copy_lines_matching_pattern(stdin,stdout,pattern);
return(0);
}
So when you want a more complex pattern, you just write:
int main(void){
pattern_t* pattern=make_pattern("umesh*");
copy_lines_matching_pattern(stdin,stdout,pattern);
return(0);
}
Or you could take the pattern from the argument line:
int main(int argc,const char** argv){
if(2!=argc){
printf("Usage: %s pattern\n",argv[0]);
return(1);
}else{
pattern_t* pattern=make_pattern(argv[1]);
copy_lines_matching_pattern(stdin,stdout,pattern);
return(0);
}
}
See? It's very easy! That's the power of abstraction.
Now, of course, for this program to work, you have to implement
make_pattern and copy_lines_matching_pattern, but this is something
you could subcontract, or even do yourself, since it's much easier
than the original task.
For example, copy_lines_matching_pattern would simply be:
void copy_lines_matching_pattern(FILE* input,FILE* output,pattern_t* pattern){
line_t* line;
while(line=next_line_matching_pattern(input,pattern)){
write_line(output,line);
}
}
See? Here again, we used abstraction, using functions such as
next_line_matching_pattern and write_line. Then aren't in stdlib? No
problem! They're yet simplier, and yet more general, so they can
easily be found in some library, or subcontracted, or done yourself...
For example:
void write_line(FILE* file,line_t* line){
size_t written=fwrite(line->chars,1,line->length,file);
if(written!=line->length){
fprintf(stderr,"fwrite could not write a whole line\n");
exit(1);
}
}
So we see that line_t must be a structure containing at least:
typedef struct {
char* chars;
size_t length;
} line_t;
Now, next_line_matching_pattern can easily (and abstractly) be written as:
line_t* next_line_matching_pattern(FILE* file,pattern_t* pattern){
line_t* line;
do{
line=next_line(file);
}while(!(pattern_match_line(pattern,line)));
return(line);
}
Of course, next_line can easily be written as:
line_t* next_line(FILE* file){
line_t* line=make_line(4096);
if(fgets(line->chars,line->allocated,file)){
line->length=strlen(line->chars);
return(line);
}else{
free_line(line);
return(0);
}
}
Where we see that line_t needs one more field:
typedef struct {
char* chars;
size_t length;
size_t allocated;
} line_t;
So we can write make_line:
line_t* make_line(size_t allocated){
line_t* line=malloc(sizeof(*line));
if(0==line){ out_of_memory(); }
line->chars=malloc(allocated);
if(0==line->chars){ out_of_memory(); }
line->allocated=allocated;
line->length=0;
return(line);
}
So now we must write pattern_match_line (or else, just contact an
off-shore software development company to do it...). One easy way to
do it is to use the regex library:
bool pattern_match_line(pattern_t* pattern,line_t* line){
return(0==regexec(pattern->regexp,line->chars,0,0,0));
}
therefore, make_pattern shall be as:
pattern_t* make_pattern(const char* regexp){
pattern_t* pattern=malloc(sizeof(*pattern));
if(0==pattern){ out_of_memory(); }
if(0!=regcomp(&(pattern->regexp),regexp,0)){
fprintf(stderr,"Error in regular expression /%s/\n",regexp);
exit(1);
}
return(pattern);
}
and therefore, pattern_t must be:
typedef struct {
regex_t regexp;
} pattern_t;
And so on, for the missing functions...
Now of course, if you must implement a different syntax for the
pattern than the one implemented by regex(7), you would have to
implement differently make_pattern and pattern_match_line. But the
rest of the program would be unchanged.
One way would be to translate the syntax of your patterns to that of
regex(7).
Assuming the syntax:
pattern ::= term | term pattern ;
term ::= literal | '*' ;
with * meaning zero or more characters, you could easily translate patterns to regex(7):
- literal characters are translated to themselves, but for the special
characters of regex(7): `^.[$()|*+?{\' that must be escaped with \.
- zero or more characters are translated to the regular expression: ".*".
You could write something like:
string_t* convert_pattern_to_regexp(char* pattern){
string_t* regexp=make_string(2*strlen(pattern)); /* since all translations is
either 1 or 2 characters, the result will be at most twice the length */
int i=0;
char c;
while(0!=(c=pattern[i])){
if(0!=strchr("`^.[$()|+?{\'",c)){
/* must escape */
string_append_char(regexp,'\\');
string_append_char(regexp,c);
}else if('*'==c){
/* must translate to .* */
string_append_char(regexp,'.');
string_append_char(regexp,'*');
}else{
/* a single literal character */
string_append_char(regexp,c);
}
}
return(regexp);
}
and:
string_t* string_append_char(string_t* string,char ch){
if(string->length+1>=string->allocated){
string_reallocate(string,2*string->allocated);
}
assert(string->length+1<string->allocated);
string->chars[string->length]=ch;
string->length++;
return(string);
}
etc...
--
__Pascal Bourguignon__ http://www.informatimago.com/
NOTE: The most fundamental particles in this product are held
together by a "gluing" force about which little is currently known
and whose adhesive power can therefore not be permanently
guaranteed.
.
- Follow-Ups:
- References:
- WILDCARD: output all a* by searching a text file
- From: Umesh
- WILDCARD: output all a* by searching a text file
- Prev by Date: Re: Finding prime number < N in O(1) time
- Next by Date: Re: Finding prime number < N in O(1) time
- Previous by thread: WILDCARD: output all a* by searching a text file
- Next by thread: Re: WILDCARD: output all a* by searching a text file
- Index(es):
Relevant Pages
|
|