Re: programming language
- From: Duncan Domingue <duncanbojangles@xxxxxxxxxxxxxxxx>
- Date: Sat, 04 Jun 2005 02:07:26 +0000
aegis wrote: *paraphrase snip* > brainf*ck interpreter > how do I handle loops
Below, you will find the source code to my bf interpreter. I took the time to document it in the code, so everything you need to know should be in there.
Before you read the code with comments, check this website out. It describes the Esoteric Languages API which I implemented in my interpreter. It only adds about 20 lines to the whole thing but it could be useful for interfacing with other programs or the host operating system. http://lilly.csoft.net/~jeffryj/programs/esoapi/esoapi.html
This source compiles fine on linux using g++, and it doesn't do anything nonstandard so I assume it should compile fine on most POSIX systems.
I've seen some other interpreters, but they were difficult to understand, or decipher in some cases since most people writing bf interpreters seem to think that they should be a.) tiny, b.) super fast with a tiny memory footprint, and c.) CRYPTIC. My interpreter is pretty long, but has many features (I hope!), is bug free as far as I can tell, doesn't use up too much memory and gets frees unnecessary allocated space, and is very readable (I hope!). It doesn't try to speed up your bf program, and is a little slow when handling loops, but it doesn exactly what it needs to. I've had this for a while now, I just didn't know who would want it so it's been my little pride and joy kept away on a server for some time.
Here's the source, commented. I don't have any sort of copyright on the code but I just wanted to ask you that if you use this code for anything, please put my name somewhere within your program or notes. I tried to think of what copyright would be appropriate, but the program is really simple and only a few people will actually use it, so I just left it alone.
\/\/\/___bf_duncan.cpp___\/\/\/
/*This program was written by Duncan Domingue (duncanbojangles@xxxxxxxxxxxxxxxx). There is no copyright on this code, however, if you use any portion of this code in any of your own programs, please make a reference to my name and email address as the creator of said code.
*/
#include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <string.h>
#define MAX 30000 /*original limit set by Urban Müller*/
/*
This interpreter implements the Esoteric Languages API, written by Jeffry Johnston. For more information on the EsoAPI, go here: http://lilly.csoft.net/~jeffryj/programs/esoapi/esoapi.html
*/
/*
bf_code
this is the data structure that holds all of the instructions that make up a bf program. *instructions is a C style string that holds the instructions (+-<>[].,) and length specifies the length of the string.
*/
struct bf_code
{
char *instructions;
unsigned int length;
};
/*
bf_vm
this is the data structure that holds all of the information about the program being executed. array[MAX] is the array that the bf program operates on. code is the structure holding all of the bf instructions. instruction_pointer is the index of the instruction currently being executed in the instruction array. pointer is the index of the current location in memory that the bf program is acting upon. eso_api describes whether or not we are in the EsoAPI execution mode. For more information on the EsoAPI, go here: http://lilly.csoft.net/~jeffryj/programs/esoapi/esoapi.html .
*/
struct bf_vm
{
int array[MAX];
bf_code code;
unsigned int instruction_pointer, pointer;
bool eso_api;
};
/*
execute
execute() is where the action happens. execute() gets a pointer to a bf_vm, where it executes one instruction, increments the instruction pointer of the bf_vm so that it points to the next instruction (or does a loop), and returns. I wanted it to execute only one instruction at a time so that it would be a simple deal to add a debugger to this little program of mine.
*/
void execute(bf_vm *vm);
/*
load
load() takes the filename, opens up the file and dynamically allocates a string large enough to fit all of the file's contents. I did this just because it's nice not having a limit on the file size. It places the string and length of string into a bf_code structure and returns it.
*/
bf_code load(char *filename);
/*
parse
Ahh, parse(). While execute() can ignore characters that aren't legal bf code, I wanted to strip that info out ahead of time. parse() also checks to make sure that your loops are balanced. If your loops aren't balanced, parse() exits the whole program with an error on STDERR and prints out the error. You really don't absolutely need this function, but it's nice to have.
*/
bf_code parse(bf_code *raw_input);
/*
main
main() does all of the dirty work. main() accepts the command line parameters, orchestrates load(), parse(), and execute(). main() can accept bf programs from files or the command line. I added the command line feature cause it was good for quick little programs rather than opening up an editor, typing, saving, then running.
*/
int main(int argc, char *argv[])
{
bool is_file, is_input; /*these say whether or not we specified file input or command line input.*/
char *filename = NULL;
bf_code input; /*this is used only if we specify bf code to be read from the command line.*/
bf_vm vm; /*this is the big one. This is _the_ vm that is used. Everything eventually goes in here.*/
int dummy; /*you'll see*/
/*
this little section of code sets up the virtual machine. It initializes all of the variables and clears out the array.
*/
vm.code.length = 0;
vm.code.instructions = NULL;
vm.instruction_pointer = 0;
vm.pointer = 0;
vm.eso_api = false;
memset(vm.array, 0, 30000 * sizeof(int));
/*just getting 'em prepared*/
is_file = false;
is_input = false;
/*if no options were entered, the program won't do anything. So, it exits.*/
if(argc <= 1)
{
printf("No commands entered.\n");
exit(0);
}
/*this is where we read the command line arguments. The printouts should tell you everything you need to know about what each option does. Oh yeah, here's dummy. :)*/
while((dummy = getopt(argc, argv, "f: i: h v")) != -1)
{
switch(dummy)
{
case 'f':
{
/*read bf code from a file*/
is_file = true;
filename = optarg;
break;
}
case 'i':
{
/*read bf code from the command line*/
is_input = true;
input.instructions = optarg;
break;
}
case 'h':
{
/*help screen. You gotta have 'em.*/
printf("\nbf_duncan 1.2\n A simple Brain*** interpreter written by Duncan Domingue\n");
printf("Usage: bf_duncan [OPTION] [FILE, INPUT]\n\n");
printf("Options:\n-f [FILE] Loads the BF program from FILE\n");
printf("-i [INPUT] Loads the BF program from the INPUT on the terminal\n");
printf("-h Prints out this help screen.\n-v Prints out the name and version of this program.\n\n");
exit(EXIT_SUCCESS);
break;
}
case 'v':
{
/*version. I was so proud of the fact that I actually spent time fixing up my ugly little hack so that it would be usable and even readable by other people that I started numbering the versions. I started at 0.9 because by the time I decided to number 'em, it was already working pretty well.*/
printf("bf_duncan 1.2\n A simple Brain*** interpreter written by Duncan Domingue\n");
exit(EXIT_SUCCESS);
break;
}
case '?':
{
/*I don't really remember why this is here. You should probably check the getopt() manpage to find out.*/
break;
}
default:
{
break;
}
}
}
if(is_file == true && is_input == true)
{
/*pretty self explanatory. You can't have code from a file and from the command line at the same time. Sorry.*/
printf("Hey, choose ONE, not both. Try again.\n");
fprintf(stderr, "Option -i and -f specified\n");
exit(EXIT_FAILURE);
}
if(is_input == true)
{
/*handle the command line bf code. Notice that it also parses the code.*/
input.length = strlen(input.instructions);
vm.code = parse(&input);
}
if(is_file == true)
{
/*handle the file bf code. load() also gets the code parsed, so don't worry.*/
printf("Using file %s\n", filename);
vm.code = load(filename);
}
/*ahh, now we have code in our virtual machine. Find out how long it is.*/
vm.code.length = strlen(vm.code.instructions);
/*this just runs the virtual machine. When the bf program gets to the last instruction and the instruction pointer increments, it'll be larger than the code is long. This is how it knows it reached the end of the bf program. Pretty simple, huh?*/
while(vm.instruction_pointer < vm.code.length)
{
execute(&vm);
}
return 0;
}
/*
load
load() takes the filename, opens up the file and dynamically allocates a string large enough to fit all of the file's contents. I did this just because it's nice not having a limit on the file size. It places the string and length of string into a bf_code structure and returns it.
*/
bf_code load(char *filename)
{
char c; /*this temporarily holds the character being examined to see if it actual bf code.*/
bf_code raw; /*this holds the "raw" code, that is, code that hasn't been parsed yet.*/
raw.instructions = NULL;
raw.length = 0;
FILE *in_file; /*file handle, whoo.*/
/*make sure the file actually is a file that is openable.*/
if(!(in_file = fopen(filename, "r")))
{
printf("File could not be loaded. Please try again.\n");
fprintf(stderr, "Error opening file.\n");
exit(EXIT_FAILURE);
}
/*I like this little bit of code, just because I like realloc(). I really like allocating and free'ing up memory by hand. :) Basically, the length of the string holding the instructions in raw is 0, so each time a character is read in, the string is lengthened by one.*/
while((c = getc(in_file)) != EOF)
{
raw.length++;
(void *)raw.instructions = realloc(raw.instructions, raw.length);/*make the string as long as raw.length says it is.*/
raw.instructions[raw.length - 1] = c;
}
fclose(in_file);
return(parse(&raw));/*notice, it parses the code, too.*/
}
/*
parse
Ahh, parse(). While execute() can ignore characters that aren't legal bf code, I wanted to strip that info out ahead of time. parse() also checks to make sure that your loops are balanced. If your loops aren't balanced, parse() exits the whole program with an error on STDERR and prints out the error. You really don't absolutely need this function, but it's nice to have.
*/
bf_code parse(bf_code *raw)
{
int temp = 0; /*I should have given this a better name, but I can't come up with one. It is used in the loop balance checking code.*/
bf_code parsed; /*this holds the parsed code.*/
parsed.instructions = NULL;
parsed.length = 0;
char c;
/*here's my simple balanced loop checking algorithm. Subtract the number of ]'s from the number of ['s in the bf code. If it's anything other than zero, you have too many or too few looping constructs. Simple, huh?*/
for(int counter = 0; counter < raw->length; counter++)
{
c = raw->instructions[counter];
if(c == '<' || c == '>' || c == '+' || c == '-' || c == '[' || c == ']' || c == '.' || c == ',')
{
parsed.length++;
(void *)parsed.instructions = realloc(parsed.instructions, parsed.length);/*ther's realloc() again. It does the same thing this time, too.*/
parsed.instructions[parsed.length - 1] = c;
//performs loop checking
if(c == '[')
{
temp++;
}
if(c == ']')
{
temp--;
}
}
}
if(temp > 0)
{
printf("Error: Unbalanced loop. Too many ['s.\n");
exit(EXIT_FAILURE);
}
if(temp < 0)
{
printf("Error: Unbalanced loop. Too many ]'s.\n");
exit(EXIT_FAILURE);
}
return(parsed);
}
/*
execute
execute() is where the action happens. execute() gets a pointer to a bf_vm, where it executes one instruction, increments the instruction pointer of the bf_vm so that it points to the next instruction (or does a loop instruction pointer move thing), and returns. I wanted it to execute only one instruction at a time so that it would be a simple deal to add a debugger to this little program of mine.
*/
void execute(bf_vm *vm)
{
switch(vm->code.instructions[vm->instruction_pointer])
{
/*simple*/
case '>':
{
if(vm->pointer == (MAX - 1))
{
printf("Pointer moved past beginning of array.\n");
fprintf(stderr, "Pointer outside array bounds.\n");
exit(EXIT_FAILURE);
}
else
{
vm->pointer++;
vm->instruction_pointer++;
break;
}
}
/*simple*/
case '<':
{
if(vm->pointer == 0)
{
printf("Pointer moved past beginning of array.\n");
fprintf(stderr, "Pointer outside array bounds.\n");
exit(EXIT_FAILURE);
}
else
{
vm->pointer--;
vm->instruction_pointer++;
break;
}
}/*simple, but watch out for type limits. since the array is just an int, the value can't go past 2^31 and -(2^31).*/
case '+':
{
vm->array[vm->pointer]++;
vm->instruction_pointer++;
break;
}
/*simple, but watch out for type limits. since the array is just an int, the value can't go past 2^31 and -(2^31).*/
case '-':
{
vm->array[vm->pointer]--;
vm->instruction_pointer++;
break;
}
/*this one's pretty simple too, except it also watches to see if we are entering EsoAPI mode. Read the EsoAPI first to understand what's going on here.*/
case '.':
{
if(vm->eso_api == false)
{
if(vm->array[vm->pointer] != 0) /*not going into EsoAPI mode.*/
{
printf("%c", vm->array[vm->pointer]);
vm->instruction_pointer++;
}
else /*calling EsoAPI mode*/
{
vm->eso_api = true;
vm->instruction_pointer++;
printf("\nEntering Eso_API mode.\n");
}
}
else /*we are in EsoAPI mode and doing EsoAPI stuff.*/
{
/*I don't have any EsoAPI functions, so I just print out what I would have called if I had any.*/
printf("\nCalled kernel function %i\n", vm->array[vm->pointer]);
vm->eso_api = false; /*go out of EsoAPI mode.*/
vm->instruction_pointer++;
printf("\nExiting Eso_API mode\n");
}
break;
}
/*simple*/
case ',':
{
vm->array[vm->pointer] = getchar();
vm->instruction_pointer++;
break;
}/*here it is, the loop constructs. here's how the algorithm breaks down: when you get to a [, you check if the value at the current memory cell is zero. If it is, move the instruction pointer forward until you get to the matching ], and go one past it. Notice, I said the matching ]. You've also got to keep track of all of the ['s and ]'s in between. That's what tmp is for. If the value at the current memory cell isn't zero, just increment the instruction pinter and do th stuff inside the loop. When you get to a ], you know that you got here because the loop was unfinished. You'll never get to a ] if the loop was finished. So, when you're at a ], just backtrack to the matching [. Again, I said matching [, so you've got to keep track of all of the ['s and ]'s in between. And that's it!*/
case '[':
{
if(vm->array[vm->pointer] == 0)/*if the loop is satisfied*/
{
int tmp = 1;
while(tmp > 0)
{
vm->instruction_pointer++;
if(vm->code.instructions[vm->instruction_pointer] == '[')
{
tmp++;
}
else if(vm->code.instructions[vm->instruction_pointer] == ']')
{
tmp--;
}
}
vm->instruction_pointer++;//brings you past last ']'
}
else //loop is not satisfied
{
vm->instruction_pointer++;
}
break;
}
/*explanation above.*/
case ']':
{
int tmp = 1;
while(tmp > 0)
{
vm->instruction_pointer--;//go backwards through the instructions
if(vm->code.instructions[vm->instruction_pointer] == '[')
{
tmp--;
}
else if(vm->code.instructions[vm->instruction_pointer] == ']')
{
tmp++;
}
}
break;
}
/*see, it handles non bf code characters. I still like parse, because you end up with a string only big enough to fit the actual code, so you could load in a whole book, but have only a twenty line bf program stored internally.*/
default:
{
vm->instruction_pointer++;
break;
}
}
/*massive debugging statement used during the creation of this program.*/
/*printf("\nInstruction is %c\nInstruction Pointer is %i\nPointer is %i\nArray[Pointer] is %i\n", instruction_array[instruction_pointer], instruction_pointer, pointer, array[pointer]); */
}
/*bf_duncan version 1.3, Duncan Domingue (duncanbojangles@xxxxxxxxxxxxxxxx)*/
/\/\/\___bf_duncan.cpp___/\/\/\
.
- References:
- programming language
- From: aegis
- programming language
- Prev by Date: Re: programming language
- Next by Date: Re: Simplify formula for iterative programming
- Previous by thread: Re: programming language
- Index(es):