Re: Virtual machine: Calling a separate script

From: Arthur J. O'Dwyer (ajo_at_nospam.andrew.cmu.edu)
Date: 01/11/05


Date: Mon, 10 Jan 2005 19:43:26 -0500 (EST)


On Mon, 10 Jan 2005, Morten Aune Lyrstad wrote:
> [re: writing a VM, plus a compiled language for it, plus
> a compiler for that language]

   You might find it useful or interesting to look at the C code here:
http://www.geocities.com/arthur_odwyer/day9/
(One of these days, I'm going to merge that site into my current site
at Carnegie Mellon, but for the moment it's still on Geocities.)
   That's a complete compiler targeted at a RISC-y subset of 8086
assembly language, using the same kind of naive stack-based discipline
you started out by mentioning. The source language is very similar to
C, so /it's/ not stack-based at all, but the underlying assembly code is
very stack-based; there's no register allocation taking place.

> If we look at the for-loop again: [whitespace compressed -ajo]
>
> PROGRAM forLoop
> push 0, push 0
> loopStart:
> get 0, push 10
> lessi, jmpfalse loopEnd:
> ;get 1
> ;get 1
> ;addi
> ;put 2
> push 1, addi
> jmp loopStart:
> loopEnd:
> pop, pop
> END
>
> PROGRAM addInts
> addi
> END
>
> Instead of the commented-out operations, I would like to be able to call
> addInts. What I'm thinking is to push the values onto the stack of that
> script, execute, and then somehow get the return value from it. I'm just
> not sure how to do it.

   I would write something like [UNTESTED CODE]

     PROGRAM forLoop =
       initialSum, initialI
     loopStart:
       get 0, push 10
       lessi, jmpfalse loopEnd
       addInts 0 1 1
       addi 1
       jmp loopStart:
     loopEnd:
       pop, pop
     END
     PROGRAM initialI = push 0 END
     PROGRAM initialSum = push 0 END
     PROGRAM addInts =
       get addi 3
       get 1, get addi 4
       addi
       put addi 4 get 2, pop, pop
     END

You don't understand the notation I'm using. Okay. Basically,
I'm taking a mathematical approach to the problem; you can actually
read more about this kind of stuff in a recent comp.lang.misc
thread.
   'addi 1' is a convenient shorthand for 'push 1, addi'. Surely
you see the benefit of this notation?
   Now we generalize the notation. Let 'foo x y z' be a shorthand
notation for 'push z, push y, push x, foo' whenever x,y,z,... are
constants; and let 'foo x bar y' be shorthand for 'push y, bar,
push x, foo' whenever x,y,... are constants and foo,bar,... are
program names ("functions").

   Side note: We could define a function
     PROGRAM nop = END
which wouldn't do anything to the stack. Then 'nop x' would be
exactly equivalent to 'push x' in all ways. So we don't even really
need a keyword 'push' at all! But I'm going to suggest that you
keep the explicit 'push' keyword, because it is more readable than
the alternative.

   Now, whenever we define a new program ("function"), such as
'initialSum' or 'addInts', we can immediately start using it
without any special changes. We just write 'foo' to call program
'foo' with the current stack contents. If we write 'foo x y z',
then x,y,z are pushed on the stack before calling 'foo', so it's
essentially as if we were passing x,y,z as arguments ("by value")
to the function 'foo'. The definition of 'addInts' uses this
principle.

   Okay, I think by now you ought to be able to figure out whether
I made any mistakes in my program above. I'm going to give you
another version of the same program (which may have new and more
interesting bugs), and I'm not going to explain it unless you ask
me to. ;)

     PROGRAM functionalVersion = sumi 0 10
     PROGRAM sumi =
       get 1, get 1, morei, jmpfalse done
       get 1, get 1, addi 1, sumi
       addi, put 1, pop
     done: END

   I'm still not clear why you want to make a stack-based VM language.
If you want speed, IMVHO, you're going to want a register architecture;
say, in addition to a (rarely used) stack or two, you'd have an array
of a few hundred random-access variables that you could use as "rough
paper" for calculations. Then you'd just write something like

       reg0 := 0;
       reg1 := 10;
       reg2 := 0;
     loop:
       reg3 := reg0 < reg1;
       jmpfalse done;
       reg2 := reg2 + reg0;
       reg0 := reg0 + 1;
       jmp loop;
     done:
       <reg2 holds the answer>

Maybe you'd want to have a separate set of 100 registers for each "level"
of function invocation. You can do that, in a VM!

-Arthur



Relevant Pages

  • Re: Virtual machine: Calling a separate script
    ... _foo proc near ... push ebp; save base pointer ... set new base pointer from stack pointer ...
    (comp.programming)
  • Re: Free word order
    ... > Frontwardness of language: Going S O V. ... I agree that SVO is a bit mixed-up for me as a native speaker of a SOV ... In a SOV language by analogy, the arguments are pushed to the stack ... no idea so push it ...
    (sci.lang)
  • Re: Forth tutorial like Ketmans
    ...   BAR BAZ FOO ... using the stack is one of Forth's biggest drawbacks to those who ... and not as a required part of the language. ...
    (comp.lang.forth)
  • Re: Doubt in memcpy() and memset()
    ... the call to foo() can push the single byte of c (if the system ... allows one to push a byte), or load the byte into a register and then ... pushing it on the stack. ...
    (comp.lang.c)
  • Re: Doubt in memcpy() and memset()
    ... the call to foo() can push the single byte of c (if the system ... allows one to push a byte), or load the byte into a register and then ... pushing it on the stack. ...
    (comp.lang.c)