I've seen the future...and it works! (was: my view on this assembler is blah)
From: Beth (BethStone21_at_hotmail.NOSPICEDHAM.com)
Date: 08/27/04
- Next message: Betov: "Re: I've seen the future...and it works! (was: my view on this assembler is blah)"
- Previous message: Betov: "Re: My view on this "Is blah an assembler""
- Next in thread: Betov: "Re: I've seen the future...and it works! (was: my view on this assembler is blah)"
- Reply: Betov: "Re: I've seen the future...and it works! (was: my view on this assembler is blah)"
- Reply: Alex McDonald: "Re: I've seen the future...and it works! (was: my view on this assembler is blah)"
- Reply: Percival: "Re: I've seen the future...and it works! (was: my view on this assembler is blah)"
- Reply: Chewy509: "Re: I've seen the future...and it works! (was: my view on this assembler is blah)"
- Reply: luvr: "Re: I've seen the future...and it works! (was: my view on this assembler is blah)"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Fri, 27 Aug 2004 09:58:58 GMT
[ Split off from the original thread because, really, it's a totally new
and different topic and it's one that deserves a thread of its own to
discuss ;) ]
gpcea wrote:
> Betov wrote:
> > The true thing is that, in 64 Bits Mode, they will no
> > more mean anything. And you know why? Because the
> > Processors manufacturers are in the opinion that nobody
> > programs in Assembly. Naming the Regs from 0 to X,
> > is nothing but saying that Assembly does not exist.
> > Implementing more regs is also considering that
> > Assembly does not exist, and so on. The Companies
> > have no interrest in having guys programming in
> > Assembly: No money to make.
>
> You're right. Keeping track of similarly named, tens
> of registers, in assembly, for processors like the
> Itanium is not very fun. I better prefer IA-32 even
> though the registers are few.
> It is a snap for the compilers (and easier too), they
> can keep up with any no. of regs, but hand-written
> assembly is become harder. And the 64 bit CPU designers
> don't seem to care :(
Geez Louise, people...what you think the "comment" stuff is for? Keep track
with a few well-placed comments what you've got in your registers...
In fact, ironically, you know what made me think of mentioning that? It's
what my Borland _C++ compiler_ does when you ask for assembly output...it
generates "; ebx = this, edx = MyVariable" comments that state what all the
registers represent from the C++ code...well, if a compiler can do that,
why can't we?
If I ever get an Itanium and it becomes commonplace and such, then I ain't
quitting assembly language...and give me time to learn it all and I _will_
kick that compiler's arse, just as before...I would have thought Rene
"assembly rebirth" Tournois would have more stamina and determination than
to immediately give up and throw his hands into the air in surrender
because, ooh, there's some extra registers...
Heck, I've always _WANTED_ more registers on the IA-32 (but, as usual,
Intel screwed that up - in typical fashion - by not leaving enough bits in
the encodings for any more registers...D'oh! But to be fully expected from
the inventors of "real mode addressing" and "Intel syntax", I suppose...can
they do anything right? ;)...
Even the Motorola 68K has 16 registers (8 data, 8 address) and that's been
around almost as long as the 8086 (and, yes, they are called "d0", "d1",
"d2", "d3", etc....really, what's the real difference between "0, 1, 2, 3,
4, 5, 6, 7" and "A, B, C, D, E, F, G, H"? ;)...doubling or quadrupling that
(up to 64) is a _Good Thing_...rather than trying to squeeze a Bresenham's
line algorithm into the IA-32's registers to keep it all on-chip to be
fast - and it only just fits - you can be easily running _two_ of these,
one for the left side of a polygon, the other for the right _AND_ handling
texture co-ordinates _AND_ having registers for "subpixel accuracy"
(anti-aliasing on-the-fly) _AND_ have registers just to keep track of
lighting information _AND_ probably still have registers to spare (_AND_
because it's an inherently parallel chip, half of this is going on at the
same time and you can improve it by clever use of prefetching and so forth
;)...all of this never once stepping off the CPU or its registers at all,
that the whole thing is _guaranteed_ to be running at the full clock speed
(no worries about a "cache miss" half-way through)...
This is a _Good Thing_..and contrary to popular belief, this is _better_
for human assembly language coders than HLL compilers...really, don't
beleive the hype...they are just trying to take advantage of a new
architecture to scare you into believing all that "only a compiler can do
the job!" nonsense (so, please deposit your money here to purchase our
compilers, thank you in advance ;) that they are, let's remember, _ALREADY
SAYING_ about the IA-32 and it's one big fat lie there, isn't it?
It's the _same myth_ and you're going to fall for it just because the
number of registers is going up? That's what we _want_ to happen...because
the architecture is becoming more inherently parallel? That's what we
_want_ to happen...
Let me prove to you that it's not really a big deal...you probably started
out programming with something like BASIC or Pascal, right? How many
registers did that have? Ah, exactly, it had none at all...HLLs don't use
registers...so, you started with zero registers...
Then, let's say you've been around a while, and the first chip you tried
assembly with was the 6502 or something...right, this had a grand total of
6 registers...none were totally "general purpose" but, of that 6, only
three were actually used for calculation type stuff (the Accumulator and
the X and Y registers...the others were the program counter, the stack
pointer, the processor flags...none of which could really be used for
anything else but their intended purpose of the CPU being able to look
after its own current status)...all except the program counter were 8 bit
registers...so, you moved up to learning how to cope with three 8-bit
registers...
Then you tried out the x86 assembly language - give this "16-bit" thing a
twirl (you're starting with DOS coding, the 32-bit stuff can come later ;)-
and you meet what we know today...eight 16-bit "general purpose"
registers...which turns out to be a lie because one register is the stack
pointer and is always tied up with that (you can "borrow" it but only
temporarily and only by not allowing interrupts to happen
and...and...basically, you can't really use it...it's more trouble than
it's worth ;)...
Oh, and despite being referred to as "general purpose", AX is the
accumulator, BX is the base register, CX is the count register, DX is the
data register, SI is the source index, DI is the destination index and BP
is the base pointer...and all the instructions are hard-coded to use
particular registers ("loop" automatically uses CX, SI and DI are
automatically used by the "string" instructions, "mul" and such dump their
results into DX:AX or something similar ;), the 16-bit memory addressing
only having possibilities for the "approved" registers of BX, SI, DI and so
forth...BP is also gone if you're having to deal with HLL calling
conventions and stack frames and so forth...basically, they aren't really
as "general purpose" as you're first lead to believe by the name...but,
hey, it's more "general purpose" than those _3_ registers you had to play
with on the 6502...so, you've moved up to learning how to cope with
about...let's say six 16-bit registers...
Although, really, you're cleverer than you think because you've got another
two segment registers (DS and ES - CS can be ignored because it's part of
the instruction pointer...the CPU maintains it and you _know_ where it's
pointing: To your code as it's being run! - not "general purpose" but you
are dealing with them and having to keep track of what's inside them
;)...FLAGS isn't at your disposal for using BUT you are having to keep
track of it, as an "implied context" that moves through your program code
(because the FLAGS do effect how your instructions proceed ;)...
And, hmmm, you've noticed that your '286 has a "machine status word" there
too...by the '386, that's renamed and you've got loads of the
things..."control registers"...and your "debug registers" and your "task
register" and your GDT, LDT, IDT...originally, there was also an army of
"test registers" (scrapped later on)...if you're dealing with protected
mode, then there's a bloody "register" for everything...if you opened the
Intel manual and it said there was a "Request a cup of coffee register",
then it wouldn't surprise...hey, why not? There's a register for everything
else...look: "The Kitchen Sink register"...excellent!! ;)
Want to escape to Motorola? Okay, let's pop over to one of their 32-bit
68K-based thingiemajigs...right, now you have 8 "data" registers and 8
"address" registers...only one "address" register has been stolen for the
"stack pointer" (for HLL calling conventions, another one might go for a
"stack frame" or something...these are a bit like your SP and BP you'd have
lost above, anyway)...otherwise, more or less yours to do with as you
please...numbered rather than lettered (like Herbert's "r0", "r1" stuff but
"d0", "d1" or "a0", "a1" ;) but that's only a "cosmetic" thing...it won't
take long before you're thinking "register d3" rather than "register DX"...
So, all in all, you started out with no registers at all...then you went to
three registers (and had to learn all about the "assembly method" and what
on Earth a "register" was in the first place, anyway)...then we had
effectively 6 or 7...then, you've got 14 or 15...well, that's not quite
true because you do have to "keep track" of the "processor status" or
"flags" register, even though you're not storing any variables there...
This roughly "doubling" every time wasn't too complicated...in fact, it was
kind of welcomed...is it really that much of a big deal to double it again?
Or even quadruple it?
I mean, the reality is if someone had told you about around 16 registers
back when you were using none (with BASIC) then you also would have said
"registers? What the hell are registers? You mean I have to keep track of
16 different things simultaneously? Oh dear...I think I prefer my BASIC
interpreter to look after such complicated things...I like my variables to
have 'names' I can look at ;)...I mean, true enough, when you're using
BASIC then the prospect of 16 registers sounds like a total nightmare...but
find me a 68K assembly programmer (if you can these days...nah, to be fair,
those chips are used in some of the PDAs and things...and many people
"retroprogram" older machine with their emulators ;) - who's "seasoned"
(not a newbie just starting with it...everything is, of course,
"complicated" when you first start ;) - that walks around moaning "16
registers! Far too many! My brain hurts! It's impossible!! Help me Obi-wan
Compiler, you're my only Hope...help me Obi-wan Compiler, you're my only
Hope"...
It's crap...it's the same old "myth" spun again and again...you know, they
_ALMOST_ succeeded with the x86 you're using right now...oh yes...surely
Rene "assembly rebirth" Tournois remembers because he trumpets the
"assembly pioneers" on Windows all the time...phone up Microsoft and what
did they say: "Assembly language with Windows?!? IMPOSSIBLE!!"...but what
are _YOU_ telling us today, Rene? Are you saying "impossible!"? No, if I've
got the gist of what you're saying correctly, you're saying the complete
polar opposite that "it's so _possible_ that everyone should burn their HLL
compilers and use RosAsm for an 'assembly rebirth'!!!"...
I'm seriously surprised to see you so easily taken in by a "myth", Rene,
just because they've added on lots more registers...is that actually what
we've always wished the x86 could have had a few more added on? I've heard
the "D'oh! How dumb were the Intel engineers to only use 3 bits? Now, we're
stuck with so _few_ registers" line said many times by x86 coders...they
grant the wish and now it's time to "give up" because "only a compiler can
deal with it!!"...
Crap; Absolute crap...I don't believe a word of it...
And neither should anyone else be duped into this: "Can you use assembly
language? No! It's totally IMPOSSIBLE! You can only use Microsoft Visual
C++ compiler (Service Pack 7) to code with our operating system! You should
go out and immediately give us money to buy MSVC++ (SP7) without delay...in
fact, don't just buy one, buy two!! Leave us a 'tip', if you like, as
well...actually, don't buy a PC! Buy a 'thin client'...the _machine_ is
much cheaper...yes, the _machine_ will be much cheaper than a full blown
PC...and all your software will automatically be up-to-date all the
time...how does it work? Ah, well, you buy a 'dumb terminal' and then
connect via the Internet to our large server and then you 'rent' your
software from us...'sounds like you're trying to force everyone onto a
global Microsoft mainframe, just so you can control everything and make
money'? No, no, no...we're Microsoft...we're everyone's friend...we're in
this business to _help people_, not to make money...why ever did you think
otherwise? The horns poking out the top of my head, the pointy tail, the
red skin and the nasty looking fork I've got in my hand? Ummm, no, no...I'm
not evil...it's, ummm, a skin complaint...yes, a medical condition...makes
you look like the spawn of satan...don't pay any attention to it...now,
here's the contract...just sign at the bottom in your own blood...ah, don't
worry, it's, ummm, just a 'legal thing'...would I lie to you?" ;)...
They _tried_ to say this about Windows already...even the Linux people are
caught saying it now and again (though, it's kind of strange because it
never bothered Linus himself in that first post he made to USENET about
Linux)...yes, "everyone" knows you can't beat the "optimising
compiler"...yes, yes, "please deposit all your worldy possessions into the
box provided"...
Just because we're got a new chip on the horizon (well, it's already here
but moving over to it is still "on the horizon" for most people ;)...this
doesn't mean they've given up on this "myth" nonsense...you're seeing the
same old "myth" just propogated onto all the newer processors...
Let's not make the same mistake twice...they said that assembly language on
Windows (or Linux, even) was "impossible"...and, for a while, most people
swallowed this "myth"...until, indeed, the "pioneers" Rene talks about in
such glowing terms, sat down and thought: "wait a minute, that makes no
sense...it's got to be _possible_ because all the HLLs just compile down to
machine code, anyway...let's actually find out how it works and try it
out"...
They did and it turns out that it was actually not in the slightest bit
"impossible"...in fact - which is perhaps a bad point more generally - the
whole thing is only fractionally more complicated than the situation with
the C compiler they were trying to peddle..._everything_ with the OS
happens through procedure calls...that's the "great leveller" and it turns
out Windows (and Linux) assembly language is arguably _EASIER_ than trying
to do it with DOS (which no-one ever disputed was "possible" because
everyone started out coding assembly language there - DOS itself was coded
that way - so there was no chance for any such "myth" to take hold)...no
"real mode addressing", no segment registers, no "direct hardware
access"...it's actually somewhat disappointing that it couldn't be more
_low-level_, as this is only marginally better than C progamming...but
then, you are basically really just calling into a C dynamic library all
the time (literally)...it's the best that'll be around while the OSes
themselves are written in C...there's, ummm, always Menuet (32-bit
protected mode GUI OS that fits on a floppy disk...Microsoft - as their GUI
is now so monsterous that they won't be able to fit it on anything less
than a DVD soon - would, no doubt, also say that Menuet was "impossible"
too ;)...
Right, let's review things...let's take a slightly different perspective on
the matter to make a small little point...Microsoft has hired _YOU_...your
job is to create an "optimising compiler" for the CPUs mentioned thus
far...you've got to code the compiler to do its uptmost best to produce the
most optimal code it can from some C++ source code that the user
supplies...you're _writing_ MSVC++ for Microsoft...but, just to be
hypothetical, Microsoft want a compiler for all of the architectures
mentioned...
So, you have to write one for the 6502...only three registers, can only be
used in certain ways, small, simple programs...working towards "optimal"
code generation isn't going to be greatly difficult, right? If I was going
to be a touch flippant then I'd feel the urge to make the joke that you
could almost do it all with just a "look-up table"...not really true, but
you know what I mean...
Then writing an x86 or 68K compiler...ah, now, this ain't quite so easy to
do, is it?
[ In fact, as the CPU itself is now starting to use "out of order
execution" (that automatically re-orders instructions that "instruction
scheduling" doesn't count for much as it used to do) then, ummm, the compil
ers are actually _LOSING_ an advantage...because for all the "fanfare" made
over the "optimising" in "optimising compiler", this really is only a case
of going back over the generated code and then "tweaking" it here and
there...you know, substitute one set of instructions for another set that's
slightly smaller / faster...move code out of loops...shuffle the code
around a bit... ]
Then you're tasked to work on the Itanium compiler...tens of registers,
inherently parallel, etc., etc...._is_ it really all that easy to write a
compiler that _properly_ and _fully_ exploits the processor?
Contrary to popular mythology, compilers are having a more and more
difficult task catching up with _humans_ (they've always been behind and
always will be ;)...oh, yes...I dare to say it: It'll be _easier_ to run
rings around a compiler with the Itanium, not harder at all...
Why do I say that? It's that "parallelism" more than anything (there's more
than that but this is the "biggie")...ever looked at some C source code?
Kind of noticed the _sequential_ nature of it? Kind of noticed that there's
no natural sense of "concurrency" in it at all? Oh dear...the compiler is
NOT going to be getting too much help from the programmer (Java has some
"concurrency" concepts in it but because of the "portability", how many
Java programmers even consider "optimisation" (let alone know how to do it
properly...as with "optimisation", you can't potentially make things worse,
if you don't really know what you're doing and check your results ;)? Yes,
an amusing contradiction...Java's got the structures but because of how
Java works, next to none of its programmers probably even understand the
concurrency stuff, let alone use it ;)...
And, with concurrency, you've got to _code to exploit it_...if you keep
calling _synchronous_ API functions in a purely _sequential_ program, then
there's really not much besides some minor "instruction scheduling" that
you can do...because as the logic is _synchronous_ (stops and waits all the
time), that puts a big brick wall in what you can do with the
parallelism...remember that compilers _MUST_ guarantee that the _logic_ of
a program cannot be altered...you can't have it that you put in code that
says to do it one way and then the compiler produces output that works to a
completely different algorithm...
How do you avoid such "brick walls"? To free up the code so that more
things can be run in parallel? Ah, you _structure_ it that way...you write
_concurrent_ code from the off...you choose algorithms where you can do as
much simultaneously as possible...
But, wait a minute, I just used the words "algorithm" and
"structure"...programmers can change these things, sure...but a compiler
can't...it's not allowed to do that...it's _NOT SMART ENOUGH_ to be able to
do that (let me stress again until people realise: This is a person we're
talking about...this is a _dumb_, _blind_ process...a piece of _software_
like all of us write (indeed, we might be the ones writing the compilers
;)...there's no "smarts" involved at all...it's all just following "rules"
from one state to another :)...
The compiler - especially compiling typical sequential, synchronous C / C++
code where the programmer has done _nothing_ to make it "concurrent aware"
and the programming language has absolutely no support for the notion at
all, anyway - is quite literally going to be constantly banging its head
against these "brick walls"...
Now, I've heard this second-hand so I can't confirm it directly but it's
already happened...people _ARE_ coding assembly and running rings around
the compilers...this might startle some people but not me...it makes
perfect logical sense...
Because, you see, it's not just assembly language that has an "anything you
can do, I can do better...anything you can do, I can do as well"
relationship with HLLs...humans have the same relationship with
compilers...whatever "rules" the compiler follows, I can follow them
too...but I can completely redesign code, change algorithms, change the
"precision" of a calculation, write two independent programs that work
together (think "concurrency" and realise that _concurrency is where it's
at_ with these future architectures then you'll realise that this is a
_major_ advantage to humans that a compiler processing source files one by
one can't follow at all), etc., etc....
oh...and, of course, what makes the biggest difference to a program's
speed? Is it shuffling instructions about, which is about all the compiler
is actually capable of doing with its "optimising" phase mostly? Or is it
structural and algorithmical improvements, that _ONLY_ an entity that
possess _intelligence_ and _understanding_ (in short, humans...because
dolphins and rats don't program - and don't beat humans on this, anyway -
and we've not met any aliens yet ;) is capable of doing? Yes, _ONLY_ humans
can do that part...and AI experts all agree: Human-level AI is so far away
that they've given up even bothering with that yet...managing
"insect-level" AI is considered "good stuff" at the moment...this advantage
is going to last for a long time...and, note, that even if human-level AI
was possible, then, by definition, it would only be _as good_ as you, not
better...to beat humans, it has to _exceed_ human intelligence...that's
even further off still and there are "philosophical" issues too: Will
humans tolerate no longer being top of the tree? How do you break the
"barrier" of something - human or machine - creating something smarter than
itself?
I refer you to that simple example before...a poster wanted a fast
"look-up" for their command prompt...everyone "followed the rules"...they
flicked through their "textbook" (metaphorically) and looked for the
fastest algorithm...they "obeyed" the "rules" of the "textbook"..."IF
(problem == command look-up) THEN Suggest("Use hashing");"...BUT - and this
is, of course, also why Intel engineers are chasing after all this
"parallelism" - with a small re-think of the problem in "concurrent" terms,
you can get effective "instant look-up"...the commands are being
typed...people don't type particularly quickly and it's like some kind of
"Matrix slo-mo" thing for the computer to watch humans type...the computer
sits there waiting for keystrokes...
------------------------------------
repeat
{
Wait for user keystroke;
Get user keystroke;
Store in buffer;
} until user presses ENTER
Look-up command in buffer
------------------------------------
The magic keyword in the above is "wait"...concurrent programs don't wait
(or do their best to wait as little as possible ;)...that "repeat...until"
loop above will take as long as it takes for the user to type out the
command to complete...this is many seconds...of which, the computer is
sitting idle waiting...the look-up does not proceed until the whole command
is typed and then it's the user's turn to wait (not particularly long with
a good algorithm and speedy machine, Hopefully ;) for the computer to do
its thing...
This is a bit like "call and return" in music...the user goes first, the
computer waits...then they swap over and the computer proceeds, while the
user waits...the overall time is how long the user takes plus how long the
computer takes...
But re-interpret the algorithm in the most simple way - get as much done as
possible before any kind of "waiting" - and then the difference is
significant to the user...
------------------------------------
repeat
{
Wait for user keystroke;
Get user keystroke;
CommandPointer = Look up user's latest character in command tree;
} until user presses ENTER
Execute [ CommandPointer ];
------------------------------------
Yup, just moved the "look-up" into the loop (also, the "buffer" is no
longer strictly needed, notice - not unless you've got some other
requirement that you need to store it for - so it saves RAM as well as time
;)...as each character is typed, then the computer looks up that character
in the command "tree" (each time a character is typed then it narrows down
the search...if the first character typed is "c" then it could only be one
of the commands that start with "c" ;)...looking up a _single_ character in
a "command tree" is not a long or complex thing - a bunch of instructions
to code it - that the "user response" is not going to be impacted at all...
The time this takes to look things up is completely absorbed...the user
presses ENTER only to say "okay, now you can run the command"...it already
knows where that command is...no "look-up" happens here because it was done
during the loop itself...the user does not wait for the computer at
all...the computer only waits for the input from necessity (the program's a
"command prompt", the whole point is that the user has to type commands
;)...
And, automatically, we've got additional benefits...the user really only
needs to type in as many "significant characters" as it takes to
distinguish one command from another...if the user presses "z" and there's
only one command that starts with "z" then we know it's got to be that
command...really, we're _only_ waiting for the ENTER key in order to give
the user the chance to edit things around...you know, presses "z" might
have been a typo and they'll want to press backspace next...the ENTER is
being waited for only for the practical reason of needing a "okay, yes,
that's _really_ the command I want" instruction from the user...you know,
they could make a typo or change their mind half-way through and want to
delete the whole thing...so we wait for ENTER before executing the command
simply to accomodate that...if you reckon you'll never make a typo, then
you could re-write it again to immediately execute the command upon getting
enough significant letters to know for sure which command is wanted...but
that's not particularly "user-friendly", as it doesn't tolerate user
mistakes...you know, the user reaches for "d" to type "dir" but hits "f" by
accident and, oops, the "format" command starts with "f" and the entire
hard drive is wiped out...NOT a good design...
If you want "autocomplete" then this algorithm is already ready for
that...when it's looking up the command in the "tree", just pop up the
"current best guess" as to what's being typed onto the screen...and then
include a TAB key which takes that "current best guess" and jumps straight
to executing the command (or at least completes the command that only ENTER
need be struck to execute the command ;)...
Put this in and you've not just eradicated the "look-up time" completely
(absorbed it into the loop)...you're also helping to speed up the user
typing things too...they'll get used to just needing to type "F" and then
hit TAB to get "FORMAT" and hit ENTER to execute...saving four
keystrokes...in addition, you get a "command look-up" for the user in the
bargain too...that is, if the user's thinking "I wonder what commands there
are?" then they can type "A" and hit TAB...then "AB" and hit TAB...then
"AC" and hit TAB...or hit "B" and TAB...then look to see what commands
appear...if you also include a "HELP command" command then the command
prompt can used to learn how to use the prompt (documentation would be
better, of course...but this is a nice "addition" that comes for free
:)...the Quake / Half-life console works in this way...it's the way to
discover those "undocumented commands" to switch on "god mode" or "no
clipping" or "wireframe output" and that kind of thing...also good if you
don't know how to spell a word, so long as you know the first few letters
;)...
Now, a person's doing well to manage a 50wpm typing rate...and, assuming a
"word" is four characters (plus a space between each word), that's roughly
250 characters a minute...that's about 4 characters a second...you've 0.25
seconds to look up just a _single_ character (probably on a 2GHz machine or
something...giving you around 500 million clock cycles to do "yo thang",
just to stress how not-difficult-to-keep-up we're talking about here
;)...doesn't sound particularly difficult to achieve, does it? So, you
could use a pretty crap look-up algorithm and still make it in plenty of
time (the user detects no slow-down in the response at all, so long as you
can keep up with this rate ;)...it'll still have that "instant look-up"
execution speed...
Or - let's NOT get complacent - rather, we could use all this "released"
processing time to add in something else...how about - perhaps "overkill"
for just a command prompt but it's an "illustrative example" - throwing in
a bit of simple compression / decompression of your command tables? Then,
it's not just faster but you can also _simultaneously_ have it all
compressed that some massive "command table" is squeezed down into
next-to-no-space-at-all (in this case, the commands aren't likely to be
very long that it would be worth the bother but I'm just illustrating how
the "re-think" can now "release" a whole bunch of brilliant features, as
well as run faster than any "textbook" algorithm you care to mention in a
_sequential, synchronous_ manner)...or, spend a few clock cycles of that
500 million on the look-up and then you can also run a "background task"
comfortably in the background to do something else...
Want to get _really_ extreme? How about even _starting to execute_ the
command while the user is still typing? Woah! Once it's got the command
(enough "significant characters" have been typed that there's only one
possible command they could currently be typing ;) it _does_ start to
execute it immediately...but does so to some "working copy" (if the command
changes anything..."read only" commands - "dir" - don't have that problem
to consider ;) rather than the real thing...so, the user hits "d" and the
machine immediately assumes "dir" and then spins up the hard drive and
reads the directory...BUT it does not report back the "dir" until the user
actually hits ENTER (to confirm that this really is the command they
want...if you were using a "working copy", then the ENTER key acts as a
"commit the operation" confirmation ;)...you start up these commands as
their own separate "processes" (lower "priority" than the input loop that
it doesn't ever get in the way of the user typing and effect their
response...higher priority tasks are always scheduled first...and we're
_blocking_ on the keypress rather than polling so the "wait" in the user
input loop is taking _no_ CPU time whatsoever ;)...so, if the user hits
backspace instead and types a different letter then, okay, they don't want
"dir"...send a "kill" signal to the "dir" command already running...we've
got enough cycles to spare from the method to be a touch "flippant" and,
yeah, sure, start running commands even before we know whether the user
really wants them or not...it's a "low priority" thread so doesn't slow
down the user input at all...indeed - for those that actually understood
the "multi-tasking" scheduling stuff before - these tasks are running in
the "system idle time"...when the machine wouldn't be doing anything else
useful, anyway (literally just running a "HLT" to halt the CPU)...so, we're
not using up any "precious" CPU cycles...we're literally picking up all the
cycles that would literally be "otherwise wasted" only...ah, so who cares
if the user hits backspace and all that work it was doing just gets
"cancelled", anyway? We've bought ourselves the time to be this flippant
about things...and if the command takes some time to complete then, hey,
we're starting early...it'll _seem_ faster to the user because they were
typing away while it was running (sshhh! Don't tell the user...we're just
"keeping them occupied" and that extra typing is a bit redundent! But,
well, you know the saying: "A watched pot never boils" or, if you prefer,
"time flies while you're having fun" ;)...
Anyway, this is a touch "overkill" most probably for a simple command
prompt...but, basically, my point here is to be illustrative of the kind of
"magic" that a touch of "concurrent thinking" can magic up seemingly from
nowhere...this is why Intel are chasing "parallelism" in all directions
(check out what they've added to the processor since the Pentium...the dual
pipeline produced such a "jump" from the '486, they thought "hang on,
there's something to this 'parallel' lark!"...and then MMX shows up and SSE
and SSE2 and "out of order execution" and "hyperthreading"...they've got a
touch "obsessed" with parallelism, in fact...that's about _all_ they do to
their designs these days...trying to "un-serialise" the x86 chip...trying
to "undo" their non-parallel thinking from earlier ;)...
And this kind of change of algorithm is something a compiler CANNOT ever
Hope to achieve...a compiler lacks the _knowledge_ and _understanding_ of
what the program is trying to achieve - at the higher-level that
"algorithms" and "structure" operate - to make the correct
alterations...it's information _BEYOND_ what's in the source code...it can
"instruction schedule" and "code substitute" all it likes to "optimise" the
serial solution...but it cannot make the _algorithmical_ - the
_intuitive_ - leap I've just shown above...
Indeed, many programmers themselves have become too "programmed" themselves
in the old "serial" ways to make such a "leap" (I mention these ideas and,
to some, it's like I'm selling "magic beans" or something: "that's
IMPOSSIBLE!"...but, hey, I've given enough details that you can try it out
yourself - completely independently of me that I can't "cheat" or "rig" the
results that you know it's really happening and really working - you can
look at those details and see I've not referred to "magic" once but given
you a simple algorithm style to follow ;)...
Though those who have the problem with the "intuitive leap" nicely
demonstrate my point...if humans with intelligence and _understanding_ and
_knowledge_ and good programming experience don't always think up these
things, then what chance the compiler in comparison? None...absolutely
none...
We may now have "parallelism" and we may now have "tens of registers" but
the two mantras have not altered and they _still_ apply: "The best
optimiser is between your ears" and "the fastest code is the code that
never runs"...
The second one is timeless and will _always_ be true (it can be treated as
a "timeless principle" ;)...the first one will remain true until computer
AI _exceeds_ human intelligence (oh...and, on that point, you've probably
worked out - if you've spent two seconds to think it through - that the
compiler is the least of your worries...if the machines really are as smart
or smarter than you - the human programmer - then, hello?!? Do you think
you'll even have a job as a programmer at all? I mean, humans require
"inconvenient" things like wages, sleep, vacations, etc....the machine's
_already_ been given your job, sunshine...hence, the point about
"compilers" is completely moot...you'll be out of a job by that
time...though, on an "academic" level, it'll be interesting to ask the
machine who's stolen your job whether these compilers are great or not
;)...
Furthermore, there are certain things that machines are quite good
at...like remembering what has been put in all those registers...yeah, they
do an excellent job there (because it's one of those "dumb, blind,
repetatitive actions" that they do so well ;)..in which case, what's
stopping you asking the machine to help you out a little there? "Foolish
pride" or something? Especially from Rene who already "re-pioneers" the
integrated editor / assembler with RosAsm...you're half-way there
already...
Add on a little extension to your editor / assembler that displays a
"register window" to the right of the source code...and it follows along
recording what variables are in what registers at any particular time...let
the _machine_ do the remembering for you, if, indeed, it's so damn good at
it...it won't complain about being "exploited" like that, machines don't
care...they just do what they are told...
You could also, if you like, add in an "optimiser" too...assemblers can
have them too, you know...it's just a case that no-one's written any, not
that they aren't possible...transplant all that "optimising" that the
compiler has into the assembler...and then it can suggest "hints"...take
'em or leave 'em...up to you...
But, to be honest, I don't think you'd really need to be that extreme for
the Itanium...
Actually, just looking up a website that talks about the Itanium's
architecture, all those extra registers really are playing into _our_
hands, not HLL compilers...
Consider a HLL procedure call...right, we push parameters one by one onto
the stack, then we call the procedure, it sets up its stack frame,
preserves all the registers, it does its work, cleans up the stack frame,
restores all the registers, pops all those parameters off the stack...
This is what's "traditional" on the x86, yeah? But consider "preserves all
the registers"...there's 128 of the integer kind alone...you're going to
PUSH and then POP 128 integer registers for each and every procedure call?
And, ummm, isn't a touch ludicrous when you have 128 integer registers
sitting there, waiting to be used, that, instead, you're going to pop all
your parameters onto the, ummm, stack?
But, of course, us assembly programmers never bought that crap in the first
place...nope, load up to 128 parameters (actually, GR0 is hard-wired,
apparently and the first 32 are fixed...so maybe you're not going to use
all 128...but, ooh, big deal...what inefficient procedure are you calling
that _needs_ 128 parameters, anyway? I don't recall seeing any of those in
the C standard library last time I looked ;) into those registers and just
CALL...my concept of reserving the traditional "default" - that caller not
callee is responsible for preserving any registers - now seems like a
psychic insight or something...
It has "register windows" to avoid this problem...you create "frames" of
registers and only those inside the "window" apply...the first 32 registers
are fixed and "global" (they are "windowless" ;)...
Now, this line from the website I'm reading says it all so I'll just quote
it: "Register windows have their problems, too, and it's no coincidence
that the only major RISC architecture to use register windows is also the
slowest major RISC architecture still in production"...
Oh dear, poor old HLLs...because of all that "must use the stack" nonsense
that they insist upon and there's far too many registers to go storing /
restoring them all every procedure call, they've got to use these "register
windows" supplied for them by Intel to remedy this problem...but using
these "windows" isn't a particularly useful thing, in fact...
Poor old HLLs...oh well, it's their own fault...they came up with these
strange "calling conventions"...they are the ones who are _bound_ to insist
on "backwards compatibilty" and _still_ follow these same conventions on an
Itanium (you know Intel and HP reckon this is very, very likely because
they went to all the effort of adding in these "register windows" to try to
minimise the difficulties that these conventions are going to cause on
their completely new designs ;)...we have a "ramming the square peg in the
round hole" problem: HLLs came up with all this "serial" thinking and
"calling conventions" back on 8-bit machines with few registers and no
parallelism...and with their insistance on "portability" - so they've got
to follow methods that work for 8-bit, 16-bit, 32-bit or 64-bit machines
equally - and because of their insistance on "backwards compatibility" then
they are _STUCK_ with their previous designs...they daren't give up their
"square peg", even though the Itanium represents a "round hole"...nope,
they are just going to try to ram it in there regardless...
Should we assembly programmers worry? Nah, of course not...we're
"non-portable"...we code to the _machine_, not to some weird arbitrary bit
of "theory" hanging around since prehistoric times...if the machine has 128
registers then, cool, let's get ahead and start filling them up (ooh, how
many algorithms can now be made completely "on-chip" with that many
registers at your disposal? Ah, all that big fuss I've had many times
before, trying to ram an "on-chip" Bresenham algorithm or something onto an
x86...throwing out the "stack frame" to get that extra BP so that the
entire thing is on registers...thing of the past completely with the
Itanium ;)...if the machine is "inherently parallel" then, cool, let's get
all "inherently parallel" about things...if the machine requires that
people wear red hats, then let's don our red hats and get coding...
We obey no "conventions" (but those that, like, make sense to follow
;)...the "bad fit" problem that HLLs are bound to suffer being squeezed
onto the Itanium is something we will not encounter (until we've got to
work with some C library or whatever ;)...this is not ruling assembly
coders out of the picture, it's playing into our hands...
And, you know, I _thought_ that might be the case because one thing I've
noticed about those Intel engineers (whatever complaints I'll make about
"real mode addressing" and their syntax but, to be honest, I wouldn't bet
against the Intel engineers having an equal opinion of both these days as
"oops, we cocked up a little there, didn't we?" too ;) is that they don't
make HLL chips...they make assembly chips and then _tack on_ "HLL
helpers"...for instance - and this also plays into my "parallel works for
us, not against us" point - the MMX stuff is not used by compilers
naturally...you either have to "go assembly" to get at things like MMX, SSE
or SSE2...or you use those "Intel compiler intrinsics" things to access
them via a C / C++ compiler...but, of course, in doing so, you're now
"non-portable" - it's Intel specific - even if it's coded in C /
C++...hmmm, you might as well have coded it in assembly language
directly...that "portability" has gone out of the window because you've got
to code _specifically to_ these new SIMD instructions...the HLLs and the
compilers are _losing_ their advantage with Intel...similarly, think about
"instruction scheduling"...yes, compilers are great because they can do
this automatically...except, oops, Intel have put this stuff - "out of
order execution" - into the CPU itself...the burden removed, to a degree,
from both human and compiler...except, wasn't the fact that the compiler
did this stuff for us, it's supposed fantastic advantage? Well, that's been
stolen away from the compilers...instead, the CPUs are now more "tolerant"
of a _human_ programmer being a bit relaxed about "instruction scheduling"
without that actually being at all costly to performance...
Look _real_ close at what Intel are doing and you'll see the pattern...the
"ENTER" and "LEAVE" x86 instructions have not been substantially improved
by Intel...in fact, they've made it that the RISC-y style performs as well
_or better_ than the CISC-y "ENTER", "LEAVE", "LOOP" and so forth...and
"ENTER" and "LEAVE" are there to support the HLL style "stack frames" (yes,
they can also be used in assembly but we also have "other means" that HLLs
don't have to do this kind of thing in a more optimised way ;)...
And looking at the simple description of the Itanium here, Intel haven't
changed their minds...those HLLs are going to have to call "ALLOC"
instructions that change the "register window mapping" to select which of
the 128 registers they are interested in...and then call it again when
another procedure is needed and call it again to "undo" what they just
did...yet more _redundent_ HLL operations that we can optimise away in
assembly language...
It has good old "register renaming"...but, hey, this is a RISC...it's
_exposed_...you ask for it with the "register rotation" commands...now,
there's a lot of "tricks" hiding in the ability to be able to instantly
apparently shift all the registers around waiting to be exploited there...I
mean, we're assembly coders...since when did we use the instructions for
their original intended purpose? We've got two levels of virtual mapping
onto registers...of course, you can ignore that bullcrap and just use them
as 128 registers straight...BUT there's a whole bunch of "tricks" possible
with something like that...it's like having "paging" on registers, as well
as memory...
Oh, goodie...here's another one they mention...the Itanium has a RSE unit
(a "register save engine") built-in...that is, when the registers are all
used up by all these "windows" everywhere, the CPU actually has a built-in
circuit that automates the pushing / popping of the registers to / from the
stack...
Huh?!? Why "goodie"? Surely this is a "HLL-like" feature? Surely the
automated algorithm won't do as well as a carefully planned and explicit
store / restore? Why get excited about that?
Exactly because we don't need it but the HLLs will be using it all the time
blindly...score one more point to our side...the HLLs will just blindly use
"windows" (with all that redundent "ALLOC" all the time ;) and then they'll
"overflow" the 128 registers...in kicks this RSE engine thing, which then
automatically saves the registers to the stack (and presumably grabs them
back off the stack when the "register window" is needed again ;)...and the
HLLs will be - as they always do because it's their "convention" to do so -
pushing and popping parameters to the stack, as they call functions which
call functions which call functions (just to add two strings together
probably too...99% "overhead", 1% "work"...all of which was probably
possible with a handful of machine instructions ;)...
The HLLs and "bloatware" programmers are going to - as they always do - use
this stuff blindly...they aren't going to exercise any discipline...they
aren't going to consider the consequences of their choices...they're going
to "overflow" those registers, for sure...give a "bloatware" programmer
twice as much memory and they just make the program twice as bloated...
This all spells, to me, an _increased_ advantage to assembly language
coders...we can just use the registers "as is" to send parameters directly
to procedures and just not bother with "preservation" at all (as far as
possible) and design it so that the RSE is never called into action at all
(because what it does is go around copying things to / from memory from /
to registers...an awful _lot_ of registers...with memory...with memory that
is over a bus...a bus that'll never be able to clock the same as the
CPU...a CPU that we can remain completely "on-chip" with almost everything
we want to do because of an abundence of registers...putting 32 and 32
together yet to get 64? This plays to our _advantage_, not disadvantage
;)...
Now, normal boring coders might not like the sound of RISC or parallel
computing but those who like _optimising_ their code are going to be over
the Moon...it brings in so many possibilities for "cutting corners", doing
things in "alternative" ways...a whole universe of those "medium level
optimisations" and "low level optimisations" that, to be honest, are too
few on the x86 architecture (you really could play more "tricks" with other
CPUs I've met ;)...
Sorry, I don't see this as anywhere near the "end of assembly
language"...this is the dawn of a whole new beginning...
Let me draw you back to Windows assembly language again...they also said
that it was "IMPOSSIBLE"...that was simply wrong...moreover, people still
say "it doesn't matter" / "Buy more RAM!" and such...well, let
them...they'll do it even more on an architecture like this, which is even
more "tolerant" of their "bloatware" attitudes...and then we'll sit back
and watch Farbrausch produce another 96KB first-person shooter
game...that's "impossible", apparently, by the way, even though it
exists...most of what this team does repeatedly - and let's attack another
"myth" at the same time: They produce these demos as regular as clockwork
without taking ten years to do it (because they use the "Werkzeug" and have
created themselves the _tool set_ and _re-use_ attitude that is the _REAL
THING_ that improves productivity - actually learning to be more productive
as a programmer itself - rather than what programming language they use to
do it ;) - is supposedly "impossible"...and they also, apparently, they
shouldn't have done it yet either because everyone knows it takes 7 billion
years to write assembly code or something equally stupid...
Myths and urban legend, folks...that's all it is..."bloatware" coders
inventing these things so that they can _justify_ that they don't do their
jobs properly, not actual truths...your "productivity" is the result of
your methodology and discipline, NOT what programming language you use
(with "code re-use" and "modularity" and so forth - and a "code library" -
you too can be coding in the blink of an eye...after all, you all insist
HLA must be a "HLL" because it has such a library...but let the myth be
exposed, "piecemeal" or "jigsaw" programming is possible in _ANY_ language
and it's just as instant ;)...
Let's not make the same mistake twice...people believed Microsoft for a
while when they said "assembly language is impossible with Windows" (when
they really meant: "please don't write non-bloated software that'll make
our joke-shop toy we call an OS look as terrible as it really is"...the
"bloatware" gets away with it when there is no "un-bloatware" to compare
to, that proves just had bad the "bloat" really is ;)...assembly language
temporarily kind of "died" for that period while the "myth" was believed
until those "pioneers", as Rene calls them, broke down the door and proved
that, in fact, it's not really that much more complicated than C
programming in Windows...the biggest problem is really just getting your
hands on "header files" - or, yes, an "equates list", if that how your tool
works - that's got all the constants and structures and API definitions
inside it...
_THAT_ is the real problem with Windows assembly coding and when it was
solved, the supposed "impossibility" now looks rather silly, doesn't it?
Yeah, RosAsm is "impossible"...HLA is "impossible"...MASM32 is
"impossible"...GoAsm is "impossible"...FASM is "impossible"...NASM is
"impossible"...all those 4KB and 64KB demos that perform mini-miracles,
they are "impossible"...
And yet the "impossible" exists...and the "impossible" will exist again, I
assure you...
The Itanium changes nothing regards this basic underlying point and all the
"myths" (indeed, I've a feeling it'll exaggerate what's already
true)...and, of course, this post will be on public archive...it'll remain
there...people will perhaps look back on it...and the bet I'm making is
that they won't look and think "what was she on?" but "was Beth the only
person who could see it back then?" ;)...
"Mother, mother tell your children
That their time has just begun
I have suffered for my anger
There are wars that can't be won
Father, father please believe me
I am laying down my guns
I am broken like an arrow
Forgive me, forgive your wayward son
Everybody needs somebody to Love
(Mother, mother)
Everybody needs somebody to hate
(Please believe me)
Everybody's bitching
'Cause they can't get enough
And it's hard to hold on
When there's no-one to lean on
Faith: You know you're gonna live thru the rain
Lord, you got to keep the Faith
Faith: don't let your Love turn to hate
Right now we got to:
Keep the Faith
Keep the Faith
Keep the Faith
Lord, we got to keep the Faith
Tell me baby, when I hurt you
Do you keep it all inside?
Do you tell me all's forgiven?
And just hide behind your pride
Everybody needs somebody to Love
(Mother, father)
Everybody needs somebody to hate
(Please don't leave me)
Everybody's bleeding
'Cause the times are tough
Well, it's hard to be strong
When there's no-one to dream on
Faith: You know you're gonna live thru the rain
Lord, you got to keep the Faith
Now you know is not too late
Oh, you got to keep the Faith
Faith: don't let your Love turn to hate
Right now we got to:
Keep the Faith
Keep the Faith
Keep the Faith
Lord, we got to keep the Faith
Walking in the footsteps
Of society's lies
I don't like what I see no more
Sometimes I wish that I was blind
Sometimes I wait forever
To stand out in the rain
So no one sees me cryin'
Trying to wash away the pain
Mother, father
There's things I've done I can't erase
Every night we fall from grace
It's hard with the world in your face
Trying to hold on, trying to hold on...
Faith: You know you're gonna live thru the rain
Lord, you got to keep the Faith
Faith: Don't let your Love turn to hate
Right now we got to keep the Faith
Faith: Now it's not too late
Try to hold on, trying to hold on
Keep the Faith"
[ "Keep the Faith", Bon Jovi ;) ]
Beth :)
- Next message: Betov: "Re: I've seen the future...and it works! (was: my view on this assembler is blah)"
- Previous message: Betov: "Re: My view on this "Is blah an assembler""
- Next in thread: Betov: "Re: I've seen the future...and it works! (was: my view on this assembler is blah)"
- Reply: Betov: "Re: I've seen the future...and it works! (was: my view on this assembler is blah)"
- Reply: Alex McDonald: "Re: I've seen the future...and it works! (was: my view on this assembler is blah)"
- Reply: Percival: "Re: I've seen the future...and it works! (was: my view on this assembler is blah)"
- Reply: Chewy509: "Re: I've seen the future...and it works! (was: my view on this assembler is blah)"
- Reply: luvr: "Re: I've seen the future...and it works! (was: my view on this assembler is blah)"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]