Re: Can this loop be made faster ?
From: Beth (BethStone21_at_hotmail.NOSPICEDHAM.com)
Date: 02/28/05
- Next message: '\\\\o//'annabee: "Re: When to use Rosasm, when to use Masm?"
- Previous message: FreeZ: "Re: When to use Rosasm, when to use Masm?"
- In reply to: '\\\\o//'annabee: "Re: Can this loop be made faster ?"
- Next in thread: '\\\\o//'annabee: "Re: Can this loop be made faster ?"
- Reply: '\\\\o//'annabee: "Re: Can this loop be made faster ?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Mon, 28 Feb 2005 06:58:26 GMT
'\\o//'annabee wrote:
> wolfgang kern wrote:
> > "The Half" skrev:
> > | > If just all this new hardware would come with detailed
documentation
> > | > instead of HLL-created windoze-drivers...
> > | Yes....but doesnt Linux have such drivers ? If it does could the info
> > | not be ported ?
> >
> > You know my opinion about Linux:
> > an "open 'C-source' variant of "WinDuNix" ;)
> > with the INT80 API (~200 wasted cycles for every function call).
> > [KESYS also got a INT7f API, but only used for calls from RM or PM16]
> >
> > Last time I checked they also don't had any Nvidea details avialable.
>
> As far as I heard from one Linux entusiast, they had NVIDIA drivers for
at
> least 2-3 years. He told me they could play a few 3D games that had been
> ported over there. Maybe Percival could give some input here, as he seems
> to be a Linux enthusiast. 200 wasted cycles per call ? Is that minus the
> functions work ? Thats sounds really terrible. That would mean a call to
a
> CopyRect function in the API, could take longer time for overhead than
> Windows Copyrect does in total ? Can this be correct ? Thats sounds
> terrible.
Ah, right...basically, nVidia themselves have a "policy" that they no
longer give out public "specifications" for their video cards...doesn't
matter what OS that is, nVidia themselves no longer do it...
BUT, as "compensation", nVidia do make the effort to support Linux by
providing "binary only" drivers...the drivers are "closed source",
unfortunately...and there's a rather annoying thing that an nVidia "logo"
appears when X is booting up (it's only there for a few seconds and isn't
quite so bad...but, well, I don't like "forceful advertising", as you
probably know)...but the drivers work just great...all the OpenGL
screen-savers (about the only things on my Linux box at the moment that
needs the 3D stuff :) run smooth as silk and look cool...
So, really, this situation is not Linux's fault...it's nVidia's
"policy"...it's exactly the same on Windows and other OSes...they just
don't give out "details" anymore...but the cards do have "binary only"
drivers available that nVidia themselves have created for Linux...
Mind you, with Windows, _ALL_ drivers tend to be "closed source / binary
only" drivers...the only real difference is that nVidia insist that this
stays the same on Linux too...annoying because that makes it NOT "open
source", while pretty much everything else is "open source"...but, you
know, the problem is not "lack of support" on Linux - the drivers are there
(and have been for a few years, as your "Linux enthusiast" reports :) - but
nVidia's "closed source" policy is rather annoying...especially because, on
Linux, you _can_, if you really wanted to, get "direct access" to the
hardware (Linux has "drivers" but it also has system calls to "ask" for I/O
port permissions to be "lifted" to let programs send data with "in" and
"out" directly too :)...
Also, the 200 cycles is about INT 80h _system calls_...note, Linux's design
is NOT like Windows' design...there is no "CopyRects" API...the system
calls are the _base_ stuff...open file, read file, write file, seek file,
change "user ID", change "group ID", ask for I/O permissions and that kind
of thing (stuff where you often really _do_ need the OS to do it...for
example, the system call to "change user ID" _should_ go through the OS in
order to make sure "security" is probably kept...the thing to realise is
that Linux _does_ have "security" - for the user's benefit (you know, stops
your 9 year old nephew wiping out all your files or installing games that
you've "banned" him from playing and, in a business environment, to stop
"crackers" coming in and stealing all your "secret plans" for a new product
and so forth ;) - but this is distinct from Windows' "protections"...you
know, if you have the right "privileges" - "root" can do anything - then
Linux does allow you to "by-pass" drivers and do "direct access" and run
programs as other users and that kind of thing...it _does_ take steps to
stop people doing things that the "security" settings say they shouldn't be
doing but it's not "nanny", it's just "security"...note: If you run
absolutely everything as "root", then "root" has privileges to do
_everything_ so it will then run as if no "security" was there at all,
prohibiting nothing you ask for :)...everything else is done with _user
code_...for example, the XFree86 code actually _IS_ using "direct access"
for the video and such (well, it is possible to "recompile" with "DirectFB"
drivers...but unless you re-compile it or download a pre-compiled
"DirectFB" binary, the "default" is the "direct access" stuff)...
Plus, the Linux kernel has been "improved" that, on machines with the
"SYSENTER" instruction (for "fast system calls"), you can use the
alternative "SYSENTER" method of making the system calls...I have yet to
time those, though, so I don't know how much of an improvement that
actually is...
Not that Linux is perfect...I agree with wolfgang that "INT" is not the
best choice (but, as noted, it's now not the only choice...the "SYSENTER"
instruction is supported for machines that have the instruction - all new
machines do (correct me if I'm wrong but it was an AMD addition that is
also in Intel from, I think - don't quote me - Pentium II onwards :) - and
this is direct hardware support for "fast system calls"...a simple but
effective idea...the OS sets up special machine specific registers with an
"entry-point" and a "stack"...the OS is just "trusted" that this is correct
and when "SYSENTER" happens, it swaps CS:EIP with the values in the special
registers and starts executing at ring 0...basically, it doesn't bother
with "checks" and doesn't bother with "saving" anything...just directly
swaps the values and "trusts" the OS (only "ring 0" can set the special
reigsters :) that all is correct and can, thus, "by-pass" making
"protection checks"...as all the "overhead" is to do with making "checks"
and "saving registers" and that kind of thing when going from user ->
kernel (ring 3 -> ring 0)...the AMD guys also saw this "problem" and
invented "SYSENTER" to attempt to deal with it...note, also, the way
"SYSENTER" works is just right for Linux's system call design - indeed, the
"change" you need to make is just swapping "INT 80h" for "SYSENTER" - but
can't help out a great deal with Windows' function calls because those all
have _different_ entry-point addresses...anyway, with Windows, you're not
calling the kernel direct _ever_...it's all being piped through
"kernel32.dll", which passes to "ntdll.dll" on any NT-based Windows and
then, somewhere in there, it presumably makes the jump to the kernel
proper...BUT, being "closed source", we can't exactly know for sure...well,
not unless you want to reverse engineer Windows or something...but *ahem*
that wouldn't come under the European "exemption" on "reverse engineering"
so you legally shouldn't be doing that and it would be illegal (plus,
Microsoft have an explicit clause against it in the EULA, anyway)...
But, really, it's about the best you're going to find outside of
"KESYS"...remember that wolfgang is the proverbial "uber-coder"...so, sure,
his OS is "extreme!"...but, note, there is also the point that Linux does
have "security" and "protection" because it separates ring 3 and ring
0...in wolfgang's OS, everything runs at ring 0, if I understand
correctly...of course, wolfgang's point is that if you don't care about
"security" and "protections" then, yeah, his OS is "optimised" not to have
the "overhead" of that...but for those that _do_ have the "security /
protections" features, Linux does it about as "economically" as you're ever
likely to find (certainly from any "non-hobby" OSes :)...
You would NOT be "wasting" any cycles in that particular case because,
unlike Windows, you don't always call "system calls" to do everything,
including tying your own shoe laces...
If you know the "details" of the hardware then you can "deal direct"...note
that XFree86 is doing exactly this itself...via the "driver" route, there's
OpenGL (okay, not the best standard in many ways, but it's available, "open
standard" and uses "3D acclerations")...outside of X and OpenGL, the story
is a little confused...but this is "historical", really: Linus didn't have
anything but "text mode" originally...when X was added, it used "direct
access"...which is great for things running under X but not useful for
programs outside...so, there was "SVGAlib" at first which gave SVGA
access...now, though, "DirectFB" is making its way onto the scene
(possibly, if it "establishes" itself well, there are versions of X that
might even use "DirectFB" as the "driver": Properly "unifying" it all to
end the "confusion"...Linus really should have "left room" for it, so to
speak, but he only did the "text mode"...and, thus, the graphics was
originally a bit "chaotic" and undersupported but this is being
rectified..."historical reasons", so to speak :)...this is nice in that it
makes the frame buffer a "/dev" file and accelerations are part of it too,
apparently...still "fledgling", though, really...but this will basically
work towards a DirectX-like graphics drivers scheme...there are probably
other options too but I myself would need to look around for that...
And let's just be clear on where the majority of that "overhead" is coming
from: The _CPU_...that is, the "cost" is the "user -> kernel"
transition...the CPU (when not using "SYSENTER") makes lots of "checks" and
"protections" stuff...this "overhead" _WILL_ be equally present on any
"user -> kernel transitions" Windows makes too (any OS that has the "kernel
ring 0 / user ring 3" protection rings design)...wolfgang's KESYS runs
everything at ring 0, which reduces the "protection checks" that the CPU
does...note that this has "security" and "stability" considerations,
though...as all applications run at ring 0 in KESYS, any application can
access "privileged" instructions and CPU tables...hence, no real "security"
and, should a program have a "bug", it _could_ potentially bring the entire
system down...wolfgang, though, simply has a "don't run any such programs"
policy...which can work with a small OS like his...not really an option,
though, for something like Linux: Caters for a different wider
audience...for example, business customers would highly prize "security"
and "stablility"...
[ Personally, as I've thought over this myself, I can only think of one
better possibility (mind you, never tested it...just a "thought experiment"
at the moment :)...a "peer-to-peer nano-kernel", to invent a term for
it..."nano-kernel" implying "extreme micro-kernel"...that is, the reverse
of wolfgang: Run absolutely everything at ring 3...even the kernel...being
"same privilege level", it also escapes the extra overhead (changing rings
is where the transition is "nastiest" about "overhead")...there is one
very, very tiny part that runs "ring 0" and its sole job is to provide
"access" to other parts...so, everything runs ring 3 but it can make a
"request" to this small "security" module (the only thing that runs ring 0)
to "open up access" to port I/O, physical memory addresses and CPU tables
and so forth...these "requests" are "vetted" for "permissions" but then
once granted, even though these parts run in ring 3, their "I/O bitmap"
will be open in the right places for "direct access" and the memory will be
"mapped" for memory-mapped devices and access to "page tables" and so forth
(yes, even the memory manager runs "ring 3"...as I say, this is
"nano-kernel" and goes to the opposite extreme)...so, as we can see, we're
not giving up "security" and "stability" BUT applications can "request"
permission and then "direct access" whatever they like (so long as they
have "permission" to do so :)...yes, even parts of the OS itself abide by
this, like the memory manager and such...the "peer-to-peer" aspect is that
of NOT using the "layered architecture" (e.g. application -> OS -> drivers)
but to have each module as a "peer" and allowing direct communication from
any "peer" to any other via an OS call (e.g. application -> request to OS
for driver, OS checks "security" and it's okay -> returns direct "channel"
to driver, application -> driver thereafter, cutting out the OS
"middle-man" wedged in between)...again, the idea being to try to improve
it all without compromising the traditional stuff of "drivers" (these can
be a "convenience" for "portability" when that's needed..."direct access"
would NOT be prohibited, though, if you needed it), "security", "stability"
and such...basically, "all ring 0" is a security problem, "ring 3 -> ring
0" is costly...so, instead, everything is "ring 3" (save one "module" at
ring 0, called infrequently, simply to "dole out permissions" to other
modules to allow them to do things from ring 3 that "transition" to ring 0
isn't needed)...and, for "drivers" and such, go "peer to peer" rather than
"client / server", then drivers can serve applications as directly as they
do the OS itself...the OS is there to "regulate" but it does "one-time
only" checks and then, if all is well, links up the "modules" to talk
directly to each other..."standards" for the driver interfaces replace "OS
nanny dominance" instead, in the old BIOS style :)... ]
> > | Ok. If you havent tried it, you _should_ at least take a look at HL2.
>
> > Ok, next time on Google I'll try to "find" (I do hate search machines).
> > Sure, the future of games seems to be interactive movies, I don't like
> > it.
>
> I try find the link for you.
>
> Here it is. The download link is in the right corner.
> < http://www.ati.com/halflife2/index.html >
>
> I think this is a flash link maybe.... :-( I am sorry. The world has gone
> amokk.
> Try this to avoid the flash page : <
> http://www2.ati.com/hl2/demo/steaminstall_hl2demo.exe >
>
> You should use your highspeed connection for this.
> You can forget downloading it via modem. Its 400+ megs or something. You
> also need a hw able to do pixel shading. I just bought one of the earlier
> pixle shader cards, it was "cheap". I can play it --ok-- in 640*480 mode,
> on a 1000mhz athlon, using an FX5200 T128 Nvidia card, and 256 megas
> memory. But this is really barly able to do it. :-( As long as I dont
move
> to fast. The game is made so that the graphic settles and becomes
smother,
> if you dont move to fast around. And for the single player demo, it is
> actually playable then. More than enough to get an impression of what
they
> are capable of.
Actually, I tried running DOOM 3 on a machine "below spec"...good enough
video card for the "bump mapping" but crap low amount of RAM...it's the
mark of John Carmack progamming to note that it _does_ "degrade
gracefully"...in fact, it worked perfectly well, except for when you move
from room to room, there is a "pause"...basically, Mr.Carmack seems to have
it load "room by room" to keep the RAM usage clean (might explain why it's
all mostly "indoors" action too...in setting it that way, the "room by
room" scheme can keep the RAM usage down :)...hence, when you run with less
than the minimum RAM, what simply happens is not that it can't run the game
but that it simply can't "cache" up the next "room" (not enough RAM for
it)...making it "delay" noticably when changing from "room to room"...BUT,
once the door shuts on this new room (flushes the old room out of RAM?), it
also "settles" nicely and inside that room: No delays at all...
Sounds like Valve are similarly minded in their coding from what you say
there...they try to "degrade gracefully" as best as possible to do the best
with what hardware is there, even when it's really "below
minimum"...although, obviously, the "minimum" of "can do pixel shading"
can't really be avoided - that's a _true_ "minimum specification" for
once - because the games use that effect and it's too "CPU intensive" to
"software emulate"..."pixel shading" is based on a new "programmable"
technology in the card so does represent, for once, an _actual_ "minimum"
barrier...
Half-life 2, though, doesn't use too much "pixel shading" effects, though,
from what I can see...the people's faces (woah! How spooky is that "fascist
police state" beginning and the shocked looks on the other people's faces?
Very Orwellian and spooky! :) have a touch of "bump mapping" (makes it look
almost too realistic that their shocked looks at the start are really very,
very spooky indeed :)...but not much else seems to use it...in contrast to
DOOM 3, where _every single surface_ is "bump mapped"...though, *ahem*,
where did that "self-shadowing" disappear to, Mr.Carmack? Probably why it
was delayed...they started doing that and then realised it would just be
asking too much...though, there is a "mod", apparently, that can put the
"self-shadowing" back in...and there's even a "parallax mapping" mod
too...now, "parallax mapping" is where it's going to be at next...this also
uses the "pixel shading" even further to add the very subtle "parallax" of
rough surfaces...with "parallax mapping" (plus "bump mapping"), polygons
will finally completely lose their "flat" look altogether...
The "parallax mapping" extends the "bump mapping" stuff by actually using
the "pixel shader" to subtly shift the pixels according to the "bump map"
to account for perspective...it's a subtle effect...basically, in DOOM 3,
if you move "sideways on" to a wall or something (so that you're up against
it and looking down its length), then the _lighting_ suggests it's
"sticking out" - bumpy - but it's still "flat"...with "parallax mapping" it
shifts the pixels around as another step on top of the "bump mapping" on a
"per pixel" basis and, thus, the bumps really are "sticking out", so to
speak...and they'll move properly in "perspective"...it gives the
appearance that some "wall" is rendered with tens of thousands of polygons
but, nope, just two polygons and a "pixel shader" routine that's working
overtime to both "light" and "distort" the pixels according to the "bump
map"...
> I allready had steam installed when I run the installer. I dont know if
> this is needed. But if it is, here is the link.
> http://www.steampowered.com/download/SteamInstall.exe (This is a direct
> link.)
Funny, really...that "Steam" thing is comparatively a bit "low tech"
compared to the game itself...but, well, the truth of that "Steam" is that
it's a sneaky way to deal with "piracy"...basically, when you install the
game, you've got to have a "steam" account and login to the "server"...this
links up that copy to that specific "steam" account...I was wondering when
other software would also start with similar "activation" stuff to
Windows...
Beth :)
- Next message: '\\\\o//'annabee: "Re: When to use Rosasm, when to use Masm?"
- Previous message: FreeZ: "Re: When to use Rosasm, when to use Masm?"
- In reply to: '\\\\o//'annabee: "Re: Can this loop be made faster ?"
- Next in thread: '\\\\o//'annabee: "Re: Can this loop be made faster ?"
- Reply: '\\\\o//'annabee: "Re: Can this loop be made faster ?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]