Re: Windows Assembly



On Wed, 14 Sep 2005 04:59:35 -0400, f0dder <f0dder_nospam@xxxxxxxxxxxxxxxx> wrote:

If you put too much in user space, you risk ending up callingkernel functions all the time, causing an insane amount of
context switches - bad.

I don't see how it could get worse than twice as many. Instead of just calling the kernel, and making one context switch, you call the kernel (for some IPC thing) and it sends it to the other process, which is two context switches. Just double the size of your buffers and you're back to the same number of context switches.


IMHO video drivers belong in the kernel, so that the drivers can take
advantage of whatever acceleration the hardware offers, and offer a nice
standard API and reduce programmers' headaches and wheel-reinventing.

Personally, if I wrote an OS I'd do everything in userspace, but so long as Linux is the way it is, then the video should be in the kernel just like everything else is.


Sockets is probably too high overhead for something as speed-sensitive as
graphics;

Sockets are actually not that bad. (Or at least anonymous pipes which I've been using.) All the kernel has to do is copy the memory from one processes address space to the other's, which isn't really any different then what has to be done when you send data to the kernel.


Then to get your virtual consoles, you start up about six VT-100
emulators which connect to this video driver, and the video driver
lets you switch between them with Alt-Fn, and there's your virtual
consoles.

Well, the VT-100 emulators would use the console subsystem which in turn would communicate with the video driver - but yes.

I just don't see why the VT-100 emulator needs to be in the kernel. In particular, if it were a user space application, then it could be the same thing that runs under X, and we wouldn't have this problem where programs that run great in a console don't work correctly in xterm.


Of course, half of those problems would disappear if the kernel didn't send delete when you press backspace.

implemented via hooks and whatnot to make it generic enough for
not just X to use it.

In the kernel source there's a comment that says something like "I'm not sure what this is supposed to do, but X seems to want it to do this, so that's what I made it do." The kernel shouldn't be designed for X.


Somebody needs to give the linux kernel team a good beating and drag them
out of the 70es. Or perhaps we should just sit back and wait until Mac OS X is fully ported and publicly available for x86 hardware. IMHO, it's the only real alternative to windows.

I read a web page once that explained why everything sucks. Basically, things that suck take less time to develop than things that are good, so the things that suck are the first ones to appear. So when they appear, everyone starts using them, and they become the standard, and the good things that come later don't have much of a chance.


True. But unless I need blending effects (or other things that require
read-back from video memory), I think I'd prefer to draw directly to the
video card backbuffer instead of using an intermediary system memory buffer.

I always use a buffer. The reason is that it's easier to write a program that always uses one memory configuration.


For example, if you're writing a 256 color application, just make a linear buffer that's one byte per pixel and make everything draw into that, then when you're done, have a function that converts that to whatever the video card actually uses, wether it be a 8/15/16/24/32 bit color, or even something as messed up as VGA's four plane mode that you have to use with the 320x240x256 mode. I think that's easier than writing one line drawing algorithm or texture mapping function that works on 8 bit modes, another that works on 16 bit modes, another that works on 24 bit modes, and another that works on that four plane memory configuration.

Also, you can just write your color data in one format that way. Between 256 color, 5/5/5 color, 5/6/5 color, 24-bit color, 32-bit color, all with either RGB order or BGR order, there's 9 different color formats to support. You can either write in just one format to a buffer and then have some function do the conversion at the end when writing to video ram, or you can make your line drawing or texture mapping functions have to deal with all of that.

Of course, you could just require a 24-bit RGB linear framebuffer and bomb out if the video card says it can only do 24-bit BGR linear or 24-bit RGB non-linear, but that's not very user friendly.

System buffers become a bit slow in high-res modes unless you have some more intelligent blitting code than simply copying everything over.

It's been my experience that a ram to ram copy is 20 times faster than a ram to video ram copy. I first noticed this under Linux and thought it might be a side effect of having to mmap /dev/mem for memory access. I thought maybe the kernel was being stupid, and rather than map the memory directly to my address space, was taking the page faults to that address range and sending them to the code that pretends that the memory is a file as if they were writes to the /dev/mem file. So I wrote a test program under DOS and sure enough it was every bit as slow there. Thinking about it, I remembered a DOS screensaver I wrote that ran at 70 frames a second writing to video memory (only coincidentally the same as the refresh rate), but ran at 1250 frames a second if it simply wrote to RAM instead. That's an difference of a factor of 18, which is the same as what I got from my tests under DOS and under Linux.


So the effect of using a buffer isn't that great. Writing to video ram that's 18 times slower than normal ram, it'll slow your code down by a factor of 1/19, or 5%. However, in using the buffer, not only do you get to easily support nonlinear modes, but you can probably make up the 5% with optimizations you wouldn't otherwise be able to make. Certainly a texture mapper that only has to deal with one video format and one color format is going to have a faster inner loop than one that has to deal with many.

And as long as you're updating the entire screen each frame (like in a 3D game) then you can do the ram to video ram copy faster if it's done as dwords in linear memory order as opposed to whatever your drawing routines happen to want to do. I'm not sure why it's that way, maybe writing a byte to video ram incurs the same io delay as writing a dword does. (That would make a lot of sense.) This alone will probably give you back your 5% plus an extra 50%.

It may even be beneficial to use the rep movsd instruction for the ram to video ram copy, since I know that the processor does some optimizations for those moves, like since it knows that it's doing a large block, it reads and writes 16 bytes at a time or something like that. I haven't tried this myself though, since Softer has to convert to funny VGA formats along the way and so it just reads and writes dwords at a time.

Now if you're not updating the entire screen each frame, then copying the rest of that buffer is a waste of time. Softer overcomes this by keeping another buffer of what's in video ram, and comparing each dword that it's about to write to video ram to what is in that buffer, and if it's the same, then it doesn't write that dword to video ram. That sounds like complete nonsense, and would be if writing to video ram wasn't such a slow operation, but it actually sped it up a lot. Before I made that change it got 155 frames a second, after I made that change it got 146 frames a second if the entire screen changed, but when only a small part of the screen had changed, it got 288 frames a second. I'm sure the effect would have been even greater if it didn't have to do the single plane to four plane conversion before deciding wether or not to write the dword.

There must really be some kernel mode support, since dong v86 by
hand is quite some work.

There is the vm86 system call. The problem is that the man page for it only mentions that it exists and that it takes a couple of structures as parameters, it doesn't detail how to use it. I looked at the structures, as usual the individual fields in the structures are named with mystery TLAs and there's no indication anywhere in the header file what sort of things I might need to put into those fields.


In all likelyhood the call was only added to the kernel because someone working on dosemu needed it, and so no one ever bothered to document it since the dosemu guys already knew how to use it since they designed it.
.




Relevant Pages

  • Re: enable ECC in OS code?
    ... Copy the kernel to the video RAM, jump to it, enable ECC, copy back. ... Not just the kernel - you have to copy all the memory that is currently ...
    (freebsd-hackers)
  • Re: viafb triggers BUG at mm/vmalloc.c:294 [kernel 2.6.28.3]
    ... I'm not quite sure what's happening here (i'm not kernel ... developer). ... that video ram is allocated from system memory by bios. ... kernel know that and does it try to allocate the framebuffer in that ...
    (Linux-Kernel)
  • Re: enable ECC in OS code?
    ... Copy the kernel to the video RAM, jump to it, enable ECC, copy back. ...
    (freebsd-hackers)
  • CONFIG_PACKET_MMAP revisited
    ... I've been looking into faster ways to do packet captures and I stumbled on ... In that discussion Jamie Lokier suggested having a memory buffer that's ... shared between user and kernel space and having the NIC do DMA transfers ...
    (Linux-Kernel)
  • Re: contigmalloc() and mmap()
    ... there seems no big differences between the kernel ... > to the card on another node, it will be DMAed to memory too. ... The buffer is mmaped to user process space, ... > mmap driver's buffer (allocated by contigmalloc()) and is killed, ...
    (freebsd-hackers)