How to create a library that detects the processor and uses SIMD instructions as necessary



I'm wondering this largely hypothetical question.

Suppose I was to create a standard C library that would detect what
kind of SIMD instructions and use them (suppose that using stuff like
MMX or SSE could speed up certain functions). How would I create one
library that when first called, detect the CPU and select the
appropriate functions, without taking penalties -- such as requiring a
branch on all affected functions such that a CPU detect function is run
once and the most optimized libraries selected, or modifying the libc
behaviour, like asking the user to run a CPU detect function.

Currently, I figured that a static global boolean could be added for
the library, all affected function would have a branch for a CPU detect
function, and once the CPU is detected, the boolean is set so that the
detect function won't be called. But then even if this scheme works,
the CPU detect function is still called once per process. Also say
function f and g is affected - i.e. they all have an if statement
calling the detect function if it hasn't been called - the CPU may have
called the detect inside f, and then be able to branch predict properly
for f, but the same branch may not be detected for g, and suffers. (Yes
I realize that saying this means that I'm _really_ counting cycles, but
then I'm crazy enough to think of this harebrained scheme).

The reason for this esoteric question is that I was wondering that
assuming SIMD instruction could give certain libc functions a boost,
what's the cleanest way to implement a modified version without a
noticible penalty or modifying how existing programs work.

Or is it the case that it's better to create different versions of
libc, such as a generic, MMX, SSE, AltiVec, etc... and then simply
allow the user to load a different one matching the right processor.

.



Relevant Pages

  • Re: <ctype.h> toLower()
    ... I think you are confusing portability with binary compatibility. ... to be CPU specific. ... compiler and thencan specify the machine type. ... >> Object libraries are definately platform specific. ...
    (alt.comp.lang.learn.c-cpp)
  • Libtool1.5 fails on freebsd6.0-BETA4
    ... checking whether the cc linker supports shared libraries... ... CPU: Dual Core AMD OpteronProcessor 275 ... acpi0: on motherboard ... <ACPI PCI bus> on pcib0 ...
    (freebsd-current)
  • Re: learning asm.
    ... | problem...such libraries are monsterous and lead to _more complex_ ... |> portability of libraries won't make too much sense. ... | of a choice between HLLs... ... but target a specific CPU, ...
    (alt.lang.asm)
  • Re: Flash Player for FC3 64 bit
    ... > same speed 32bit CPU. ... It ain't necessarily true. ... or on a 64 bit OS, and (with the AMD designs) this is more or less true. ... Now some of these libraries will probably ...
    (Fedora)
  • Re: assembly in future C standard
    ... How would the C language or library be changed in light of any such ... The last changes to the C language for "CPU advancements" included ... a simple case without any prompting from the language standard. ... usually at least /try/ to exploit the SIMD instructions. ...
    (comp.std.c)