One way is to use the Function Multiversioning feature in GCC, write a test program, and see what version of the function (dependent on your CPU arch
) will it pick.
The foo
function from the program below will create multiple symbols in the binary, and the "best" version will be picked at runtime
$ nm a.out | grep foo
0000000000402236 T _Z3foov
000000000040224c T _Z3foov.arch_x86_64
0000000000402257 T _Z3foov.arch_x86_64_v2
0000000000402262 T _Z3foov.arch_x86_64_v3
000000000040226d T _Z3foov.arch_x86_64_v4
0000000000402290 W _Z3foov.resolver
0000000000402241 T _Z3foov.sse4.2
0000000000402290 i _Z7_Z3foovv
// multiversioning.c
#include <stdio.h>
attribute ((target ("default")))
const char* foo () { return "default"; }
attribute ((target ("sse4.2")))
const char* foo () { return "sse4.2"; }
attribute ((target ("arch=x86-64")))
const char* foo () { return "x86-64-v1"; }
attribute ((target ("arch=x86-64-v2")))
const char* foo () { return "x86-64-v2"; }
attribute ((target ("arch=x86-64-v3")))
const char* foo () { return "x86-64-v3"; }
attribute ((target ("arch=x86-64-v4")))
const char* foo () { return "x86-64-v4"; }
int main ()
{
printf("%s\n", foo());
return 0;
}
On my laptop, this prints
$ g++ multiversioning.c
$ ./a.out
x86-64-v3
Note that the use of g++
is intentional here.
If I used gcc
to compile, it would fail with error: redefinition of ‘foo’
.
/sse3/
.) The de-facto standard is that runtime CPU dispatching only needs to check the highest SSE feature flag it depends on. – Peter Cordes Jan 27 '21 at 19:02popcnt
, but that's good to check explicitly. And other non-SIMD extensions like BMI1 are fully independent of SIMD (although since some BMI1/2 instructions use VEX encoding, they're normally only found on CPUs that support AVX. And unfortunately Intel even disables BMI1/2 on their Pentium/Celeron CPUs, perhaps as a way of fully disabling AVX.). – Peter Cordes Jan 27 '21 at 19:08-march=skylake-avx512
. – Peter Cordes Jan 27 '21 at 19:12/lm/
will match anything containing those characters). I followed the exhaustive level definitions as used in the first answer (that’s where/ssse3/
without/sse3/
came from), even though as you say many of them are redundant. (I’ve been following the discussions leading up to the definition of these levels.) – Stephen Kitt Jan 27 '21 at 19:29lm
is long mode; checking for level 1 is basically just a sanity check of CPUID flags if you're already running a 64-bit kernel because those are all baseline for x86-64. (Also, my comments aren't fully directed at your answer, some of it I just wanted to put somewhere on this page for future readers. Also: Are older SIMD-versions available when using newer ones? / Do the MMX registers always exist in modern processors? / Does a processor that supports SSE4 support SSSE3 instructions?) – Peter Cordes Jan 27 '21 at 19:37