I've read somewhere that recompiling libc
with the -march=native
and -mtune=native
flags will provide the maximum benefit for programs, where shared libraries are used instead of static libraries. Is this true, and might there be any additional benefit by recompiling other programs?

- 1,371
-
For x86 architectures. Doing this optimisation for 32 bit code will have more affect than for 64 bit (at this time). As there is more diversity in the 32 bit processors. You may have noticed that some distros come in different flavours (sub architectures): 386, 486, 586, 686, x86_64. choosing the highest of these that will work, may be good enough. – ctrl-alt-delor Jul 13 '15 at 18:37
3 Answers
The -march=native
and -mtune=native
options will ensure generated binaries best utilize the available processor feature sets and scheduling. Any gain in performance will relate to how much of the application code may be optimized by using the additional processor feature sets. (YMMV). Optimized libraries and binaries should run faster in comparison with generic binaries, but how much is difficult to quantify without testing. So, the short answer is yes there might be a performance gain by recompiling your applications with CPU optimizations, however, maintaining your own optimized builds and keeping up with security updates, etc. will likely be a nightmare.
More information about GCC 4.4.4 i386 and amd64 architecture options here.

- 654
-
1Just a small note:
-march
implies-mtune
.-mtune
retains compatibility with other processors within the architecture but favors the set type,march
drops the compatibility. – WhimsicalWombat Jul 09 '15 at 03:04
There is no short and easy answer.
1.
There are lot of parameters like code cache/pipeline size, difference beween cache speed and main memory speed, code size with "-Os" vs "-O2", "-O3", code size using some generic "march=X/mtune=Y" settings vs "=native".
When more code fits into the cache, this performance gain might outperform some other optimisations. Some optimisations increase the code size...
If more code fits into the cache, more code of different running tasks in parallel fits in the cache, this may be a desired aspect too...
It will take a lot of research to provide an exhausting answer.
2.
Using different compiler flags and options may trigger different bugs and misbehaviour.
So recompiling a a central part like libc or even the whole distribution will make your bug reports unusable for others, they simply will not be able to reproduce your problems easily. Your setup turns into a lonely island...
3.
The social aspect: If you don't optimize parts of your distribution, bug reports from your installations are replayable by the maintainers and sending bug reports will help to evolve this distribution.
4.
And probably the gain of speed is not worth the weeks of recompiling (if optimising not only libc) and cutting yourself off from mainstream.
...
If you have speed problems to solve, a faster system probably is the efficient solution.
There are performance benefits but they are small enough that you won't notice them unless you benchmark them against each other. And like yeti wrote, there are lot more variables that affect the speed. In general, it's not worth building custom versions of single libraries if you are on binary distribution because the onus of keeping that one library up to date will fall on you and it's easy to forget to update it.
Some programs may benefit more than others. Especially math-heavy programs like folding@home or similar, cryptocoin mining, encryption, media encoding. It'll aid in media decoding too but the most important stuff like MMX, AVX and similar will be compiled in regardless of your -march
so it's likely that you won't notice the difference watching movies. Real time audio (like JACK), on the other hand, may benefit since smallest delays affect the sound quality. These are also less critical to upgrade promptly in case a vulnerability is detected compared to base libraries like libc since you can just not use them until you have upgraded it.
If you're interested, try source code based distribution where everything will be compiled with flags of your choice. Code compiles very fast on contemporary processors so it's not as painful as it once was. Gentoo is the most used of them.
Other than that, you can play around with a lot of parameters that likely affect the performance more than source code -march
via /sys filesystem. For example, /sys/block/sd?/queue/
houses scheduler settings which may affect the overall performance a lot. I switched from CFQ to deadline and it improved the interactive performance noticably on my particular workload. Should be said that CFQ has a whole bunch of settings that I could've tweaked to my liking as well.
Another 'treasure trove' is /proc/sys/
. For example, adjust /proc/sys/vm/swappiness
to change how fast memory is freed by moving old stuff into swap. Red Hat has a nice primer for the parameters.
EDIT: Added couple of examples of programs more likely to benefit from -march

- 623