2

Got AMD Ryzen Threadripper 2970WX 24-Core Processor (on X399-AORUS-XTREME-rev-10 mainboard) to my workstation for a C++ project compilation, but even being super-multi-threaded - it is worse then my 5yrs old i7.

$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       43 bits physical, 48 bits virtual
CPU(s):              48
On-line CPU(s) list: 0-47
Thread(s) per core:  2
Core(s) per socket:  24
Socket(s):           1
NUMA node(s):        4
Vendor ID:           AuthenticAMD
CPU family:          23
Model:               8
Model name:          AMD Ryzen Threadripper 2970WX 24-Core Processor
Stepping:            2
CPU MHz:             548.955 <-??
CPU max MHz:         4200.0000
CPU min MHz:         2200.0000
BogoMIPS:            8384.14
Virtualization:      AMD-V
L1d cache:           32K
L1i cache:           64K
L2 cache:            512K
L3 cache:            8192K
NUMA node0 CPU(s):   0-11
NUMA node1 CPU(s):   24-35
NUMA node2 CPU(s):   12-23
NUMA node3 CPU(s):   36-47

Now, even when running compilation, which simply grabs 100% for all available cores, no core is not getting over 550Mhz.

$ watch -n0.2 'cat /proc/cpuinfo | grep MHz'
cpu MHz     : 548.904
cpu MHz     : 548.598
.... many more like that ...

OS (Ubuntu 19.04) performance on its own feels like from a past.

Additional info: OS storage is achieved through 2 SSD M2s in ZFS. 32GB RAM DDR4 on 3200 MT/s. Mainboard is on the latest BIOS F5j.

Any idea what's wrong?

==================================================================

Tried: for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "performance" > $file; done

-> Problem persist

==================================================================

Intermezzo:

Several restarts and random BIOS changes, I've ended up in a state where /sys/devices/system/cpu/cpu*/cpufreq doesn't exist anymore. Doing more research on this, found following link where similar problem was solved by updating BIOS to the latest.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/351159

Checking BIOS version it most probably downgraded back to F4 version. So did another update to the latest F5h and options re-appeared.

-> Still the original problem persist

==================================================================

blech
  • 21
  • Your BIOS and settings are up to date? Your kernel version might be interesting too. – Freddy Oct 17 '19 at 04:58
  • Thanks I am on "5.0.0-32-generic GNU/Linux" + did BIOS upgrade ~1 month ago to the latest. – blech Oct 17 '19 at 05:01
  • Do you have cpufrequtils or cpufreqd installed? the former can be used to set the governor manually (or on boot). and the latter monitors the system and selects "the most appropriate" governor. I don't bother with cpufreqd - I just use cpufrequtils to set the governor for all cores to either on-demand or performance, depending on what I'm doing. BTW, on my threadripper 1950x i'm getting ~ 3700 Mhz on all cores/threads with "performance". – cas Oct 17 '19 at 08:49
  • BTW, what do you mean by "RAID over ZFS"? have you got the 2 SSDs configured in HW or SW raid and then use ZFS on that? If so, that's a bad idea. Give ZFS both drives (or partitions on the drives if you need a non-ZFS /EFI or /boot or swap partition) and let ZFS handle the mirroring. – cas Oct 17 '19 at 08:52
  • Thanks, yes - I have both cpufrequtils and cpufreqd installed and set to govern on boot (added through systemctl) to go with the "performance" setting. I can also see that it is triggered through the journalctl. But that doesn't have any impact on performace - still running on 500-ish Mhz.

    On top of that I've tried setting it manually - but still no change. Now I am at the point when I went with setting BIOS to its defaults and /sys/devices/system/cpu/cpu*/cpufreq folder completely disappeared. I've contacted AMD through the official support. However still pretty lost.

    – blech Oct 17 '19 at 12:12
  • I've removed that note on RAID - as I was wrong. I am not ZFS expert and that was my suggestion to go like that originally, but I was corrected by our expert that it is all set like you said - ZFS has both drives and handles mirroring on its own. (It is damn fast) – blech Oct 17 '19 at 12:18
  • maybe try uninstalling or just stopping cpufreqd, or tell it to use your performance profile - it may be resetting the governor to powersave or something. cpufreqd's purpose is to automatically set the governor and min/max freqs, over-riding any manual changes. – cas Oct 18 '19 at 02:41
  • re: zfs - yeah, my 1950x has a pair of NVME drives partitioned identically. most of the space is in partition 5, for a mirrored rootfs zpool. The other partitions are for swap and mdadm raid-1 /boot and /EFI, plus a 2GB and a 32GB partition reserved to use for ZIL and L2ARC if/when I add some spinning rust drives to the system. It's fast. and ZOL 0.8 just added TRIM support, which is nice. – cas Oct 18 '19 at 02:47
  • Something doesn't feel right, maybe cooling isn't working? If you install windows does it work at higher GHz? (you can try by installing windows free from an ISO). Also maybe try 19.10, or maybe another distro, also check dmesg (ref: https://unix.stackexchange.com/q/260328/8337), GL! – rogerdpack Oct 19 '19 at 01:50
  • Thanks guys for your notes, I needed to go on a business trip and I'll be back in a week time. "Something doesn't feel right" is precisely what I am thinking. Current status is that after another random restart everything is suddenly working. That randomness is so suspicious that I am thinking about some "cooling" or "hw" problem in general. AMD came back with accepting an official ticket for this, whoever I am thinking (as per my research) that I should do the same with Aorus and run those in parallel. (I'm back in a week) – blech Oct 20 '19 at 11:10
  • Thanks for your patience. I think I've been able to identify the problem as most likely PMAC. When setting up the Threadripper I went through several articles where one stated that it was tested on 4.2GHz stable and having a BIG Noctua fan (+ case having multiple others) I even haven't considered potential cooling problems. Then when attempting to run on 4.2GHz generated heat didn't dissolve enough (spring in Queensland Australia) that when doing consequent run on 3GHz CPU didn't had a time to cool down enough and took a safe run on 500MHz. Now running on 3.5GHz and all seems to be stable. – blech Nov 11 '19 at 22:36
  • I am not 100% sure if this is a real cause, but CPU starts now in 66% on 3.5GHz and all works well even when running long time on 100% load (Seti@home over weekend). Thanks all for your time & guidance. – blech Nov 11 '19 at 22:38

0 Answers0