1

I have a Dell 5820 running Ubuntu Server 18.04 with SGE installed. I have a queue set up for 20 slots which I fill with 20 jobs running at the same time. % CPU goes up to about 70% and I have 128GB so no problems there.

I am worried about CPU overheating. I look at what happens with watch sensors and see:

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +82.0°C  (high = +81.0°C, crit = +91.0°C)
Core 0:        +80.0°C  (high = +81.0°C, crit = +91.0°C)
Core 1:        +77.0°C  (high = +81.0°C, crit = +91.0°C)
Core 2:        +78.0°C  (high = +81.0°C, crit = +91.0°C)
Core 3:        +78.0°C  (high = +81.0°C, crit = +91.0°C)
Core 4:        +77.0°C  (high = +81.0°C, crit = +91.0°C)
Core 5:        +75.0°C  (high = +81.0°C, crit = +91.0°C)
Core 6:        +82.0°C  (high = +81.0°C, crit = +91.0°C)
Core 8:        +76.0°C  (high = +81.0°C, crit = +91.0°C)
Core 9:        +79.0°C  (high = +81.0°C, crit = +91.0°C)
Core 10:       +82.0°C  (high = +81.0°C, crit = +91.0°C)
Core 11:       +82.0°C  (high = +81.0°C, crit = +91.0°C)
Core 12:       +81.0°C  (high = +81.0°C, crit = +91.0°C)
Core 13:       +78.0°C  (high = +81.0°C, crit = +91.0°C)
Core 14:       +80.0°C  (high = +81.0°C, crit = +91.0°C)

So the CPU runs very hot (82C).

My confusion is why the fan RPM stays so low, even after wating for a few minutes:

dell_smm-virtual-0
Adapter: Virtual device
fan1:        1573 RPM
fan2:         722 RPM
fan3:         684 RPM

I know the fan works well because it spins up to > 2000 in Bios diagnostics. Moreover, when I run a simple stress test in Ubuntu using stress --cpu 8 the fan spins up after a few seconds:

dell_smm-virtual-0
Adapter: Virtual device
fan1:        3567 RPM
fan2:         716 RPM
fan3:        3384 RPM

Is this normal? Can someone explain this? Why does the fan not trigger higher RPM when using SGE? Obviously I am worried about frying the CPU with such high temps.

1 Answers1

2

If procHot is 91°C [which feels a bit low these days] & Hot is 81°C, then the fans won't really ramp until you're in that zone, the CPU is happy to do that all day.

Your figures show it's succeeding at staying just about <=Hot, so I don't see any issue.

Tetsujin
  • 50,917