This site may earn chapter commissions from the links on this page. Terms of use.

AMD's Ryzen 7 has been generally well-received past the enthusiast customs, but there's been 1 depression-level trouble that we've been watching but oasis't previously reported on. In early June, Ryzen users running Linux began reporting segmentation faults when running multiple concurrent compilation workloads using multiple different versions of GCC. LVVM/Clang was non affected, and the issue appears bars to Linux. Moreover, information technology wasn't apparently common, even among Linux users — Michael Larabel, of Phoronix.com, reported that his ain test rigs had been absolutely solid, even nether heavy workloads.

Like the Pentium FDIV bug of yesteryear, this was a existent issue, but ane that realistically only impacted a fraction of a fraction of buyers. AMD had previously said it was investigating the trouble (which isn't present on whatever Epyc or Threadripper CPUs) and it's at present announced a solution: CPU replacement.

Phoronix reports AMD provided them with a new Ryzen 7 1800X CPU and that this flake has refused to crash, even when running a "impale Ryzen" script that would previously deliberately create a compiler segmentation mistake. While some users idea the consequence was confined to a RAM, motherboard, or BIOS-related event, Phoronix's testing proves otherwise. Bandy the new Ryzen 7 1800X for an older part, and the trouble reappears. Switch dorsum to the new chip, and it vanishes. Larabel has tentatively concluded that the issue appears bars to Ryzen CPUs manufactured before Week 25 of this year (the new scrap was built in Week 30), but no other details on what caused it are available.

The good news is, AMD is replacing the CPUs of anyone who has this issue. Once more, while the issue is real, information technology appears to simply trigger in an extremely small-scale number of cases when running a Linux workload under specific and particular circumstances.

CPU Errata Are the Dominion, Not the Exception

We tend to call up of CPU errata as being show-stopping phenomena that occur only occasionally, just the contrary is true. The summary tabular array of errata within Intel'southward sixth-generation Cadre family unit is eight pages long. Most of these bugs are minor issues or relate to corner cases, only larger issues tin pause through. Intel's original Cantlet architecture had a major FPU bug in which trying to perform two back-to-back x87 operations would double the execution time. CPU annotator Agner Fog writes (Page 162 / 233):

Whenever in that location are ii consecutive x87 instructions, the two instructions fail to pair and instead crusade an extra delay of one clock wheel due to problems in the decoders. This gives a throughput of only one instruction every two clock cycles, while a similar code using XMM registers would have a maximum throughput of two instructions per clock cycle.

This applies to all x87 instructions (names beginning with F), even the FNOP. For example, a sequence of 100 consecutive FNOP instructions takes 200 clock cycles to execute in my tests. If the 100 FNOPs are interspersed by 100 NOPs and then the sequence takes merely 100 clock cycles. It is therefore important to avoid consecutive x87 instructions.

Intel-Wafer

Every bit CPU designs have go more complex and node sizes accept shrunk, the chance of bugs and errata has risen significantly.

The Skylake Hyper-Threading bug that froze systems when executing certain workloads is included in the 6th Generation list described in a higher place. AMD, of course, has had other problems of its own, including Piledriver'due south poor handling of 256-chip AVX instructions (the penalty for using these was severe), and the infamous TLB issues that limited the scaling and performance of the original Phenom / Barcelona processors.

Unless y'all're absolutely certain that yous're having a trouble related to this bug, you probably aren't. Merely we're glad to meet AMD offering replacement cores for those affected by the event. CPU errata may be null new, but how companies respond to them nevertheless impacts how the issue is perceived by the It community.