My Mastodon thread about the bug was on HN a few weeks ago, so this might seem familiar, but now I've finished a detailed blog post. The previous HN post has a bunch of comments: https://news.ycombinator.com/item?id=42391079
In my view, this $475M was perhaps the best marketing spend for Intel. Because of the bug and recall, everyone including those not in tech knew about Intel. Coming from the 486 when people were expecting 586 or 686 but then suddenly "Pentium", this bug and recall built a reputation and good will that carried on later with Pentium MMX.
Great article and analysis as always, thanks! Somewhat crazy to remember that a (as you argue) minor CPU erretum made world wide headlines. So many worse ones out there (like you mention from Intel) but others as well, that are completely forgotten.
For the Pentium, I'm curious about the FPU value stack (or whatever the correct term is) rework they did. It's been a long time, but didn't they do some kind of early "register renaming" thing that had you had to manually manage doing careful fxchg's?
Yes, internally fxch is a register rename—_and_ fxch can go in the V-pipe and takes only one cycle (Pentium has two pipes, U and V).
IIRC fadd and fmul were both 3/1 (three cycles latency, one cycle throughput), so you'd start an operation, use the free fxch to get something else to the top, and then do two other operations while you were waiting for the operation to finish. That way, you could get long strings of FPU operations at effectively 1 op/cycle if you planned things well.
IIRC, MSVC did a pretty good job of it, too. GCC didn't, really (and thus Pentium GCC was born).
FMUL could only be issued every other cycle, which made scheduling even more annoying. Doing something like a matrix-vector multiplication was a messy game of FADD/FMUL/FXCH hot potato since for every operation one of the arguments had to be the top of the stack, so the TOS was constantly being replaced.
Compilers got pretty good at optimizing straight line math but were not as good at cases where variables needed to be kept in the stack during a loop, like a running sum. You had to get the order of exchanges just right to preserve stack order across loop iterations. The compilers at the time often had to spill to memory or use multiple FXCHs at the end of the loop.
> FMUL could only be issued every other cycle, which made scheduling even more annoying.
Huh, are you sure? Do you have any documentation that clarifies the rules for this? I was under the impression that something like `FMUL st, st(2) ; FXCH st(1), FMUL st, st(2)` would kick off two muls in two cycles, with no stall.
It's only a stack machine in front, really. Behind-the-scenes, it's probably just eight registers (the stack is a fixed size, it doesn't spill to memory or anything).
> The bug is presumably in the Pentium's voluminous microcode. The microcode is too complex for me to analyze, so don't expect a detailed blog post on this subject.
How hard is it to "dump" the microcode into a bitstream? Could it be done programatically from high resolution die photographs? Of course, I appreciate that's probably the easy part in comparison to reverse engineering what the bitstream means.
> By carefully examining the PLA under a microscope
Do you do this stuff at home? What kind of equipment do you have in your lab? How did you develop the skills to do all this?
Dumping the microcode into a bitstream can be done in an automated way if you have clear, high-resolution die photos. There are programs to generate ROM bitsreams from photos. Part of the problem is removing all the layers of metal to expose the transistors. My process isn't great, so the pictures aren't as clear as I'd like. But yes, the hard part is figuring out what the microcode bitstream means. Intel's patents explained a lot about the 8086 microcode structure, but Intel revealed much less about later processors.
I do this stuff at home. I have an AmScope metallurgical microscope; a metallurgical microscope shines light down through the lens, rather than shining the light from underneath like a biological microscope. Thus, the metallurgical microscope works for opaque chips. The Pentium is reaching the limits of my microscope, since the feature size is about the wavelength of light. I don't have any training in this; I learned through reading and experimentation.
One tidbit to add about scopes: some biological scopes do use "epi" illumination like metallurgical scopes. It's commonly used on high end scopes, in combination with laser illumination and fluorescence. They are much more complicated and require much better alignment than a regular trans illumination scope.
I suppose you might be able to get slightly better resolution using a shorter wavelength, but at that point, it requires a lot of technical skill and environmental conditions and time and money, Just getting to the point you've reached (and knowing what the limitations are) can be satisfying in itself.
I was about to ask if the explanation of floating point numbers was using Avogadro's number on purpose, but then I realized the other number was Planck's constant.
I never realised this is how floating point division can be implemented. Actually funny how I didn't realise that multiple integer division steps are required to implement floating point division :-)
In hindsight one could wonder why the unused parts of the lookup table were not filled with 2 and -2 in the first place.
The bug is super fun, but I also find the Intel response to be fascinating on its own. They apparently didn’t replace everyone’s processor with a non faulty version who wanted it, resulting in a ton of bad press.
To contrast, I’ve been thinking a lot about the Amazon Colorsoft launch, which had a yellow band graphics issue on some devices (mine included). Amazon waited a bit before acknowledging it (maybe a day or two, presumably to get the facts right). Then they simply quietly replace all of them. No recall. They just send you a new one if you ask for it (mine replacement comes Friday, hopefully it will fix it). My takeaway is that it’s pretty clear that having an incredibly robust return/support apparatus has a lot of benefits when launches don’t go quite right. Certainly more than you’d expect from analysis.
Similarly I haven’t seen too many recent reports about the Apple AirPod Pros crackle issue that happened a couple years ago (my AirPods had to be replaced twice), but Apple also just quietly replaced them and the support competence really seemed something powerful that isn’t always noticed.
The Kindle and AirPod cases are not really comparable since those are relatively minor products for the respective companies.
On the Apple side the iPhone 4 antennagate is a better comparison since the equivalent fix there would have involved free replacements for a flagship and revenue-critical product which Apple did not offer.
Intel on the other hand did eventually offer free replacements for anybody who asked and took a major financial hit.
Maybe but by that argument 99% of the affected Pentium users could have happily used their computers until they became obsolete. The bug went completely unnoticed for over a year with millions of units in use.
The media coverage and the fact that "computer can't divide" is something that the public could wrap their heads around is what made the recall unavoidable.
Intel's own marketing hype around the Pentium has played into it too. It would have been a smaller deal during the 486 era.
I had the first gen white MacBook with the magnetic closure that resulted in chipped, discoloured topcases. I had it replaced for free like three or four times over the lifespan of that computer, including past the three year AppleCare expiry.
I really respected Apple’s commitment to standing behind their product in that way.
I thought the response from intel was to invest a lot in correctness for a while and then deciding that AMD were not being punished for their higher defect rate and so, more recently, investing in other things to try to compete with AMD on other metrics than how buggy the cpu is.
I read a claim that they had gutted their verification team several years ago in response to Zen since they claimed that they needed to develop faster and verification was slowing them down. Then not that long ago we started hearing about the raptor lake issues.
I work adjacent to CPU verification, and let me tell you, those verification guys do file a LOT of bugs (and thus make a lot of work debugging and fixing issues). Some days I do wish we could just get rid of them, surprised that Intel really went ahead and did it.
For the most part, this wasn't an individual problem. Corporations purchased these pretty expensive Pentium computers through a distributor, and just got them replaced by the vendor, per their support contract.
I've been in some consumer Apple "shadow warranty" situations, so I know what you are talking about, but IMO very different than the "IT crisis" that intel was facing. "IBM said so" had a ton of IT weight back then.
Only up to a point. If one is abusing it, expect getting locked out. I buy enough stuff from Amazon that they don't mind me returning something once in a while.
> Intel's whitepaper claimed that a typical user would encounter a problem once every 27,000 years, insignificant compared to other sources of error such as DRAM bit flips.
> However, IBM performed their own analysis,29 suggesting that the problem could hit customers every few days.
I bet these aren’t as far off as they seem. Intel seems to be considering a single user, while I suspect IBM is thinking in terms of support calls.
This is a problem I’ve had at work. When you process a 100 million requests a day the one in a billion problem is hitting you a few times a month. If it’s something a customer or worse a manager notices, they ignore the denominator and suspect you all of incompetence. Four times a month can translate into “all the time” in the manner humans bias their experiences. If you get two statistical clusters of three in a week someone will lose their shit.
No, IBM's estimate is for a single user. IBM figures that a typical spreadsheet user does 5000 divides per second when recalculating and does 15 minutes of recalculating a day. IBM also figures that the numbers people use are 90 times as likely to cause an error as Intel's uniformly-distributed numbers. The result is one user will have an error every 24 days.
That's also a clearly flawed analysis, because the numbers mostly don't change between re-computations of the spreadsheet cell values!
E.g.: Adding a row doesn't invalidate calculations for previous rows in typical spreadsheet usage. The bug is deterministic, so repeating successful calculations over and over with the same numbers won't ever trigger the bug.
Yes, the book "Inside Intel" makes the same argument about spreadsheets (p364). My opinion is that Intel's analysis is mostly objective, while IBM's analysis is kind of a scam.
IBM's result is correct if we interpret "one user experiences the problem every few days" as "one in a million users will experience the problem 5000 times a second, for 15 minutes every day they use the spreadsheet with certain values". It's an average that makes no sense.
The other failure mode that occurred to me is that if a spread sheet is involved you could keep running the same calc on a bad input for months or even years when aggregating intermediate values over units of time. A problem that happens every time you run a calculation is very different from one that happens at random. Better in some ways and worse in others.
> It appears that only one person (Professor Nicely) noticed the bug in actual use.
I recall a study done years ago where students were supplied calculators for their math class. The calculators had been doctored to produce incorrect results. The researchers wanted to know how wrong the calculators had to be before the students noticed something was amiss.
It was a factor of 2.
Noticing the error, and being affected by the error, are two entirely different things.
I.e. how many people check to see if the computer's output is correct? I'd say very, very, very few. Not me, either, except in one case - when I was doing engineering computations at Boeing, I'd run the equations backwards to verify the outputs matched the inputs.
I used to tutor physics in college. My students would show a problem they worked and ask for feedback, and I’d tell them that they definitely went wrong somewhere since they calculated that the rollercoaster was 23,000 miles tall.
Which is to say, it will depend a lot on the context and the understanding of the person doing the calculation.
It is institute policy at Caltech (at least when I attended) that obviously wrong answers would get you zero credit, even if the result came from a minor error. However, if you concluded after solving the problem that the answer was absurd, but you didn't know where the calculation went wrong, you'd get partial credit.
> Noticing the error, and being affected by the error, are two entirely different things.
Only somewhat true. Take any consumer usage here for example. If you're playing a game and it hits this incorrect output but you don't notice anything as a result, were you actually affected?
How much usage of FDIV on a Pentium was for numerically significant output instead of just multimedia?
I remember that bug. Because I could not control what CPU my customers were running on, I had to add special code in the library to detect the bad FPU and execute workaround code (this code was supplied by Intel).
Another great article from Ken. I remember this particularly because the first PC that I bought with my own money had an affected CPU. Prior to this era I hadn't been much interested in PCs because they couldn't run "real" software. But Windows NT changed that (thank you Mr. Cutler), and Taiwanese sourced low cost motherboards made it practical to build your own machine, as many people still do today. Ken touched on the fact that it was easy for users to check if their CPU was affected. I remember that this was as easy as typing a division expression with the magic numbers into Excel. If MS had released a version of Excel that worked around the bug, I suspect fewer users would have claimed their replacement device!
What an interesting and utterly dedicated analysis. Thank you so much for all your work analysing the silicon and sharing your findings. I particularly like how you’re able to call out Intel on the actual root cause, which their PR made sound like something analogous to a trivial omission. But, in fact, was less forgivable and more blameworthy, ie they stuffed up their table generation algorithm.
>Since only one in 9 billion values caused the problem, Intel's view was that the problem was trivial: "This doesn't even qualify as an errata."
This sounds utterly insane. You are making a CPU, if any calculations are wrong it needs to be fixed ??
I supposed this only came to light very late into testing and it was very impractical to bin every cpu, so they rolled the dice.
Given that the fixed table is a much simpler one (by letting out-of-bounds just return 2, rather than adding circuitry to make it return 0), I wonder why they didn't just do it that way in the first place?
It feels like the kind of optimization that gets missed because the task was split between multiple people, and nobody had complete knowledge of the problem.
The person generating the table didn't realize filling the out-of-bounds with two would make for a simpler PLA. And the person squishing the table into the PLA didn't realize the zeros were "don't care" and assumed they needed to be preserved.
It's also possible they simply stopped optimizing as soon as they felt the PLA was small enough for their needs. If they had already done the floorplanning, making the PLA even smaller wasn't going to make the chip any smaller, and their engineering time would be better spent elsewhere.
It's hard to believe that people collaborating on something this important to the company aren't like, in a meeting at least weekly talking about implementation details like this.
The other thing that's hard for me to believe is there wasn't an extensive and mostly automated QA process that would test absolutely every little feature of this CPU.
"Make it work first before you make it work fast". Fundamentally this is a software problem solved with software techniques. And like most software there's some optimization left on the table just because no one thought of it in time. And you can't patch a CPU of this era.
Returning 0 for undefined table entries is the obvious thing to do. Setting these entries to 2 is a bit of a conceptual leap, even though it would have prevented the FDIV error and it makes the PLA simpler. So I can't fault Intel for this.
It's a NULL / 'do not care' issue. 0 isn't a reserved out of band value, it's payload data and anything beyond the bounds should have been DNC.
It's possible some other result, likely aligned to an easy binary multiple would still produce a square block of 2, and that allowing the far edges to float to some other value could yield a slightly more compact logic array. Back-filling the entire side to the clamped upper value doesn't cost that much more though, and is known to solve the issue. As pointed out elsewhere, that sort of solution would also be faster for engineering time, fit within the planned space budget, and best of all reduces conative load. It's obviously correct when looking at the bug.
I would have expected that instead of manually picking a value they would be specified as "don't care". I guess optimizer software like Espresso should allow for that?
The weird thing is that I traced out the circuitry and the bottom bit of the adder is discarded, not the top bit where overflow would happen. (Note that you won't get overflow for this addition because the partial remainder is in range, just split into the sum and carry parts.)
I'm surprised they took the risk of extending the lookup table to have all 2's in the undefined region. A safer route would have been to just fix the 5 entries. Someone was pretty confident!
At the 2012 Turning Award conference in San Francisco, Prof William Kahan mentioned that he had a newer test suite available in 1993 that would have caught Intel's bug. Still, Intel did not run that.. Prof. Kahan was actively involved in its analysis and further testing. (I'm stating this just from memory).
> The explanation is that Intel didn't just fill in the five missing table entries with the correct value of 2. Instead, Intel filled all the unused table entries with 2.
I wonder why they didn't do this in the first place.
From someone who had to mentally let go once you started talking about planes crossing each other, thank you for such an amazingly detailed writeup. It's not everyday that you learn a new cool way to divide numbers!
More explicitly. In 2006, Apple asked Intel to make a SoC for their upcoming product... the iPhone.
At the time, Intel was one of the leading ARM SoC providers, their custom XScale ARM cores were faster than anything from ARM Inc themselves. It was the perfect line of chips for smartphones.
The MBA types at Intel ran some sales projects and decided that such a chip wasn't likely to be profitable. There was apparently debate within Intel, the engineering types wanted to develop the product line anyway, and others wanting to win good-will from Apple. But the MBA types won. Not only did they reject Apple's request for an iPhone SoC, but they immediately sold off their entire XScale division to marvel (who did nothing with it) so they wouldn't even be able to change their mind later even if they wanted.
With hindsight, I think we can safely say Intel's projections for iPhone sales were very wrong. They would have easily made their money back on just the sales from the first-gen iPhone, and Apple would probably gone back to intel for at least a few generations. Even if Apple dumped them, Intel would have a great product to sell to the rapidly market of Android smartphones in the early 2010s.
-----------
But I think it's actually far worse than just Intel missing out on the mobile market.
In 2008, Apple acquired P.A. Semi, and started work on their own custom ARM processors (and ARM SoCs). The ARM processors which Apple eventually used to replace Intel as suppler in laptops and desktops too.
Maybe Apple would have gone down that path anyway, but I really suspect Intel's reluctance to work with Apple to produce the chips Apple wanted (especially the iPhone chip) was a huge motivating factor that drove Apple down the path of developing their own CPUs.
Remember, this is 2006. Intel had only just switched to Intel in January because IBM had continually failed to deliver Apple the laptop-class powerpc chips they needed [1]. And while at that time, Intel had a good roadmap for laptop-class chips, it would have looked to Apple as if history was at risk of repeating itself, especially as they moved into the mobile market where low power consumption was even more important.
[1]TBH, IBM were failing to provide desktop-class CPUs too. But the laptop cpus were the more pressing issue. Fun fact: IBM actually tried to sell the PowerPC core they were developing for the xbox 360 and PS3 to Apple as a low-power laptop core. It was sold to Microsoft/Sony as a low-power core too, but if you look at the launch versions of both consoles, they run extremely hot, even when paired with comically large (for the era) cooling solutions.
> More explicitly. In 2006, Apple asked Intel to make a SoC for their upcoming product... the iPhone.
This isn’t strictly true. Tony Fadell and one of t- the creator of the iPod and considered co-creator of the iPhone - said in an interview with Ben Thompson (Stratechery) that Intel was never seriously in the running for iPhone chips.
Jobs wanted it. But the technical people at Apple pushed back.
Besides, especially in 2006 less than a year before the iPhone was introduced, chip decisions had already been made.
Was it really? x86 is more performance oriented and not efficiency oriented. Its variable length just makes it really hard to have a low power CPU that isn't too slow.
I think the impact of ISA is way overblown. The instruction decode pipeline is worse but doesn’t consume that many transistors in the end relative to the total size of the system. I think it has much more to do with the attitude of Intel defining the x86 market as desktop and servers and not focused on super low power parts; plus their monopoly which led to a long stagnation because they didn’t have to innovate as much.
You can see today with modern Ryzen laptop chips that aren’t that much worse than ARMs fabbed with the same node on perf/watt.
Innovate on what though? There was no market for performant very low power chips before the iPhone and then Android took off.
I am sure if IBM had more of a market than the minuscule Mac market for laptop class PPC chips back in 2005, they could have poured money into making that work.
Even today, I doubt it would be worth Apple’s money to design and manufacture its own M class desktop chips just for around 25 million Macs + iPads if they weren’t reusing a lot of the R&D
In 2010s, Intel pretty much sold the same Haswell design for more than half a decade and lipsticked the pig. It is not just low power that they missed. They had time to improve the performance/watt for server use, add core counts, do big-little, improve the iGPU, etc.
They just sat on it, their marketing dept made fancy boxes for high end CPUs and their HR department innovated DEI strategies.
Yes I’m sure that Intel fell behind because a for profit company was more concerned with hiring minorities than hiring the best employees they could find.
It’s amazing that the “take responsibility”, “pull yourself up by your bootstraps crowd” has now become the “we can’t get ahead because of minorities crowd”
Huh, it's not clear what you are suggesting. Who's "we" and who's not taking responsibility?
The best people were clearly not staying at Intel and they have been winning hard at AMD, Tesla, NVIDIA, Apple, Qualcomm, and TSMC, in case you have not been paying attention. They could not stop winning and getting ahead in the past 5-10 years, in fact. So much semiconductor innovation happened.
Yes, if you start promoting the wrong people, very quickly the best ones leave. No one likes to report to their stupid peer who just got promoted or the idiot they hire from the outside when there are more qualified people they could promote from within.
--
And re marketing boxes, just check out where Intel chose to innovate:
The problem with Intel weren’t the technical people. It started with the board laying off people, borrowing money to pay dividends to investors, bad strategy, not building relationships with customers who didn’t want to work with them for fabs, etc and then firing the CEO who had a strategy that they knew was going to take years fo implement
It wasn’t because of “DI&E” initiatives and a refusal to hire white people
For applications where the performance is determined by array operations, which can leverage AVX-512 instructions, an AMD Zen 5 core has better performance per area and per power than any ARM-based core, with the possible exception of the Fujitsu custom cores.
The Apple cores themselves do not have great performance for array operations, but when considering the CPU cores together with the shared SME/AMX accelerator, the aggregate might have a good performance per area and per power consumption, but that cannot be known with certainty, because Apple does not provide information usable for comparison purposes.
The comparison is easy only with the cores designed by Arm Holdings. For array operations, the best performance among the Arm-designed cores is obtained by Cortex-X4 a.k.a. Neoverse V3. Cortex-A720 and Cortex-A725 have half of the number of SIMD pipelines but more than half of the area, while Cortex-X925 has only 50% more SIMD pipelines but a double area. Intel's Skymont a.k.a. Darkmont have the same area and the same number of SIMD pipelines as Cortex-X4, so like Cortex-X4 they are also more efficient than the much bigger core Lion Cove, which is faster on average for non-optimized programs but it has the same maximum throughput for optimized programs.
When compared with Cortex-X4/Neoverse V3, a Zen 5 compact core has a throughput for array operations that can be up to double, while the area of a Zen 5 compact core is less than double the area of an Arm Cortex-X4. A high-clock frequency Zen 5 core has more than double the area of a Cortex-X4, but due to the high clock frequency it still has a better performance per area, even if it no longer has also a better performance per power consumption, like the Zen 5 compact cores.
So the advantage in ISA of Aarch64, which results in a simpler and smaller CPU core frontend, is not enough to ensure better performance per area and per power consumption when the backend, i.e. the execution units, does not have itself a good enough performance per area and per power consumption.
The area of Arm Cortex-X4 and of the very similar Intel Skymont core is about 1.7 square mm in a "3 nm" TSMC process (both including 1 MB of L2 cache memory). The area of a Zen 5 compact core in a "4 nm" TSMC process (with 1 MB of L2) is about 3 square mm (in Strix Point). The area of a Zen 5 compact core with full SIMD pipelines must be greater, but not by much, perhaps by 10%, and if it were done in the same "3 nm" process like Cortex-X4 and Skymont, the area would shrink , perhaps by 20% to 25% (depending on the fraction of the area occupied by SRAM). In any case there is little doubt that the area in the same fabrication process of a Zen 5 compact with full 512-bit SIMD pipelines would be less than 3.4 square mm (= double Cortex-X4), leading to a better performance per area and per power consumption than for either Cortex-X4 or Skymont (this considers only the maximum throughput for optimized programs, but for non-optimized programs the advantage could be even greater for Zen 5, which has a higher IPC on average).
Cores like Arm Cortex-X4/Neoverse V3 (also Intel Skymont/Darkmont) are optimal from the POV of performance per area and power consumption only for applications that are dominated by irregular integer and pointer operations, which cannot be accelerated using array operations (e.g. for the compilation of software projects). Until now, with the exception of the Fujitsu custom cores, which are inaccessible for most computer users, no Arm-based CPU core has been suitable for scientific/technical computing, because none has had enough performance per area and per power consumption, when performing array operations. For a given socket, both the total die area inside the package and the total power consumption are limited, so the performance per area and per power consumption of a CPU core determines the performance per socket that can be achieved.
The lunar lake Xe (IE the generation before the current one) is not rock solid on linux - i can get it to crash the gpu consistently just by loading enough things that use GL.
Not like 100, like 5.
If i start chrome and signal and something else, it often crashes the gpu after a few minutes.
I've tried latest kernel and firmware and mesa and ....
I will say it's nice that the gpu crash only hangs the app i'm in and not like crashes the system as a lot of the other gpus.
They do seem to have figured out how to properly isolate and recover their gpu.
But it also feels like that's because it crashed so much it was bothering their engineers, so they made the recovery robust.
Good enough? Maybe better today, but they have been god awful compared to AMD and absolute garbage compared to something like M1 iGPU. They are responsible for more than half of the pain inflicted on users in Vista days.
Ironically, they have lost the driver advantage in Linux with their latest Arc stuff.
I trust they could have done a lot better, a lot earlier, if they cared to invest in iGPU. Feels like deliberately neglected.
Where yuh bin livin all yore laafe, pilgrim? Under a Boulder in Colorado, mebbe? Dontcha know dat contracts can be gamed, and hev bin fer yeers, if not deecades? Dis here ain't Aahffel, ya know?
Come eat some chili widdus.
Id'll shore put some hair on yore chest, and grey cells in yore coconut.
# sorry, in a punny mood and too many spaghetti western movies
Oh, it gets even better. US taxpayers are giving them billions for "national security" reasons.
Nothing like giving piles of cash to a grossly incompetent company (the Pentium math bug, Puma cablemodem issues, their shitty 4G cellular radios, extensive issues with gigabit and 2.5G network interfaces, and now the whole 13th/14th gen processor self-destruction mess.)
Author here if anyone has Pentium questions :-)
My Mastodon thread about the bug was on HN a few weeks ago, so this might seem familiar, but now I've finished a detailed blog post. The previous HN post has a bunch of comments: https://news.ycombinator.com/item?id=42391079
In my view, this $475M was perhaps the best marketing spend for Intel. Because of the bug and recall, everyone including those not in tech knew about Intel. Coming from the 486 when people were expecting 586 or 686 but then suddenly "Pentium", this bug and recall built a reputation and good will that carried on later with Pentium MMX.
Nah, Intel already did a big Pentium marketing blitz with the bunny people before this bug.
Bunny people were part of the MMX and PII marketing.
Great article and analysis as always, thanks! Somewhat crazy to remember that a (as you argue) minor CPU erretum made world wide headlines. So many worse ones out there (like you mention from Intel) but others as well, that are completely forgotten.
For the Pentium, I'm curious about the FPU value stack (or whatever the correct term is) rework they did. It's been a long time, but didn't they do some kind of early "register renaming" thing that had you had to manually manage doing careful fxchg's?
Yes, internally fxch is a register rename—_and_ fxch can go in the V-pipe and takes only one cycle (Pentium has two pipes, U and V).
IIRC fadd and fmul were both 3/1 (three cycles latency, one cycle throughput), so you'd start an operation, use the free fxch to get something else to the top, and then do two other operations while you were waiting for the operation to finish. That way, you could get long strings of FPU operations at effectively 1 op/cycle if you planned things well.
IIRC, MSVC did a pretty good job of it, too. GCC didn't, really (and thus Pentium GCC was born).
FMUL could only be issued every other cycle, which made scheduling even more annoying. Doing something like a matrix-vector multiplication was a messy game of FADD/FMUL/FXCH hot potato since for every operation one of the arguments had to be the top of the stack, so the TOS was constantly being replaced.
Compilers got pretty good at optimizing straight line math but were not as good at cases where variables needed to be kept in the stack during a loop, like a running sum. You had to get the order of exchanges just right to preserve stack order across loop iterations. The compilers at the time often had to spill to memory or use multiple FXCHs at the end of the loop.
> FMUL could only be issued every other cycle, which made scheduling even more annoying.
Huh, are you sure? Do you have any documentation that clarifies the rules for this? I was under the impression that something like `FMUL st, st(2) ; FXCH st(1), FMUL st, st(2)` would kick off two muls in two cycles, with no stall.
Agner Fog's manuals are clear on this. Only the last of FMUL's 3 cycles can overlap with another FMUL.
You can immediately overlap with a FADD.
AFAIK, the FPU was a stack calculator. So you pushed things on and ran calculations on the stack. https://en.wikibooks.org/wiki/X86_Assembly/Floating_Point
It's only a stack machine in front, really. Behind-the-scenes, it's probably just eight registers (the stack is a fixed size, it doesn't spill to memory or anything).
Definitely was 8 regs: https://intranetssn.github.io/www.ssn.net/twiki/pub/CseIntra... also where you'd see 'long double'
> The bug is presumably in the Pentium's voluminous microcode. The microcode is too complex for me to analyze, so don't expect a detailed blog post on this subject.
How hard is it to "dump" the microcode into a bitstream? Could it be done programatically from high resolution die photographs? Of course, I appreciate that's probably the easy part in comparison to reverse engineering what the bitstream means.
> By carefully examining the PLA under a microscope
Do you do this stuff at home? What kind of equipment do you have in your lab? How did you develop the skills to do all this?
Dumping the microcode into a bitstream can be done in an automated way if you have clear, high-resolution die photos. There are programs to generate ROM bitsreams from photos. Part of the problem is removing all the layers of metal to expose the transistors. My process isn't great, so the pictures aren't as clear as I'd like. But yes, the hard part is figuring out what the microcode bitstream means. Intel's patents explained a lot about the 8086 microcode structure, but Intel revealed much less about later processors.
I do this stuff at home. I have an AmScope metallurgical microscope; a metallurgical microscope shines light down through the lens, rather than shining the light from underneath like a biological microscope. Thus, the metallurgical microscope works for opaque chips. The Pentium is reaching the limits of my microscope, since the feature size is about the wavelength of light. I don't have any training in this; I learned through reading and experimentation.
One tidbit to add about scopes: some biological scopes do use "epi" illumination like metallurgical scopes. It's commonly used on high end scopes, in combination with laser illumination and fluorescence. They are much more complicated and require much better alignment than a regular trans illumination scope.
I suppose you might be able to get slightly better resolution using a shorter wavelength, but at that point, it requires a lot of technical skill and environmental conditions and time and money, Just getting to the point you've reached (and knowing what the limitations are) can be satisfying in itself.
I was about to ask if the explanation of floating point numbers was using Avogadro's number on purpose, but then I realized the other number was Planck's constant.
Yes, I wanted to use meaningful floating point examples instead of random numbers. You get a gold star for noticing :-)
Thank you very much for this detailed article.
I never realised this is how floating point division can be implemented. Actually funny how I didn't realise that multiple integer division steps are required to implement floating point division :-)
In hindsight one could wonder why the unused parts of the lookup table were not filled with 2 and -2 in the first place.
Tour de force, truly. Amazing work!
The bug is super fun, but I also find the Intel response to be fascinating on its own. They apparently didn’t replace everyone’s processor with a non faulty version who wanted it, resulting in a ton of bad press.
To contrast, I’ve been thinking a lot about the Amazon Colorsoft launch, which had a yellow band graphics issue on some devices (mine included). Amazon waited a bit before acknowledging it (maybe a day or two, presumably to get the facts right). Then they simply quietly replace all of them. No recall. They just send you a new one if you ask for it (mine replacement comes Friday, hopefully it will fix it). My takeaway is that it’s pretty clear that having an incredibly robust return/support apparatus has a lot of benefits when launches don’t go quite right. Certainly more than you’d expect from analysis.
Similarly I haven’t seen too many recent reports about the Apple AirPod Pros crackle issue that happened a couple years ago (my AirPods had to be replaced twice), but Apple also just quietly replaced them and the support competence really seemed something powerful that isn’t always noticed.
Colorsoft: https://www.tomsguide.com/tablets/e-readers/amazon-kindle-co...
AirPods Pro: https://support.apple.com/airpods-pro-service-program-sound-...
The Kindle and AirPod cases are not really comparable since those are relatively minor products for the respective companies.
On the Apple side the iPhone 4 antennagate is a better comparison since the equivalent fix there would have involved free replacements for a flagship and revenue-critical product which Apple did not offer.
Intel on the other hand did eventually offer free replacements for anybody who asked and took a major financial hit.
Antennagate didn’t affect everyone though, only those 90s businessman nokia-in-fist style holders.
Anecdata ofc, but everyone I know already held phones in fingers back then, rather than hugging it as a brick.
Maybe but by that argument 99% of the affected Pentium users could have happily used their computers until they became obsolete. The bug went completely unnoticed for over a year with millions of units in use.
The media coverage and the fact that "computer can't divide" is something that the public could wrap their heads around is what made the recall unavoidable.
Intel's own marketing hype around the Pentium has played into it too. It would have been a smaller deal during the 486 era.
There were even (bad) jokes about it newspapers at the time.
https://www.latimes.com/archives/la-xpm-1994-12-14-ls-8729-s...
> Why didn’t Intel call the Pentium the 586? Because they added 486 and 100 on the first Pentium and got 585.999983605 .”
And Apple sold the same GSM iPhone 4 without making any changes to it for 3 years and the uproar died down.
Before anyone well actually’s me, yes they did come out with a separate CDMA iPhone 4 for Verizon where they changed the antenna design
I had the first gen white MacBook with the magnetic closure that resulted in chipped, discoloured topcases. I had it replaced for free like three or four times over the lifespan of that computer, including past the three year AppleCare expiry.
I really respected Apple’s commitment to standing behind their product in that way.
I thought I remembered at least some of those replacements were class-action settlements and not Apple's good will.
I thought the response from intel was to invest a lot in correctness for a while and then deciding that AMD were not being punished for their higher defect rate and so, more recently, investing in other things to try to compete with AMD on other metrics than how buggy the cpu is.
I read a claim that they had gutted their verification team several years ago in response to Zen since they claimed that they needed to develop faster and verification was slowing them down. Then not that long ago we started hearing about the raptor lake issues.
I work adjacent to CPU verification, and let me tell you, those verification guys do file a LOT of bugs (and thus make a lot of work debugging and fixing issues). Some days I do wish we could just get rid of them, surprised that Intel really went ahead and did it.
This article? https://news.ycombinator.com/item?id=16058920
For the most part, this wasn't an individual problem. Corporations purchased these pretty expensive Pentium computers through a distributor, and just got them replaced by the vendor, per their support contract.
I've been in some consumer Apple "shadow warranty" situations, so I know what you are talking about, but IMO very different than the "IT crisis" that intel was facing. "IBM said so" had a ton of IT weight back then.
That is default Amazon - you can return stuff no hassle for almost any reason.
Only up to a point. If one is abusing it, expect getting locked out. I buy enough stuff from Amazon that they don't mind me returning something once in a while.
> Intel's whitepaper claimed that a typical user would encounter a problem once every 27,000 years, insignificant compared to other sources of error such as DRAM bit flips.
> However, IBM performed their own analysis,29 suggesting that the problem could hit customers every few days.
I bet these aren’t as far off as they seem. Intel seems to be considering a single user, while I suspect IBM is thinking in terms of support calls.
This is a problem I’ve had at work. When you process a 100 million requests a day the one in a billion problem is hitting you a few times a month. If it’s something a customer or worse a manager notices, they ignore the denominator and suspect you all of incompetence. Four times a month can translate into “all the time” in the manner humans bias their experiences. If you get two statistical clusters of three in a week someone will lose their shit.
No, IBM's estimate is for a single user. IBM figures that a typical spreadsheet user does 5000 divides per second when recalculating and does 15 minutes of recalculating a day. IBM also figures that the numbers people use are 90 times as likely to cause an error as Intel's uniformly-distributed numbers. The result is one user will have an error every 24 days.
That's also a clearly flawed analysis, because the numbers mostly don't change between re-computations of the spreadsheet cell values!
E.g.: Adding a row doesn't invalidate calculations for previous rows in typical spreadsheet usage. The bug is deterministic, so repeating successful calculations over and over with the same numbers won't ever trigger the bug.
Yes, the book "Inside Intel" makes the same argument about spreadsheets (p364). My opinion is that Intel's analysis is mostly objective, while IBM's analysis is kind of a scam.
IBM's result is correct if we interpret "one user experiences the problem every few days" as "one in a million users will experience the problem 5000 times a second, for 15 minutes every day they use the spreadsheet with certain values". It's an average that makes no sense.
Spreadsheets Georg....
Ah.
The other failure mode that occurred to me is that if a spread sheet is involved you could keep running the same calc on a bad input for months or even years when aggregating intermediate values over units of time. A problem that happens every time you run a calculation is very different from one that happens at random. Better in some ways and worse in others.
> It appears that only one person (Professor Nicely) noticed the bug in actual use.
I recall a study done years ago where students were supplied calculators for their math class. The calculators had been doctored to produce incorrect results. The researchers wanted to know how wrong the calculators had to be before the students noticed something was amiss.
It was a factor of 2.
Noticing the error, and being affected by the error, are two entirely different things.
I.e. how many people check to see if the computer's output is correct? I'd say very, very, very few. Not me, either, except in one case - when I was doing engineering computations at Boeing, I'd run the equations backwards to verify the outputs matched the inputs.
I used to tutor physics in college. My students would show a problem they worked and ask for feedback, and I’d tell them that they definitely went wrong somewhere since they calculated that the rollercoaster was 23,000 miles tall.
Which is to say, it will depend a lot on the context and the understanding of the person doing the calculation.
It is institute policy at Caltech (at least when I attended) that obviously wrong answers would get you zero credit, even if the result came from a minor error. However, if you concluded after solving the problem that the answer was absurd, but you didn't know where the calculation went wrong, you'd get partial credit.
> Noticing the error, and being affected by the error, are two entirely different things.
Only somewhat true. Take any consumer usage here for example. If you're playing a game and it hits this incorrect output but you don't notice anything as a result, were you actually affected?
How much usage of FDIV on a Pentium was for numerically significant output instead of just multimedia?
If your game has some artifacts in the display, nobody cares.
But if you're doing financial work, scientific work, or engineering work, the results matter. An awful lot of people used Excel.
BTW, telling a customer that a bug doesn't matter doesn't work out very well.
Unless your multiplayer game desyncs due to different division result on other computer.
I remember that bug. Because I could not control what CPU my customers were running on, I had to add special code in the library to detect the bad FPU and execute workaround code (this code was supplied by Intel).
I.e. Intel's problem became my problem, grrrr
Reminds me of a joke floating around at the time that captures a couple different 90s themes:
pretty sure that was in my tagline generator...
Another great article from Ken. I remember this particularly because the first PC that I bought with my own money had an affected CPU. Prior to this era I hadn't been much interested in PCs because they couldn't run "real" software. But Windows NT changed that (thank you Mr. Cutler), and Taiwanese sourced low cost motherboards made it practical to build your own machine, as many people still do today. Ken touched on the fact that it was easy for users to check if their CPU was affected. I remember that this was as easy as typing a division expression with the magic numbers into Excel. If MS had released a version of Excel that worked around the bug, I suspect fewer users would have claimed their replacement device!
Couldn’t these PCs run 386BSD?
Yeah, there was BSD, Coherent, SCO, Xenix, etc. Arguably OS/2 was also a "real" operating system.
What an interesting and utterly dedicated analysis. Thank you so much for all your work analysing the silicon and sharing your findings. I particularly like how you’re able to call out Intel on the actual root cause, which their PR made sound like something analogous to a trivial omission. But, in fact, was less forgivable and more blameworthy, ie they stuffed up their table generation algorithm.
>Since only one in 9 billion values caused the problem, Intel's view was that the problem was trivial: "This doesn't even qualify as an errata."
This sounds utterly insane. You are making a CPU, if any calculations are wrong it needs to be fixed ?? I supposed this only came to light very late into testing and it was very impractical to bin every cpu, so they rolled the dice.
>Smith posted the email on a Compuserve forum, a 1990s version of social media.
I hate how this sentence makes me feel.
I like to use the 1900s instead of the 1990s.
Does it help, or make it worse, if you say it as ‘late 1900s’?
He sent it via his Personal Computer, a precursor to the smartphone.
My initial feeling is: that data is probably mostly unmined and lost. Lucky bastards!
Given that the fixed table is a much simpler one (by letting out-of-bounds just return 2, rather than adding circuitry to make it return 0), I wonder why they didn't just do it that way in the first place?
It feels like the kind of optimization that gets missed because the task was split between multiple people, and nobody had complete knowledge of the problem.
The person generating the table didn't realize filling the out-of-bounds with two would make for a simpler PLA. And the person squishing the table into the PLA didn't realize the zeros were "don't care" and assumed they needed to be preserved.
It's also possible they simply stopped optimizing as soon as they felt the PLA was small enough for their needs. If they had already done the floorplanning, making the PLA even smaller wasn't going to make the chip any smaller, and their engineering time would be better spent elsewhere.
It's hard to believe that people collaborating on something this important to the company aren't like, in a meeting at least weekly talking about implementation details like this.
The other thing that's hard for me to believe is there wasn't an extensive and mostly automated QA process that would test absolutely every little feature of this CPU.
"Make it work first before you make it work fast". Fundamentally this is a software problem solved with software techniques. And like most software there's some optimization left on the table just because no one thought of it in time. And you can't patch a CPU of this era.
Returning 0 for undefined table entries is the obvious thing to do. Setting these entries to 2 is a bit of a conceptual leap, even though it would have prevented the FDIV error and it makes the PLA simpler. So I can't fault Intel for this.
It's not really a conceptual leap if you've ever had to work with "don't care" cases before...
It's a NULL / 'do not care' issue. 0 isn't a reserved out of band value, it's payload data and anything beyond the bounds should have been DNC.
It's possible some other result, likely aligned to an easy binary multiple would still produce a square block of 2, and that allowing the far edges to float to some other value could yield a slightly more compact logic array. Back-filling the entire side to the clamped upper value doesn't cost that much more though, and is known to solve the issue. As pointed out elsewhere, that sort of solution would also be faster for engineering time, fit within the planned space budget, and best of all reduces conative load. It's obviously correct when looking at the bug.
I would have expected that instead of manually picking a value they would be specified as "don't care". I guess optimizer software like Espresso should allow for that?
That must have been such a satisfying fix for the engineers though!
More engineering time resulted in a more efficient solution.
> Curiously, the adder is an 8-bit adder but only 7 bits are used; perhaps the 8-bit adder was a standard logic block at Intel.
I believe this is because for any adder you always want 1 bit extra to detect overflow! This is why 9 bit adders are a common component in MCUs
The weird thing is that I traced out the circuitry and the bottom bit of the adder is discarded, not the top bit where overflow would happen. (Note that you won't get overflow for this addition because the partial remainder is in range, just split into the sum and carry parts.)
I'm surprised they took the risk of extending the lookup table to have all 2's in the undefined region. A safer route would have been to just fix the 5 entries. Someone was pretty confident!
It actually seems like it becomes much easier to reason about because you remove a ton of (literal in the diagram) edge cases.
How did idiv work on the pentium. Was it also optimized, or somehow connected to fdiv, or just the old slow algorithm?
At the 2012 Turning Award conference in San Francisco, Prof William Kahan mentioned that he had a newer test suite available in 1993 that would have caught Intel's bug. Still, Intel did not run that.. Prof. Kahan was actively involved in its analysis and further testing. (I'm stating this just from memory).
> The explanation is that Intel didn't just fill in the five missing table entries with the correct value of 2. Instead, Intel filled all the unused table entries with 2.
I wonder why they didn't do this in the first place.
Implementation detail. Somone overspecified it and didn't realise that it didn't matter.
Look at it again later, someone asks why not just fill everything in instead and everyone feels a bit silly XD.
From someone who had to mentally let go once you started talking about planes crossing each other, thank you for such an amazingly detailed writeup. It's not everyday that you learn a new cool way to divide numbers!
Intel $475B error: not building a decent GPU
Lack of clairvoyance? Missing out on mobile was more obvious tho.
More explicitly. In 2006, Apple asked Intel to make a SoC for their upcoming product... the iPhone.
At the time, Intel was one of the leading ARM SoC providers, their custom XScale ARM cores were faster than anything from ARM Inc themselves. It was the perfect line of chips for smartphones.
The MBA types at Intel ran some sales projects and decided that such a chip wasn't likely to be profitable. There was apparently debate within Intel, the engineering types wanted to develop the product line anyway, and others wanting to win good-will from Apple. But the MBA types won. Not only did they reject Apple's request for an iPhone SoC, but they immediately sold off their entire XScale division to marvel (who did nothing with it) so they wouldn't even be able to change their mind later even if they wanted.
With hindsight, I think we can safely say Intel's projections for iPhone sales were very wrong. They would have easily made their money back on just the sales from the first-gen iPhone, and Apple would probably gone back to intel for at least a few generations. Even if Apple dumped them, Intel would have a great product to sell to the rapidly market of Android smartphones in the early 2010s.
-----------
But I think it's actually far worse than just Intel missing out on the mobile market.
In 2008, Apple acquired P.A. Semi, and started work on their own custom ARM processors (and ARM SoCs). The ARM processors which Apple eventually used to replace Intel as suppler in laptops and desktops too.
Maybe Apple would have gone down that path anyway, but I really suspect Intel's reluctance to work with Apple to produce the chips Apple wanted (especially the iPhone chip) was a huge motivating factor that drove Apple down the path of developing their own CPUs.
Remember, this is 2006. Intel had only just switched to Intel in January because IBM had continually failed to deliver Apple the laptop-class powerpc chips they needed [1]. And while at that time, Intel had a good roadmap for laptop-class chips, it would have looked to Apple as if history was at risk of repeating itself, especially as they moved into the mobile market where low power consumption was even more important.
[1] TBH, IBM were failing to provide desktop-class CPUs too. But the laptop cpus were the more pressing issue. Fun fact: IBM actually tried to sell the PowerPC core they were developing for the xbox 360 and PS3 to Apple as a low-power laptop core. It was sold to Microsoft/Sony as a low-power core too, but if you look at the launch versions of both consoles, they run extremely hot, even when paired with comically large (for the era) cooling solutions.
> More explicitly. In 2006, Apple asked Intel to make a SoC for their upcoming product... the iPhone.
This isn’t strictly true. Tony Fadell and one of t- the creator of the iPod and considered co-creator of the iPhone - said in an interview with Ben Thompson (Stratechery) that Intel was never seriously in the running for iPhone chips.
Jobs wanted it. But the technical people at Apple pushed back.
Besides, especially in 2006 less than a year before the iPhone was introduced, chip decisions had already been made.
Was it really? x86 is more performance oriented and not efficiency oriented. Its variable length just makes it really hard to have a low power CPU that isn't too slow.
I think the impact of ISA is way overblown. The instruction decode pipeline is worse but doesn’t consume that many transistors in the end relative to the total size of the system. I think it has much more to do with the attitude of Intel defining the x86 market as desktop and servers and not focused on super low power parts; plus their monopoly which led to a long stagnation because they didn’t have to innovate as much.
You can see today with modern Ryzen laptop chips that aren’t that much worse than ARMs fabbed with the same node on perf/watt.
Innovate on what though? There was no market for performant very low power chips before the iPhone and then Android took off.
I am sure if IBM had more of a market than the minuscule Mac market for laptop class PPC chips back in 2005, they could have poured money into making that work.
Even today, I doubt it would be worth Apple’s money to design and manufacture its own M class desktop chips just for around 25 million Macs + iPads if they weren’t reusing a lot of the R&D
In 2010s, Intel pretty much sold the same Haswell design for more than half a decade and lipsticked the pig. It is not just low power that they missed. They had time to improve the performance/watt for server use, add core counts, do big-little, improve the iGPU, etc.
They just sat on it, their marketing dept made fancy boxes for high end CPUs and their HR department innovated DEI strategies.
Yes I’m sure that Intel fell behind because a for profit company was more concerned with hiring minorities than hiring the best employees they could find.
It’s amazing that the “take responsibility”, “pull yourself up by your bootstraps crowd” has now become the “we can’t get ahead because of minorities crowd”
Huh, it's not clear what you are suggesting. Who's "we" and who's not taking responsibility?
The best people were clearly not staying at Intel and they have been winning hard at AMD, Tesla, NVIDIA, Apple, Qualcomm, and TSMC, in case you have not been paying attention. They could not stop winning and getting ahead in the past 5-10 years, in fact. So much semiconductor innovation happened.
Yes, if you start promoting the wrong people, very quickly the best ones leave. No one likes to report to their stupid peer who just got promoted or the idiot they hire from the outside when there are more qualified people they could promote from within.
--
And re marketing boxes, just check out where Intel chose to innovate:
https://www.reddit.com/r/intel/comments/15dx55m/which_i9_box...
The problem with Intel weren’t the technical people. It started with the board laying off people, borrowing money to pay dividends to investors, bad strategy, not building relationships with customers who didn’t want to work with them for fabs, etc and then firing the CEO who had a strategy that they knew was going to take years fo implement
It wasn’t because of “DI&E” initiatives and a refusal to hire white people
> borrowing money to pay dividends to investor
That's scam. If you fail to profit, you should admit it, not fake it.
Apple did that too for awhile just because it was cheaper for them to borrow money than repatriate their foreign income and pay taxes.
The issue with Intel though is that they needed the money to invest in R&D.
Cool. And the bad decisions were made by who exactly? Intel executives & employees.
Intel didn’t hire the board. The board did hire the CEO. The bad decisions either made weren’t the results of “DE&I” initiatives.
For applications where the performance is determined by array operations, which can leverage AVX-512 instructions, an AMD Zen 5 core has better performance per area and per power than any ARM-based core, with the possible exception of the Fujitsu custom cores.
The Apple cores themselves do not have great performance for array operations, but when considering the CPU cores together with the shared SME/AMX accelerator, the aggregate might have a good performance per area and per power consumption, but that cannot be known with certainty, because Apple does not provide information usable for comparison purposes.
The comparison is easy only with the cores designed by Arm Holdings. For array operations, the best performance among the Arm-designed cores is obtained by Cortex-X4 a.k.a. Neoverse V3. Cortex-A720 and Cortex-A725 have half of the number of SIMD pipelines but more than half of the area, while Cortex-X925 has only 50% more SIMD pipelines but a double area. Intel's Skymont a.k.a. Darkmont have the same area and the same number of SIMD pipelines as Cortex-X4, so like Cortex-X4 they are also more efficient than the much bigger core Lion Cove, which is faster on average for non-optimized programs but it has the same maximum throughput for optimized programs.
When compared with Cortex-X4/Neoverse V3, a Zen 5 compact core has a throughput for array operations that can be up to double, while the area of a Zen 5 compact core is less than double the area of an Arm Cortex-X4. A high-clock frequency Zen 5 core has more than double the area of a Cortex-X4, but due to the high clock frequency it still has a better performance per area, even if it no longer has also a better performance per power consumption, like the Zen 5 compact cores.
So the advantage in ISA of Aarch64, which results in a simpler and smaller CPU core frontend, is not enough to ensure better performance per area and per power consumption when the backend, i.e. the execution units, does not have itself a good enough performance per area and per power consumption.
The area of Arm Cortex-X4 and of the very similar Intel Skymont core is about 1.7 square mm in a "3 nm" TSMC process (both including 1 MB of L2 cache memory). The area of a Zen 5 compact core in a "4 nm" TSMC process (with 1 MB of L2) is about 3 square mm (in Strix Point). The area of a Zen 5 compact core with full SIMD pipelines must be greater, but not by much, perhaps by 10%, and if it were done in the same "3 nm" process like Cortex-X4 and Skymont, the area would shrink , perhaps by 20% to 25% (depending on the fraction of the area occupied by SRAM). In any case there is little doubt that the area in the same fabrication process of a Zen 5 compact with full 512-bit SIMD pipelines would be less than 3.4 square mm (= double Cortex-X4), leading to a better performance per area and per power consumption than for either Cortex-X4 or Skymont (this considers only the maximum throughput for optimized programs, but for non-optimized programs the advantage could be even greater for Zen 5, which has a higher IPC on average).
Cores like Arm Cortex-X4/Neoverse V3 (also Intel Skymont/Darkmont) are optimal from the POV of performance per area and power consumption only for applications that are dominated by irregular integer and pointer operations, which cannot be accelerated using array operations (e.g. for the compilation of software projects). Until now, with the exception of the Fujitsu custom cores, which are inaccessible for most computer users, no Arm-based CPU core has been suitable for scientific/technical computing, because none has had enough performance per area and per power consumption, when performing array operations. For a given socket, both the total die area inside the package and the total power consumption are limited, so the performance per area and per power consumption of a CPU core determines the performance per socket that can be achieved.
Intel had a leading line of ARM SoCs from 2002-2006. Some of the best on the market for PDAs and smartphones. Their XScale SoCs were very popular.
But Intel gave up and sold it off, right as smartphones were reaching mainstream.
They sold XScale to Marvell which ironically has a higher market cap than Intel.
Their iGPUs are good enough for day-to-day (non gaming) computer use and rock-solid in Linux.
???
The lunar lake Xe (IE the generation before the current one) is not rock solid on linux - i can get it to crash the gpu consistently just by loading enough things that use GL. Not like 100, like 5.
If i start chrome and signal and something else, it often crashes the gpu after a few minutes.
I've tried latest kernel and firmware and mesa and ....
The GPU should not crash, period.
Same here. But the older version are rock solid, and have very low power draw for office/browsing, which is really nice.
I will say it's nice that the gpu crash only hangs the app i'm in and not like crashes the system as a lot of the other gpus. They do seem to have figured out how to properly isolate and recover their gpu.
But it also feels like that's because it crashed so much it was bothering their engineers, so they made the recovery robust.
I can reliably crash my GPU firmware/driver so that reboot is needed to fix, so not everything gets recovered/is recoverable (on Linux at least).
Good enough? Maybe better today, but they have been god awful compared to AMD and absolute garbage compared to something like M1 iGPU. They are responsible for more than half of the pain inflicted on users in Vista days.
Ironically, they have lost the driver advantage in Linux with their latest Arc stuff.
I trust they could have done a lot better, a lot earlier, if they cared to invest in iGPU. Feels like deliberately neglected.
The same way missing mobile feels so nuts that it's gotta be deliberate.
Didn't Intel have floating point division issues more recently as well?
There's an FSIN trig inaccuracy, but I don't know of other division issues: https://randomascii.wordpress.com/2014/10/09/intel-underesti...
Inflation adjusted that’s over 1 billion today. And they do more mitigations with microcode these days.
Some irony if internal calculations of financial damage estimates were under or over-estimated because they were done on a defective chip.
Then that immediately becomes a life or death situation.
yes.
their ”1 in a billion” (excuse) became $1 billion (cost to them).
of course, the CEOs not only go scot-free, but get to bail out with their golden parachutes, while the shareholders and public take the hit.
https://en.m.wikipedia.org/wiki/Golden_parachute
Those evil CEOs pulling the wool over the eyes of the poor shareholders!
This is the employment contract that was negotiated and agreed to by the board / shareholders.
Where yuh bin livin all yore laafe, pilgrim? Under a Boulder in Colorado, mebbe? Dontcha know dat contracts can be gamed, and hev bin fer yeers, if not deecades? Dis here ain't Aahffel, ya know?
Come eat some chili widdus.
Id'll shore put some hair on yore chest, and grey cells in yore coconut.
# sorry, in a punny mood and too many spaghetti western movies
Oh, it gets even better. US taxpayers are giving them billions for "national security" reasons.
Nothing like giving piles of cash to a grossly incompetent company (the Pentium math bug, Puma cablemodem issues, their shitty 4G cellular radios, extensive issues with gigabit and 2.5G network interfaces, and now the whole 13th/14th gen processor self-destruction mess.)
> He called Intel tech support but was brushed off
I laughed when I read this. It’s hard enough to get support for basic issues, good luck explaining a hardware bug.
Reminds me of part 2 of day24. Some wrong wirings. ;-)
https://adventofcode.com/2024/day/24
"At Intel, Quality is job 0.9999999999999999762"
[flagged]