How AMD "Zen", now called Ryzen, aims to shake the CPU world, which is just what we need.
The new AMD Zen microprocessor is a long time. We have had some glimpses of what we can expect in the last six months, but at the recent AMD Tech Summit, AMD finally revealed the official name for the new processor: Ryzen. It is a tribute to the name in Zen code, and would probably be a lot easier to mark than Zen. To be clear, Zen is still the name of the architecture so I will use that when referring to the whole family, while Ryzen Refers specifically to the consumption patterns of the Zen family.
But that's just the name of the chip, and we still do not have an official word on the various models, or precise details of when Ryzen will launch, apart from Q1'17. I've already talked about what we know and expect from Zen, but with new information and additional demonstrations (including the AMD Livestream New Horizon event that shows Ryzen and even a Vega demo), some things are clearer than before- And fortunately, performance remains basically objective.
I have updated and reviewed our Zen need information sheet, which has now become the Ryzen article you see here. Here is the brief summary of the latest updates:
- The new consumer name for Zen is Ryzen
- 8-core / 16-wire clock parts at 3.4 GHz or higher
- TDP for 8C / 16T parts is 95W
- Ryzen equal or better than i7-6900K in multiple tests
- SenseMI and automatic overclocking with improved cooling
- Zen should be much more scalable (eg, 32-core Naples)
- AMD has Vega GPUs running on a Ryzen system
A brief history of AMD CPU architectures
Sometimes you have to break the mold barriers and do something that has not been done before. But truly innovative products are really hard to do. If it were easy, everyone would be doing it, right? The more established a market becomes, the harder it is to innovate, but that does not mean companies can not go back to the drawing table and start over. For AMD, that's actually what they're doing with the next Zen family of processors.
To be clear, AMD is not starting from scratch. AMD has decades of experience building x86 processors, and everything they've done and learned in the past has influenced future designs. But when AMD launched the Bulldozer processor family in 2011, it was its first major architectural review since the K8 came out in 2003. Six years later, AMD has redesigned the architecture again in a major way.
That's good news, because if you're using a midrange PC ($ 1000) or higher to play, you're probably using an Intel Core i5 / i7 processor in some way. The CPU scene has been pretty stagnant for several years, with the regular tick-tock cadence of Intel (now called process-architecture-optimization) giving incremental improvements every generation. AMD has tried to keep pace, but Bulldozer started with a performance deficit, and if you want the CPU faster and more efficiently, Intel wins, period.
AMD has had the unenviable task of being the latest x86 alternative to Intel, but its manufacturing process fell far behind, with GlobalFoundries being separated from AMD in 2009. Relegated to competing primarily in price, AMD has done pretty well in retail markets Where People just want a low-cost PC (I think Walmart). But once you step out of the budget sector and into the more lucrative (and business) environment, Intel dominates sales. And with a good reason: to take any Core i5 processor and will generally outperform any AMD CPU of the same generation in most applications and workloads.
This is bad for innovation, bad for the industry, and bad for AMD and Intel users. We need competition, and it's not too surprising that some of the biggest advances in Intel's processor technology (the Core architecture) came only after AMD took the performance crown out of them in the Athlon 64 era. The good news is that Zen should shake things up again, with AMD taking another chance at the performance CPU market. Even if AMD can not claim pure performance crown, having a real performance alternative to Core i5 will be greatóespecially if it still comes at a lower price.
Designed to scale from mobile products across desktops, workstations, and servers, Zen looks promising, but will it be enough? Modern processors are complex beasts, with many opportunities for things to go wrong. This is what we know about Zen, what we expect it to do, and additional thoughts about what we really want it to do.
What We Know: Zen Architecture
We have had variations of AMD Bulldozer (Piledriver, Steamroller, and Excavator) for five years, all built using the same building block as a CMT (clustered multi-threading) module consisting of two integer cores with a shared floating-point unit. (It's technically two 128-bit FMAC units that can also function as a single 256-bit FP unit.) All of that changes with Zen.
From a high level, Zen closely resembles Intel's Core architecture. The CMT module has disappeared and instead AMD is using a 4-core / 8-wire SMT (multi-threaded symmetric) building block. Indications are AMD will have the ability to disable SMT support so the 'minimum' Ryzen CPUs can be 4-core / 4-thread parts. At launch, AMD will have a "full" set of Ryzen variants with only 8C / 16T, 8C / 8T and 4C / 8T CPUs.
Current rumors are that the 4C / 8T parts will be sold under the SR3 brand, with 8C / 16T selling as SR7, and a mid-level SR5 offering 8C / 8T. These may be code names or may be retail names (similar to Intel's i3 / i5 / i7 nomenclature), which means that AMD will double the number of cores/threads offered in each market segment.
Along with SMT support, the pipeline and various other elements of the architecture have also been reworked. The L1 cache is a faster write design, and the L2 cache also has twice the bandwidth. Meanwhile, the L3 cache will deliver up to five times the bandwidth. There is a new micro-op cache, and each core can emit up to six micro-ops (or four FP-ops) per cycle, similar to the width of 6 Skylake widths and 50 percent higher than the 4-width design Of the heavy Bulldozer Equipment "CPUs.
Zen has an improved branch prediction algorithm 'perceptron', now decoupled from the search stage, which again helps the performance. In fact, we do not know the length of the pipe for Zen (bulldozer is estimated in a 20-stage pipeline), but the better prediction of branches can help mitigate having more stages. Notice for example that Intel's NetBurst pipeline was nominally a 20-stage design, which was "too long" back in the day, and yet all Intel designs dating back at least to Sandy Bridge are around the same length. And not to minimize these aspects, but Zen also has higher load, store and remove buffers, along with improved clock door.
Then there is the platform. Ryzen will use a new socket AM4, with one of the several chipsets, A320, B350, and X370. Regardless of the chipset, the platform will remain as a dual-channel DDR4 configuration, and the CPU socket has 1331 pins. Adhering with dual channel also makes sense as it keeps the motherboard costs under control, and allows up to 64 GB of maximum memory. As for the socket, 1331 is a good number of pins because it is more than the Intel LGA1151, and it gives enough pins for the rumors of 36 PCIe Gen3 lanes on the CPU, which would be 32 lanes for graphics cards, with four other lanes Probably connected with the chipset. However, some previously filtered information indicates that X370 will be required for SLI / CF configurations so we could end up with more PCIe lanes attached to the chipset, which in turn would connect to PCIe slots.
Along with DDR4 support, perhaps equally important is the inclusion of USB 3.1 Gen2 (10Gbps) and NVMe M.2 support. (SATA Express, on the other hand, seems to be dying fast, so its inclusion does not really matter to me). Obviously, M.2 NVMe drives remain a high-end option for most PC versions, and for particular games, there is a little benefit compared to a good SATA drive. But then, Ryzen clearly is not aiming at budget compilations as the only option, so being able to use a modern NVIDIA M.2 drive is a must.
I can not emphasize enough on how big of a fundamental change this all means and means that everything we know about AMD CPU performance from the past can no longer be applied. AMD has stated a performance target of 40 percent better IPC (instructions per watch) with Zen versus Excavator, and these architectural changes should provide some performance improvements per clock. A 3.0GHz Zen core should be 40 percent faster than a 3.0GHz excavator core (though we never saw them outside APU), based on AMD's claims. But there is a problem: we do not know the final clock speeds of Zen / Ryzen. I'll get back to that in a moment, but first, let's talk about some other aspects of architecture as well as process technology.
SenseMI technology
One of the new technologies and names coming out of the AMD Tech Summit is SenseMI (pronounced "Sense Em-Eye" and not "Sense Me", although I like the sound of the latter). Much of this seems to be rebranding and grouping features that are often found in existing processor designs, but there are some new twists.
The previous slides provide a brief overview, but several of them do not tell us much. Neural Net Prediction and Smart Prefetch, in particular, do not appear to be anything new, we had a prediction of "smart" branch that "learns" from the previous code execution that goes back to at least the Pentium P5 era. Everything has since been about improving how branch prediction traces states and predicts the next iteration. It is impossible to say at this point whether the prediction of the Zen branch is better, worse, or similar to the current Intel design, but it is almost certain to be better than the Piledriver / Excavator / Jaguar / Puma. The same goes for smart prefetch - Intel has used that term that goes back at least to the early Core processors. I will assume the AMD branch prediction and prefetching are up to the snuff here and move on to the other elements.
Pure Power closely resembles the evolution of AMD's PowerNow (CPU) and PowerTune (GPU) technologies, and AMD discussed similar ideas with its Polaris 10 architecture last year. The main idea is that Pure Power will optimize the behavior to allow the operation with a lower energy consumption with the same level of performance, and joins the second technology of SenseMI, Precision Boost.
Precision Boost is a little more interesting. We've had Turbo Core (AMD) and Turbo Boost (Intel) for a while now, allowing CPUs to dynamically change clocks and potentially exceed guaranteed minimum clock speed. This has allowed quad-core and even 6-core / 8-core / 10-core CPUs to offer single-thread performance relatively similar to dual-core designs while running multi-threaded workloads without exceeding The budget power. Precision Boost adjusts things the old way, providing a new granularity of 25MHz clock speed settings.
I'm not sure how important this really will be since the difference in performance between 3.5 GHz and 3.525 GHz, for example, is going to be so small that even benchmarks will probably not be able to show a steady improvement. However, it gives AMD the potential to extract every last bit of performance from a CPU. And there is the final part of SenseMI where this could be useful.
The Extended Frequency Range (XFR) is designed to reward enthusiast with high-end cooling. If you have been using a stock CPU fan in your system, you will know that you normally do the job well. It may not be the quietest choice, and CPU temperatures may warm up a little at times, but overall it is enough to meet the minimum performance levels of AMD or Intel CPUs. If you upgrade to a better air cooler or a closed circuit AIO solution and do not engage in any kind of overclocking, you will basically get the very little benefit; May has slightly lower noise levels and/or temperatures.
XFR shakes things up by providing some level of autonomous overclocking. How much is not something that AMD specifically wanted to comment on, but they did notice the clock speed scale with better air, liquid, and even LN2 (liquid nitrogen) cooling. Now, I'm not one to play with LN2, but if XFR means that someone willing to install a better refrigerator can suddenly improve the speed of 300MHz clocks (5-10 percent), that would be pretty impressive. It would also make a comparison between benchmarks more difficult ("What CPU cooler did PC Gamer use when they showed Ryzen getting XYZ at Cinebench?"), But I'll deal with that when I need to.
Scalability and Infinity fabric
Something you'll see mentioned on SenseMI slides is the Infinity Fabric. This is an area that AMD discussed in greater detail at the Technology Summit, as it is very important. This great general term refers to both the internal and external communications of Zen products and replaces a whole series of technologies superimposed from earlier architectures. It would be great if I could simply say that the Fabric of Infinity represents the inner topology of Zen, but that is not entirely true. This covers that, but it also includes several signaling and control protocols (see SenseMI above). I have to apologize for the lack of clear details since AMD did not provide us slides specifically on the Infinity Fabric, but let me talk about the high-level overview.
In previous architectures, AMD had a large number of protocols and communication channels, including Hyper Transport, with other standards such as CCIX (Coherent Interconnect for Accelerators). Infinity Fabric is designed to include all of these, with a common set of protocols that, over the long term, allow significantly better scalability and efficiency. Hyper Transport can still be used, or a mesh fabric, or CCIX, or a direct connection, or whatever, the key is that all of these are compatible with Infinity Fabric and several chips can use what works best. But instead of trying to follow dozens of protocols (think of each as a slightly different language or dialect), everything can talk about the fabric of infinity.
What this means in practice is that designs based on Zen, among others, should have better scalability. With Bulldozer, if AMD wanted to break the basic 2-core module, it needed a lot of custom logic to handle things. The same goes for "cat cores" like Jaguar and Puma, where 2/4 cores designs were not too bad, but something like the 8-core Jaguar CPUs at the heart of the PS4 / XB1 required more work. From what I understand, the automated design tools for the processor array will gain a lot of flexibility, including potentially energy savings and performance, using the Infinity Fabric.
It may be easier to give some concrete examples of the "old way" and the "new form" of things. Piledriver was the last major AMD CPU architecture update, with Steamroller and Excavator confined for use in APUs. For consumer CPUs, Piledriver found its way into the core of Vishera, with the high-end design being the FX-8350 and similar chips. It has 1.2B (billion) transistors, 8MB L2 + 8MB L3 cache, and measures 315mm ^ 2. For server parts, 16-core Opteron parts basically took two Vishera chips and put them in a multi-chip package, with Some additional custom logic to facilitate communication between the two dies. What if AMD wanted to make a 24-core Piledriver piece? They would have had to rework things even more, which would be a long time and could not even scale so well.
Now, compare that to how Zen is designed. Ryzen will be a consumer part of 8C / 16T at launch, and we expect a 4C / 8T cut part to follow, along with APU variants at a later date. AMD has not demonstrated any of the 4C things yet, but scalability can also go in the other direction. For Zen, AMD has publicly demonstrated a Naples server chip with 32 cores / 64 threads and had several Tech Summit servers with dual sockets (and several GPUs). Given the proper demand, AMD should be able to create 12C / 24T, 16C / 32T, 20C / 40T, 24C / 48T and 28C / 56T Zen chips. Some of them would use a larger die with disabled parts, but instead Only two options, they should be able to create several variants of data.
Something to note here is that there is no indication that the parts of the server will use the Ryzen-AMD name can be pasted with Opteron, or they may have a separate name for the new family of CPU servers. As expected, extending an 8C / 16T to 32C / 64T package results in an absolutely massive chip. AMD has not provided transistor counts or cache sizes for Naples, but I expect it to be as large as the larger Intel chips.
By way of comparison, the 24-core Broadwell-EP registers about 7.2 trillion transistors, with 60 MB of L3 cache and a die size of 456 mm ^ 2, which makes the i7-6950X appear rather insignificant with Only 25 MB of L3 cache, billion transistors, and 246 mm ^ 2. In summary, compared to its previous design, AMD Zen should prove much more scalable, thanks in large part to the Infinity fabric.
Technological process
In the Bulldozer and in an earlier time, AMD was similar to Intel in the fact that they did all the processor design, chip manufacturing and home making for their CPUs. Running a foundry chip and keeping it updated is an expensive proposition, however, and if the installation is not used fully it can be a huge sink of money. AMD sold its manufacturing facilities in 2009 and GlobalFoundries (GF) was born, but the separation of the two companies took a few years before it was fully realized.
Today, GlobalFoundries is taking orders from other large companies beyond AMD, and AMD has fully married the last of its GF shares (in 2012), although they still have wafer deals in place. AMD no longer depends entirely on GF and can pursue manufacturing agreements with other facilities, focusing on chip design rather than the foundry business. GF is also free to take orders for all its manufacturing capacity available, and GF has also authorized the production process of 14n FinFET from Samsung has plans to move to 7nm FinFET as its next main process node.
AMD is using the GF 14nm FinFET node for Zen, and moving on to a competitive process is a huge leap from Vishera's 32nm SOI process. Recent AMD APUs have been using 28nm SOI, so again this will be a big change in the size of the features. It's similar to the jump graphics chips they saw last year, which ultimately helped double GPU efficiency. Combined with all architectural improvements, switching to a significantly smaller process node should be hugely beneficial to AMD and Zen.
What we expect: Performance
The combination of a new high-performance architecture with an upgraded manufacturing process basically means that anything can happen when it comes to the end product. Is the GF 14nm FinFET fully ready, or are the returns still a bit dubious? We really do not know. Does Ryzen's architecture live up to the hype? Again, we do not know for sure, although the first signs are promising. And what will be the final clock speeds on the retail pieces? Previously, there were reports of engineering samples with a clock up to 3.2GHz floating, but the early ES chips did not tell us much about the final clocks.
Fortunately, we have more details coming directly from AMD now. While they would not commit to maximum turbo clocks, And now it has gone on record as saying that Ryzen 8C / 16T parts will ship at 3.4GHz or higher for the base clock. That may seem low, but keep in mind that the 8C / 16T part with higher synchronization of Intel is the i7-6900K, which comes with a 3.2GHz clock with a maximum of 3.7GHz Turbo Boost. So at least for many basic parts, Ryzen is looking very competitive.
As noted earlier, AMD has stated that CPI is 40 percent higher than Excavator, which is a bit of a statement difficult to prove that Excavator was not used outside of APU. We know that Piledriver was slightly better at IPC and efficiency than Bulldozer, and Steamroller (APU-only) and Excavator improved further. The latest pure CPU design from AMD was Vishera, the FX-8300/9000 series, with a maximum of 4.3GHz on the FX-8370.
Even if Zen's maximum clock speed drops from 4.3GHz to 3.8GHz, Zen / Ryzen should still end up being a significantly faster processor. 40 percent better CPI than Excavator cores somewhere around 50 percent better CPI compared to Vishera. With a baseline of at least 3.4 GHz and turbo speeds that are likely to be at least 300MHz higher (possibly 600MHz or higher), Ryzen should put something scary in the heart of Intel.
And AMD has offered not one but two public benchmarks showing Ryzen hitting i7-6900K now. Earlier this year, they showed both the i7-6900K and Zen running at 3.0GHz in a Blender test, with AMD coming out right ahead. Now, they showed a closed zen part at 3.4GHz (no turbo) running the same benchmark against the fully enabled (Turbo Boost Max 3.0) i7-6900K at the Tech Summit. This time, Zen basically coincides with Broadwell-E, Which is great to see. AMD followed this up with a second benchmark Handbrake H.264 video transcoding, another test that leverages multi-threading to a high degree, and Zen at 3.4 GHz emerged victorious -54 seconds compared to 59 seconds.
Perhaps more importantly, AMD showed real-time energy monitoring of both Zen and i7-6900K during Handbrake's workload. Both systems showed drainage power (for the CPU only, if I am not mistaken) in the range 90-100W, but with Zen constantly it is about 5W lower power. That's just two tests, and notably absent are single-threaded tests with turbo-enabled modes, but we're still a couple of months publicly available.
To put things into perspective, I've run the current AMD numbers and Intel APUs / CPUs (including game performance). A single Steamroller core (A10-7890K) runs at 4.3GHz and networks 97 points in the Cinebench 15 single-thread test, while a Piledriver core (FX-8370) running on networks up to 4.3GHz 99 in the same Test Note that the difference in platform probably has denied any Steamroller IPC gain over Piledriver). The Broadwell-E i7-6900K (3.7GHz) gets 153, so using the previous 50 percent estimated IPC improvement over Piledriver, Ryzen AMD should be fairly competitive with the i7-6900K, probably in the 140-150 range.
However, Haswell / Devil Canyon Core i7-4790K (4.4GHz) gets 173 and a Skylake i7-6700K (4.2GHz) gets 182 in the same test, thanks to the higher clocks and improved IPC in Skylake's case. Both Broadwell-E and Zen get short on single-threaded performance compared to the i7-6700K, and Ryzen would have to hit the turbo clocks well above 4.0 GHz to close the gap. But remember that AMD only shows part 8C / 16T; A 4C / 8T part could show much better clock scaling - we can expect at least.
What about games, which do not usually outperform well with the CPU core? The games are a wild card for Zen because there are times when the current AMD CPUs are a remarkable bottleneck, at least with a fast GPU like a GTX 1080 that runs at 1080p. It is also possible that Ryzen can see more than 40 percent of performance gains on Vishera in the games, but without proof, it is impossible to say. Regardless, I hope any player on a 1440p screen should find that a four core Ryzen core close most of the gap between the FX-8370 and the i5-6600K.
And there's still the multi-core factor. DX12, in theory, allows games to scale performance with more CPU cores, but so far we have seen few clues to scaling beyond the quad core. Ashes of the Singularity is really the only game that pushes past a 4C / 8T chip, making use of 6-core and even 8-core processors. Other DX12 games have not hit the CPU nearly as hard, and unless they're using tons of drives like AotS, I do not expect this to change that much. A straight quad-core chip will prove to be a bit of a handicap in the future, but quad-core plus SMT is not close to reaching the end of the road.
Ryzen: What we want
This is the first time in almost a decade that a new AMD chip has the opportunity to go head to head with Intel's latest. I do not really expect them to win against all parts of Core i7, at least not in the overall performance through a series of tests, but I would love to be wrong here. AMD is "tuning" Ryzen's performance (note the lack of turbo in the public demo), so expect at least another turbo 10 percent improvement alone, and possibly more.
(As a side note, when I first tested Intel i7-6700K and Z170, my motherboard BIOS was not properly tuned. An updated BIOS a couple of weeks after the release gave me a 10-15 percent increase In performance, so it's possible AMD might still surprise us.)
Assuming that everything goes as planned, clock speeds will determine how much Ryzen is positioned against Intel. AMD's Ryzen 8-core chip looks like it might have a Broadwell-E i7-6900K, with both running on similar clocks (~ 3.4GHz) in both Blender and Handbrake. We can not simply take those results and apply them unilaterally to all benchmarks, but it's a start, and AMD says they will provide video and Blender files so others can perform similar tests on other hardware, which is great to hear.
However, the i7-6900K can also hit 4.5GHz with good cooling, and we do not know how much higher the Ryzen chips actually clock. As for the clock speed of RX 480, at least so far the Samsung / GF 14nm FinFET process does not seem to clock as high as other processes (TSMC 16nm FinFET and Intel 14nm FinFET), but that could be due to the various Chip More than the manufacturing process.
Ironically, since I use an i7-5930K almost daily, I do not see Ryzen 8C / 16T processors as "mainstream" products (as Intel's X99 platform is not "mainstream"), in part because the fall in The efficiency and speed of the watch is not usually worth it for the additional cores. But if you are running programs that can use kernels, it is impressive, and efficiency becomes less important in high-end workstation class processors. But Ryzen impresses here because even in an 8C / 16T configuration, AMD is aiming for a TDP of 95W.
8C / 16T is great but what I really want to see is the Ryzen 4C / 8T parts that come out of doors with 4.0GHz clocks in a minimum, and I want to be able to slap on a good liquid cooling solution and get An At least another 10-15 percent. If the rumor of 65W TDP for 4C / 8T is accurate, I would be happy to slap in liquid cooling and crash up to a 130W TDP if it means that clocks closer to 5.0GHz with overclocking. Give me that and we could end up with an old and old CPU royale battle.
Moreover, considering the i7-6900K is great news for Ryzen, even better would be a quad-core APU with the performance that rivals the i5-6600K, paired with a GPU equal to the RX 460. I still want Discreet graphics, But having a convincing alternative to Intel would be great. Well, to be honest, I really want to see an APU with a performance closer to the RX 480, but I doubt it is happening anytime soon since the 480 has up to 8GB of GDDR5 with 256GB / s of bandwidth. Even the RX 460 has a bandwidth of 112GB / s, which a Ryzen APU can not expect to match with a comparable memory configuration of 38.4GB / s (DDR4-2400).
In this regard, we expect that DDR4-2400 support simply mars the baseline and that, same as Intel's X99 and Z170 platforms, support overclocking memory for things like DDR4-3200 (51.2GB / s). High-performance memory does not really matter much for a typical desktop system, but for an APU it can help a lot of graphics performance. (Too bad Ryzen and AM4 without four-channel memory child, since it could actually be a real benefit to APUs, well, not that it would really be worth the cost.)
Finally, there is the omnipotent question of price. This is where AMD has demonstrated its ability to win an Intel for years because Intel is generally a higher-end processing vendor processor for just under $ 175 - which has been the approximate price of entry for the less expensive i5 chips Of four cores that go back to the first generation Core i5 chips. With Ryzen, AMD, some options, depending on where the final performance falls.
If AMD can clearly match or exceed Core i3 Skylake parts / Intel Lake Kaby with its Ryzen 4C / 8T chips, we are likely to see prices in the $ 150 range with 8C / 8T Ryzen for $ 200- $ 250. 8C / 16T among all likely will not flirt with the obscene prices of Intel $ 1000- $ 1750 on top, and depending on the watches and performance the current estimate is $ 300- $ 500. Assuming AMD Intel's Broadwell-E performance can be approached (or surpassed), we could end up with better prices for everyone, and that's great to see.
There are a lot of "ifs" and other qualifiers right now. What will Ryzen really deliver? Everything looks promising and AMD seems confident in its live broadcast of New Horizon. Ryzen is scheduled to launch Q1'17, with indications that it's early March. Even if the AMD manufacturer can not buy an Intel at each benchmark, they are bringing the proper competition back into the CPU market, and that's great news.
No comments:
Post a Comment