Anandtech News

**FEED** · 11-08-21, 02:54 PM

As an industry, we are slowly moving into an era where how we package the small pieces of silicon together is just as important as the silicon itself. New ways to connect all the silicon include side by side, on top of each other, and all sorts of fancy connections that help keep the benefits of chiplet designs but also taking advantage of them. Today, AMD is showcasing its next packaging uplift: stacked L3 cache on its Zen 3 chiplets, bumping each chiplet from 32 MiB to 96 MiB, however this announcement is targeting its large EPYC enterprise processors.

More...

**FEED** · 11-09-21, 04:52 AM

Today as part of NVIDIA’s fall GTC event, the company has announced that the Jetson embedded system kits will be getting a refresh with NVIDIA’s forthcoming Orin SoC. Due early next year, Orin is slated to become NVIDIA’s flagship SoC for automotive and edge computing applications. And, has become customary for NVIDIA, they are also going to be making Orin available to non-automotive customers through their Jetson embedded computing program, which makes the SoC available on a self-contained modular package.
Always a bit of a side project for NVIDIA, the Jetson single-board computers have none the less become an important tool for NVIDIA, serving as both an entry-point for helping bootstrap developers into the NVIDIA ecosystem, and as a embedded computing product in and of itself. Jetson boards are sold as complete single-board systems with an SoC, memory, storage, and the necessary I/O in pin form, allowing them to serve as commercial off the shelf (COTS) systems for use in finished products. Jetson modules are also used as the basis of NVIDIA’s Jetson developer kits, which throw in a breakout board, power supply, and other bits needed to fully interact with Jetson modules.

NVIDIA Jetson Module Specifications
	AGX Orin	AGX Xavier	Jetson Nano
CPU	12x Cortex-A78AE @ 2.0GHz	8x Carmel @ 2.26GHz	4x Cortex-A57 @ 1.43GHz
GPU	Ampere, 2048 Cores @ 1000MHz	Volta, 512 Cores @ 1377MHz	Maxwell, 128 Cores @ 920MHz
Accelerators	2x NVDLA v2.0	2x NVDLA	N/A
Memory	32GB LPDDR5, 256-bit bus (204 GB/sec)	16GB LPDDR4X, 256-bit bus (137 GB/sec)	4GB LPDDR4, 64-bit bus (25.6 GB/sec)
Storage	64GB eMMC 5.1	32GB eMMC	16GB eMMC
AI Perf. (INT8)	200 TOPS	32 TOPS	N/A
Dimensions	100mm x 87mm	100mm x 87mm	45mm x 70mm
TDP	15W-50W	30W	10W
Price	?	$999	$129

With NVIDIA’s Orin SoC set to arrive early in 2022, NVIDIA is using this opportunity to announce the next generation of Jetson AGX products. Joining the Jetson AGX Xavier will be the aptly named Jetson AGX Orin, which integrates the Orin SoC.
Orin featuring 12 Arm Cortex-A78AE “Hercules” CPU cores and an integrated Ampere architecture GPU with 2048 CUDA cores, adding up to 17 billion transistors, Given Orin's mobile-first design, NVIDIA is being fairly conservative with the clockspeeds here; the CPU cores for Jetson AGX Orin top out at 2GHz, while the GPU tops out at 1GHz. Otherwise, the SoC also contains a pair of NVIDIA’s latest generation dedicated Deep Learning Accelerators (DLA), as well as a vision accelerator to further speed up and efficiently process those tasks.

Rounding out the Jetson AGX Orin package, the Orin SoC is being paired with 32GB of LPDDR5 RAM, which is attached to a 256-bit memory bus, allowing for 204GB/second of memory bandwidth. Meanwhile storage is provided by a 64GB eMMC 5.1 storage device, which is twice the capacity of the previous generation Jetson AGX.
All told, NVIDIA is promising 200 TOPS of performance in INT8 machine learning workloads, which would be a 6x improvement over Jetson AGX Xavier. Presumably those performance figures are for the module’s full 50W TDP, while performance is proportionally lower as you move towards the module’s minimum TDP of 15W.

Meanwhile, for this generation NVIDIA will be maintaining pin and form-factor compatibility with Jetson AGX Xavier. So Jetson AGX Orin modules will be the same 100mm x 87mm in size, and use the same edge connector, making Orin modules drop-in compatible with Xavier.
Jetson AGX Oron modules and dev kits are slated to become available in Q1 of 2022. NVIDIA has not announced any pricing information at this time.

More...

**FEED** · 11-09-21, 04:52 AM

After the publication of the LPDDR5X memory standard earlier this summer, Samsung has now been the first vendor to announce new modules based on the new technology.
The LPDDR5X standard will start out at speeds of 8533Mbps, a 33% increase over current generation LPDDR5 based products which are running at 6400Mbps.

More...

**FEED** · 11-09-21, 10:00 AM

To support the launch of Intel's latest 12th generation 'Alder Lake' processors, Intel has also pulled the trigger on its latest Z690 motherboard chipset. Using a new LGA1700 socket, some of the most significant advancements with Alder Lake and Z690 include PCIe 5.0 support from the processor, as well as a PCIe 4.0 x8 link from the processor to the chipset. In this article, we're taking a closer look at over 50+ different DDR5 enabled motherboards designed to not only use the processing power of Alder Lake but offer users a myriad of high-class and premium features.

More...

**FEED** · 11-09-21, 10:00 AM

Alongside a slew of software-related announcements this morning from NVIDIA as part of their fall GTC, the company has also quietly announced a new server GPU product for the accelerator market: the NVIDIA A2. The new low-end member of the Ampere-based A-series accelerator family is designed for entry-level inference tasks, and thanks to its relatively small size and low power consumption, is also being aimed at edge computing scenarios as well.
Along with serving as the low-end entry point into NVIDIA’s GPU accelerator product stack, the A2 seems intended to largely replace what was the last remaining member of NVIDIA’s previous generation cards, the T4. Though a bit of a higher-end card, the T4 was designed for many of the same inference workloads, and came in the same HHHL single-slot form factor. So the release of the A2 finishes the Ampere-ficiation of NVIDIA accelerator lineup, giving NVIDIA’s server customers a fresh entry-level card.

NVIDIA ML Accelerator Specification Comparison
	A100	A30	A2
FP32 CUDA Cores	6912	3584	1280
Tensor Cores	432	224	40
Boost Clock	1.41GHz	1.44GHz	1.77GHz
Memory Clock	3.2Gbps HBM2e	2.4Gbps HBM2	12.5Gbps GDDR6
Memory Bus Width	5120-bit	3072-bit	128-bit
Memory Bandwidth	2.0TB/sec	933GB/sec	200GB/sec
VRAM	80GB	24GB	16GB
Single Precision	19.5 TFLOPS	10.3 TFLOPS	4.5 TFLOPS
Double Precision	9.7 TFLOPS	5.2 TFLOPS	0.14 TFLOPS
INT8 Tensor	624 TOPS	330 TOPS	36 TOPS
FP16 Tensor	312 TFLOPS	165 TFLOPS	18 TFLOPS
TF32 Tensor	156 TFLOPS	82 TFLOPS	9 TFLOPS
Interconnect	NVLink 3 12 Links	PCIe 4.0 x16 + NVLink 3 (4 Links)	PCIe 4.0 x8
GPU	GA100	GA100	GA107
Transistor Count	54.2B	54.2B	?
TDP	400W	165W	40W-60W
Manufacturing Process	TSMC 7N	TSMC 7N	Samsung 8nm
Form Factor	SXM4	SXM4	HHHL-SS PCIe
Architecture	Ampere	Ampere	Ampere

Going by NVIDIA’s official specifications, the A2 appears to be using a heavily cut-down version of their low-end GA107 GPU. With only 1280 CUDA cores (and 40 tensor cores), the A2 is only using about half of GA107’s capacity. But this is consistent with the size and power-optimized goal of the card. A2 only draws 60W out of the box, and can be configured to drop down even further, to 42W.
Compared to its compute cores, NVIDIA is keeping GA107’s full memory bus for the A2 card. The 128-bit memory bus is paired with 16GB of GDDR6, which is clocked at a slightly unusual 12.5Gbps. This works out to a flat 200GB/second of memory bandwidth, so it would seem someone really wanted to have a nice, round number there.

Otherwise, as previously mentioned, this is a PCIe card in a half height, half-length, single-slot (HHHL-SS) form factor. And like all of NVIDIA’s server cards, A2 is passively cooled, relying on airflow from the host chassis. Speaking of the host, GA107 only offers 8 PCIe lanes, so the card gets a PCIe 4.0 x8 connection back to its host CPU.
Wrapping things up, according to NVIDIA the A2 is available immediately. NVIDIA does not provide public pricing for its server cards, but the new accelerator should be available through NVIDIA’s regular OEM partners.

More...

**FEED** · 11-10-21, 05:48 AM

Join Supermicro at Supercomputing ‘21 to hear about where HPC is going, modular GPUs, innovations in high-performance storage solutions, and more.

More...

**FEED** · 11-10-21, 07:27 AM

Flash-based portable drives have become popular fast storage options for both content creators and backups-seeking consumers. The advent of high-speed interfaces such as USB 3.2 Gen 2 (10 Gbps) and USB 3.2 Gen 2x2 (20 Gbps) along with Thunderbolt 3 (up to 40 Gbps) have enabled rapid improvements in performance of such portable SSDs over the last few years. While the higher-speed variants have traditionally been premium devices, a push towards lower priced drives was kickstarted by the introduction of native USB flash drive (UFD) controllers. Today, we are taking a look at the performance and value proposition of the complete Kingston XS2000 portable SSD lineup based on the Silicon Motion SM2320 controller.

More...

**FEED** · 11-10-21, 10:53 AM

Every once in a while, a startup comes along with something out of left field. In the AI hardware generation, Cerebras holds that title, with their Wafer Scale Engine. The second generation product, built on TSMC 7nm, is a full wafer packed to the brim with cores, memory, and performance. By using patented manufacturing and packaging techniques, a Cerebras CS-2 features a single chip, bigger than your head, with 2.6 trillion transistors. The cost for a CS-2, with appropriate cooling, power, and connectivity, is ‘a few million’ we are told, and Cerebras has customers that include research, oil and gas, pharmaceuticals, and defense – all after the unique proposition that a wafer scale AI engine provides. Today’s news is that Cerebras is still in full startup mode, finishing a Series F funding round.

More...

**FEED** · 11-15-21, 09:20 AM

Kioxia this morning is updating their BG series of M.2 2230 SSDs for OEMs with the addition of the new BG5 family of drives. The latest in the company’s lineup of postage stamp-sized SSDs, the BG5 series sees Kioxia reworking both the NAND and the underlying controller to use newer technologies. As a result, the latest iteration of the drive is gaining overall higher performance thanks to the combination of PCIe 4.0 support as well as the switch to Kioxia’s latest BiCS5 NAND. However, in an unexpected twist, the BG series is no longer a single-chip design; instead, the NAND and controller on the BG5 are now separate packages.
Long a fixture of pre-built systems, Kioxia’s BG series of SSDs have been a favorite of OEMs for the last several years due to their small size – typically M2. 2230 or smaller – as well as their low cost. In particular, the DRAMless design of the drive keeps the overall component costs down, and it allowed Kioxia to simply stack the NAND dies on top of the controller, giving the SSDs their small footprint. As well, the simple design and tight thermal tolerances of such a stacked design mean that power consumption has been kept quite low, too. The resulting performance of the drives is very much entry-level, and thus rarely noteworthy, but for a drive not much bigger than a postage stamp, it fills a small role.
Coming a bit over two years since the BG4 was introduced, the headlining update to BG5 is the addition of PCIe 4.0 support. Whereas BG4 was a PCIe 3.0 x4 drive, BG5 is PCIe 4.0 x4, which at this point gives the drive more bus bandwidth than it could ever possibly hope to use. Truth be told, I was a bit surprised to see that the BG5 went PCIe 4.0 given the limited performance impact on an entry-level drive and the tight power limits, though there are some second-order benefits from PCIe 4.0. In particular, any OEM who ends up only allocating two lanes to the drive (something that happens now and then) will still get the equivalent of PCIe 3.0 x4 speeds out of the drive, which in turn is still high enough to run the drive at almost full performance. This underscores one of the big improvements offered by higher PCIe speeds: for components that don’t need more bandwidth, integrators can instead cut down on the number of lanes.

Kioxia BG5 SSD Specifications
Capacity	256 GB	512 GB	1 TB
Form Factor	M.2 2230 or M.2 2280
Interface	PCIe Gen4 x4, NVMe 1.4
NAND Flash	112L BiCS5 3D TLC
Sequential Read	3500 MB/s
Sequential Write	2900 MB/s
Random Read	500k IOPS
Random Write	450k IOPS

Speaking of performance, the BG5 drives are rated for higher throughput than their predecessor. Kioxia’s official press release only offers a single set of figures, so these are almost certainly for the 1TB configuration, but for that drive they are rating it at 2900MB/sec writes and 3500MB/sec reads – the latter just crossing the limits of PCIe 3.0 x4. Random writes and reads are rated at 450K IOPS and 500K IOPS respectively. As always, these figures are against writing to the drive’s SLC cache, so sustained write throughput does eventually taper off.
As this is a DRAMless drive, there is no significant on-package caching/buffer layer to talk about. Instead, like its predecessor, Kioxia is relying on Host Memory Buffer (HMB) tech to improve the performance of their drive. HMB isn’t used to cache user data, but instead is used to cache mapping information about the drive’s contents in order to speed up access. Along with the latest generation of this tech, Kioxia has also updated their controller to support NVMe 1.4.
Backing the new PCIe 4.0 controller is Kioxia’s BiCS5 generation of TLC NAND, which is a 112L design. BiCS5 has been shipping for a while now, so it’s very much a known quantity, but the time has finally come for it to trickle down into the BG series of drives. BiCS5 was a relatively modest increase in density over BiCS4, so it’s not too surprising here that Kioxia is keeping the largest BG5 configuration at 1TB, which would mean stacking 8 of the 1Tbit dies.
But perhaps the biggest change with the BG5 isn’t the specifications of the controller or the NAND on their own, but rather the fact that the two parts are alone to begin with. A staple of the BG5 series design has been the small package enabled by stacking the memory and controller together into a single package. But from Kioxia’s supplied product photo, we can clearly see that the NAND and the controller are separate packages. Kioxia made no mention of this change, so we can only speculate about whether it’s for simplicity in construction (no TSVs to the controller) or maybe the heat put off by a PCIe 4.0 controller. But one way or another, it’s a big change in how the small drive is assembled.

As a result of this change, the BGA M.2 1620 form factor – which supplied the single-chip package in a solder-down package – has gone away. Instead, the smallest form factor is now the removable M.2 2230 version. The postage stamp-sized M.2 2230 form factor has long been the staple of the lineup, as it’s what we’ve seen in Microsoft’s Surface products and other thin and light designs over the years. Since the form factor here isn’t changing, the use of multiple packages shouldn’t alter things much for a lot of OEMs. And for OEMs that need physically larger drives for compatibility reasons, Kioxia is also formally offering a 2280 design as well. A simple two-chip solution on such a large PCB is unremarkable, but it would allow the BG5 to be easily inserted into systems that are designed to take (and typically use) 2280 drives.
As these are OEM drives, no pricing information is available. The drives are currently sampling to Kioxia’s customers, so expect to see them land in commercial products in 2022.

More...

**FEED** · 11-15-21, 09:20 AM

This week we have the annual Supercomputing event where all the major High Performance Computing players are putting their cards on the table when it comes to hardware, installations, and design wins. As part of the event Intel is having a presentation on its hardware offerings, which discloses additional details about the next generation hardware going into the Aurora Exascale supercomputer.

More...