All about finance. — All about investing.

In a statement, Positron AI claims that its Atlas accelerator outperforms Nvidia's H200 during inference, consuming only 33% of the power while processing 280 tokens per second per user with Llama 3.1 8B within a 2000W power envelope.

Cloudflare Trials Positron AI's Atlas Machine Powered by Archer Accelerators, a Inference-Only Solution Purported to Exceed Nvidia's H200 DGX Performance While Consuming Only a Third of the Power.

, and Administrator

2025 August 9 . 8:45 AM

3 min read

Aurora, the Atlas accelerator by Positron AI, reportedly outperforms Nvidia's H200 in inference,... — Aurora, the Atlas accelerator by Positron AI, reportedly outperforms Nvidia's H200 in inference, requiring only 33% of its power consumption. This innovative device is capable of processing 280 tokens per second per user with Llama 3.1 8B, all within a 2000W power limit.

In a statement, Positron AI claims that its Atlas accelerator outperforms Nvidia's H200 during inference, consuming only 33% of the power while processing 280 tokens per second per user with Llama 3.1 8B within a 2000W power envelope.

In a significant development for the AI industry, Positron AI, a U.S.-based company founded in 2023, is making waves with its Atlas accelerator. The device, designed specifically for inference tasks, is reported to outperform Nvidia's DGX H200 system in terms of power efficiency and inference performance.

According to Positron AI, the Atlas accelerator delivers approximately 280 tokens per second per user on Llama 3.1 8B with BF16 precision while consuming around 2000W of power. This is in contrast to Nvidia's 8-way DGX H200 system, which manages roughly 180 tokens per second at 5900W power consumption. This means Atlas uses practically only 33% of the power but delivers roughly 1.5x higher token throughput, making it about three times more efficient in performance per watt and performance per dollar.

The DGX H200, a high-end, infrastructure-grade AI chip, consumes up to 700W per GPU and is designed with advanced power management features. However, an 8-way DGX H200 server scales power usage accordingly, around 5900W, with cutting-edge performance delivered at large scale but with significantly higher power draw.

While Nvidia focuses on robustness, redundancy, and general-purpose AI acceleration, Positron AI's Atlas is designed from scratch for highly efficient inference workloads only, optimizing power and cost without the overhead of training or other HPC tasks.

In summary:

| Aspect | Positron AI Atlas | Nvidia DGX H200 8-way | |---|---|---| | Power Consumption | ~2000W total | ~5900W total | | Inference Performance (Llama 3.1 8B, BF16) | ~280 tokens/sec/user | ~180 tokens/sec/user | | Performance per Watt | ~3x that of DGX H200 | Baseline | | Design Focus | Inference accelerator specialized | General-purpose AI GPU accelerator | | Power Management | Lower total power, purpose-built efficiency | Advanced power management with redundancy and telemetry |

Hyperscale cloud service provider Cloudflare is among the early adopters currently testing Positron AI's Atlas solution for AI inference. The power consumption of AI data centers used for inference is also increasing at a rapid pace due to the growing use of AI.

Positron AI has raised a recent $51.6 million funding round, led by Valor Equity Partners, Atreides Management, and DFJ Growth. The company is also developing its 2nd Generation AI inference accelerator, Asimov, expected in 2026. The Asimov AI accelerator will have 2 TB of memory per ASIC and will not use HBM memory. It will also feature a 16 Tb/s external network bandwidth for efficient operations in rack-scale systems.

The AI industry's power demands are raising concerns as some massive clusters used for AI model training consume the same amount of power as cities. Positron AI's Atlas, with its focus on power efficiency, could play a significant role in addressing this issue. The company manufactures its ASIC hardware at TSMC's Fab 21 in Arizona and assembles the cards in the U.S., making them an almost entirely American product.

Sources: 1. [Link to Source 1] 2. [Link to Source 2] 3. [Link to Source 3]

Positron AI's Atlas accelerator, designed for inference tasks, showcases impressive power efficiency and inference performance, delivering approximately 280 tokens per second per user on Llama 3.1 8B with BF16 precision while consuming around 2000W of power. This is significantly more cost-effective than Nvidia's 8-way DGX H200 system, which consumes approximately 5900W, yet offers a lower inference performance of roughly 180 tokens per second. Further, with the increasing power consumption of AI data centers, Positron AI's focus on data-and-cloud-computing and technology, particularly artificial-intelligence, could potentially revolutionize the finance and investing sectors by reducing power demands and costs associated with AI inference.

Latest

Banks' voluntary guidelines unveiled by the Basel Committee to reveal potential financial hazards...

All about finance.

Voluntary Guidelines Unveiled by Basel Committee for Banks' Exposure of Climate-Related Hazards

Global standard-bearer for banking, the Basel Committee, unveils a fresh, anticipated climate risk disclosure framework for regulators. Despite push from the U.S., the committee opts for a voluntary approach, instead of enforcing a mandatory one for banks' climate risk disclosure.

, and Administrator

2025 August 9

"Galatasaray makes significant purchases, but where does the financing originate?"

All about finance.

Is Galatasaray spending lavishly, but questions linger over the source of their funds?

Turkey's Galatasaray Istanbul making waves on the transfer market with potential moves for Leroy Sane, Victor Osimhen, and Ilkay Gundogan. Question arises: How are they funding these expensive acquisitions?

, and Administrator

2025 August 9

Registration for Airtel Payment Bank Customer Service Provider

All about finance.

Registration for Certified Service Provider status with Airtel Payment Bank

Unfolding digital advances present numerous possibilities, and Airtel Payment Bank CSP Registration is a promising avenue among them.

, and Administrator

2025 August 9

All about finance.

Investigate

Chinese auto manufacturer Nio reaches a significant milestone in China, rolling out its 800,000th vehicle, which turns out to be the Onvo L90 full-sized SUV.

, and Administrator

2025 August 9

In a statement, Positron AI claims that its Atlas accelerator outperforms Nvidia's H200 during inference, consuming only 33% of the power while processing 280 tokens per second per user with Llama 3.1 8B within a 2000W power envelope.

In a statement, Positron AI claims that its Atlas accelerator outperforms Nvidia's H200 during inference, consuming only 33% of the power while processing 280 tokens per second per user with Llama 3.1 8B within a 2000W power envelope.

Read also:

Related

Latest