A Spark to beat M5 Ultra and a MegaSpark to beat 2x Rubin PRO 6000! (original) (raw)

June 11, 2026, 4:50am 1

I do a lot of model optimizing work and a lot of analysis of inference economics. Nvidia’s announcement that they’ll be keeping up the DGX Spark as a per-generation new platform release was great news to me, after enjoying the Spark so much.

I feel that the single Spark is woefully underpowered though and not in the right market slot. I greatly hope that Nvidia course-corrects towards a base DGX Spark being competitive at least with the M6 Max and hopefully also the M5 Ultra that it will be competing with.

This would likely mean a Vera Rubin DGX Spark starts around $7,000. With the way local inference is going, I think that will be easy to justify though.

The last 90 days of AI research.. Nemotron 3 Ultra, DiffusionGemma, DFlash, Speculative Speculative Decoding… are one result reported four ways. And a direct instruction to whoever designs desktop silicon.

I turned it into the memo NVIDIA should read before taping out the next DGX Spark. I have six predictions for early 2028:

  1. Lossless speculative decoding is an architecture decision: ×1.8 shipping today at large MoE scale, ~×6 demonstrated, drafting now off the critical path entirely
  2. The 2028 desktop flagship model shape will be: ~500B-A50B sparse, 2–4M token context
  3. The battlefield is the 5–12Kdeskto…anda5–12K deskto… and a 5–12Kdesktoanda6,999 / 432 GB Spark takes the slot between M6 Max and M5 Ultra
  4. Lose the single-unit inference crown, lose the desk. The desk decides the defaults the datacenter inherits
  5. 100+ tok/s becomes the baseline people simply expect.. including locally, on the heaviest model the box claims to run
  6. White-collar work runs on orchestrated agent swarms… and a 25Kboxbeatsa25K box beats a 25Kboxbeatsa2K/mo API bill by month 14

These predictions point to the hardware that needs to be designed today: a MegaSpark with 288 GB of GDDR7 at ~5 TB/s over 576 GB of LPDDR6. 864 GB, one address space, one outlet, ≤25 dBA (whisper-quiet right on your desk). It replaces a dual-RTX-PRO-6000 tower at parity money… and its slow memory tier outruns an M5 Ultra.

Every projection in the doc traces to a measured anchor: what we’re all actually running on our GB10s, the emergey diffusion acceleration narrative, one identity: tok/s = GB/s ÷ GB/step × yield.

The CFO doesn’t need to believe in the future to sign it. Only to divide.

Full details in the article here. (share it if you dream of having one of these on your desk in 2028!)

0rand June 11, 2026, 6:18am 2

The problem with spark is that is misunderstood by the buying public and lack of clear message from nvidia does not help. Short points:

so for those who said sparks are bad - find a better option for 8k or less.

Sparks never meant to be used solo, otherwise a very expensive 200gb connect x 7 dual nic would never be a standard option

entrpi June 11, 2026, 6:50am 3

Yeah, nothing in my argument is for DGX Spark to be used solo. Speculative Speculative Decoding is actually a perfect way for someone to pair a current generation GB10 (to use as the SSD box) with a Vera Rubin next-gen DGX Spark (for main model work).

The core differentiation between DGX Spark and RTX Spark is whether there’s the Connect-X link, allowing scaling up. There’s a whole section in there about how the size of the new Spark I’m arguing for would be perfect for scaling up at tp=2 or tp=4 to cover all model classes.

They will in the M6 Max if NVIDIA doesn’t scale appropriately for a late-2027 release. M6 Max will be a 256GB+ with 800GB/s+ system and M5 Ultra will be 512GB+ with 1TB/s+. DGX Spark 2 needs to sit between those specs or beat them for where people’s expectations of dedicated local AI desktops are going.

mashie June 11, 2026, 7:16am 4

Considering where things are going with RAM availability I wouldn’t bet on Apple offering anything over128GB in 2027.

0rand June 11, 2026, 7:20am 5

There is no point comparing anything without considering costs. If cost does not matter buy 8 H200 with NVLink fabric and don’t bother :)

I see no reason for prices to go lower. MacBook M5 Max with 128GB and 4TB is close to 7K cost.

That’s clear. But my question is: why not a cheaper version meant to run solo?
Sure, 8k for 2x Spark is a good deal, and it could scale further.

But since you pay 2x1k for connect x 7 200g network cards, why not create a single Spark meant for local inference with 2x blackwell and 256GB RAM without all the pricey network and offer it at about 6k?

Right now I’m running a Thor instead of a Spark, and at around 3k is actually a pretty good machine for local inference. Still blackwell, still 128GB, but not pricey network. It lacks some features that would have made it killer for this use at a price/output POV, but still it doesn’t really make me feel the need to upgrade to Spark for now.

0rand June 11, 2026, 7:31am 7

Separate model, separate production, separate support. RTX Spark will do just that.

entrpi June 11, 2026, 7:32am 8

There is. That’s what the upcoming RTX Spark lineup is.

yep, but that’ll be limited to 128GB.

TBH, I’m looking forward to RTX spark. Could be a pretty nice local development machine for lots of stuff.
I’m in robotics and the Thor is lacking RT Cores to run Isaac Sim, but the DGX Spark has them and I suppose RTX Spark will have them too. With that platform I could run literally the whole NVIDIA stack on a laptop format. Pretty tempting.

Still, the small desktop format has its perks, and a compact DGX/RTX lineup with 256GB+ of ram (maybe a faster one too) could be very interesting in general.

0rand June 11, 2026, 7:43am 10

The thing is, guys, NVIDIA is a corporate/industrial supplier, they are clearly stepping away for lower-margin consumer products, less new card, lower capabilities (no nvlink since 3090). Datacenters pay way more. The only reason they released DGX Spark and will release RTX Spark is to bootstrap and build a community of CUDA developers to support their industrial stack. If you need fast 256gb box they want you to buy a big server instead :)

Cost-wise 2x or 4x spark are still incredible value.

Maybe so, but local model serving is not the only business right now: AI in robotics (sorry, Phisical AI) is blooming and needs low power, reliability and capability. For robotics the target deployment is local machines, not data centers.
DGX/RTX Spark let you develop for the whole Nvidia stack, that’s the whole point.
For many it may be a tinkering platform for server applications, for me it’d be a development platform for robotics applications that span from simulation to VLA serving simultaneously. And once done developing, can serve on the same architecture.
Right now I’m working on a robotics application that will need Isaac Sim/Lab, Isaac ROS with foundation models, vllm/llama.cpp with a decently capable model, all running at the same time. I need 2/3 machines to develop on this locally, and I can offload to cloud only for model training as the running application needs low latency for ROS streaming.

The problem with 2/3 machines is portability and ease of development, not capability nor price.
If I had a laptop that has the capabilities of all three machines combined in the same architecture it would be amazing!

0rand

June 11, 2026, 7:59am 12

I think there is a certain reason for it. Look at their industrial offering - there is no single massive RAM server, even for datacenters. Their whole philosophy is to do a lego-style cluster building. NVLINK/Fabric at 1.3 Tb/s does just that. Plus this way you can sell more gear and make more money. And its very flexible too.

image

The only option is brand new B200 SMX6 which cost more than average house.

entrpi June 11, 2026, 8:05am 13

They’re betting on local inference growing as a market and not wanting Apple’s competing platform with a competing ML stack to encroach on their mindshare.

Yes but these are meant to be stacked up, right? And they need to be correctly proportioned not to have a clear bottleneck either on compute nor bandwidth. It makes very much sense in that context. These are for deployment, not for development.
Sparks are not deployment machines: I prefer to have 128/256 GB RAM with limited speed but at a price that won’t require me to sell my house than not having anything at all. The amount of possibilities this unlocks is huge.
And once the application is done it scales.

Yep, as a machine for local inference for just a personal assistant is limited, but in general the range of possibilities that Spark and Jetson platforms open up is incredible. And, once all is said and done, it ALSO works pretty decently as a personal AI assistant right now.

This too: if another architecture gets all the attentions it also gets all the developers and their ecosystem grows exponentially.

0rand June 11, 2026, 8:08am 15

That is your opinion and there are no facts that I can see to support it. They are financing OPENAI to buy more datacenter GPU from Nvidia. That’s the fact. They are in datacenter business up their tits :)

PS Why I believe 256GB LDPRAM with 2x GPU will never happen - look at the Spark form factor. People already are complaining that sparks overheat under heavy use. Please are building cooling cages. Now imaging you stick 2x GPU, 2x ram. It will melt immediately. The PCI bus won’t be able to handle it, so the current arch can’t support it. Apple has totally different bus structure. So two boxes in the current form factor are better that 1 that overheats. Or the superspark will become a rack server or Midi-Tower at best.

0rand June 11, 2026, 8:09am 16

Exactly right, so 2x sparks are yours (and mine) answer for it. Need more - add two more. 512GB and 4 GPUs for 18K usd is unbeatable. Why are you mad people? Show me a better deal that exist in reality no in fantasy

Oh, not mad, just daydreaming of a even better option! :D

0rand

June 11, 2026, 8:13am 19

And the most funny part the SuperSpark already exists and exactly in the form factor I just mentioned.

image

The Ultimate Deskside AI Supercomputer

NVIDIA DGX Station™ is the ultimate deskside AI supercomputer for building and running AI. Powered by the NVIDIA GB300 Grace™ Blackwell Ultra Desktop Superchip, it features a massive 748 GB of coherent memory and up to 20 petaFLOPS of AI compute performance, enabling seamless development and execution of massive AI workloads, supporting models up to 1 trillion parameters. Preconfigured with an NVIDIA AI software stack, developers, researchers, and data scientists can rapidly develop, fine-tune, and inference with large AI models and long-running AI agents locally and seamlessly deploy to the data center or cloud.

entrpi June 11, 2026, 8:15am 20

DGX Station is a $100k+ deskside GB300 system (datacenter class GPU in a tower). Totally different ballpark.

0rand June 11, 2026, 8:16am 21

exactly. that’s the whole idea.