On Thursday, OpenAI introduced a new coding model, GPT‑5.3‑Codex‑Spark, which runs on Cerebras wafer‑scale chips—not Nvidia. The tool, a smaller, more speed‑optimized variant of GPT‑5.3‑Codex that focuses on text‑only coding tasks, is designed to support real‑time software development thanks to its very low latency.
Codex‑Spark runs on Cerebras’ Wafer‑Scale Engine 3, a dinner‑plate‑sized processor that integrates millions of AI‑oriented cores and large on‑chip memory on a single silicon wafer. OpenAI and Cerebras have said that this hardware change enables the model to generate more than 1,000 tokens per second, which is about 15 times faster than the base GPT‑5.3‑Codex.
According to OpenAI, third‑party tests and guides report significant reductions in time‑to‑first‑token and per‑token overhead. They also described these interactions as feeling nearly instant for common code edits and completions.
OpenAI presents Codex‑Spark as a lighter option that works alongside the more advanced Codex models. Early user reports say it tends to produce precise edits and quick iteration for tasks like UI tweaks and syntax fixes, but big changes in design or structure still work better on larger, slower models.
This launch also marks the first time OpenAI has put a GPT‑class model into production on non-NVIDIA silicon, using a Cerebras‑backed “latency‑first” serving path that sits next to its existing GPU infrastructure. Last month, the company signed a multi‑year deal with Cerebras for up to 750MW of inference capacity and continues to add AMD GPUs and other accelerators as it diversifies the hardware behind its AI options.