artificial intelligence - Faz Business

The only AI glossary you’ll need this year

Posted on July 4, 2026 by faz_business

Artificial intelligence is rewriting the world, and simultaneously inventing a whole new language to describe how it’s doing it. Sit in on any product meeting, pitch, or panel these days, and you’ll hear people toss around LLMs, RAG, RLHF, and a dozen other terms that can make even very smart people in the tech world feel a little insecure. This glossary is our attempt to fix that: pain-English definitions of the AI terms you’re most likely to actually run into, whether you’re building with this stuff, investing in it, or just trying to keep up by reading TechCrunch or listening to related podcasts. We update it regularly as the field evolves, so consider it a living document, much like the AI systems it describes.

Artificial general intelligence, or AGI, is a nebulous term. But it generally refers to AI that’s more capable than the average human at many, if not most, tasks. OpenAI CEO Sam Altman once described AGI as the “equivalent of a median human that you could hire as a co-worker.” Meanwhile, OpenAI’s charter defines AGI as “highly autonomous systems that outperform humans at most economically valuable work.” Google DeepMind’s understanding differs slightly from these two definitions; the lab views AGI as “AI that’s at least as capable as humans at most cognitive tasks.” Confused? Not to worry — so are experts at the forefront of AI research.

An AI agent refers to a tool that uses AI technologies to perform a series of tasks on your behalf — beyond what a more basic AI chatbot could do — such as filing expenses, booking tickets or a table at a restaurant, or even writing and maintaining code. However, as we’ve explained before, there are lots of moving pieces in this emergent space, so “AI agent” might mean different things to different people. Infrastructure is also still being built out to deliver on its envisaged capabilities. But the basic concept implies an autonomous system that may draw on multiple AI systems to carry out multistep tasks.

Think of API endpoints as “buttons” on the back of a piece of software that other programs can press to make it do things. Developers use these interfaces to build integrations — for example, allowing one application to pull data from another, or enabling an AI agent to control third-party services directly without a human manually operating each interface. Most smart home devices and connected platforms have these hidden buttons available, even if ordinary users never see or interact with them. As AI agents grow more capable, they are increasingly able to find and use these endpoints on their own, opening up powerful — and sometimes unexpected — possibilities for automation.

Given a simple question, a human brain can answer without even thinking too much about it — things like “which animal is taller, a giraffe or a cat?” But in many cases, you often need a pen and paper to come up with the right answer because there are intermediary steps. For instance, if a farmer has chickens and cows, and together they have 40 heads and 120 legs, you might need to write down a simple equation to come up with the answer (20 chickens and 20 cows).

In an AI context, chain-of-thought reasoning for large language models means breaking down a problem into smaller, intermediate steps to improve the quality of the end result. It usually takes longer to get an answer, but the answer is more likely to be correct, especially in a logic or coding context. Reasoning models are developed from traditional large language models and optimized for chain-of-thought thinking thanks to reinforcement learning.

(See: Large language model)

This is a more specific concept that an “AI agent,” which means a program that can take actions on its own, step by step, to complete a goal. A coding agent is a specialized version applied to software development. Rather than simply suggesting code for a human to review and paste in, a coding agent can write, test, and debug code autonomously, handling the kind of iterative, trial-and-error work that typically consumes a developer’s day. These agents can operate across entire codebases, spotting bugs, running tests, and pushing fixes with minimal human oversight. Think of it like hiring a very fast intern who never sleeps and never loses focus — though, as with any intern, a human still needs to review the work.

Although somewhat of a multivalent term, compute generally refers to the vital computational power that allows AI models to operate. This type of processing fuels the AI industry, giving it the ability to train and deploy its powerful models. The term is often a shorthand for the kinds of hardware that provides the computational power — things like GPUs, CPUs, TPUs, and other forms of infrastructure that form the bedrock of the modern AI industry.

A subset of self-improving machine learning in which AI algorithms are designed with a multi-layered, artificial neural network (ANN) structure. This allows them to make more complex correlations compared to simpler machine learning-based systems, such as linear models or decision trees. The structure of deep learning algorithms draws inspiration from the interconnected pathways of neurons in the human brain.

Deep learning AI models are able to identify important characteristics in data themselves, rather than requiring human engineers to define these features. The structure also supports algorithms that can learn from errors and, through a process of repetition and adjustment, improve their own outputs. However, deep learning systems require a lot of data points to yield good results (millions or more). They also typically take longer to train compared to simpler machine learning algorithms — so development costs tend to be higher.

(See: Neural network)

Diffusion is the tech at the heart of many art-, music-, and text-generating AI models. Inspired by physics, diffusion systems slowly “destroy” the structure of data — for example, photos, songs, and so on — by adding noise until there’s nothing left. In physics, diffusion is spontaneous and irreversible — sugar diffused in coffee can’t be restored to cube form. But diffusion systems in AI aim to learn a sort of “reverse diffusion” process to restore the destroyed data, gaining the ability to recover the data from noise.

Distillation is a technique used to extract knowledge from a large AI model with a ‘teacher-student’ model. Developers send requests to a teacher model and record the outputs. Answers are sometimes compared with a dataset to see how accurate they are. These outputs are then used to train the student model, which is trained to approximate the teacher’s behavior.

Distillation can be used to create a smaller, more efficient model based on a larger model with a minimal distillation loss. This is likely how OpenAI developed GPT-4 Turbo, a faster version of GPT-4.

While all AI companies use distillation internally, it may have also been used by some AI companies to catch up with frontier models. Distillation from a competitor usually violates the terms of service of AI API and chat assistants.

This refers to the further training of an AI model to optimize performance for a more specific task or area than was previously a focal point of its training — typically by feeding in new, specialized (i.e., task-oriented) data.

Many AI startups are taking large language models as a starting point to build a commercial product but are vying to amp up utility for a target sector or task by supplementing earlier training cycles with fine-tuning based on their own domain-specific knowledge and expertise.

(See: Large language model [LLM])

A GAN, or Generative Adversarial Network, is a type of machine learning framework that underpins some important developments in generative AI when it comes to producing realistic data — including (but not only) deepfake tools. GANs involve the use of a pair of neural networks, one of which draws on its training data to generate an output that is passed to the other model to evaluate.

The two models are essentially programmed to try to outdo each other. The generator is trying to get its output past the discriminator, while the discriminator is working to spot artificially generated data. This structured contest can optimize AI outputs to be more realistic without the need for additional human intervention. Though GANs work best for narrower applications (such as producing realistic photos or videos), rather than general purpose AI.

Hallucination is the AI industry’s preferred term for AI models making stuff up — literally generating information that is incorrect. Obviously, it’s a huge problem for AI quality.

Hallucinations produce GenAI outputs that can be misleading and could even lead to real-life risks — with potentially dangerous consequences (think of a health query that returns harmful medical advice).

The problem of AIs fabricating information is thought to arise as a consequence of gaps in training data. Hallucinations are contributing to a push toward increasingly specialized and/or vertical AI models — i.e. domain-specific AIs that require narrower expertise — as a way to reduce the likelihood of knowledge gaps and shrink disinformation risks.

Inference is the process of running an AI model. It’s setting a model loose to make predictions or draw conclusions from previously seen data. To be clear, inference can’t happen without training; a model must learn patterns in a set of data before it can effectively extrapolate from this training data.

Many types of hardware can perform inference, ranging from smartphone processors to beefy GPUs to custom-designed AI accelerators. But not all of them can run models equally well. Very large models would take ages to make predictions on, say, a laptop versus a cloud server with high-end AI chips.

[See: Training]

Large language models, or LLMs, are the AI models used by popular AI assistants, such as ChatGPT, Claude, Google’s Gemini, Meta’s AI Llama, Microsoft Copilot, or Mistral’s Le Chat. When you chat with an AI assistant, you interact with a large language model that processes your request directly or with the help of different available tools, such as web browsing or code interpreters.

LLMs are deep neural networks made of billions of numerical parameters (or weights, see below) that learn the relationships between words and phrases and create a representation of language, a sort of multidimensional map of words.

These models are created from encoding the patterns they find in billions of books, articles, and transcripts. When you prompt an LLM, the model generates the most likely pattern that fits the prompt.

(See: Neural network)

Memory cache refers to an important process that boosts inference (which is the process by which AI works to generate a response to a user’s query). In essence, caching is an optimization technique, designed to make inference more efficient. AI is obviously driven by high-octane mathematical calculations and every time those calculations are made, they use up more power. Caching is designed to cut down on the number of calculations a model might have to run by saving particular calculations for future user queries and operations. There are different kinds of memory caching, although one of the more well-known is KV (or key value) caching. KV caching works in transformer-based models, and increases efficiency, driving faster results by reducing the amount of time (and algorithmic labor) it takes to generate answers to user questions.

(See: Inference)

Model Context Protocol, or MCP, is an open standard that lets AI models connect to outside tools and data — your files, databases, or apps like Slack and Google Drive — without a developer building a custom connector for every single pairing. Think of it as a USB-C port for AI. Anthropic introduced MCP in 2024 and later handed it over to the Linux Foundation, and it’s since been adopted by OpenAI, Google, and Microsoft, making it one of the fastest-spreading standards in recent AI history.

Mixture of Experts is a model architecture that splits a neural network into many smaller specialized sub-networks, or “experts,” and only activates a handful of them for any given task. Rather than routing every request through the entire model — like calling in your whole office for every question — an MoE model has a built-in “router” that picks just the right specialists for the job. This makes it possible to build enormous models that stay relatively fast and cheap to run, since only a fraction of the network is doing work at any one time. Mistral AI’s Mixtral model is a well-known example; OpenAI’s newer GPT models are also widely believed to use some version of this approach, though the company has never officially confirmed it.

(See: Neural network, Deep learning)

A neural network refers to the multi-layered algorithmic structure that underpins deep learning — and, more broadly, the whole boom in generative AI tools following the emergence of large language models.

Although the idea of taking inspiration from the densely interconnected pathways of the human brain as a design structure for data processing algorithms dates all the way back to the 1940s, it was the much more recent rise of graphical processing hardware (GPUs) — via the video game industry — that really unlocked the power of this theory. These chips proved well suited to training algorithms with many more layers than was possible in earlier epochs — enabling neural network-based AI systems to achieve far better performance across many domains, including voice recognition, autonomous navigation, and drug discovery.

(See: Large language model [LLM])

Open source refers to software — or, increasingly, AI models — where the underlying code is made publicly available for anyone to use, inspect, or modify. In the AI world, Meta’s Llama family of models is a prominent example; Linux is the famous historical parallel in operating systems. Open source approaches allow researchers, developers, and companies around the world to build on top of one another’s work, accelerating progress and enabling independent safety audits that closed systems cannot easily provide. Closed source means the code is private — you can use the product but not see how it works, as is the case with OpenAI’s GPT models — a distinction that has become one of the defining debates in the AI industry.

Parallelization means doing many things at the same time instead of one after another — like having 10 employees working on different parts of a project at the same time instead of one employee doing everything sequentially. In AI, parallelization is fundamental to both training and inference: modern GPUs are specifically designed to perform thousands of calculations in parallel, which is a big reason why they became the hardware backbone of the industry. As AI systems grow more complex and models grow larger, the ability to parallelize work across many chips and many machines has become one of the most important factors in determining how quickly and cost-effectively models can be built and deployed. Research into better parallelization strategies is now a field of study in its own right.

RAMageddon is the fun new term for a not-so-fun trend that is sweeping the tech industry: an ever-increasing shortage of random access memory, or RAM chips, which power pretty much all the tech products we use in our daily lives. As the AI industry has blossomed, the biggest tech companies and AI labs — all vying to have the most powerful and efficient AI — are buying so much RAM to power their data centers that there’s not much left for the rest of us. And that supply bottleneck means that what’s left is getting more and more expensive.

That includes industries like gaming (where major companies have had to raise prices on consoles because it’s harder to find memory chips for their devices), consumer electronics (where memory shortage could cause the biggest dip in smartphone shipments in more than a decade), and general enterprise computing (because those companies can’t get enough RAM for their own data centers). The surge in prices is only expected to stop after the dreaded shortage ends but, unfortunately, there’s not really much of a sign that’s going to happen anytime soon.

Like AGI, recursive self-improvement is a threshhold for how smart AI can get, and how little it may rely on humans. In the RSI scenario, AI models start improving themselves without human intervention, leading to a huge acceleration in capabilities and autonomy. In some tellings, this would be a cataclysmic moment akin to the singularity, a moment when AI models become immune to outside intervention. But RSI also describes a basic capability — can an AI model design its own successor? — which makes it much easier for engineers to try to build it. A number of recent AI startups have set out to build recursively self-improving models, but most of them dismiss the apocalyptic implications, presenting RSI as simply the next frontier for research.

Reinforcement learning is a way of training AI where a system learns by trying things and receiving rewards for correct answers — like training your beloved pet with treats, except the “pet” in this scenario is a neural network and the “treat” is a mathematical signal indicating success. Unlike supervised learning, where a model is trained on a fixed dataset of labeled examples, reinforcement learning lets a model explore its environment, take actions, and continuously update its behavior based on the feedback it receives. This approach has proven especially powerful for training AI to play games, control robots, and, more recently, sharpen the reasoning ability of large language models. Techniques like reinforcement learning from human feedback, or RLHF, are now central to how leading AI labs fine-tune their models to be more helpful, accurate, and safe.

When it comes to human-machine communication, there are some obvious challenges — people communicate using human language, while AI programs execute tasks through complex algorithmic processes informed by data. Tokens bridge that gap: they are the basic building blocks of human-AI communication, representing discrete segments of data that have been processed or produced by an LLM. They are created through a process called tokenization, which breaks down raw text into bite-sized units a language model can digest, similar to how a compiler translates human language into binary code a computer can understand. In enterprise settings, tokens also determine cost — most AI companies charge for LLM usage on a per-token basis, meaning the more a business uses, the more it pays.

So again, tokens are the small chunks of text — often parts of words rather than whole ones — that AI language models break language into before processing it; they are roughly analogous to “words” for the purposes of understanding AI workloads. Throughput refers to how much can be processed in a given period of time, so token throughput is essentially a measure of how much AI work a system can handle at once. High token throughput is a key goal for AI infrastructure teams, since it determines how many users a model can serve simultaneously and how quickly each of them receives a response. AI researcher Andrej Karpathy has described feeling anxious when his AI subscriptions sit idle — echoing the feeling he had as a grad student when expensive computer hardware wasn’t being fully utilized — a sentiment that captures why maximizing token throughput has become something of an obsession in the field.

Developing machine learning AIs involves a process known as training. In simple terms, this refers to data being fed in in order that the model can learn from patterns and generate useful outputs. Essentially, it’s the process of the system responding to characteristics in the data that enables it to adapt outputs toward a sought-for goal — whether that’s identifying images of cats or producing a haiku on demand.

Training can be expensive because it requires lots of inputs, and the volumes required have been trending upwards — which is why hybrid approaches, such as fine-tuning a rules-based AI with targeted data, can help manage costs without starting entirely from scratch.

[See: Inference]

A technique where a previously trained AI model is used as the starting point for developing a new model for a different but typically related task — allowing knowledge gained in previous training cycles to be reapplied.

Transfer learning can drive efficiency savings by shortcutting model development. It can also be useful when data for the task that the model is being developed for is somewhat limited. But it’s important to note that the approach has limitations. Models that rely on transfer learning to gain generalized capabilities will likely require training on additional data in order to perform well in their domain of focus

(See: Fine tuning)

Validation loss is a number that tells you how well an AI model is learning during training — and lower is better. Researchers track it closely as a kind of real-time report card, using it to decide when to stop training, when to adjust hyperparameters, or whether to investigate a potential problem. One of the key concerns it helps flag is overfitting, a condition in which a model memorizes its training data rather than truly learning patterns it can generalize to new situations. Think of it as the difference between a student who genuinely understands the material and one who simply memorized last year’s exam — validation loss helps reveal which one your model is becoming.

Weights are core to AI training, as they determine how much importance (or weight) is given to different features (or input variables) in the data used for training the system — thereby shaping the AI model’s output.

Put another way, weights are numerical parameters that define what’s most salient in a dataset for the given training task. They achieve their function by applying multiplication to inputs. Model training typically begins with weights that are randomly assigned, but as the process unfolds, the weights adjust as the model seeks to arrive at an output that more closely matches the target.

For example, an AI model for predicting housing prices that’s trained on historical real estate data for a target location could include weights for features such as the number of bedrooms and bathrooms, whether a property is detached or semi-detached, whether it has parking, a garage, and so on.

Ultimately, the weights the model attaches to each of these inputs reflect how much they influence the value of a property, based on the given dataset.

This article is updated regularly with new information.

When you purchase through links in our articles, we may earn a small commission. This doesn’t affect our editorial independence.

Google DeepMind Unionization Talks Are Off to a Rocky Start

Posted on July 3, 2026 by faz_business

Negotiations between Google DeepMind and its London-based employees over the possibility of unionization stumbled this week, after initial talks left union representatives feeling they had wasted their time, WIRED has learned.

In May, DeepMind employees asked Google to recognize the Communication Workers Union and Unite the Union as joint representatives. The company later denied that request, but agreed to participate in negotiations arbitrated by a third-party body.

An initial meeting on Wednesday was attended by union officers, DeepMind employees involved in the unionization push, the third-party arbitrator, and DeepMind HR representatives. Those advocating for unionization were left frustrated by the absence of DeepMind leadership figures.

“Recognition talks not being attended by senior management at the opening stage is a leading indicator that a company isn’t engaging in good faith. It’s just a time-wasting exercise,” claims John Chadfield, a CWU officer, who attended the meeting. “Negotiations have stalled at an early stage.”

DeepMind denies that negotiations have stalled. “The first step in the process is to define who the unions want to represent and the parties agreed on next steps to do this,” says Al Verney, a Google DeepMind spokesperson. “The appropriate representatives attended this initial meeting.”

During the meeting, a DeepMind employee read out a prepared letter on behalf of colleagues that support unionization, reviewed by WIRED. “Instead of having meaningful dialogue with its employees about our concerns, Google DeepMind workers have been treated as a problem handed off to HR,” the letter states. The employee reading the statement was interrupted on two occasions by DeepMind HR representatives, according to multiple sources with knowledge of the meeting.

The letter goes on to allege that Google has attempted to quash open dialogue between DeepMind employees and crack down on dissent, by shutting down or reconfiguring internal chat venues, and preventing staff from responding to company-wide communications about the unionization bid. Employees that sought to dance around restrictions were “reprimanded” by HR, the letter alleges.

“The intention was to intimidate,” claims a DeepMind employee involved in drafting the letter, who asked to remain anonymous because they are not authorized to speak to the media. “These are well-established union-busting techniques.”

“We’ll continue to engage constructively in the…process and have open dialogue with employees,” says Verney. “For topics outside of this, we continue to offer employees a variety of other channels and opportunities to discuss their views.”

The push to unionize at DeepMind began in February 2025, when Google’s parent company Alphabet removed a pledge not to use AI for purposes like weapons development and surveillance from its ethics guidelines, WIRED previously reported.

“Those principles were a big part of why I joined DeepMind,” says a second DeepMind employee, who asked to remain anonymous for the same reason. “We basically just got rid of them all.”

What Happens If AI Causes 25% Unemployment? Anthropic Has a Concept of a Plan

Posted on June 11, 2026 by faz_business

It seems to be one of the most pressing questions in the world of AI these days. If artificial intelligence tools cause massive disruptions in the economy and unemployment soars, what should AI companies and the government do about it?

Anthropic released a new economic policy framework on Wednesday that aims to tackle these questions, and the company has pledged $350 million to help work through solutions. But it remains to be seen how the federal government under President Donald Trump will respond.

“We are not seeking job displacement. We are working to prevent or minimize it,” Anthropic explained in releasing the new paper. “Some amount of displacement, though we cannot say how much, may be an intrinsic consequence of the technology, and our responsibility is to prepare for it and respond to it.”

The company has three different proposals, one for a world with 5% unemployment, one with 10% unemployment, and one with so-called “unprecedented unemployment.” The current unemployment rate is 4.3%. The last time unemployment rose about 10% was in 2009, and before that in 1983. And the highest unemployment rate of the 20th century was during the Great Depression, when the unemployment rate hit 25% in 1933.

If unemployment only rises to 5%, Anthropic proposes the expansion of “new capital accounts seeded at birth,” and allowing young adults to benefit from them as well.

“Currently, these accounts can hold only index funds—not a stake in AI companies,” the company continued. “We also propose policies like workforce training grants, occupational licensing reform, and wage insurance, that make it easier for workers to find new roles and enter new industries.” Anthropic also proposes creating incentives for companies that retain and redeploy workers under the 5% plan.

The company explains in its paper that it’s unclear whether job disruptions will be a “temporary shock” or an “enduring restructuring, in which the demand for human labor is significantly and persistently lower.” But either way, Anthropic says something must be done.

“In the 10% scenario, our priority is expanded unemployment insurance, which we propose supplementing with sector-specific transition support and basic-needs relief,” Anthropic explains. “If AI does become a general substitute for human labor, policymakers will also need to consider the pace of its rollout, including by incentivizing firms to manage displacement gradually.”

Under the most dire “unprecedented unemployment” situation, which presumably means higher than 25%, Anthropic believes there will be a need for “income replacement,” as they call it, “for a large share of the workforce.”

“We’ll need new sources of tax revenue, and new ways of sharing this broadly, which might include basic income, sovereign wealth models, and equity-sharing mechanisms,” the company explains. “This scenario is novel economic territory, so we’re less certain about the right answers here.”

Anthropic claims in its paper that it’s not ready to advocate for specific policies in the worst-case scenario, but it says it’s investing in researching different mechanisms, like:

“Potential revenue sources could include increasing the capital gains tax, broad-based consumption taxes, sector-specific levies on AI use (measured by tokens, compute, or revenue), and scalable “digital dividends” funded by taxes on the digital sector.”
“Potential redistribution mechanisms could include universal basic income, AI sovereign wealth funds funded by investment stakes in AI-driven productivity, equity-sharing mechanisms giving workers partial ownership in AI enterprises, and dramatically expanded pre-distributive capital accounts building on existing models.”

Anthropic explains that the framework is U.S.-focused because they’re an American company, but that the principles are global.

“We hope to think through these questions with governments around the world, and to see them on the agenda at the G7 and the upcoming AI Summit in Geneva,” the company said.

From Donald Trump to Bernie Sanders, every elected politician seems concerned with how AI will impact the job landscape. But even the AI companies can’t give you a concrete idea of how many jobs will ultimately be lost. Anthropic admits as much.

You may be asking yourself, as we did, how much Claude may have played a role in coming up with these ideas. We reached out to Anthropic but haven’t heard back. Gizmodo will update this article if we learn the answer. It would be appropriate, if a bit odd, to discover that AI is coming up with the “answers” on how to deal with large-scale unemployment caused by AI.

It’s also something Sam Altman envisioned years ago when he was asked how his company would make money. As he said in 2019: “We’ve made a soft promise to investors that, ‘Once we build a generally intelligent system, that basically we will ask it to figure out a way to make an investment return for you.’”

How the UK Is Turning Sovereign AI Ambition Into Action With NVIDIA Technologies

Posted on June 9, 2026 by faz_business

A year ago at London Tech Week, NVIDIA founder and CEO Jensen Huang and U.K. Prime Minister Keir Starmer made a declaration: the U.K. would be an AI maker, not an AI taker.

At this year’s event, NVIDIA and its partners are showcasing how that commitment is producing real momentum across the nation’s infrastructure, startups and enterprises.

U.K. technology leaders are innovating across healthcare and life sciences, coding, agentic AI, inference and more — all running on sovereign AI deployments.

AI Minister Kanishka Narayan said: “A year ago, we said the UK would be an AI maker, not an AI taker. Today we’re delivering on that — with sovereign compute powering British startups to push the boundaries of what AI can do, from drug discovery to healthcare to robotics. This is what it looks like when a country backs its own talent with the infrastructure to match.

“NVIDIA’s decision to invest billions here is a reflection of the strength of what’s being built in Britain. We are determined to make sure the next generation of AI breakthroughs happens in this country, and we have everything we need to make it happen.”

Commitment to Compute

Over the past year, the number of AI cloud providers planning to deploy AI infrastructure on U.K. soil has doubled.

Nebius has announced plans to expand customers and cloud capabilities with three new deployments of advanced NVIDIA AI infrastructure, as the NVIDIA AI Cloud ecosystem partner continues to build out its commercial and AI R&D hub in London. Combined, the deployments are expected to reach 65 megawatts when fully ramped up in 2027.

CoreWeave is building in the U.K. Government’s AI Growth Zones, and seven more NVIDIA AI Cloud ecosystem partners have plans in the pipeline. BT and Nscale announced plans to build sovereign AI data centers across three existing BT sites in the U.K., combining NVIDIA AI infrastructure, Nscale’s full stack and BT’s trusted nationwide connectivity backbone.

From Fund to Frontier

Central to that sovereign compute story is Isambard-AI — the U.K.’s most powerful computer. Built on 5,400 NVIDIA GH200 Grace Hopper Superchips and running entirely on zero-carbon electricity, it’s the engine behind some of the U.K.’s most ambitious AI research.

The U.K. government’s Sovereign AI Fund is putting that capability to work by backing homegrown companies and providing the domestic infrastructure needed to scale their ambitions.

Among its first recipients is Ineffable Intelligence, which recently announced a collaboration with NVIDIA to build the future of reinforcement learning infrastructure.

Other recipients include four U.K.-based NVIDIA Inception startups, each pushing the AI frontier using Isambard-AI. These startups are:

Cosine Builds Sovereign Coding Platform

Cosine is building an end-to-end sovereign AI coding platform for highly regulated industries such as financial services, critical infrastructure and national security. Using Isambard, Cosine is training a new, large-parameter, mixture-of-experts, multimodal agentic LLM for natively handling data types beyond text and image.

“Access to Isambard enables the project, full stop,” said Alistair Pullen, cofounder and CEO of Cosine. “We already have the people who know how to do this. We have the data. We have the infrastructure and the training. The thing we’ve never had is this level of compute.”

Cursive Trains Self-Improving AI Systems

Cursive is building self-improving AI systems that learn continuously from real-world data, enabling them to operate autonomously over long periods of time. This is unlocked through new memory-augmented architectures with dramatically larger context windows, currently in development using the Sovereign AI Fund resources. In addition, the team recently adopted the NVIDIA Megatron-LM framework for distributed training at scale.

“The Sovereign AI Fund is more than just processing power — it’s a statement about investing in AI in the U.K.,” said Talfan Evans, cofounder and CEO of Cursive. “Sovereignty is actually now a buying criterion — and it’s a challenge to tap into the resources we uniquely have as U.K. and European companies.”

Doubleword Optimizes Inference to Deliver Abundant Intelligence Tokens

Doubleword, the U.K.’s first dedicated inference lab, optimizes every layer of the AI stack to maximize what it calls “IQ per dollar.” The company deploys open models including NVIDIA Nemotron 3 Super 120B and builds on the NVIDIA Dynamo inference framework.

On Isambard, Doubleword’s early results achieved 70x faster model cold starts — aka model loading times — and 4x lossless KV cache compression, critical advancements for long-running agentic workloads. The result: inference at 90-95% lower costs than other leading inference providers.

“Sovereign AI is most impactful at the inference layer,” said Meryem Arik, cofounder and CEO of Doubleword. “Inference is when you’re actually getting the value from the model — we want that value created in the U.K., with U.K. compute and U.K. data centers.”

Prima Mente Uses Foundation Models to Study Alzheimer’s and More

Prima Mente builds biological foundation models to identify new biomarkers, subtypes and drug targets of Alzheimer’s, Parkinson’s and ALS. With its Isambard allocation, the company is developing Pleiades 2, a foundation model combining five biological data modalities.

Achieving nearly 3x speedups in model training with NVIDIA Blackwell GPUs, Prima Mente also uses NVIDIA Parabricks for genomic data processing and NVIDIA Transformer Engine for model optimization.

“Research shows Alzheimer’s might be 25 different subgroups of disease, and we want to help by using AI to identify these subtypes and the biology within the cells as they change,” said Hannah Madan, cofounder of Prima Mente.

Video courtesy of Nebius and Prima Mente.

AI Talent, Policy and Production

NVIDIA’s £2 billion investment in the U.K. startup ecosystem — in collaboration with leading venture capital firms — is bringing new capital and advanced AI infrastructure to major U.K. hubs including London, Oxford, Cambridge and Manchester.

U.K. membership in the NVIDIA Inception program has increased by 50% over the past year. AI-native companies like Doubleword, Synthesia and PolyAI are scaling globally from U.K. roots.

At last year’s London Tech Week, NVIDIA announced a collaboration with the U.K Department for Science, Innovation and Technology on 6G and AI skills. The 6G collaboration has seeded testbeds at four U.K. universities. In May, the NVIDIA Deep Learning Institute (DLI) delivered two new courses — added to support the nation’s wireless research community — to participants from over 30 U.K. universities.

Plus, as part of this AI skills collaboration, NVIDIA DLI courses are offered as part of QA’s AI Apprenticeships in England.

And the NVIDIA Developer Program now includes more than 200,000 U.K. developers.

The Sovereign AI Forum, which launched last year with seven charter members, convened the country’s AI leadership to turn policy into deployment roadmaps. Over the past year, the Forum has welcomed dozens of participants across government, industry and the startup community — turning policy into deployment roadmaps.

And enterprise AI is moving from pilot to production:

Apian is building digital twins of two National Health Service hospitals, combining autonomous devices, ground robots, computer vision and robotic simulation.
Deliverance AI is helping regulated enterprises to run, govern and scale AI agents inside their own environment — through a single control plane. The Agentic Operating System is built for organizations where data sovereignty is non-negotiable.
Glass Futures has installed an AI-driven digital twin of its glass furnace capable of testing and predicting new, optimal ways to make glass. The digital twin taps into NVIDIA accelerated computing and the NVIDIA PhysicsNeMo framework.
Orbital Industries has announced codesigned, NVIDIA Vera Rubin DSX AI Factory-compliant AI infrastructure that accelerates time to first token.
Reading Football Club is partnering with Stelia to establish an AI Centre of Excellence, combining Stelia’s full-stack AI platform with accelerated compute infrastructure from NVIDIA and Lenovo.

It all reflects momentous progress in U.K. AI leadership — and offers a glimpse of where it’s heading.

Join NVIDIA at London Tech Week.

NVIDIA GTC Taipei at COMPUTEX: Live Updates on What’s Next in AI

Posted on May 25, 2026 by faz_business

The future of AI is landing in Taipei. At NVIDIA GTC Taipei at COMPUTEX, the world’s developers, researchers and industry leaders are converging to dive into the latest breakthroughs shaping every industry, covering topics spanning AI factories and scaling infrastructure to agentic and physical AI and more.

Hear from NVIDIA founder and CEO Jensen Huang live on stage at Taipei Music Center on Monday, June 1, 11 a.m. Taipei time. Tune in early to catch the GTC Live at Taipei 2026 pregame show, featuring lively conversations with industry leaders about the latest innovations in AI and accelerated computing.

This is the place to find all the latest — stay tuned to the blog for live updates.

Sunday, May 24, 7 a.m. PT

It’s not a trip to Taipei without a stop at the Raohe St. night market. Matcha and mango shave ice hit the spot on a warm evening.

Saturday, May 23, 5:35 a.m. PT

The Countdown to NVIDIA GTC Taipei Begins

NVIDIA founder and CEO Jensen Huang meets with industry leaders, dignitaries, developers and NVIDIA employees ahead of GTC Taipei.

Hours after landing, Huang made a surprise visit to Meet-a-Claw, where NVIDIA and Taiwan’s developer community gathered for an afternoon of demos, tech talks and networking — an opportunity to get hands on with autonomous agents and OpenClaw.

Because OpenClaw is open source, it’s available for everyone to use and build their own AI agent. Huang described some of the ways OpenClaw agents secured by NVIDIA OpenShell can be of service for everything from software programming to marketing and content creation.

“It’s become a really, very powerful assistant,” Huang said. “The era of useful AI has arrived. That’s what this event is about, to show you what open source agents can do and then you can go create your own.”

Huang fielded a few questions from the assembled press, including the status of NVIDIA’s pending Taipei office. Huang smiled before offering his response.

“I think I’m going to give you an update on the headquarters this week,” he said. “It could be a secret … I might show you what the building is going to look like.”

Well, if it was a secret, it isn’t now. Come back for the latest on the NVIDIA Taipei office design and all the action in the run-up to NVIDIA GTC Taipei at COMPUTEX.

The air buzzed with excitement as NVIDIA founder and CEO Jensen Huang touched down in Taipei Saturday afternoon, greeted by a flurry of journalists and cameras. This set the tone for the weeks ahead — kickstarting the countdown to NVIDIA GTC Taipei at COMPUTEX.

Speaking with media on site, Huang said, “Vera Rubin is the largest product launch, probably in the history of Taiwan. Each one of the Vera Rubin systems consists of almost 2 million parts, and it includes 150 different ecosystem partners here in Taiwan to build it.”

Thursday, May 21, 9 a.m. PT 🔗

NVIDIA Wins COMPUTEX 2026 Best Choice Awards for Innovations Spanning AI Factories, Robotics and Autonomous Vehicles

NVIDIA Vera Rubin NVL72, NVIDIA Jetson Thor and NVIDIA Alpamayo were honored across four categories at Asia’s premier technology and computer trade exhibition.

At this year’s COMPUTEX Best Choice Awards (BCA), NVIDIA today received honors recognizing its innovation in AI computing, integrated circuits and autonomous vehicle (AV) development.

The NVIDIA Vera Rubin NVL72 rack-scale AI supercomputer won a Golden Award and the Sustainable Tech Special Award; the NVIDIA Jetson Thor platform for edge AI and robotics won a Golden Award; and the NVIDIA Alpamayo open platform for AV development won the Vehicle Technology and Smart Cockpit Category Award.

Entries were evaluated on their functionality, innovation and market potential, showcased at the premier computer and technology trade exhibition.

Jensen Huang, founder and CEO of NVIDIA, will deliver a keynote at COMPUTEX on Monday, June 1, at 11 a.m. Taipei time.

NVIDIA Vera Rubin NVL72 Takes Home COMPUTEX Awards

Securing a Golden Award and the Sustainable Tech Special Award, Vera Rubin NVL72 connects 36 NVIDIA Vera CPUs and 72 NVIDIA Rubin GPUs — unified by the sixth-generation NVIDIA NVLink Switch for scale-up — with ConnectX-9 SuperNICs and Spectrum-X Ethernet Photonics co-packaged optics switches for scale-out and scale-across, as well as BlueField-4 DPUs to accelerate data processing across storage and security.

Vera Rubin NVL72 delivers up to 10x higher inference performance per watt and 10x lower cost per token. When paired with NVIDIA Groq 3 LPX, Vera Rubin NVL72 delivers up to 35x higher throughput per watt for trillion-parameter models.

Designed for agentic AI, reasoning and long-context workloads, it enables AI factories to scale intelligence inside the rack and across the data center with secure, continuously available deployment.

The Vera Rubin NVL72 sets the bar for scalability, resiliency and sustainable AI infrastructure. Its cable-free, hose-free, fanless modular tray design reduces assembly time from two hours to five minutes per compute tray.

The system’s power shelves deliver 6x more onboard energy storage for intelligent power smoothing, protecting both the rack and the broader power grid from steep load swings. In addition, its 100% liquid-cooled architecture operates at 45 degrees Celsius, meaning it drops seamlessly into existing liquid-cooled data centers and enables ambient-air, dry-cooler designs that redirect power from cooling overhead into token generation.

More BCA Wins for NVIDIA Technologies

NVIDIA Jetson Thor won a Golden Award as the most powerful edge AI compute platform built for physical AI and autonomous robots. Powered by the NVIDIA Blackwell GPU architecture, it delivers up to 2,070 FP4 teraflops of AI performance — 7.5x the compute and 3.5x the energy efficiency of the previous NVIDIA Jetson Orin generation — in a compact module configurable between 40 and 130 watts.

Already in production across hundreds of applications, Jetson Thor is built to bring generative AI to smart robots, industrial systems, medical devices and autonomous machines while maximizing run-time performance and memory optimization.

Plus, NVIDIA Alpamayo won the Vehicle Technology and Smart Cockpit Category Award for pioneering open, reasoning-based autonomous vehicle development. Alpamayo is designed to help developers tackle rare, complex long-tail driving scenarios — such as interpreting an ambiguous hand signal from a pedestrian, determining the right-of-way when traffic lights and road markings contradict each other, and safely passing an emergency vehicle parked partially in the lane ahead — which fall outside typical training experience

The Alpamayo open platform includes Alpamayo 1.5 and Alpamayo 1, 10-billion-parameter chain-of-thought reasoning vision language action models for AV research; AlpaSim, an open source, end-to-end simulation framework for high-fidelity AV development; and NVIDIA Physical AI Open Datasets, which include more than 1,700 hours of driving data across geographies and conditions.

Learn more about NVIDIA’s latest innovations at NVIDIA GTC Taipei, running June 1-4 at COMPUTEX.

NVIDIA and Google Cloud Empower the Next Wave of AI Builders

Posted on May 21, 2026 by faz_business

At this year’s Google I/O conference, NVIDIA and Google Cloud are accelerating the work of more than 100,000 developers in the companies’ joint developer community, which provides curated learning paths, hands-on labs and events that help them build using the full-stack NVIDIA AI platform on Google Cloud.

Launched at Google I/O last year, the community brings together developers, data scientists and machine learning engineers who want to sharpen their AI skills on the latest NVIDIA and Google Cloud technologies.

New additions for the community are rolling out this year, including a learning path for using the JAX library on NVIDIA GPUs, a new NVIDIA Dynamo codelab focused on inference optimizations, as well as monthly developer livestreams.

Over the last year, the community has become a go‑to hub for AI builders using NVIDIA‑accelerated tools for data science and machine learning. The result has been production‑ready retrieval-augmented generation applications on Google Kubernetes Engine (GKE) and instrumenting observability for agent workloads.

These AI builders are also experimenting with new large language model research and prototyping hybrid on‑premises and cloud inference for real‑world use cases like sports analytics and enterprise data pipelines.

Building With Google DeepMind’s Gemma, NVIDIA Nemotron and Open Frameworks

NVIDIA and Google Cloud are equipping developers with learning resources and hands-on labs that combine NVIDIA libraries, open models and tools with Google Cloud’s AI platform — so they can build optimized, production‑ready AI applications faster.

For example, developers can accelerate data science and analytics with the NVIDIA cuDF library in Google Colab Enterprise or Dataproc, or deploy multi-agent applications by combining Google DeepMind’s Gemma 4 models, NVIDIA Nemotron open models and Google Agent Development Kit with Google Cloud G4 VMs powered by NVIDIA RTX PRO 6000 Blackwell GPUs in Google Cloud Run or with spot instances.

NVIDIA and Google Cloud work closely across open frameworks like JAX so developers can build, scale and productize JAX workloads on NVIDIA AI infrastructure on Google Cloud — from single‑GPU experiments to multi‑rack deployments — while getting strong performance and a consistent experience.

This work extends to Google Cloud AI Hypercomputer, where the MaxText framework uses these JAX optimizations to train large models efficiently on NVIDIA GPUs.

Building on the same foundation, NVIDIA Dynamo on GKE helps developers optimize large-scale inference — including mixture-of-experts models — so they can serve AI applications more efficiently with NVIDIA accelerated infrastructure on Google Cloud.

To help developers get hands-on with these capabilities, a new learning path on running and scaling JAX on NVIDIA GPUs and a new NVIDIA Dynamo on GKE inference codelab will become available next month for members in the Google Cloud and NVIDIA developer community.

Advancing Responsible AI With Google DeepMind’s SynthID and NVIDIA Cosmos

AI agents are increasingly built from a system of AI models — combining proprietary and open source models that reason, plan and act on users’ behalf.

Amid this shift, trust and transparency are foundational, so developers and organizations can understand how these systems work and what they generate.

NVIDIA was the first industry partner to collaborate with Google DeepMind on SynthID, an AI watermarking technology that embeds robust digital watermarks directly into AI‑generated content, which helps preserve the integrity of outputs from NVIDIA Cosmos world foundation models available on build.nvidia.com.

Cosmos models provide rich 3D perception and simulation capabilities for robots, autonomous machines and other physical AI systems, while SynthID brings content transparency to the imagery and video they rely on.

Together, they help preserve the integrity of AI‑generated content so developers can build and deploy agentic applications more responsibly across cloud, edge and real‑world environments.

Building on a Full-Stack NVIDIA and Google Cloud Platform

This year, Google I/O is putting the spotlight on new agentic experiences and tools for developers — and NVIDIA and Google Cloud are focused on ensuring builders have the infrastructure, software and learning resources they need to make the most of them.

For developers in the community building on NVIDIA and Google Cloud, the skills and tools they learn can scale, effortlessly taking projects from prototype to enterprise‑grade workloads.

At Google Cloud Next, Google Cloud and NVIDIA expanded their full‑stack platform to help developers train, deploy and operationalize agents on Google Cloud. This collaboration includes work on NVIDIA Vera Rubin-powered A5X instances, Google DeepMind Gemini models and more, and is being harnessed by leading AI labs and enterprises including OpenAI, Thinking Machine Labs, Schrodinger, Salesforce, Snap and Crowdstrike. Learn more in this blog.

Join the NVIDIA and Google Cloud developer community to connect with other builders and stay up to date on new tools, developer events and programs.

NVIDIA and ServiceNow Partner on New Autonomous AI Agents for Enterprises

Posted on May 6, 2026 by faz_business

Enterprise AI has learned to generate. It has learned to reason. Now companies are asking the next question: How should AI act?

Early agent systems have shown what’s possible, moving beyond simple prompts to take on more complex tasks. The next step is bringing those capabilities into enterprise environments — where agents must operate with context, control and consistency across real workflows.

At ServiceNow Knowledge 2026, NVIDIA founder and CEO Jensen Huang joined ServiceNow chairman and CEO Bill McDermott during the opening keynote to discuss the next phase of enterprise AI.

The companies are expanding their collaboration across the full stack, delivering specialized autonomous AI agents that are safe and easy to adopt — powered by NVIDIA accelerated computing, open models, domain-specific skills and secure agent execution software, and bringing together enterprise workflow context from ServiceNow Action Fabric and governance from ServiceNow AI Control Tower.

ServiceNow is introducing Project Arc, a long-running, self-evolving autonomous desktop agent designed for knowledge workers, including developers, IT teams and administrators.

Unlike standalone AI agents, Project Arc connects natively to the ServiceNow AI Platform through ServiceNow Action Fabric to bring governance, auditability and workflow intelligence to every action the autonomous desktop agent takes. It can access the local file systems, terminals and applications installed on a machine to complete complex, multistep tasks that traditional automation can’t handle, but with the controls enterprises actually need to deploy AI at scale.

The work is designed based on three requirements every company will need for long-running, autonomous agents: open models and domain-specific skills that can be customized and security that helps agents act without exposing sensitive data or systems — all running on AI factories that deliver efficient tokenomics.

Bringing this level of autonomy to enterprises requires control from the start.

Project Arc uses NVIDIA OpenShell, an open source secure runtime for developing and deploying autonomous agents in sandboxed, policy-governed environments. ServiceNow is building on and contributing to OpenShell to advance a common foundation for secure, enterprise-grade agent execution. With OpenShell, enterprises can define what an agent can see, which tools it can use and how each action is contained.

“Project Arc represents the next step in our ongoing collaboration with NVIDIA, bringing autonomous execution to the desktop,” said Jon Sigler, executive vice president and general manager of AI Platform at ServiceNow. “By combining OpenShell’s runtime layer with ServiceNow AI Control Tower, and powered by ServiceNow Action Fabric, we’re delivering the governance and security that enterprise AI requires.”

Open Models and Agent Skills Scale Enterprise AI

To be effective, enterprise AI systems must be adaptable. NVIDIA and ServiceNow are building on an open ecosystem that allows organizations to tailor models and applications to their specific domains and data.

NVIDIA agent skills enable specialized agents, such as ServiceNow AI Specialists, to deliver targeted capabilities across enterprise workflows. For example, the NVIDIA AI-Q Blueprint for building specialized deep research agents empowers ServiceNow AI Specialists to gather context, synthesize information and support more complex decision-making across business functions.

In addition, the NVIDIA Agent Toolkit, including NVIDIA Nemotron open models, provide flexible building blocks and specialized skills for developing customized AI applications. To support real-world performance that these systems can perform reliably, the companies are also advancing NOWAI-Bench, an open benchmarking suite for enterprise AI agents, integrated with the NVIDIA NeMo Gym library. NOWAI-Bench includes EnterpriseOps-Gym, one of the industry’s most challenging enterprise agent benchmarks, where Nemotron 3 Super currently ranks No. 1 among open source models.

Unlike general benchmarks, these evaluations focus on multistep workflows — where enterprise AI systems often encounter real challenges — helping teams build agents that perform reliably in production environments.

Efficient AI Factories

As AI agents become long running and always on, scaling them across millions of workflows requires not just capability but efficiency — making token economics central to enterprise AI.

NVIDIA AI factories are built to deliver the lowest-cost, most-efficient tokenomics for production AI. The NVIDIA Blackwell platform delivers more than 50x greater token output per watt than NVIDIA Hopper, resulting in nearly 35x lower cost per million tokens. For enterprises running agents across millions of workflows, that efficiency can determine how quickly AI moves from pilots to broad production use.

ServiceNow AI Control Tower integrates with the NVIDIA Enterprise AI Factory validated design, extending governance and observability to large-scale AI workloads. With added agent observability capabilities, organizations can monitor behavior in real time and manage AI systems across their full lifecycle — from deployment to optimization.

AI is becoming a new way that work gets done. What’s changing now is that the core pieces required to deploy it at scale — capable agents, built-in guardrails and proven performance — are all coming together.

The companies that move fastest will be the ones that give agents the infrastructure to act, the context to make decisions and the governance to keep every action accountable — and NVIDIA and ServiceNow are making this a reality for the world’s enterprises.

Learn more about NVIDIA OpenShell and the NVIDIA AI-Q Blueprint.

Greg Brockman Defends $30B OpenAI Stake: ‘Blood, Sweat, and Tears’

Posted on May 4, 2026 by faz_business

Two days before the Musk v. Altman trial began, Elon Musk asked OpenAI cofounder and president Greg Brockman about reaching a settlement. When Brockman suggested both sides drop their claims, Musk responded, “By the end of this week, you and Sam [Altman] will be the most hated men in America. If you insist, so be it.”

The message—which OpenAI’s lawyers made public on Sunday, and which Judge Yvonne Gonzalez Rogers subsequently refused to let the jury hear about—underscores what may be Musk’s larger goal in this trial. He appears to be trying to not only win over the jurors to potentially remove Brockman and CEO Sam Altman from power, but also stir up dirt on the two men and damage OpenAI’s public image.

As Brockman took the stand on Monday, Musk’s attorney Steven Molo quickly started questioning him about his compensation at OpenAI. Brockman revealed that his equity stake at OpenAI is currently worth more than $20 billion, and perhaps up to $30 billion. While Brockman initially promised to donate $100,000 to OpenAI when it was being set up, he said he ultimately never followed through.

Brockman has held a number of instrumental roles at OpenAI since he cofounded the company in 2015. In the startup’s early days, it operated out of his apartment in the Mission District of San Francisco. Today, he’s deeply involved with refocusing OpenAI on a few key products, such as Codex. In the past year, Brockman has also given millions to super PACs promoting AI and President Trump, and has previously said this increased political spending is related to OpenAI’s founding mission to create artificial general intelligence that benefits all of humanity.

In court on Monday, Molo tried to make the case that Brockman and Altman had essentially looted OpenAI’s original nonprofit, which Musk funded and helped create.

In its early days, OpenAI told investors and employees that its nonprofit mission took precedence over generating profit. Brockman testified that his financial interests are still, to this day, second to OpenAI’s nonprofit mission.

When OpenAI created its for-profit arm in 2019, which received assets from the nonprofit, Brockman testified that he was given a significant stake in the new entity. Early in OpenAI’s history, Brockman had referenced wanting to be a billionaire, writing in his personal journal, “Financially what will take me to $1B?”

On Monday, Molo pressed Brockman for several minutes about the vast wealth he had accumulated beyond his initial goal.

“Why not donate that $29 billion to the OpenAI nonprofit? Why didn’t you do that?” Molo asked. Brockman responded that he and others had poured “blood, sweat, and tears” into building OpenAI in the years since Musk left the company.

OpenAI’s foundation holds a stake of over $150 billion in the company, making it one of the richest nonprofits in history, Brockman said. That’s roughly five times Brockman’s ownership interest. Altogether, OpenAI employees hold about 25 percent of shares. The foundation has 27 percent. Brockman testified that OpenAI’s nonprofit had received less than $150 million from donors, implying Musk had been incidental to the company’s success and that the real drivers were those who stuck around to build out OpenAI.

Of course, Brockman’s stake in OpenAI could be worth much more than $30 billion if the company successfully goes public in the next two years. When asked whether OpenAI was exploring a potential IPO, Brockman said he believes so.

Nemotron Labs: What OpenClaw Agents Mean for Every Organization

Posted on May 2, 2026 by faz_business

Editor’s note: This post is part of the Nemotron Labs blog series, which explores how the latest open models, datasets and training techniques help businesses build specialized AI systems and applications on NVIDIA platforms. Each post highlights practical ways to use an open stack to deliver real value in production — from transparent research copilots to scalable AI agents.

By early 2026, the open source project OpenClaw had become a phenomenon. In January, its GitHub star count crossed 100,000 as developer interest surged. Community dashboards and traffic analytics showed more than 2 million visitors in a single week. By March, OpenClaw topped 250,000 stars — overtaking React to become the most-starred software project on GitHub in just 60 days.

Created by Peter Steinberger, OpenClaw is a self-hosted, persistent AI assistant designed to run locally or on private servers. The project drew attention for its accessibility and unbounded autonomy: Users could deploy an AI model locally without depending on cloud infrastructure or external application programming interfaces (APIs).

Most AI agents today are triggered by a prompt, complete a defined task and then stop running. A long-running autonomous agent, or “claw,” works differently. These agents run persistently in the background, completing tasks on their own and surfacing only what requires a human decision. They operate on a heartbeat: At regular intervals, they check their task list, evaluate what needs action, and either act or wait for the next cycle.

OpenClaw’s rapid adoption also sparked debate. Security researchers raised concerns about how self-hosted AI tools manage sensitive data, authentication and model updates. Others questioned whether local deployments could expose users to new risks — from unpatched server instances to malicious contributions in community forks. As contributors and maintainers worked to address these issues, OpenClaw’s rise prompted a broader conversation across the AI ecosystem about the trade-offs between openness, privacy and safety.

To help enhance the security and robustness of the OpenClaw project, NVIDIA is collaborating with Steinberger and the OpenClaw developer community to address potential vulnerabilities, as detailed in a recent blog post by OpenClaw.

NVIDIA contributes code and guidance focused on improving model isolation, better managing local data access and strengthening the processes for verifying community code contributions. The goal is to support the project’s momentum by contributing its security and systems expertise in an open, transparent way that strengthens the community’s work while preserving OpenClaw’s independent governance.

To help make long-running agents safer for enterprises, NVIDIA also introduced NVIDIA NemoClaw, a reference implementation that uses a single command to install OpenClaw, the NVIDIA OpenShell secure runtime and NVIDIA Nemotron open models with hardened defaults for networking, data access and security. NemoClaw serves as a blueprint for organizations to deploy claws more securely.

Inference Demand Multiplies With Each AI Wave

AI has moved through four phases, and the time between each is shortening. Predictive AI took years to become mainstream. Generative AI moved faster. Reasoning AI arrived faster still. Autonomous AI — the wave OpenClaw represents — is setting an even faster pace.

What compounds with each wave is inference demand. Generative AI increased token usage over predictive AI. Reasoning AI increased it another 100x. Autonomous agents, which run continuously and act across long time horizons, drive inference demand up by another 1,000x over reasoning AI. Each wave multiplies the compute required.

This increase in token usage is enabling organizations to speed their productivity by orders of magnitude. For example, long-running agents can help researchers work through a problem overnight, iterate on a design across thousands of configurations, or monitor systems and surface only the anomalies that require human judgment — freeing up researchers’ work days for higher-value tasks.

Choosing the Tool: When to Deploy a ‘Claw’

While generative AI has become a staple for on-demand tasks, there are specific scenarios where the persistent “heartbeat” of a claw offers distinct advantages. Determining when to move from a standard prompt-based AI to a long-running agent often comes down to the nature of the workflow:

From “On-Demand” to “Always-On”: While standard models are excellent for immediate, human-triggered queries, claws are often better suited for tasks that require continuous background monitoring or periodic system checks without a manual start.
Managing High-Iteration Loops: For complex problems, like testing thousands of chemical combinations or simulating infrastructure stress tests, a claw can manage the sheer volume of iterations that might otherwise be bottlenecked by human intervention.
Shifting from Suggestions to Actions: In many workflows, standard AI is used to provide information or drafts. A claw is often considered when the goal is for the AI to move into the execution phase — interacting with APIs, updating databases or managing files across a long time horizon.
Resource Optimization: For massive, token-heavy reasoning tasks, deploying a local claw on dedicated hardware like an NVIDIA DGX Spark personal AI supercomputer allows for more predictable costs and data privacy compared with high-frequency cloud API calls.

How Are Organizations Using Long-Running Autonomous Agents?

The practical applications of long-running autonomous agents span every function and sector.

In financial services, agents continuously monitor trading systems and regulatory feeds, flagging material events before the morning review. In drug discovery, agents sweep new scientific literature, extracting relevant findings and updating internal databases in real time without researcher intervention — a process that previously took weeks.

In engineering and manufacturing, agents speed problem analysis by testing thousands of parameter combinations, ranking results and flagging the configurations worth examining — and all this can happen overnight.

In IT operations, agents diagnose infrastructure incidents, apply known remediations and escalate only the novel problems — compressing average time to resolution from hours to minutes. At ServiceNow, AI specialists leveraging Apriel and NVIDIA Nemotron models can resolve 90% of tickets autonomously.

How Can Companies Deploy Autonomous Agents Responsibly?

Autonomous agents are hands-on. They can send communications, write files, call APIs and update live systems. When an agent produces a wrong action, there are real consequences. Getting the accountability framework right from the start is essential, and organizations deploying autonomous agents in production must treat governance as a first-order requirement.

Organizations need to see what their agents are doing, inspect their reasoning at each step, audit their actions and intervene when needed.

Organizations deploying autonomous agents responsibly are focused on three priorities:

An open, auditable framework: NemoClaw is built on OpenClaw’s MIT licensed codebase, which means organizations own the full agent harness. They can read, fork and modify every layer of how their agents are built and deployed. That transparency enables teams to understand and control the system at the code level. Running open source models like NVIDIA Nemotron locally keeps sensitive workloads, including patient records, legal documents, financial transactions and proprietary research, within the organization’s own environment, ensuring that trace data stays under organizational control.
Securing the runtime environment: NemoClaw runs agents inside OpenShell, a sandboxed environment that defines precisely what the agent can and cannot do, enforcing clear permission boundaries from the start.
Local compute: NVIDIA DGX Spark supercomputers deliver data-center-class GPU performance in a deskside form factor built for continuous local inference that’s always on, with local model hosting and data that stays within the organization’s environment. NVIDIA DGX Station systems scale that capability for teams running multiple agents simultaneously across complex, sustained workloads.

The organizations defining what autonomous agents do in practice are accumulating something valuable: months of live operational learning, governance frameworks developed through real workloads and agents that have absorbed the institutional context that makes them genuinely useful. This foundation will only deepen over time.

Get Started With NVIDIA NemoClaw

Access a step-by-step tutorial on how to build a more secure AI agent with NemoClaw on NVIDIA DGX Spark. Explore how NemoClaw can deploy more secure, always-on AI assistants with a single command.

Experiment with NemoClaw, available on GitHub, and join the community of developers on Discord building with NemoClaw using NVIDIA Nemotron 3 Super and Telegram on DGX Spark.

Stay up to date on agentic AI, NVIDIA Nemotron and more by subscribing to NVIDIA AI news, joining the community and following NVIDIA AI on LinkedIn, Instagram, X and Facebook.

Explore self-paced video tutorials and livestreams.

NVIDIA Launches Nemotron 3 Nano Omni Model, Unifying Vision, Audio and Language for up to 9x More Efficient AI Agents

Posted on April 28, 2026 by faz_business

AI agent systems today juggle separate models for vision, speech and language — losing time and context as they pass data from one model to the other.

Unveiled today, NVIDIA Nemotron 3 Nano Omni is an open multimodal model that brings these capabilities together into one system, enabling agents to deliver faster, smarter responses with advanced reasoning across video, audio, image and text. This best-in-class model gives enterprises and developers a production path for more efficient and accurate multimodal AI agents with full deployment flexibility and control.

Nemotron 3 Nano Omni sets a new efficiency frontier for open multimodal models with leading accuracy and low cost, topping six leaderboards for complex document intelligence, and video and audio understanding.

At a Glance

What it is

An open, omni-modal reasoning model — the highest-efficiency open multimodal model of its kind with leading accuracy

What it handles

Text, images, audio, video, documents, charts and graphical interfaces (input); text (output)

Who it’s for

Enterprises and developers building fast and reliable, agentic systems that need a multimodal perception sub-agent

How it works

Functions as the “eyes and ears” in a system of agents, working alongside models like Nemotron 3 Super and Ultra or other proprietary models

Why it matters

Leading multimodal accuracy and 9x higher throughput than other open omni models with the same interactivity, resulting in lower cost and better scalability without sacrificing responsiveness.

Architecture

30B-A3B hybrid MoE with Conv3D, EVS, 256K context

Availability

April 28th, 2026 via Hugging Face, OpenRouter, build.nvidia.com and 25+ partner platforms

AI and software companies already adopting Nemotron 3 Nano Omni include Aible, Applied Scientific Intelligence (ASI), Eka Care, Foxconn, H Company, Palantir and Pyler, with Dell Technologies, Docusign, Infosys, K-Dense, Lila, Oracle and Zefr evaluating the model.

“To build useful agents, you can’t wait seconds for a model to interpret a screen,” said Gautier Cloix, CEO of H Company. “By building on Nemotron 3 Nano Omni, our agents can rapidly interpret full HD screen recordings — something that wasn’t practical before. This isn’t just a speed boost: It’s a fundamental shift in how our agents perceive and interact with digital environments in real time.”

Nemotron 3 Nano Omni Enables Faster, Leaner Multimodal Agents

Consider an AI agent for customer support processing a screen recording while analyzing uploaded call audio and checking data logs — or an agent for finance tasked with parsing PDFs, spreadsheets, charts and voice notes. Today, most agentic systems accomplish these tasks with separate models for vision, speech and language.

This approach increases latency through repeated inference passes, fragments context across modalities, and adds cost and inaccuracies over time.

By combining vision and audio encoders within its 30B-A3B, hybrid mixture-of-experts architecture, Nemotron 3 Nano Omni eliminates the need for separate perception models, driving inference efficiency at scale. It pairs this efficiency with strong multimodal perception accuracy, enabling AI systems to achieve 9x higher throughput than other open omni models with the same interactivity. The result is lower costs and better scalability without sacrificing responsiveness or quality.

In agentic systems, Nemotron 3 Nano Omni can work alongside proprietary cloud models or other NVIDIA Nemotron open models — such as Nemotron 3 Super for high-frequency execution or Nemotron 3 Ultra for complex planning — as well as proprietary models from other providers, to power sub-agents for agentic workflows such as computer use, document intelligence and audio-video reasoning.

Computer use agents — Nemotron 3 Nano Omni powers the perception loop for agents navigating graphical user interfaces, reasoning over onscreen content and understanding user interface state over time. H Company’s latest computer usage agent, powered by Nemotron 3 Nano Omni, uses a native input resolution of 1920×1080 pixels to achieve high-fidelity visual reasoning. In preliminary evaluations on the OSWorld benchmark, this integration showed a significant leap in navigating complex graphical interfaces and used Nemotron 3 Nano Omni’s ability to process very high-resolution images.
Document intelligence — Interprets documents, charts, tables, screenshots and mixed-media inputs, enabling agents to reason across visual structure and text content coherently. Critical for enterprise analysis and compliance workflows.
Audio and video understanding — For customer service, research and monitoring workflows, Nemotron 3 Nano Omni maintains audio-video context, tying what was said, shown and documented into a single reasoning stream instead of disconnected summaries.

Open and Customizable, Deployable Anywhere

Nemotron 3 Nano Omni is released with open weights, datasets and training techniques — giving organizations full transparency and control over how the model is customized and deployed.

Developers can use tools like NVIDIA NeMo for customization, evaluation and optimization for domain-specific use cases. Because the Nemotron family of models is open, organizations can deploy them in environments that meet regulatory, sovereignty or data localization requirements.

The Nemotron 3 family — including Nano, Super and Ultra models — has seen over 50 million downloads in the past year. Omni extends the family’s capabilities into multimodal and agentic domains.

The model is available on Hugging Face, OpenRouter and build.nvidia.com as an NVIDIA NIM microservice and through a broad ecosystem of NVIDIA Cloud Partners, inference platforms and cloud service providers.

Its open, lightweight architecture supports consistent deployment from local systems like NVIDIA Jetson modules, NVIDIA DGX Spark and DGX Station to data center and cloud environments.

Visit the NVIDIA technical blog for tutorials, cookbooks and deployment guides for Nemotron 3 Nano Omni use cases. Stay up to date on agentic AI, NVIDIA Nemotron and more by subscribing to NVIDIA news, joining the community and following NVIDIA AI on LinkedIn, Instagram, X and Facebook.

Explore self-paced video tutorials and livestreams.

Posted in GamingTagged Agentic AI, artificial intelligence, Nemotron, NVIDIA NeMo, open sourceLeave a Comment on NVIDIA Launches Nemotron 3 Nano Omni Model, Unifying Vision, Audio and Language for up to 9x More Efficient AI Agents

Posts navigation

Older posts