NVIDIA and ServiceNow Partner on New Autonomous AI Agents for Enterprises



Enterprise AI has learned to generate. It has learned to reason. Now companies are asking the next question: How should AI act?

Early agent systems have shown what’s possible, moving beyond simple prompts to take on more complex tasks. The next step is bringing those capabilities into enterprise environments — where agents must operate with context, control and consistency across real workflows.

At ServiceNow Knowledge 2026, NVIDIA founder and CEO Jensen Huang joined ServiceNow chairman and CEO Bill McDermott during the opening keynote to discuss the next phase of enterprise AI. 

The companies are expanding their collaboration across the full stack, delivering specialized autonomous AI agents that are safe and easy to adopt — powered by NVIDIA accelerated computing, open models, domain-specific skills and secure agent execution software, and bringing together enterprise workflow context from ServiceNow Action Fabric and governance from ServiceNow AI Control Tower.

ServiceNow is introducing Project Arc, a long-running, self-evolving autonomous desktop agent designed for knowledge workers, including developers, IT teams and administrators. 

Unlike standalone AI agents, Project Arc connects natively to the ServiceNow AI Platform through ServiceNow Action Fabric to bring governance, auditability and workflow intelligence to every action the autonomous desktop agent takes. It can access the local file systems, terminals and applications installed on a machine to complete complex, multistep tasks that traditional automation can’t handle, but with the controls enterprises actually need to deploy AI at scale.

The work is designed based on three requirements every company will need for long-running, autonomous agents: open models and domain-specific skills that can be customized and security that helps agents act without exposing sensitive data or systems — all running on AI factories that deliver efficient tokenomics.

Bringing this level of autonomy to enterprises requires control from the start.

Project Arc uses NVIDIA OpenShell, an open source secure runtime for developing and deploying autonomous agents in sandboxed, policy-governed environments. ServiceNow is building on and contributing to OpenShell to advance a common foundation for secure, enterprise-grade agent execution. With OpenShell, enterprises can define what an agent can see, which tools it can use and how each action is contained. 

“Project Arc represents the next step in our ongoing collaboration with NVIDIA, bringing autonomous execution to the desktop,” said Jon Sigler, executive vice president and general manager of AI Platform at ServiceNow. “By combining OpenShell’s runtime layer with ServiceNow AI Control Tower, and powered by ServiceNow Action Fabric, we’re delivering the governance and security that enterprise AI requires.” 

Open Models and Agent Skills Scale Enterprise AI

To be effective, enterprise AI systems must be adaptable. NVIDIA and ServiceNow are building on an open ecosystem that allows organizations to tailor models and applications to their specific domains and data.

NVIDIA agent skills enable specialized agents, such as ServiceNow AI Specialists, to deliver targeted capabilities across enterprise workflows. For example, the NVIDIA AI-Q Blueprint for building specialized deep research agents empowers ServiceNow AI Specialists to gather context, synthesize information and support more complex decision-making across business functions. 

In addition, the NVIDIA Agent Toolkit, including NVIDIA Nemotron open models, provide flexible building blocks and specialized skills for developing customized AI applications. To support real-world performance that these systems can perform reliably, the companies are also advancing NOWAI-Bench, an open benchmarking suite for enterprise AI agents, integrated with the NVIDIA NeMo Gym library. NOWAI-Bench includes EnterpriseOps-Gym, one of the industry’s most challenging enterprise agent benchmarks, where Nemotron 3 Super currently ranks No. 1 among open source models.

Unlike general benchmarks, these evaluations focus on multistep workflows — where enterprise AI systems often encounter real challenges — helping teams build agents that perform reliably in production environments.

Efficient AI Factories

As AI agents become long running and always on, scaling them across millions of workflows requires not just capability but efficiency — making token economics central to enterprise AI.

NVIDIA AI factories are built to deliver the lowest-cost, most-efficient tokenomics for production AI. The NVIDIA Blackwell platform delivers more than 50x greater token output per watt than NVIDIA Hopper, resulting in nearly 35x lower cost per million tokens. For enterprises running agents across millions of workflows, that efficiency can determine how quickly AI moves from pilots to broad production use.

ServiceNow AI Control Tower integrates with the NVIDIA Enterprise AI Factory validated design, extending governance and observability to large-scale AI workloads. With added agent observability capabilities, organizations can monitor behavior in real time and manage AI systems across their full lifecycle — from deployment to optimization.

AI is becoming a new way that work gets done. What’s changing now is that the core pieces required to deploy it at scale — capable agents, built-in guardrails and proven performance — are all coming together.

The companies that move fastest will be the ones that give agents the infrastructure to act, the context to make decisions and the governance to keep every action accountable — and NVIDIA and ServiceNow are making this a reality for the world’s enterprises.

Learn more about NVIDIA OpenShell and the NVIDIA AI-Q Blueprint

Greg Brockman Defends $30B OpenAI Stake: ‘Blood, Sweat, and Tears’


Two days before the Musk v. Altman trial began, Elon Musk asked OpenAI cofounder and president Greg Brockman about reaching a settlement. When Brockman suggested both sides drop their claims, Musk responded, “By the end of this week, you and Sam [Altman] will be the most hated men in America. If you insist, so be it.”

The message—which OpenAI’s lawyers made public on Sunday, and which Judge Yvonne Gonzalez Rogers subsequently refused to let the jury hear about—underscores what may be Musk’s larger goal in this trial. He appears to be trying to not only win over the jurors to potentially remove Brockman and CEO Sam Altman from power, but also stir up dirt on the two men and damage OpenAI’s public image.

As Brockman took the stand on Monday, Musk’s attorney Steven Molo quickly started questioning him about his compensation at OpenAI. Brockman revealed that his equity stake at OpenAI is currently worth more than $20 billion, and perhaps up to $30 billion. While Brockman initially promised to donate $100,000 to OpenAI when it was being set up, he said he ultimately never followed through.

Brockman has held a number of instrumental roles at OpenAI since he cofounded the company in 2015. In the startup’s early days, it operated out of his apartment in the Mission District of San Francisco. Today, he’s deeply involved with refocusing OpenAI on a few key products, such as Codex. In the past year, Brockman has also given millions to super PACs promoting AI and President Trump, and has previously said this increased political spending is related to OpenAI’s founding mission to create artificial general intelligence that benefits all of humanity.

In court on Monday, Molo tried to make the case that Brockman and Altman had essentially looted OpenAI’s original nonprofit, which Musk funded and helped create.

In its early days, OpenAI told investors and employees that its nonprofit mission took precedence over generating profit. Brockman testified that his financial interests are still, to this day, second to OpenAI’s nonprofit mission.

When OpenAI created its for-profit arm in 2019, which received assets from the nonprofit, Brockman testified that he was given a significant stake in the new entity. Early in OpenAI’s history, Brockman had referenced wanting to be a billionaire, writing in his personal journal, “Financially what will take me to $1B?”

On Monday, Molo pressed Brockman for several minutes about the vast wealth he had accumulated beyond his initial goal.

“Why not donate that $29 billion to the OpenAI nonprofit? Why didn’t you do that?” Molo asked. Brockman responded that he and others had poured “blood, sweat, and tears” into building OpenAI in the years since Musk left the company.

OpenAI’s foundation holds a stake of over $150 billion in the company, making it one of the richest nonprofits in history, Brockman said. That’s roughly five times Brockman’s ownership interest. Altogether, OpenAI employees hold about 25 percent of shares. The foundation has 27 percent. Brockman testified that OpenAI’s nonprofit had received less than $150 million from donors, implying Musk had been incidental to the company’s success and that the real drivers were those who stuck around to build out OpenAI.

Of course, Brockman’s stake in OpenAI could be worth much more than $30 billion if the company successfully goes public in the next two years. When asked whether OpenAI was exploring a potential IPO, Brockman said he believes so.

Nemotron Labs: What OpenClaw Agents Mean for Every Organization


Editor’s note: This post is part of the Nemotron Labs blog series, which explores how the latest open models, datasets and training techniques help businesses build specialized AI systems and applications on NVIDIA platforms. Each post highlights practical ways to use an open stack to deliver real value in production — from transparent research copilots to scalable AI agents.

By early 2026, the open source project OpenClaw had become a phenomenon. In January, its GitHub star count crossed 100,000 as developer interest surged. Community dashboards and traffic analytics showed more than 2 million visitors in a single week. By March, OpenClaw topped 250,000 stars — overtaking React to become the most-starred software project on GitHub in just 60 days.

Created by Peter Steinberger, OpenClaw is a self-hosted, persistent AI assistant designed to run locally or on private servers. The project drew attention for its accessibility and unbounded autonomy: Users could deploy an AI model locally without depending on cloud infrastructure or external application programming interfaces (APIs).

Most AI agents today are triggered by a prompt, complete a defined task and then stop running. A long-running autonomous agent, or “claw,” works differently. These agents run persistently in the background, completing tasks on their own and surfacing only what requires a human decision. They operate on a heartbeat: At regular intervals, they check their task list, evaluate what needs action, and either act or wait for the next cycle.

OpenClaw’s rapid adoption also sparked debate. Security researchers raised concerns about how self-hosted AI tools manage sensitive data, authentication and model updates. Others questioned whether local deployments could expose users to new risks — from unpatched server instances to malicious contributions in community forks. As contributors and maintainers worked to address these issues, OpenClaw’s rise prompted a broader conversation across the AI ecosystem about the trade-offs between openness, privacy and safety.

To help enhance the security and robustness of the OpenClaw project, NVIDIA is collaborating with Steinberger and the OpenClaw developer community to address potential vulnerabilities, as detailed in a recent blog post by OpenClaw.

NVIDIA contributes code and guidance focused on improving model isolation, better managing local data access and strengthening the processes for verifying community code contributions. The goal is to support the project’s momentum by contributing its security and systems expertise in an open, transparent way that strengthens the community’s work while preserving OpenClaw’s independent governance.

 To help make long-running agents safer for enterprises, NVIDIA also introduced NVIDIA NemoClaw, a reference implementation that uses a single command to install OpenClaw, the NVIDIA OpenShell secure runtime and NVIDIA Nemotron open models with hardened defaults for networking, data access and security. NemoClaw serves as a blueprint for organizations to deploy claws more securely.

Inference Demand Multiplies With Each AI Wave

AI has moved through four phases, and the time between each is shortening. Predictive AI took years to become mainstream. Generative AI moved faster. Reasoning AI arrived faster still. Autonomous AI — the wave OpenClaw represents — is setting an even faster pace.

What compounds with each wave is inference demand. Generative AI increased token usage over predictive AI. Reasoning AI increased it another 100x. Autonomous agents, which run continuously and act across long time horizons, drive inference demand up by another 1,000x over reasoning AI. Each wave multiplies the compute required.

This increase in token usage is enabling organizations to speed their productivity by orders of magnitude. For example, long-running agents can help researchers work through a problem overnight, iterate on a design across thousands of configurations, or monitor systems and surface only the anomalies that require human judgment — freeing up researchers’ work days for higher-value tasks.

Choosing the Tool: When to Deploy a ‘Claw’

While generative AI has become a staple for on-demand tasks, there are specific scenarios where the persistent “heartbeat” of a claw offers distinct advantages. Determining when to move from a standard prompt-based AI to a long-running agent often comes down to the nature of the workflow:

  • From “On-Demand” to “Always-On”: While standard models are excellent for immediate, human-triggered queries, claws are often better suited for tasks that require continuous background monitoring or periodic system checks without a manual start.
  • Managing High-Iteration Loops: For complex problems, like testing thousands of chemical combinations or simulating infrastructure stress tests, a claw can manage the sheer volume of iterations that might otherwise be bottlenecked by human intervention.
  • Shifting from Suggestions to Actions: In many workflows, standard AI is used to provide information or drafts. A claw is often considered when the goal is for the AI to move into the execution phase — interacting with APIs, updating databases or managing files across a long time horizon.
  • Resource Optimization: For massive, token-heavy reasoning tasks, deploying a local claw on dedicated hardware like an NVIDIA DGX Spark personal AI supercomputer allows for more predictable costs and data privacy compared with high-frequency cloud API calls.

How Are Organizations Using Long-Running Autonomous Agents?

The practical applications of long-running autonomous agents span every function and sector.

In financial services, agents continuously monitor trading systems and regulatory feeds, flagging material events before the morning review. In drug discovery, agents sweep new scientific literature, extracting relevant findings and updating internal databases in real time without researcher intervention — a process that previously took weeks.

In engineering and manufacturing, agents speed problem analysis by testing thousands of parameter combinations, ranking results and flagging the configurations worth examining — and all this can happen overnight. 

In IT operations, agents diagnose infrastructure incidents, apply known remediations and escalate only the novel problems — compressing average time to resolution from hours to minutes. At ServiceNow, AI specialists leveraging Apriel and NVIDIA Nemotron models can resolve 90% of tickets autonomously. 

How Can Companies Deploy Autonomous Agents Responsibly? 

Autonomous agents are hands-on. They can send communications, write files, call APIs and update live systems. When an agent produces a wrong action, there are real consequences. Getting the accountability framework right from the start is essential, and organizations deploying autonomous agents in production must treat governance as a first-order requirement.

Organizations need to see what their agents are doing, inspect their reasoning at each step, audit their actions and intervene when needed. 

Organizations deploying autonomous agents responsibly are focused on three priorities: 

  • An open, auditable framework: NemoClaw is built on OpenClaw’s MIT licensed codebase, which means organizations own the full agent harness. They can read, fork and modify every layer of how their agents are built and deployed. That transparency enables teams to understand and control the system at the code level. Running open source models like NVIDIA Nemotron locally keeps sensitive workloads, including patient records, legal documents, financial transactions and proprietary research, within the organization’s own environment, ensuring that trace data stays under organizational control.
  • Securing the runtime environment: NemoClaw runs agents inside OpenShell, a sandboxed environment that defines precisely what the agent can and cannot do, enforcing clear permission boundaries from the start. 
  • Local compute: NVIDIA DGX Spark supercomputers deliver data-center-class GPU performance in a deskside form factor built for continuous local inference that’s always on, with local model hosting and data that stays within the organization’s environment. NVIDIA DGX Station systems scale that capability for teams running multiple agents simultaneously across complex, sustained workloads. 

The organizations defining what autonomous agents do in practice are accumulating something valuable: months of live operational learning, governance frameworks developed through real workloads and agents that have absorbed the institutional context that makes them genuinely useful. This foundation will only deepen over time.

Get Started With NVIDIA NemoClaw

Access a step-by-step tutorial on how to build a more secure AI agent with NemoClaw on NVIDIA DGX Spark. Explore how NemoClaw can deploy more secure, always-on AI assistants with a single command.​ 

 

Experiment with NemoClaw, available on GitHub, and join the community of developers on Discord building with NemoClaw using NVIDIA Nemotron 3 Super and Telegram on DGX Spark.

Stay up to date on agentic AI, NVIDIA Nemotron and more by subscribing to NVIDIA AI news, joining the community and following NVIDIA AI on LinkedIn, Instagram, X and Facebook.  

Explore self-paced video tutorials and livestreams.



NVIDIA Launches Nemotron 3 Nano Omni Model, Unifying Vision, Audio and Language for up to 9x More Efficient AI Agents


AI agent systems today juggle separate models for vision, speech and language — losing time and context as they pass data from one model to the other.

Unveiled today, NVIDIA Nemotron 3 Nano Omni is an open multimodal model that brings these capabilities together into one system, enabling agents to deliver faster, smarter responses with advanced reasoning across video, audio, image and text. This best-in-class model gives enterprises and developers a production path for more efficient and accurate multimodal AI agents with full deployment flexibility and control. 

Nemotron 3 Nano Omni sets a new efficiency frontier for open multimodal models with leading accuracy and low cost, topping six leaderboards for complex document intelligence, and video and audio understanding.

AI and software companies already adopting Nemotron 3 Nano Omni include Aible, Applied Scientific Intelligence (ASI), Eka Care, Foxconn, H Company, Palantir and Pyler, with Dell Technologies, Docusign, Infosys, K-Dense, Lila, Oracle and Zefr evaluating the model. 

“To build useful agents, you can’t wait seconds for a model to interpret a screen,” said Gautier Cloix, CEO of H Company. “By building on Nemotron 3 Nano Omni, our agents can rapidly interpret full HD screen recordings — something that wasn’t practical before. This isn’t just a speed boost: It’s a fundamental shift in how our agents perceive and interact with digital environments in real time.”

Nemotron 3 Nano Omni Enables Faster, Leaner Multimodal Agents

Consider an AI agent for customer support processing a screen recording while analyzing uploaded call audio and checking data logs — or an agent for finance tasked with parsing PDFs, spreadsheets, charts and voice notes. Today, most agentic systems accomplish these tasks with separate models for vision, speech and language. 

This approach increases latency through repeated inference passes, fragments context across modalities, and adds cost and inaccuracies over time.

By combining vision and audio encoders within its 30B-A3B, hybrid mixture-of-experts architecture, Nemotron 3 Nano Omni eliminates the need for separate perception models, driving inference efficiency at scale. It pairs this efficiency with strong multimodal perception accuracy, enabling AI systems to achieve 9x higher throughput than other open omni models with the same interactivity. The result is lower costs and better scalability without sacrificing responsiveness or quality.

In agentic systems, Nemotron 3 Nano Omni can work alongside proprietary cloud models or other NVIDIA Nemotron open models — such as Nemotron 3 Super for high-frequency execution or Nemotron 3 Ultra for complex planning — as well as proprietary models from other providers, to power sub-agents for agentic workflows such as computer use, document intelligence and audio-video reasoning.

  • Computer use agents — Nemotron 3 Nano Omni powers the perception loop for agents navigating graphical user interfaces, reasoning over onscreen content and understanding user interface state over time. H Company’s latest computer usage agent, powered by Nemotron 3 Nano Omni, uses a native input resolution of 1920×1080 pixels to achieve high-fidelity visual reasoning. In preliminary evaluations on the OSWorld benchmark, this integration showed a significant leap in navigating complex graphical interfaces and used Nemotron 3 Nano Omni’s ability to process very high-resolution images. 
  • Document intelligence — Interprets documents, charts, tables, screenshots and mixed-media inputs, enabling agents to reason across visual structure and text content coherently. Critical for enterprise analysis and compliance workflows.
  • Audio and video understanding — For customer service, research and monitoring workflows, Nemotron 3 Nano Omni maintains audio-video context, tying what was said, shown and documented into a single reasoning stream instead of disconnected summaries.

Open and Customizable, Deployable Anywhere

Nemotron 3 Nano Omni is released with open weights, datasets and training techniques — giving organizations full transparency and control over how the model is customized and deployed. 

Developers can use tools like NVIDIA NeMo for customization, evaluation and optimization for domain-specific use cases. Because the Nemotron family of models is open, organizations can deploy them in environments that meet regulatory, sovereignty or data localization requirements.

The Nemotron 3 family — including Nano, Super and Ultra models — has seen over 50 million downloads in the past year. Omni extends the family’s capabilities into multimodal and agentic domains. 

The model is available on Hugging Face, OpenRouter and build.nvidia.com as an NVIDIA NIM microservice and through a broad ecosystem of NVIDIA Cloud Partners, inference platforms and cloud service providers. 

Its open, lightweight architecture supports consistent deployment from local systems like NVIDIA Jetson modules, NVIDIA DGX Spark and DGX Station to data center and cloud environments. 

Visit the NVIDIA technical blog for tutorials, cookbooks and deployment guides for Nemotron 3 Nano Omni use cases. Stay up to date on agentic AI, NVIDIA Nemotron and more by subscribing to NVIDIA news, joining the community and following NVIDIA AI on LinkedIn, Instagram, X and Facebook.  

Explore self-paced video tutorials and livestreams.



Adobe Agents Unlock Breakthrough Creative Intelligence With NVIDIA and WPP



AI agents are transforming how work gets done across all industries, accelerating everything from content creation to decision-making.

NVIDIA’s expanded strategic collaborations with Adobe and WPP are bringing agentic AI to the center of enterprise marketing operations across creative production and customer experience orchestration. 

As demand for personalized customer experiences surges, brands require intelligent systems that can plan, create, produce and activate content continuously — without compromising control, governance or brand integrity.

Consider a global retailer delivering the right offer, image, copy and price, across millions of product, audience and channel combinations — updated in minutes instead of months. 

For marketing and creative teams, that means moving from one-size-fits-all campaigns to tailored experiences that are always on, always relevant and on brand. All of it is powered by intelligent systems that continuously generate and deliver content without sacrificing control, governance or brand integrity.

The expanded collaborations bring together three complementary strengths: Adobe’s creative and customer experience platforms and the new Adobe CX Enterprise Coworker, WPP’s global media and marketing expertise, and NVIDIA’s accelerated computing and software stack, including NVIDIA Nemotron open models, NVIDIA Agent Toolkit and the NVIDIA OpenShell secure runtime for building and running secure agentic AI systems.

As these agents begin orchestrating multistep workflows, tapping sensitive data and triggering actions across marketing stacks, enterprises need a way to enforce clear rules of engagement so every operation remains compliant, on brand and within defined risk boundaries.

Powered by the NVIDIA OpenShell runtime, every agent operates within a secure, isolated environment, delivering enterprise-grade control, consistency and auditability across the entire marketing lifecycle, with verifiable policy management, answering the question, “What can the agent do?” and not just, “What policy is in place?” 

In governed environments, enterprises can also keep key workflows and intelligence services inside their trust boundary, including securely invoking Adobe CX Intelligence as part of customer experience agents.

A live demo of CX Enterprise Coworker — powered by NVIDIA Agent Toolkit, including the OpenShell runtime and Nemotron models — will be featured during Adobe Summit’s day-two keynote taking place Tuesday, April 21, at 9 a.m. PT.

The collaboration enables:

  • End-to-end agentic workflows: Adobe is developing creative and marketing agents that can generate, adapt and version on-brand assets. Adobe’s CX Enterprise Coworker orchestrates downstream customer experience workflows from personalization to activation, closing the loop between content creation and customer engagement.
  • Controlled execution with NVIDIA OpenShell: Agents run in a policy-based, containerized sandbox designed to keep execution governed, observable and auditable, helping enterprises safely deploy long-running agentic workflows on premises or in the cloud.
  • Commercially safe content at scale: Adobe Firefly Foundry, accelerated by NVIDIA AI infrastructure, can help organizations deeply tune custom models on their proprietary assets, enabling agents to generate commercially safe content at scale and aligned to brand identity.
  • A 3D digital twins solution for scalable marketing production: Adobe’s cloud-native 3D digital twin solution is now generally available, built on NVIDIA Omniverse libraries and OpenUSD. 3D digital twins serve as persistent product identities that agents use to automate and scale high-fidelity content creation across formats, markets and configurations.

Creative Intelligence Meets Performance Intelligence With Policy-Governed Agents

Governed environments such as the ones enabled by this collaboration act as a set of “guardrails” that keep AI operations observable and auditable, preventing the system from acting outside of a company’s specific data boundaries or brand rules.

By combining Adobe’s creative platforms, WPP’s media and marketing expertise and NVIDIA’s secure infrastructure with CX Enterprise Coworker, brands no longer have to choose between speed and safety. Autonomous agents can now generate, adapt and activate content at scale while operating within governed, policy-driven environments.

The result is a new foundation for agentic marketing — where creative intelligence, performance and trust are built in from the start and delivered at global scale.

Watch NVIDIA founder and CEO Jensen Huang’s Adobe Summit fireside chat with Adobe CEO Shantanu Narayen below.

RTX to Spark: Gemma 4 Accelerated for Agentic AI


Open models are driving a new wave of on-device AI, extending innovation beyond the cloud to everyday devices. As these models advance, their value increasingly depends on access to local, real-time context that can turn meaningful insights into action. 

Designed for this shift, Google’s latest additions to the Gemma 4 family introduce a class of small, fast and omni-capable models built for efficient local execution across a wide range of devices.  

Google and NVIDIA have collaborated to optimize Gemma 4 for NVIDIA GPUs, enabling efficient performance across a range of systems — from data center deployments to NVIDIA RTX-powered PCs and workstations, the NVIDIA DGX Spark personal AI supercomputer and NVIDIA Jetson Orin Nano edge AI modules.

Gemma 4: Compact Models Optimized for NVIDIA GPUs 

The latest additions to the Gemma 4 family of open models spanning E2B, E4B, 26B and 31B variants  are designed for efficient deployment from edge devices to high-performance GPUs.  

All configurations measured using Q4_K_M quantizations BS = 1, ISL = 4096 and OSL = 128 on NVIDIA GeForce RTX 5090 and Mac M3 Ultra desktops. Token generation throughput measured on llama.cpp b7789, using the llama-bench tool.

This new generation of compact models supports a range of tasks, including: 

  • Reasoning: Strong performance on complex problem-solving tasks.  
  • Coding: Code generation and debugging for developer workflows.   
  • Agents: Native support for structured tool use (function calling).  
  • Vision, Video and Audio Capabilities: Enables rich multimodal interactions for object recognition, automated speech recognition, and document or video intelligence. 
  • Interleaved Multimodal Input: Mix text and images in any order within a single prompt.  
  • Multilingual: Out-of-the-box support for 35+ languages, pretrained on 140+ languages. 

The E2B and E4B models are built for ultraefficient, low-latency inference at the edge, running completely offline with near-zero latency across many devices including Jetson Nano modules. 

The 26B and 31B modelsare designed for high-performance reasoning and developer-centric workflows, making them well suited for agentic AI. Optimized to deliver state-of-the-art, accessible reasoning, these models run efficiently on NVIDIA RTX GPUs and DGX Spark — powering development environments, coding assistants and agent-driven workflows.  

As local agentic AI continues to gain momentum, applications like OpenClaw are enabling always-on AI assistants on RTX PCs, workstations and DGX Spark. The latest Gemma 4 models are compatible with OpenClaw, allowing users to build capable local agents that draw context from personal files, applications and workflows to automate tasks. Learn how to run OpenClaw for free on RTX GPUs and DGX Spark or using the DGX Spark OpenClaw playbook. 

Getting Started: Gemma 4 on RTX GPUs and DGX Spark 

NVIDIA has collaborated with Ollama and llama.cpp to provide the best local deployment experience for each of the Gemma 4 models.    

To use Gemma 4 locally, users can download Ollama to run Gemma 4 models or install llama.cpp and pair it with the Gemma 4 GGUF Hugging Face checkpoint. Additionally, Unsloth provides day-one support with optimized and quantized models for efficient local fine-tuning and deployment via Unsloth Studio. Start running and fine-tuning Gemma 4 in Unsloth Studio today. 

Running open models like the Gemma 4 family on NVIDIA GPUs achieves optimal performance because NVIDIA Tensor Cores accelerate AI inference workloads to deliver higher throughput and lower latency for local execution. Plus, the CUDA software stack ensures broad compatibility across leading frameworks and tools, enabling new models to run efficiently from day one.  

This combination allows open models like Gemma 4 to scale across a wide range of systems — from Jetson Orin Nano at the edge to RTX PCs, workstations and DGX Spark — without requiring extensive optimization. 

Check out the NVIDIA technical blog for more details on how to get started with Gemma 4 on NVIDIA GPUs and learn more about NVIDIA’s work on open models. 

#ICYMI: The Latest Updates for RTX AI PCs 

✨ Catch up on RTX AI Garage blogs for a host of agentic AI announcements from NVIDIA GTC, such as new open models for local agents. These models include NVIDIA Nemotron 3 Nano 4B and Nemotron 3 Super 120B, and optimizations for Qwen 3.5 and Mistral Small 4. 

 NVIDIA recently introduced NVIDIA NemoClaw, an open source stack that optimizes OpenClaw experiences on NVIDIA devices by increasing security and supporting local models.  

🚀 Accomplish.ai announced Accomplish FREE, a no-cost version of its open source desktop AI agent with built-in models. It harnesses NVIDIA GPUs to run open weight models locally, while a hybrid router dynamically balances workloads between local RTX hardware and the cloud — enabling fast, private, zero-configuration execution without requiring an application programming interface key. 

Plug in to NVIDIA AI PC on FacebookInstagramTikTok and X — and stay informed by subscribing to the RTX AI PC newsletter. 

Follow NVIDIA Workstation on LinkedIn and X 



AI Research Is Getting Harder to Separate From Geopolitics


The world’s top AI research conference, the Conference on Neural Information Processing Systems—better known as NeurIPS—became the latest organization this week to become embroiled in a growing clash between geopolitics and global scientific collaboration. The conference’s organizers announced and then quickly reversed controversial new restrictions for international participants after Chinese AI researchers threatened to boycott the event.

“This is a potential watershed moment,” says Paul Triolo, a partner at the advisory firm DGA-Albright Stonebridge who studies US-China relations. Triolo argues that attracting Chinese researchers to NeurIPS is beneficial to US interests, but some American officials have pushed for American and Chinese scientists to decouple their work—especially in AI, which has become a particularly sensitive topic in Washington.

The incident could deepen political tensions around AI research, as well as dissuade Chinese scientists from working at US universities and tech companies in the future. “At some level now it is going to be hard to keep basic AI research out of the [political] picture,” Triolo says.

In its annual handbook for paper submissions, issued in mid-March, NeurIPS organizers announced updated restrictions for participation. The rules stated that the event could not provide services including “peer review, editing, and publishing” to any organizations subject to US sanctions, and linked to a database of sanctioned entities. It included companies and organizations on the Bureau of Industry and Security’s entity list and those on another list with alleged ties to the Chinese military.

The new rules would have affected researchers at Chinese companies like Tencent and Huawei who regularly present work at NeurIPS. The database also includes entities from other countries such as Russia and Iran. The US places limits on doing business with these organizations, but there are no rules around academic publishing or conference participation.

The NeurIPS handbook has since been updated to specify that the restrictions apply only to Specially Designated Nationals and Blocked Persons, a list used primarily for terrorist groups and criminal organizations.

“In preparing the NeurIPS 2026 handbook, we included a link to a US government sanctions tool that covers a significantly broader set of restrictions than those NeurIPS is actually required to follow,” the event’s organizers said in a statement issued Friday. “This error was due to miscommunication between the NeurIPS Foundation and our legal team.”

Before they reversed course, the conference organizers initially said that the new rule was “about legal requirements that apply to the NeurIPS Foundation, which is responsible for complying with sanctions,” adding that it was seeking legal consultation on the issue.

Immediate Backlash

The new rule drew swift backlash from AI researchers around the world, particularly in China, which produces a large quantity of cutting-edge machine learning papers and is home to a growing share of the world’s top AI talent. Several academic groups there issued statements condemning the measure and, more importantly, discouraging Chinese academics from attending NeurIPS in the future. Some urged Chinese academics to contribute instead to domestic research conferences, potentially helping increase the country’s influence in relevant science and tech fields.

The China Association of Science and Technology (CAST), an influential government-affiliated organization for scientists and engineers, said Thursday that it would stop providing funding for Chinese scholars traveling to attend NeurIPS and would use the money instead to support domestic and international conferences that “respect the rights of Chinese scholars.”

CAST also said it will no longer count publications at the 2026 NeurIPS conference as academic achievements when evaluating future research funding. It’s unclear if the organization will reverse course now that NeurIPS has walked back the new rule.

Washington Post’s Surveillance Pricing Under Fire From Dems Who Want to Ban the Practice



Some subscribers to the Washington Post have been receiving emails that their subscription rates will be going up, according to the Washingtonian. That part isn’t surprising, given the fact that Post owner Jeff Bezos has reportedly been upset that the newspaper is losing money, especially since ditching about half of his workforce. But some folks who scrolled down to the bottom of the email were surprised when they read about how the new price was determined: “This price was set by an algorithm using your personal data.”

It’s a concept called surveillance pricing, and it’s not entirely new. People can often be charged different prices for the same product, depending on any number of factors. If your phone battery is low, rideshare companies like Uber or Lyft might charge more because they know you’re desperate. Instacart was recently caught charging up to 23% more to some shoppers based on unknown criteria.

Many Democrats aren’t happy about it, including Rep. Greg Casar of Texas. On Monday,  Casar wrote on Bluesky that surveillance pricing “should be illegal,” adding, “I have a bill to ban it.”

Last year, Casar and Rashida Tlaib of Michigan introduced legislation called the Stop AI Price Gouging and Wage Fixing Act. And last month, two other Democrats in the Senate, Ben Ray Luján from New Mexico and Jeff Merkley from Oregon, introduced very similar legislation called the Stop Price Gouging in Grocery Stores Act of 2026.

The Washington Post hasn’t explained how it determines pricing by using personal data. But there could be a number of factors, including zip code, estimated income, and purchase history. Bezos, the founder of Amazon, presumably has more data on what people buy than just about anyone in the country. And he’s a big supporter of utilizing AI to maximize profits.

The problem is that AI can’t really make up for losses any business might incur by offering a bad product. The newspaper first hemorrhaged subscribers—250,000 in one week alone—after Bezos stopped the Washington Post editorial board from endorsing Kamala Harris in the 2024 presidential election against Donald Trump.

The Washington Post had no reporters at the Academy Awards on Sunday, according to the paper’s former culture writer. And it was the last of the major news outlets to report that the U.S. had started bombing Iran late last month. The paper has purged any writer on the opinion side deemed to be liberal and has instead become a mouthpiece for the Bezos worldview—a worldview that happens to align perfectly with that of the Trump regime.

Bezos has been criticized for buying the distribution rights to First Lady Melania Trump’s “documentary” Melania for a whopping $40 million, but the movie itself helps explain why he’d bother. There are several shots of the Trumps with Big Tech oligarchs like Elon Musk, Tim Cook, and Bezos himself. All of these guys need something from Trump, whether it’s space contracts or just tariff relief.

News of what Bezos has in store for the future of his newspaper doesn’t instill confidence that it can survive much longer as a respected institution. The Washington Post’s news side still breaks major stories, but the New York Times reports Bezos’s big idea was to chop the newsroom’s budget in half and demand twice the productivity through AI. Columnist Dana Milbank and economics correspondent Jeff Stein both announced they were leaving the Post on Monday.

Businesses are increasingly turning to algorithms to set their prices, and it doesn’t look like that’s going to change anytime soon unless legislators get involved. At least a dozen states are considering legislation about surveillance pricing, but so far, only New York has passed a law in this area. Unfortunately, it doesn’t have much teeth since it only requires companies to notify consumers when a price has been set with AI.

On the other hand, New York’s law may be the only reason we know that the Washington Post is using AI for subscription rates. The paper has little other incentive to include the disclaimer: “This price was set by an algorithm using your personal data.” Notifying consumers may not fix the problem of surveillance pricing, but at least people can take it into account when deciding where they want to spend their money.

New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI



Launched today, NVIDIA Nemotron 3 Super is a 120‑billion‑parameter open model with 12 billion active parameters designed to run complex agentic AI systems at scale. 

Available now, the model combines advanced reasoning capabilities to efficiently complete tasks with high accuracy for autonomous agents.

AI-Native Companies: Perplexity offers its users access to Nemotron 3 Super for search and as one of 20 orchestrated models in Computer. Companies offering software development agents like CodeRabbit, Factory and Greptile are integrating the model into their AI agents along with proprietary models to achieve higher accuracy at lower cost. And life sciences and frontier AI organizations like Edison Scientific and Lila Sciences will power their agents for deep literature search, data science and molecular understanding.

Enterprise Software Platforms: Industry leaders such as Amdocs, Palantir, Cadence, Dassault Systèmes and Siemens are deploying and customizing the model to automate workflows in telecom, cybersecurity, semiconductor design and manufacturing. 

As companies move beyond chatbots and into multi‑agent applications, they encounter two constraints.

The first is context explosion. Multi‑agent workflows generate up to 15x more tokens than standard chat because each interaction requires resending full histories, including tool outputs and intermediate reasoning. 

Over long tasks, this volume of context increases costs and can lead to goal drift, where agents lose alignment with the original objective.

The second is the thinking tax. Complex agents must reason at every step, but using large models for every subtask makes multi-agent applications too expensive and sluggish for practical applications.

Nemotron 3 Super has a 1‑million‑token context window, allowing agents to retain full workflow state in memory and preventing goal drift.

Nemotron 3 Super has set new standards, claiming the top spot on Artificial Analysis for efficiency and openness with leading accuracy among models of the same size. 

The model also powers the NVIDIA AI-Q research agent to the No. 1 position on DeepResearch Bench and DeepResearch Bench II leaderboards, benchmarks that measure an AI system’s ability to conduct thorough, multistep research across large document sets while maintaining reasoning coherence. 

Hybrid Architecture

Nemotron 3 Super uses a hybrid mixture‑of‑experts (MoE) architecture that combines three major innovations to deliver up to 5x higher throughput and up to 2x higher accuracy than the previous Nemotron Super model. 

  • Hybrid Architecture: Mamba layers deliver 4x higher memory and compute efficiency, while transformer layers drive advanced reasoning.
  • MoE: Only 12 billion of its 120 billion parameters are active at inference. 
  • Latent MoE: A new technique that improves accuracy by activating four expert specialists for the cost of one to generate the next token at inference.
  • Multi-Token Prediction: Predicts multiple future words simultaneously, resulting in 3x faster inference.

On the NVIDIA Blackwell platform, the model runs in NVFP4 precision. That cuts memory requirements and pushes inference up to 4x faster than FP8 on NVIDIA Hopper, with no loss in accuracy. 

Open Weights, Data and Recipes

NVIDIA is releasing Nemotron 3 Super with open weights under a permissive license. Developers can deploy and customize it on workstations, in data centers or in the cloud.

The model was trained on synthetic data generated using frontier reasoning models. NVIDIA is publishing the complete methodology, including over 10 trillion tokens of pre- and post-training datasets, 15 training environments for reinforcement learning and evaluation recipes. Researchers can further use the NVIDIA NeMo platform to fine-tune the model or build their own. 

Use in Agentic Systems

Nemotron 3 Super is designed to handle complex subtasks inside a multi-agent system. 

A software development agent can load an entire codebase into context at once, enabling end-to-end code generation and debugging without document segmentation. 

In financial analysis it can load thousands of pages of reports into memory,  eliminating the need to re-reason across long conversations, which improves efficiency. 

Nemotron 3 Super has high-accuracy tool calling that ensures autonomous agents reliably navigate massive function libraries to prevent execution errors in high-stakes environments, like autonomous security orchestration in cybersecurity.

Availability

NVIDIA Nemotron 3 Super, part of the Nemotron 3 family, can be accessed at build.nvidia.com, Perplexity, OpenRouter and Hugging Face. Dell Technologies is bringing the model to the Dell Enterprise Hub on Hugging Face, optimized for on-premise deployment on the Dell AI Factory, advancing multi-agent AI workflows. HPE is also bringing NVIDIA Nemotron to its agents hub to help ensure scalable enterprise adoption of agentic AI. 

Enterprises and developers can deploy the model through several partners:

The model is packaged as an NVIDIA NIM microservice, allowing deployment from on-premises systems to the cloud.

Stay up to date on agentic AI, NVIDIA Nemotron and more by subscribing to NVIDIA AI news, joining the community, and following NVIDIA AI on LinkedIn, Instagram, X and Facebook.

Explore self-paced video tutorials and livestreams.



How to watch Jensen Huang’s Nvidia GTC 2026 keynote


Nvidia kicks off its annual GTC developer conference in San Jose, California, next week with CEO Jensen Huang’s keynote scheduled for Monday at 11am PT / 2pm ET.

GTC — which stands for GPU Technology Conference — is Nvidia’s flagship annual event, where the chipmaker typically uses the spotlight to announce new products, champion partnerships, and lay out its vision for the future of computing. Huang’s keynote will focus on Nvidia’s role in the future of computing and AI. You can watch the two-hour address in person at the SAP Center or livestream the talk on the event’s website.

The broader three-day event is focused on what’s coming next for AI across industries including healthcare, robotics, and autonomous vehicles, among others.

On the software side, it’s rumored that Nvidia will release an open source platform for enterprise AI agents, dubbed NemoClaw, as originally reported by Wired. The platform would give businesses a structured way to build and deploy AI agents (software that can carry out multi-step tasks autonomously) and would position Nvidia to mirror similar offerings from companies like OpenAI.

On the hardware side, the company is also rumored to be releasing a new chip designed to accelerate the AI inference process — the process by which an AI model applies what it has learned to generate responses or make decisions, as distinct from the initial training process, which requires far more computing power. Faster, cheaper inference is widely seen as one of the last bottlenecks to scaling AI applications broadly. The chip, if confirmed, would represent Nvidia’s latest bid to dominate not just the training market, where it already commands an estimated 80% share, but the inference market as well, where competition from custom chips built by Google, Amazon and others is fast intensifying.

Kevin Cook, a senior equity strategist at Zacks Investment Research, told TechCrunch that attendees should also expect to learn what the company plans to do with its relationship with Groq, the inference company Nvidia reportedly paid $20 billion late last year to license its technology. There’s a lot of curiosity around this tie-up, given that Jonathan Ross, Groq’s founder, Sunny Madra, Groq’s President, and other members of the Groq team agreed to join Nvidia to help advance and scale that licensed tech.

There will, of course, also be a range of partnership announcements and demonstrations showcasing Nvidia’s AI capabilities across industries.

Techcrunch event

San Francisco, CA
|
October 13-15, 2026