NVIDIA and Google Cloud Empower the Next Wave of AI Builders



At this year’s Google I/O conference, NVIDIA and Google Cloud are accelerating the work of more than 100,000 developers in the companies’ joint developer community, which provides curated learning paths, hands-on labs and events that help them build using the full-stack NVIDIA AI platform on Google Cloud. 

Launched at Google I/O last year, the community brings together developers, data scientists and machine learning engineers who want to sharpen their AI skills on the latest NVIDIA and Google Cloud technologies. 

New additions for the community are rolling out this year, including a learning path for using the JAX library on NVIDIA GPUs, a new NVIDIA Dynamo codelab focused on inference optimizations, as well as monthly developer livestreams

Over the last year, the community has become a go‑to hub for AI builders using NVIDIA‑accelerated tools for data science and machine learning. The result has been production‑ready retrieval-augmented generation applications on Google Kubernetes Engine (GKE) and instrumenting observability for agent workloads. 

These AI builders are also experimenting with new large language model research and prototyping hybrid on‑premises and cloud inference for real‑world use cases like sports analytics and enterprise data pipelines. 

Building With Google DeepMind’s Gemma, NVIDIA Nemotron and Open Frameworks

NVIDIA and Google Cloud are equipping developers with learning resources and hands-on labs that combine NVIDIA libraries, open models and tools with Google Cloud’s AI platform — so they can build optimized, production‑ready AI applications faster.

For example, developers can accelerate data science and analytics with the NVIDIA cuDF library in Google Colab Enterprise or Dataproc, or deploy multi-agent applications by combining Google DeepMind’s Gemma 4 models, NVIDIA Nemotron open models and Google Agent Development Kit with Google Cloud G4 VMs powered by NVIDIA RTX PRO 6000 Blackwell GPUs in Google Cloud Run or with spot instances. 

NVIDIA and Google Cloud work closely across open frameworks like JAX so developers can build, scale and productize JAX workloads on NVIDIA AI infrastructure on Google Cloud — from single‑GPU experiments to multi‑rack deployments — while getting strong performance and a consistent experience. 

This work extends to Google Cloud AI Hypercomputer, where the MaxText framework uses these JAX optimizations to train large models efficiently on NVIDIA GPUs.

Building on the same foundation, NVIDIA Dynamo on GKE helps developers optimize large-scale inference — including mixture-of-experts models — so they can serve AI applications more efficiently with NVIDIA accelerated infrastructure on Google Cloud.

To help developers get hands-on with these capabilities, a new learning path on running and scaling JAX on NVIDIA GPUs and a new NVIDIA Dynamo on GKE inference codelab will become available next month for members in the Google Cloud and NVIDIA developer community.

Advancing Responsible AI With Google DeepMind’s SynthID and NVIDIA Cosmos

AI agents are increasingly built from a system of AI models — combining proprietary and open source models that reason, plan and act on users’ behalf. 

Amid this shift, trust and transparency are foundational, so developers and organizations can understand how these systems work and what they generate.

NVIDIA was the first industry partner to collaborate with Google DeepMind on SynthID, an AI watermarking technology that embeds robust digital watermarks directly into AI‑generated content, which helps preserve the integrity of outputs from NVIDIA Cosmos world foundation models available on build.nvidia.com.

Cosmos models provide rich 3D perception and simulation capabilities for robots, autonomous machines and other physical AI systems, while SynthID brings content transparency to the imagery and video they rely on. 

Together, they help preserve the integrity of AI‑generated content so developers can build and deploy agentic applications more responsibly across cloud, edge and real‑world environments.

Building on a Full-Stack NVIDIA and Google Cloud Platform

This year, Google I/O is putting the spotlight on new agentic experiences and tools for developers — and NVIDIA and Google Cloud are focused on ensuring builders have the infrastructure, software and learning resources they need to make the most of them. 

For developers in the community building on NVIDIA and Google Cloud, the skills and tools they learn can scale, effortlessly taking projects from prototype to enterprise‑grade workloads. 

At Google Cloud Next, Google Cloud and NVIDIA expanded their full‑stack platform to help developers train, deploy and operationalize agents on Google Cloud. This collaboration includes work on NVIDIA Vera Rubin-powered A5X instances, Google DeepMind Gemini models and more, and is being harnessed by leading AI labs and enterprises including OpenAI, Thinking Machine Labs, Schrodinger, Salesforce, Snap and Crowdstrike. Learn more in this blog.

Join the NVIDIA and Google Cloud developer community to connect with other builders and stay up to date on new tools, developer events and programs.

NVIDIA and ServiceNow Partner on New Autonomous AI Agents for Enterprises



Enterprise AI has learned to generate. It has learned to reason. Now companies are asking the next question: How should AI act?

Early agent systems have shown what’s possible, moving beyond simple prompts to take on more complex tasks. The next step is bringing those capabilities into enterprise environments — where agents must operate with context, control and consistency across real workflows.

At ServiceNow Knowledge 2026, NVIDIA founder and CEO Jensen Huang joined ServiceNow chairman and CEO Bill McDermott during the opening keynote to discuss the next phase of enterprise AI. 

The companies are expanding their collaboration across the full stack, delivering specialized autonomous AI agents that are safe and easy to adopt — powered by NVIDIA accelerated computing, open models, domain-specific skills and secure agent execution software, and bringing together enterprise workflow context from ServiceNow Action Fabric and governance from ServiceNow AI Control Tower.

ServiceNow is introducing Project Arc, a long-running, self-evolving autonomous desktop agent designed for knowledge workers, including developers, IT teams and administrators. 

Unlike standalone AI agents, Project Arc connects natively to the ServiceNow AI Platform through ServiceNow Action Fabric to bring governance, auditability and workflow intelligence to every action the autonomous desktop agent takes. It can access the local file systems, terminals and applications installed on a machine to complete complex, multistep tasks that traditional automation can’t handle, but with the controls enterprises actually need to deploy AI at scale.

The work is designed based on three requirements every company will need for long-running, autonomous agents: open models and domain-specific skills that can be customized and security that helps agents act without exposing sensitive data or systems — all running on AI factories that deliver efficient tokenomics.

Bringing this level of autonomy to enterprises requires control from the start.

Project Arc uses NVIDIA OpenShell, an open source secure runtime for developing and deploying autonomous agents in sandboxed, policy-governed environments. ServiceNow is building on and contributing to OpenShell to advance a common foundation for secure, enterprise-grade agent execution. With OpenShell, enterprises can define what an agent can see, which tools it can use and how each action is contained. 

“Project Arc represents the next step in our ongoing collaboration with NVIDIA, bringing autonomous execution to the desktop,” said Jon Sigler, executive vice president and general manager of AI Platform at ServiceNow. “By combining OpenShell’s runtime layer with ServiceNow AI Control Tower, and powered by ServiceNow Action Fabric, we’re delivering the governance and security that enterprise AI requires.” 

Open Models and Agent Skills Scale Enterprise AI

To be effective, enterprise AI systems must be adaptable. NVIDIA and ServiceNow are building on an open ecosystem that allows organizations to tailor models and applications to their specific domains and data.

NVIDIA agent skills enable specialized agents, such as ServiceNow AI Specialists, to deliver targeted capabilities across enterprise workflows. For example, the NVIDIA AI-Q Blueprint for building specialized deep research agents empowers ServiceNow AI Specialists to gather context, synthesize information and support more complex decision-making across business functions. 

In addition, the NVIDIA Agent Toolkit, including NVIDIA Nemotron open models, provide flexible building blocks and specialized skills for developing customized AI applications. To support real-world performance that these systems can perform reliably, the companies are also advancing NOWAI-Bench, an open benchmarking suite for enterprise AI agents, integrated with the NVIDIA NeMo Gym library. NOWAI-Bench includes EnterpriseOps-Gym, one of the industry’s most challenging enterprise agent benchmarks, where Nemotron 3 Super currently ranks No. 1 among open source models.

Unlike general benchmarks, these evaluations focus on multistep workflows — where enterprise AI systems often encounter real challenges — helping teams build agents that perform reliably in production environments.

Efficient AI Factories

As AI agents become long running and always on, scaling them across millions of workflows requires not just capability but efficiency — making token economics central to enterprise AI.

NVIDIA AI factories are built to deliver the lowest-cost, most-efficient tokenomics for production AI. The NVIDIA Blackwell platform delivers more than 50x greater token output per watt than NVIDIA Hopper, resulting in nearly 35x lower cost per million tokens. For enterprises running agents across millions of workflows, that efficiency can determine how quickly AI moves from pilots to broad production use.

ServiceNow AI Control Tower integrates with the NVIDIA Enterprise AI Factory validated design, extending governance and observability to large-scale AI workloads. With added agent observability capabilities, organizations can monitor behavior in real time and manage AI systems across their full lifecycle — from deployment to optimization.

AI is becoming a new way that work gets done. What’s changing now is that the core pieces required to deploy it at scale — capable agents, built-in guardrails and proven performance — are all coming together.

The companies that move fastest will be the ones that give agents the infrastructure to act, the context to make decisions and the governance to keep every action accountable — and NVIDIA and ServiceNow are making this a reality for the world’s enterprises.

Learn more about NVIDIA OpenShell and the NVIDIA AI-Q Blueprint

Nemotron Labs: What OpenClaw Agents Mean for Every Organization


Editor’s note: This post is part of the Nemotron Labs blog series, which explores how the latest open models, datasets and training techniques help businesses build specialized AI systems and applications on NVIDIA platforms. Each post highlights practical ways to use an open stack to deliver real value in production — from transparent research copilots to scalable AI agents.

By early 2026, the open source project OpenClaw had become a phenomenon. In January, its GitHub star count crossed 100,000 as developer interest surged. Community dashboards and traffic analytics showed more than 2 million visitors in a single week. By March, OpenClaw topped 250,000 stars — overtaking React to become the most-starred software project on GitHub in just 60 days.

Created by Peter Steinberger, OpenClaw is a self-hosted, persistent AI assistant designed to run locally or on private servers. The project drew attention for its accessibility and unbounded autonomy: Users could deploy an AI model locally without depending on cloud infrastructure or external application programming interfaces (APIs).

Most AI agents today are triggered by a prompt, complete a defined task and then stop running. A long-running autonomous agent, or “claw,” works differently. These agents run persistently in the background, completing tasks on their own and surfacing only what requires a human decision. They operate on a heartbeat: At regular intervals, they check their task list, evaluate what needs action, and either act or wait for the next cycle.

OpenClaw’s rapid adoption also sparked debate. Security researchers raised concerns about how self-hosted AI tools manage sensitive data, authentication and model updates. Others questioned whether local deployments could expose users to new risks — from unpatched server instances to malicious contributions in community forks. As contributors and maintainers worked to address these issues, OpenClaw’s rise prompted a broader conversation across the AI ecosystem about the trade-offs between openness, privacy and safety.

To help enhance the security and robustness of the OpenClaw project, NVIDIA is collaborating with Steinberger and the OpenClaw developer community to address potential vulnerabilities, as detailed in a recent blog post by OpenClaw.

NVIDIA contributes code and guidance focused on improving model isolation, better managing local data access and strengthening the processes for verifying community code contributions. The goal is to support the project’s momentum by contributing its security and systems expertise in an open, transparent way that strengthens the community’s work while preserving OpenClaw’s independent governance.

 To help make long-running agents safer for enterprises, NVIDIA also introduced NVIDIA NemoClaw, a reference implementation that uses a single command to install OpenClaw, the NVIDIA OpenShell secure runtime and NVIDIA Nemotron open models with hardened defaults for networking, data access and security. NemoClaw serves as a blueprint for organizations to deploy claws more securely.

Inference Demand Multiplies With Each AI Wave

AI has moved through four phases, and the time between each is shortening. Predictive AI took years to become mainstream. Generative AI moved faster. Reasoning AI arrived faster still. Autonomous AI — the wave OpenClaw represents — is setting an even faster pace.

What compounds with each wave is inference demand. Generative AI increased token usage over predictive AI. Reasoning AI increased it another 100x. Autonomous agents, which run continuously and act across long time horizons, drive inference demand up by another 1,000x over reasoning AI. Each wave multiplies the compute required.

This increase in token usage is enabling organizations to speed their productivity by orders of magnitude. For example, long-running agents can help researchers work through a problem overnight, iterate on a design across thousands of configurations, or monitor systems and surface only the anomalies that require human judgment — freeing up researchers’ work days for higher-value tasks.

Choosing the Tool: When to Deploy a ‘Claw’

While generative AI has become a staple for on-demand tasks, there are specific scenarios where the persistent “heartbeat” of a claw offers distinct advantages. Determining when to move from a standard prompt-based AI to a long-running agent often comes down to the nature of the workflow:

  • From “On-Demand” to “Always-On”: While standard models are excellent for immediate, human-triggered queries, claws are often better suited for tasks that require continuous background monitoring or periodic system checks without a manual start.
  • Managing High-Iteration Loops: For complex problems, like testing thousands of chemical combinations or simulating infrastructure stress tests, a claw can manage the sheer volume of iterations that might otherwise be bottlenecked by human intervention.
  • Shifting from Suggestions to Actions: In many workflows, standard AI is used to provide information or drafts. A claw is often considered when the goal is for the AI to move into the execution phase — interacting with APIs, updating databases or managing files across a long time horizon.
  • Resource Optimization: For massive, token-heavy reasoning tasks, deploying a local claw on dedicated hardware like an NVIDIA DGX Spark personal AI supercomputer allows for more predictable costs and data privacy compared with high-frequency cloud API calls.

How Are Organizations Using Long-Running Autonomous Agents?

The practical applications of long-running autonomous agents span every function and sector.

In financial services, agents continuously monitor trading systems and regulatory feeds, flagging material events before the morning review. In drug discovery, agents sweep new scientific literature, extracting relevant findings and updating internal databases in real time without researcher intervention — a process that previously took weeks.

In engineering and manufacturing, agents speed problem analysis by testing thousands of parameter combinations, ranking results and flagging the configurations worth examining — and all this can happen overnight. 

In IT operations, agents diagnose infrastructure incidents, apply known remediations and escalate only the novel problems — compressing average time to resolution from hours to minutes. At ServiceNow, AI specialists leveraging Apriel and NVIDIA Nemotron models can resolve 90% of tickets autonomously. 

How Can Companies Deploy Autonomous Agents Responsibly? 

Autonomous agents are hands-on. They can send communications, write files, call APIs and update live systems. When an agent produces a wrong action, there are real consequences. Getting the accountability framework right from the start is essential, and organizations deploying autonomous agents in production must treat governance as a first-order requirement.

Organizations need to see what their agents are doing, inspect their reasoning at each step, audit their actions and intervene when needed. 

Organizations deploying autonomous agents responsibly are focused on three priorities: 

  • An open, auditable framework: NemoClaw is built on OpenClaw’s MIT licensed codebase, which means organizations own the full agent harness. They can read, fork and modify every layer of how their agents are built and deployed. That transparency enables teams to understand and control the system at the code level. Running open source models like NVIDIA Nemotron locally keeps sensitive workloads, including patient records, legal documents, financial transactions and proprietary research, within the organization’s own environment, ensuring that trace data stays under organizational control.
  • Securing the runtime environment: NemoClaw runs agents inside OpenShell, a sandboxed environment that defines precisely what the agent can and cannot do, enforcing clear permission boundaries from the start. 
  • Local compute: NVIDIA DGX Spark supercomputers deliver data-center-class GPU performance in a deskside form factor built for continuous local inference that’s always on, with local model hosting and data that stays within the organization’s environment. NVIDIA DGX Station systems scale that capability for teams running multiple agents simultaneously across complex, sustained workloads. 

The organizations defining what autonomous agents do in practice are accumulating something valuable: months of live operational learning, governance frameworks developed through real workloads and agents that have absorbed the institutional context that makes them genuinely useful. This foundation will only deepen over time.

Get Started With NVIDIA NemoClaw

Access a step-by-step tutorial on how to build a more secure AI agent with NemoClaw on NVIDIA DGX Spark. Explore how NemoClaw can deploy more secure, always-on AI assistants with a single command.​ 

 

Experiment with NemoClaw, available on GitHub, and join the community of developers on Discord building with NemoClaw using NVIDIA Nemotron 3 Super and Telegram on DGX Spark.

Stay up to date on agentic AI, NVIDIA Nemotron and more by subscribing to NVIDIA AI news, joining the community and following NVIDIA AI on LinkedIn, Instagram, X and Facebook.  

Explore self-paced video tutorials and livestreams.



NVIDIA Launches Nemotron 3 Nano Omni Model, Unifying Vision, Audio and Language for up to 9x More Efficient AI Agents


AI agent systems today juggle separate models for vision, speech and language — losing time and context as they pass data from one model to the other.

Unveiled today, NVIDIA Nemotron 3 Nano Omni is an open multimodal model that brings these capabilities together into one system, enabling agents to deliver faster, smarter responses with advanced reasoning across video, audio, image and text. This best-in-class model gives enterprises and developers a production path for more efficient and accurate multimodal AI agents with full deployment flexibility and control. 

Nemotron 3 Nano Omni sets a new efficiency frontier for open multimodal models with leading accuracy and low cost, topping six leaderboards for complex document intelligence, and video and audio understanding.

AI and software companies already adopting Nemotron 3 Nano Omni include Aible, Applied Scientific Intelligence (ASI), Eka Care, Foxconn, H Company, Palantir and Pyler, with Dell Technologies, Docusign, Infosys, K-Dense, Lila, Oracle and Zefr evaluating the model. 

“To build useful agents, you can’t wait seconds for a model to interpret a screen,” said Gautier Cloix, CEO of H Company. “By building on Nemotron 3 Nano Omni, our agents can rapidly interpret full HD screen recordings — something that wasn’t practical before. This isn’t just a speed boost: It’s a fundamental shift in how our agents perceive and interact with digital environments in real time.”

Nemotron 3 Nano Omni Enables Faster, Leaner Multimodal Agents

Consider an AI agent for customer support processing a screen recording while analyzing uploaded call audio and checking data logs — or an agent for finance tasked with parsing PDFs, spreadsheets, charts and voice notes. Today, most agentic systems accomplish these tasks with separate models for vision, speech and language. 

This approach increases latency through repeated inference passes, fragments context across modalities, and adds cost and inaccuracies over time.

By combining vision and audio encoders within its 30B-A3B, hybrid mixture-of-experts architecture, Nemotron 3 Nano Omni eliminates the need for separate perception models, driving inference efficiency at scale. It pairs this efficiency with strong multimodal perception accuracy, enabling AI systems to achieve 9x higher throughput than other open omni models with the same interactivity. The result is lower costs and better scalability without sacrificing responsiveness or quality.

In agentic systems, Nemotron 3 Nano Omni can work alongside proprietary cloud models or other NVIDIA Nemotron open models — such as Nemotron 3 Super for high-frequency execution or Nemotron 3 Ultra for complex planning — as well as proprietary models from other providers, to power sub-agents for agentic workflows such as computer use, document intelligence and audio-video reasoning.

  • Computer use agents — Nemotron 3 Nano Omni powers the perception loop for agents navigating graphical user interfaces, reasoning over onscreen content and understanding user interface state over time. H Company’s latest computer usage agent, powered by Nemotron 3 Nano Omni, uses a native input resolution of 1920×1080 pixels to achieve high-fidelity visual reasoning. In preliminary evaluations on the OSWorld benchmark, this integration showed a significant leap in navigating complex graphical interfaces and used Nemotron 3 Nano Omni’s ability to process very high-resolution images. 
  • Document intelligence — Interprets documents, charts, tables, screenshots and mixed-media inputs, enabling agents to reason across visual structure and text content coherently. Critical for enterprise analysis and compliance workflows.
  • Audio and video understanding — For customer service, research and monitoring workflows, Nemotron 3 Nano Omni maintains audio-video context, tying what was said, shown and documented into a single reasoning stream instead of disconnected summaries.

Open and Customizable, Deployable Anywhere

Nemotron 3 Nano Omni is released with open weights, datasets and training techniques — giving organizations full transparency and control over how the model is customized and deployed. 

Developers can use tools like NVIDIA NeMo for customization, evaluation and optimization for domain-specific use cases. Because the Nemotron family of models is open, organizations can deploy them in environments that meet regulatory, sovereignty or data localization requirements.

The Nemotron 3 family — including Nano, Super and Ultra models — has seen over 50 million downloads in the past year. Omni extends the family’s capabilities into multimodal and agentic domains. 

The model is available on Hugging Face, OpenRouter and build.nvidia.com as an NVIDIA NIM microservice and through a broad ecosystem of NVIDIA Cloud Partners, inference platforms and cloud service providers. 

Its open, lightweight architecture supports consistent deployment from local systems like NVIDIA Jetson modules, NVIDIA DGX Spark and DGX Station to data center and cloud environments. 

Visit the NVIDIA technical blog for tutorials, cookbooks and deployment guides for Nemotron 3 Nano Omni use cases. Stay up to date on agentic AI, NVIDIA Nemotron and more by subscribing to NVIDIA news, joining the community and following NVIDIA AI on LinkedIn, Instagram, X and Facebook.  

Explore self-paced video tutorials and livestreams.



OpenAI’s New GPT-5.5 Powers Codex on NVIDIA Infrastructure


AI agents have revolutionized developer workflows, and their next frontier is knowledge work: processing information, solving complex problems, coming up with new ideas and driving innovation. 

Codex, OpenAI’s agentic coding application, is enabling this new frontier. It’s now powered by GPT-5.5, OpenAI’s latest frontier model, which runs on NVIDIA GB200 NVL72 rack-scale systems. 

Over 10,000 NVIDIANs — across engineering, product, legal, marketing, finance, sales, HR, operations and developer programs — are already using GPT-5.5-powered Codex to achieve, in their words, “mind-blowing” and “life-changing” results. 

NVIDIA engineers have had access to GPT-5.5 through the Codex app for a few weeks, and the gains are measurable. Served on GB200 NVL72, which is capable of delivering 35x lower cost per million tokens and 50x higher token output per second per megawatt compared with prior-generation systems — economics that make frontier-model inference viable at enterprise scale.

Debugging cycles that once stretched across days are closing in hours. Experimentation that previously required weeks is turning into overnight progress in complex, multi-file codebases. Teams are shipping end-to-end features from natural-language prompts, with stronger reliability and fewer wasted cycles than earlier models. 

OpenAI’s stunning progress is just the latest example of NVIDIA’s work with every frontier model company — not just to accelerate the use of AI agents inside NVIDIA, but to help the company’s partners build the world’s best, lowest cost and most power efficient models for everyone.

As NVIDIA founder and CEO Jensen Huang told employees in a company-wide email urging everyone to use Codex: “Let’s jump to lightspeed. Welcome to the age of AI.”

A Deployment Built for Enterprise Security 

Just like humans, every agent needs its own dedicated computer. 

To ensure seamless operation within secure enterprise environments, the Codex app supports remote Secure Shell (SSH) connections to approved cloud virtual machines, allowing agents to work with real company data without exposing it externally. 

So to ensure maximum security and auditability, NVIDIA IT rolled out cloud virtual machines (VMs) for every employee to run their agent safely. This provides a dedicated sandbox for the agent to operate at its maximum capabilities while maintaining full auditability. Users can control the Codex agent running in the cloud VM from a user interface that every employee is familiar with.

A zero-data retention policy governs NVIDIA’s deployment, and agents access production systems with read-only permissions through command-line interfaces and Skills — the same agentic toolkit NVIDIA uses to run automation workflows across the company.

A Decade of Full-Stack Collaboration

The GPT-5.5 launch and the Codex rollout reflect more than 10 years of collaboration between NVIDIA and OpenAI. The partnership began in 2016, when Huang hand-delivered the first NVIDIA DGX-1 AI supercomputer to OpenAI’s San Francisco headquarters.

Since then, the two companies have worked closely across the full AI stack. 

NVIDIA was a day-zero partner for OpenAI’s gpt-oss open-weight model launch, optimizing model weights for NVIDIA TensorRT-LLM and ecosystem frameworks including vLLM and Ollama. 

OpenAI has committed to deploying more than 10 gigawatts of NVIDIA systems for its next-generation AI infrastructure — a buildout that will put millions of NVIDIA GPUs at the foundation of OpenAI’s model training and inference for years ahead.

And OpenAI and NVIDIA are early silicon and codesign partners: OpenAI provides feedback that informs NVIDIA’s hardware roadmap, and in turn gains early access to new architectures. That relationship produced a concrete milestone — the joint bring-up of the first GB200 NVL72 100,000-GPU cluster. The cluster completed multiple large-scale training runs and set a new benchmark for system-level reliability at frontier scale.

GPT-5.5 is the product of that infrastructure running at full strength. 

Learn more in OpenAI’s announcement.

Adobe Agents Unlock Breakthrough Creative Intelligence With NVIDIA and WPP



AI agents are transforming how work gets done across all industries, accelerating everything from content creation to decision-making.

NVIDIA’s expanded strategic collaborations with Adobe and WPP are bringing agentic AI to the center of enterprise marketing operations across creative production and customer experience orchestration. 

As demand for personalized customer experiences surges, brands require intelligent systems that can plan, create, produce and activate content continuously — without compromising control, governance or brand integrity.

Consider a global retailer delivering the right offer, image, copy and price, across millions of product, audience and channel combinations — updated in minutes instead of months. 

For marketing and creative teams, that means moving from one-size-fits-all campaigns to tailored experiences that are always on, always relevant and on brand. All of it is powered by intelligent systems that continuously generate and deliver content without sacrificing control, governance or brand integrity.

The expanded collaborations bring together three complementary strengths: Adobe’s creative and customer experience platforms and the new Adobe CX Enterprise Coworker, WPP’s global media and marketing expertise, and NVIDIA’s accelerated computing and software stack, including NVIDIA Nemotron open models, NVIDIA Agent Toolkit and the NVIDIA OpenShell secure runtime for building and running secure agentic AI systems.

As these agents begin orchestrating multistep workflows, tapping sensitive data and triggering actions across marketing stacks, enterprises need a way to enforce clear rules of engagement so every operation remains compliant, on brand and within defined risk boundaries.

Powered by the NVIDIA OpenShell runtime, every agent operates within a secure, isolated environment, delivering enterprise-grade control, consistency and auditability across the entire marketing lifecycle, with verifiable policy management, answering the question, “What can the agent do?” and not just, “What policy is in place?” 

In governed environments, enterprises can also keep key workflows and intelligence services inside their trust boundary, including securely invoking Adobe CX Intelligence as part of customer experience agents.

A live demo of CX Enterprise Coworker — powered by NVIDIA Agent Toolkit, including the OpenShell runtime and Nemotron models — will be featured during Adobe Summit’s day-two keynote taking place Tuesday, April 21, at 9 a.m. PT.

The collaboration enables:

  • End-to-end agentic workflows: Adobe is developing creative and marketing agents that can generate, adapt and version on-brand assets. Adobe’s CX Enterprise Coworker orchestrates downstream customer experience workflows from personalization to activation, closing the loop between content creation and customer engagement.
  • Controlled execution with NVIDIA OpenShell: Agents run in a policy-based, containerized sandbox designed to keep execution governed, observable and auditable, helping enterprises safely deploy long-running agentic workflows on premises or in the cloud.
  • Commercially safe content at scale: Adobe Firefly Foundry, accelerated by NVIDIA AI infrastructure, can help organizations deeply tune custom models on their proprietary assets, enabling agents to generate commercially safe content at scale and aligned to brand identity.
  • A 3D digital twins solution for scalable marketing production: Adobe’s cloud-native 3D digital twin solution is now generally available, built on NVIDIA Omniverse libraries and OpenUSD. 3D digital twins serve as persistent product identities that agents use to automate and scale high-fidelity content creation across formats, markets and configurations.

Creative Intelligence Meets Performance Intelligence With Policy-Governed Agents

Governed environments such as the ones enabled by this collaboration act as a set of “guardrails” that keep AI operations observable and auditable, preventing the system from acting outside of a company’s specific data boundaries or brand rules.

By combining Adobe’s creative platforms, WPP’s media and marketing expertise and NVIDIA’s secure infrastructure with CX Enterprise Coworker, brands no longer have to choose between speed and safety. Autonomous agents can now generate, adapt and activate content at scale while operating within governed, policy-driven environments.

The result is a new foundation for agentic marketing — where creative intelligence, performance and trust are built in from the start and delivered at global scale.

Watch NVIDIA founder and CEO Jensen Huang’s Adobe Summit fireside chat with Adobe CEO Shantanu Narayen below.

RTX to Spark: Gemma 4 Accelerated for Agentic AI


Open models are driving a new wave of on-device AI, extending innovation beyond the cloud to everyday devices. As these models advance, their value increasingly depends on access to local, real-time context that can turn meaningful insights into action. 

Designed for this shift, Google’s latest additions to the Gemma 4 family introduce a class of small, fast and omni-capable models built for efficient local execution across a wide range of devices.  

Google and NVIDIA have collaborated to optimize Gemma 4 for NVIDIA GPUs, enabling efficient performance across a range of systems — from data center deployments to NVIDIA RTX-powered PCs and workstations, the NVIDIA DGX Spark personal AI supercomputer and NVIDIA Jetson Orin Nano edge AI modules.

Gemma 4: Compact Models Optimized for NVIDIA GPUs 

The latest additions to the Gemma 4 family of open models spanning E2B, E4B, 26B and 31B variants  are designed for efficient deployment from edge devices to high-performance GPUs.  

All configurations measured using Q4_K_M quantizations BS = 1, ISL = 4096 and OSL = 128 on NVIDIA GeForce RTX 5090 and Mac M3 Ultra desktops. Token generation throughput measured on llama.cpp b7789, using the llama-bench tool.

This new generation of compact models supports a range of tasks, including: 

  • Reasoning: Strong performance on complex problem-solving tasks.  
  • Coding: Code generation and debugging for developer workflows.   
  • Agents: Native support for structured tool use (function calling).  
  • Vision, Video and Audio Capabilities: Enables rich multimodal interactions for object recognition, automated speech recognition, and document or video intelligence. 
  • Interleaved Multimodal Input: Mix text and images in any order within a single prompt.  
  • Multilingual: Out-of-the-box support for 35+ languages, pretrained on 140+ languages. 

The E2B and E4B models are built for ultraefficient, low-latency inference at the edge, running completely offline with near-zero latency across many devices including Jetson Nano modules. 

The 26B and 31B modelsare designed for high-performance reasoning and developer-centric workflows, making them well suited for agentic AI. Optimized to deliver state-of-the-art, accessible reasoning, these models run efficiently on NVIDIA RTX GPUs and DGX Spark — powering development environments, coding assistants and agent-driven workflows.  

As local agentic AI continues to gain momentum, applications like OpenClaw are enabling always-on AI assistants on RTX PCs, workstations and DGX Spark. The latest Gemma 4 models are compatible with OpenClaw, allowing users to build capable local agents that draw context from personal files, applications and workflows to automate tasks. Learn how to run OpenClaw for free on RTX GPUs and DGX Spark or using the DGX Spark OpenClaw playbook. 

Getting Started: Gemma 4 on RTX GPUs and DGX Spark 

NVIDIA has collaborated with Ollama and llama.cpp to provide the best local deployment experience for each of the Gemma 4 models.    

To use Gemma 4 locally, users can download Ollama to run Gemma 4 models or install llama.cpp and pair it with the Gemma 4 GGUF Hugging Face checkpoint. Additionally, Unsloth provides day-one support with optimized and quantized models for efficient local fine-tuning and deployment via Unsloth Studio. Start running and fine-tuning Gemma 4 in Unsloth Studio today. 

Running open models like the Gemma 4 family on NVIDIA GPUs achieves optimal performance because NVIDIA Tensor Cores accelerate AI inference workloads to deliver higher throughput and lower latency for local execution. Plus, the CUDA software stack ensures broad compatibility across leading frameworks and tools, enabling new models to run efficiently from day one.  

This combination allows open models like Gemma 4 to scale across a wide range of systems — from Jetson Orin Nano at the edge to RTX PCs, workstations and DGX Spark — without requiring extensive optimization. 

Check out the NVIDIA technical blog for more details on how to get started with Gemma 4 on NVIDIA GPUs and learn more about NVIDIA’s work on open models. 

#ICYMI: The Latest Updates for RTX AI PCs 

✨ Catch up on RTX AI Garage blogs for a host of agentic AI announcements from NVIDIA GTC, such as new open models for local agents. These models include NVIDIA Nemotron 3 Nano 4B and Nemotron 3 Super 120B, and optimizations for Qwen 3.5 and Mistral Small 4. 

 NVIDIA recently introduced NVIDIA NemoClaw, an open source stack that optimizes OpenClaw experiences on NVIDIA devices by increasing security and supporting local models.  

🚀 Accomplish.ai announced Accomplish FREE, a no-cost version of its open source desktop AI agent with built-in models. It harnesses NVIDIA GPUs to run open weight models locally, while a hybrid router dynamically balances workloads between local RTX hardware and the cloud — enabling fast, private, zero-configuration execution without requiring an application programming interface key. 

Plug in to NVIDIA AI PC on FacebookInstagramTikTok and X — and stay informed by subscribing to the RTX AI PC newsletter. 

Follow NVIDIA Workstation on LinkedIn and X 



How Autonomous AI Agents Become Secure by Design With NVIDIA OpenShell



Autonomous agents mark a new inflection point in AI. Systems are no longer limited to generating responses or reasoning through tasks. They can take action: Agents can read files, use tools, write and run code, and execute workflows across enterprise systems, all while expanding their own capabilities. 

Application-layer risk grows exponentially when agents continuously improve and evolve. The NVIDIA OpenShell runtime is being built to address this. 

Part of NVIDIA Agent Toolkit, OpenShell is an open source, secure-by-design runtime for running autonomous agents such as claws. It works by ensuring each agent runs inside its own sandbox, separating application-layer operations from infrastructure-layer policy enforcement.

This means security policies are out of reach of the agent — they’re applied at the system level. Instead of relying on behavioral prompts, OpenShell enforces constraints on the environment the agent runs in — meaning the agent cannot override policies, or leak credentials or private data, even if compromised. 

With OpenShell, enterprises can separate agent behavior, policy definition and policy enforcement. Organizations gain a single, unified policy layer to define and monitor how autonomous systems operate. Coding agents, research assistants and agentic workflows all run under the same runtime policies regardless of host operating system, simplifying compliance and operational oversight.

This is the “browser tab” model applied to agents: Sessions are isolated, resources are controlled and permissions are verified by the runtime before any action takes place.

Securing autonomous systems requires an integrated ecosystem. OpenShell is designed to add privacy and security controls for AI agents. NVIDIA is collaborating with security partners, including Cisco, CrowdStrike, Google Cloud, Microsoft Security and TrendAI, to align runtime policy management and enforcement for agents across the enterprise stack. 

OpenShell Provides an Enterprise-Grade Sandbox for Building Personal AI Assistants

NVIDIA NemoClaw is an open source reference stack that simplifies installing OpenClaw always-on assistants with the OpenShell runtime and NVIDIA Nemotron models in a single command. 

NemoClaw provides enthusiasts with an open reference for building self-evolving personal AI agents, or claws. Since security needs vary, NemoClaw provides a reference example for policy-based privacy and security guardrails to give users more control over their agents’ behavior and data-handling. Users can customize it for their specific use cases — much like adjusting security preferences for applications on a phone. 

NemoClaw includes an example configuration of OpenShell that defines how the agent should interact with systems. NemoClaw uses open source models like NVIDIA Nemotron alongside OpenShell. 

This enables self-evolving claws to run more securely in clouds, on premises or on personal computers, including NVIDIA GeForce RTX PCs and laptops or NVIDIA RTX PRO-powered workstations, as well as NVIDIA DGX Station and NVIDIA DGX Spark AI supercomputers.

Both OpenShell and NemoClaw are in early preview. NVIDIA is building in the open with the community and its partners to enable enterprises to scale self-evolving, long-running autonomous agents safely, confidently and in compliance with global security standards.

Get started with NVIDIA OpenShell and launch a ready‑to‑use environment on NVIDIA Brev, or explore the open source project on GitHub.

New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI



Launched today, NVIDIA Nemotron 3 Super is a 120‑billion‑parameter open model with 12 billion active parameters designed to run complex agentic AI systems at scale. 

Available now, the model combines advanced reasoning capabilities to efficiently complete tasks with high accuracy for autonomous agents.

AI-Native Companies: Perplexity offers its users access to Nemotron 3 Super for search and as one of 20 orchestrated models in Computer. Companies offering software development agents like CodeRabbit, Factory and Greptile are integrating the model into their AI agents along with proprietary models to achieve higher accuracy at lower cost. And life sciences and frontier AI organizations like Edison Scientific and Lila Sciences will power their agents for deep literature search, data science and molecular understanding.

Enterprise Software Platforms: Industry leaders such as Amdocs, Palantir, Cadence, Dassault Systèmes and Siemens are deploying and customizing the model to automate workflows in telecom, cybersecurity, semiconductor design and manufacturing. 

As companies move beyond chatbots and into multi‑agent applications, they encounter two constraints.

The first is context explosion. Multi‑agent workflows generate up to 15x more tokens than standard chat because each interaction requires resending full histories, including tool outputs and intermediate reasoning. 

Over long tasks, this volume of context increases costs and can lead to goal drift, where agents lose alignment with the original objective.

The second is the thinking tax. Complex agents must reason at every step, but using large models for every subtask makes multi-agent applications too expensive and sluggish for practical applications.

Nemotron 3 Super has a 1‑million‑token context window, allowing agents to retain full workflow state in memory and preventing goal drift.

Nemotron 3 Super has set new standards, claiming the top spot on Artificial Analysis for efficiency and openness with leading accuracy among models of the same size. 

The model also powers the NVIDIA AI-Q research agent to the No. 1 position on DeepResearch Bench and DeepResearch Bench II leaderboards, benchmarks that measure an AI system’s ability to conduct thorough, multistep research across large document sets while maintaining reasoning coherence. 

Hybrid Architecture

Nemotron 3 Super uses a hybrid mixture‑of‑experts (MoE) architecture that combines three major innovations to deliver up to 5x higher throughput and up to 2x higher accuracy than the previous Nemotron Super model. 

  • Hybrid Architecture: Mamba layers deliver 4x higher memory and compute efficiency, while transformer layers drive advanced reasoning.
  • MoE: Only 12 billion of its 120 billion parameters are active at inference. 
  • Latent MoE: A new technique that improves accuracy by activating four expert specialists for the cost of one to generate the next token at inference.
  • Multi-Token Prediction: Predicts multiple future words simultaneously, resulting in 3x faster inference.

On the NVIDIA Blackwell platform, the model runs in NVFP4 precision. That cuts memory requirements and pushes inference up to 4x faster than FP8 on NVIDIA Hopper, with no loss in accuracy. 

Open Weights, Data and Recipes

NVIDIA is releasing Nemotron 3 Super with open weights under a permissive license. Developers can deploy and customize it on workstations, in data centers or in the cloud.

The model was trained on synthetic data generated using frontier reasoning models. NVIDIA is publishing the complete methodology, including over 10 trillion tokens of pre- and post-training datasets, 15 training environments for reinforcement learning and evaluation recipes. Researchers can further use the NVIDIA NeMo platform to fine-tune the model or build their own. 

Use in Agentic Systems

Nemotron 3 Super is designed to handle complex subtasks inside a multi-agent system. 

A software development agent can load an entire codebase into context at once, enabling end-to-end code generation and debugging without document segmentation. 

In financial analysis it can load thousands of pages of reports into memory,  eliminating the need to re-reason across long conversations, which improves efficiency. 

Nemotron 3 Super has high-accuracy tool calling that ensures autonomous agents reliably navigate massive function libraries to prevent execution errors in high-stakes environments, like autonomous security orchestration in cybersecurity.

Availability

NVIDIA Nemotron 3 Super, part of the Nemotron 3 family, can be accessed at build.nvidia.com, Perplexity, OpenRouter and Hugging Face. Dell Technologies is bringing the model to the Dell Enterprise Hub on Hugging Face, optimized for on-premise deployment on the Dell AI Factory, advancing multi-agent AI workflows. HPE is also bringing NVIDIA Nemotron to its agents hub to help ensure scalable enterprise adoption of agentic AI. 

Enterprises and developers can deploy the model through several partners:

The model is packaged as an NVIDIA NIM microservice, allowing deployment from on-premises systems to the cloud.

Stay up to date on agentic AI, NVIDIA Nemotron and more by subscribing to NVIDIA AI news, joining the community, and following NVIDIA AI on LinkedIn, Instagram, X and Facebook.

Explore self-paced video tutorials and livestreams.



Stripe wants to turn your AI costs into a profit center


Stripe on Monday released a preview of a new feature that could help AI startups (and other companies) solve the problem of passing through the underlying costs of AI model usage to their customers.

Stripe’s feature, however, goes even further than just passing through the costs of the tokens. It allows startups to charge a markup percentage on token usage. So a company can, for instance, charge an automatic 30% above the cost of the tokens that the startup will pay the model maker.

As Stripe described it, “Say you’re building an AI app: you want a consistent 30% margin over raw LLM token costs across providers. Billing automates the process.”

The billing feature lets the startup pick the AI models it uses. It tracks the API prices of those models. It then records the customers’ token usage and applies the profit-margin markup automatically.

As we’ve previously reported, there are a variety of ways that AI startups are charging for their wares. Many of them charge tiered monthly subscriptions that have usage-rate caps; once those are hit, the subscriber may be charged more for exceeding the limit.

For instance, Cursor last year changed the pricing on some of its tiers from unlimited use to rate-limited usage, with fees for extra consumption on top.

Without a usage cap, users could run up big bills for a startup with the model makers, and force the startup to operate in the red. This is especially acute for agentic startups. The more their customers use their agents, the more tokens they consume from the underlying model provider, be that OpenAI, Google Gemini, Anthropic or others — making pricing and business model decisions especially critical.

Techcrunch event

San Francisco, CA
|
October 13-15, 2026

Stripe has also introduced its own AI gateway, a tool that give users access to multiple models, letting them choose the best one for the job. But the billing tool also works with third-party gateways that are already popular, like those offered by Vercel and OpenRouter, according to a tweet by a Stripe product manager,

There are, of course, other startups offering AI model cost management features with their own gateways. OpenRouter, for instance, which grants access to over 300 models, charges a flat 5.5% markup over the token fees for its first-tier plan, and offers budget controls, too.

Stripe is not currently charging its own markup on the gateway, its product manager said on Twitter. The feature, however, is still in waitlist mode. Either way, if Stripe can help startups easily turn tracking and billing for this expense into a profit-maker, it could be a game-changer. Stripe did not immediately respond to a request for comment on when the feature may be generally available.