RTX to Spark: Gemma 4 Accelerated for Agentic AI


Open models are driving a new wave of on-device AI, extending innovation beyond the cloud to everyday devices. As these models advance, their value increasingly depends on access to local, real-time context that can turn meaningful insights into action. 

Designed for this shift, Google’s latest additions to the Gemma 4 family introduce a class of small, fast and omni-capable models built for efficient local execution across a wide range of devices.  

Google and NVIDIA have collaborated to optimize Gemma 4 for NVIDIA GPUs, enabling efficient performance across a range of systems — from data center deployments to NVIDIA RTX-powered PCs and workstations, the NVIDIA DGX Spark personal AI supercomputer and NVIDIA Jetson Orin Nano edge AI modules.

Gemma 4: Compact Models Optimized for NVIDIA GPUs 

The latest additions to the Gemma 4 family of open models spanning E2B, E4B, 26B and 31B variants  are designed for efficient deployment from edge devices to high-performance GPUs.  

All configurations measured using Q4_K_M quantizations BS = 1, ISL = 4096 and OSL = 128 on NVIDIA GeForce RTX 5090 and Mac M3 Ultra desktops. Token generation throughput measured on llama.cpp b7789, using the llama-bench tool.

This new generation of compact models supports a range of tasks, including: 

  • Reasoning: Strong performance on complex problem-solving tasks.  
  • Coding: Code generation and debugging for developer workflows.   
  • Agents: Native support for structured tool use (function calling).  
  • Vision, Video and Audio Capabilities: Enables rich multimodal interactions for object recognition, automated speech recognition, and document or video intelligence. 
  • Interleaved Multimodal Input: Mix text and images in any order within a single prompt.  
  • Multilingual: Out-of-the-box support for 35+ languages, pretrained on 140+ languages. 

The E2B and E4B models are built for ultraefficient, low-latency inference at the edge, running completely offline with near-zero latency across many devices including Jetson Nano modules. 

The 26B and 31B modelsare designed for high-performance reasoning and developer-centric workflows, making them well suited for agentic AI. Optimized to deliver state-of-the-art, accessible reasoning, these models run efficiently on NVIDIA RTX GPUs and DGX Spark — powering development environments, coding assistants and agent-driven workflows.  

As local agentic AI continues to gain momentum, applications like OpenClaw are enabling always-on AI assistants on RTX PCs, workstations and DGX Spark. The latest Gemma 4 models are compatible with OpenClaw, allowing users to build capable local agents that draw context from personal files, applications and workflows to automate tasks. Learn how to run OpenClaw for free on RTX GPUs and DGX Spark or using the DGX Spark OpenClaw playbook. 

Getting Started: Gemma 4 on RTX GPUs and DGX Spark 

NVIDIA has collaborated with Ollama and llama.cpp to provide the best local deployment experience for each of the Gemma 4 models.    

To use Gemma 4 locally, users can download Ollama to run Gemma 4 models or install llama.cpp and pair it with the Gemma 4 GGUF Hugging Face checkpoint. Additionally, Unsloth provides day-one support with optimized and quantized models for efficient local fine-tuning and deployment via Unsloth Studio. Start running and fine-tuning Gemma 4 in Unsloth Studio today. 

Running open models like the Gemma 4 family on NVIDIA GPUs achieves optimal performance because NVIDIA Tensor Cores accelerate AI inference workloads to deliver higher throughput and lower latency for local execution. Plus, the CUDA software stack ensures broad compatibility across leading frameworks and tools, enabling new models to run efficiently from day one.  

This combination allows open models like Gemma 4 to scale across a wide range of systems — from Jetson Orin Nano at the edge to RTX PCs, workstations and DGX Spark — without requiring extensive optimization. 

Check out the NVIDIA technical blog for more details on how to get started with Gemma 4 on NVIDIA GPUs and learn more about NVIDIA’s work on open models. 

#ICYMI: The Latest Updates for RTX AI PCs 

✨ Catch up on RTX AI Garage blogs for a host of agentic AI announcements from NVIDIA GTC, such as new open models for local agents. These models include NVIDIA Nemotron 3 Nano 4B and Nemotron 3 Super 120B, and optimizations for Qwen 3.5 and Mistral Small 4. 

 NVIDIA recently introduced NVIDIA NemoClaw, an open source stack that optimizes OpenClaw experiences on NVIDIA devices by increasing security and supporting local models.  

🚀 Accomplish.ai announced Accomplish FREE, a no-cost version of its open source desktop AI agent with built-in models. It harnesses NVIDIA GPUs to run open weight models locally, while a hybrid router dynamically balances workloads between local RTX hardware and the cloud — enabling fast, private, zero-configuration execution without requiring an application programming interface key. 

Plug in to NVIDIA AI PC on FacebookInstagramTikTok and X — and stay informed by subscribing to the RTX AI PC newsletter. 

Follow NVIDIA Workstation on LinkedIn and X 



How Autonomous AI Agents Become Secure by Design With NVIDIA OpenShell



Autonomous agents mark a new inflection point in AI. Systems are no longer limited to generating responses or reasoning through tasks. They can take action: Agents can read files, use tools, write and run code, and execute workflows across enterprise systems, all while expanding their own capabilities. 

Application-layer risk grows exponentially when agents continuously improve and evolve. The NVIDIA OpenShell runtime is being built to address this. 

Part of NVIDIA Agent Toolkit, OpenShell is an open source, secure-by-design runtime for running autonomous agents such as claws. It works by ensuring each agent runs inside its own sandbox, separating application-layer operations from infrastructure-layer policy enforcement.

This means security policies are out of reach of the agent — they’re applied at the system level. Instead of relying on behavioral prompts, OpenShell enforces constraints on the environment the agent runs in — meaning the agent cannot override policies, or leak credentials or private data, even if compromised. 

With OpenShell, enterprises can separate agent behavior, policy definition and policy enforcement. Organizations gain a single, unified policy layer to define and monitor how autonomous systems operate. Coding agents, research assistants and agentic workflows all run under the same runtime policies regardless of host operating system, simplifying compliance and operational oversight.

This is the “browser tab” model applied to agents: Sessions are isolated, resources are controlled and permissions are verified by the runtime before any action takes place.

Securing autonomous systems requires an integrated ecosystem. OpenShell is designed to add privacy and security controls for AI agents. NVIDIA is collaborating with security partners, including Cisco, CrowdStrike, Google Cloud, Microsoft Security and TrendAI, to align runtime policy management and enforcement for agents across the enterprise stack. 

OpenShell Provides an Enterprise-Grade Sandbox for Building Personal AI Assistants

NVIDIA NemoClaw is an open source reference stack that simplifies installing OpenClaw always-on assistants with the OpenShell runtime and NVIDIA Nemotron models in a single command. 

NemoClaw provides enthusiasts with an open reference for building self-evolving personal AI agents, or claws. Since security needs vary, NemoClaw provides a reference example for policy-based privacy and security guardrails to give users more control over their agents’ behavior and data-handling. Users can customize it for their specific use cases — much like adjusting security preferences for applications on a phone. 

NemoClaw includes an example configuration of OpenShell that defines how the agent should interact with systems. NemoClaw uses open source models like NVIDIA Nemotron alongside OpenShell. 

This enables self-evolving claws to run more securely in clouds, on premises or on personal computers, including NVIDIA GeForce RTX PCs and laptops or NVIDIA RTX PRO-powered workstations, as well as NVIDIA DGX Station and NVIDIA DGX Spark AI supercomputers.

Both OpenShell and NemoClaw are in early preview. NVIDIA is building in the open with the community and its partners to enable enterprises to scale self-evolving, long-running autonomous agents safely, confidently and in compliance with global security standards.

Get started with NVIDIA OpenShell and launch a ready‑to‑use environment on NVIDIA Brev, or explore the open source project on GitHub.

New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI



Launched today, NVIDIA Nemotron 3 Super is a 120‑billion‑parameter open model with 12 billion active parameters designed to run complex agentic AI systems at scale. 

Available now, the model combines advanced reasoning capabilities to efficiently complete tasks with high accuracy for autonomous agents.

AI-Native Companies: Perplexity offers its users access to Nemotron 3 Super for search and as one of 20 orchestrated models in Computer. Companies offering software development agents like CodeRabbit, Factory and Greptile are integrating the model into their AI agents along with proprietary models to achieve higher accuracy at lower cost. And life sciences and frontier AI organizations like Edison Scientific and Lila Sciences will power their agents for deep literature search, data science and molecular understanding.

Enterprise Software Platforms: Industry leaders such as Amdocs, Palantir, Cadence, Dassault Systèmes and Siemens are deploying and customizing the model to automate workflows in telecom, cybersecurity, semiconductor design and manufacturing. 

As companies move beyond chatbots and into multi‑agent applications, they encounter two constraints.

The first is context explosion. Multi‑agent workflows generate up to 15x more tokens than standard chat because each interaction requires resending full histories, including tool outputs and intermediate reasoning. 

Over long tasks, this volume of context increases costs and can lead to goal drift, where agents lose alignment with the original objective.

The second is the thinking tax. Complex agents must reason at every step, but using large models for every subtask makes multi-agent applications too expensive and sluggish for practical applications.

Nemotron 3 Super has a 1‑million‑token context window, allowing agents to retain full workflow state in memory and preventing goal drift.

Nemotron 3 Super has set new standards, claiming the top spot on Artificial Analysis for efficiency and openness with leading accuracy among models of the same size. 

The model also powers the NVIDIA AI-Q research agent to the No. 1 position on DeepResearch Bench and DeepResearch Bench II leaderboards, benchmarks that measure an AI system’s ability to conduct thorough, multistep research across large document sets while maintaining reasoning coherence. 

Hybrid Architecture

Nemotron 3 Super uses a hybrid mixture‑of‑experts (MoE) architecture that combines three major innovations to deliver up to 5x higher throughput and up to 2x higher accuracy than the previous Nemotron Super model. 

  • Hybrid Architecture: Mamba layers deliver 4x higher memory and compute efficiency, while transformer layers drive advanced reasoning.
  • MoE: Only 12 billion of its 120 billion parameters are active at inference. 
  • Latent MoE: A new technique that improves accuracy by activating four expert specialists for the cost of one to generate the next token at inference.
  • Multi-Token Prediction: Predicts multiple future words simultaneously, resulting in 3x faster inference.

On the NVIDIA Blackwell platform, the model runs in NVFP4 precision. That cuts memory requirements and pushes inference up to 4x faster than FP8 on NVIDIA Hopper, with no loss in accuracy. 

Open Weights, Data and Recipes

NVIDIA is releasing Nemotron 3 Super with open weights under a permissive license. Developers can deploy and customize it on workstations, in data centers or in the cloud.

The model was trained on synthetic data generated using frontier reasoning models. NVIDIA is publishing the complete methodology, including over 10 trillion tokens of pre- and post-training datasets, 15 training environments for reinforcement learning and evaluation recipes. Researchers can further use the NVIDIA NeMo platform to fine-tune the model or build their own. 

Use in Agentic Systems

Nemotron 3 Super is designed to handle complex subtasks inside a multi-agent system. 

A software development agent can load an entire codebase into context at once, enabling end-to-end code generation and debugging without document segmentation. 

In financial analysis it can load thousands of pages of reports into memory,  eliminating the need to re-reason across long conversations, which improves efficiency. 

Nemotron 3 Super has high-accuracy tool calling that ensures autonomous agents reliably navigate massive function libraries to prevent execution errors in high-stakes environments, like autonomous security orchestration in cybersecurity.

Availability

NVIDIA Nemotron 3 Super, part of the Nemotron 3 family, can be accessed at build.nvidia.com, Perplexity, OpenRouter and Hugging Face. Dell Technologies is bringing the model to the Dell Enterprise Hub on Hugging Face, optimized for on-premise deployment on the Dell AI Factory, advancing multi-agent AI workflows. HPE is also bringing NVIDIA Nemotron to its agents hub to help ensure scalable enterprise adoption of agentic AI. 

Enterprises and developers can deploy the model through several partners:

The model is packaged as an NVIDIA NIM microservice, allowing deployment from on-premises systems to the cloud.

Stay up to date on agentic AI, NVIDIA Nemotron and more by subscribing to NVIDIA AI news, joining the community, and following NVIDIA AI on LinkedIn, Instagram, X and Facebook.

Explore self-paced video tutorials and livestreams.



NVIDIA and Partners Show That Software-Defined AI-RAN Is the Next Wireless Generation



AI-RAN is moving from lab to field, showing that a software-defined approach is the only viable way to build future AI-native wireless networks.

Ahead of Mobile World Congress (MWC), running March 2-5 in Barcelona, NVIDIA and Nokia announced new AI-RAN collaborations with top telecom operators across Europe, Asia and North America, powered by NVIDIA AI-RAN platforms. Industry pioneers T-Mobile U.S., SoftBank and Indosat Ooredoo Hutchison (IOH) passed implementation milestones, taking NVIDIA-powered AI-RAN outdoors and over the air.

New benchmarking results from partners like SynaXG showed that AI-RAN running on NVIDIA platforms delivers high-speed, carrier-grade performance — meaning extreme reliability — across multiple 5G spectrum bands. And over 20 AI-RAN Alliance demos built on NVIDIA platforms will be showcased at MWC, highlighting how AI is boosting 5G performance and efficiency, and unlocking new edge AI applications.

All of this represents momentum and convergence toward a common, software-defined foundation that will set the stage for secure, open and AI-native 6G systems.

AI-RAN Goes From Lab to Live

Top telecom operators and partners are using NVIDIA platforms to bring AI-RAN to commercial deployment. 

T-Mobile U.S. demonstrated concurrent AI and RAN processing on NVIDIA AI-RAN platform using Nokia’s CUDA-accelerated RAN software. In T-Mobile’s over-the-air field environment, Nokia’s AirScale massive multiple-input and multiple-output (MIMO) radio in the 3.7GHz band supported commercial devices running applications like video streaming, generative AI and AI-powered video captioning, alongside 5G. 

SoftBank’s AITRAS live field trial achieved an industry-first, 16-layer massive MIMO using fully software-defined 5G running on NVIDIA’s AI-RAN platform, marking an important technical milestone toward AI-RAN commercialization. 

IOH has implemented software-defined 5G with Nokia’s vRAN software on NVIDIA AI-RAN platforms, moving from proof of concept to pre-commercial field validation. This milestone was showcased at MWC through Southeast Asia’s first AI-powered 5G call, where AI and network intelligence operated seamlessly to enable secure, real-time cross-border connectivity, including responsive remote control of a robotic dog over the live 5G network. This achievement demonstrates IOH’s readiness to scale AI-native network capabilities and bring intelligent connectivity to communities across Indonesia.

SynaXG demonstrated fully software-defined AI-RAN using NVIDIA AI Aerial — a suite of accelerated computing platforms, software libraries and tools to build, train, simulate and deploy AI-native wireless networks — running 4G, 5G in both sub-6GHz [FR1] and millimeter wave [FR2] spectrum bands, alongside agentic AI workloads, on a single NVIDIA GH200 server. This marks the world’s first implementation of AI-RAN on FR2 bands.

SynaXG’s setup activated 20 component carriers with both a centralized unit (CU) and distributed unit (DU) on one platform, achieving a throughput of 36 Gbps and under 10 milliseconds latency. These breakthrough results highlight AI-RAN-based 5G performance as well as seamless orchestration between AI and RAN workloads.

Tripled Pace of AI-RAN Innovation

This year’s MWC will see triple the number of AI-RAN innovations over last year, with 26 out of 33 AI-RAN Alliance demos built using NVIDIA AI Aerial and a software-defined architecture.

Some of these demos include:

  • DeepSig is reinventing how devices “speak” to networks by letting AI learn a smarter signal format at both ends of the link — the communications channel that connects two devices. An AI‑native air interface jointly learns how to best encode and decode signals using neural techniques at the device and base station, removing pilot overheads and adapting to site‑specific channels. Early results on NVIDIA platforms show up to about 2x higher throughput and better spectral and energy efficiency from the same spectrum.
  • SUTD, NVIDIA and partners will show how robots and autonomous vehicles can distribute their “thinking” across the device, edge and cloud — bringing split-inferencing from concept to implementation. By deciding in real time where each AI task runs, the demos prove how AI-RAN can meet tight latency, privacy and coverage service-level agreements to scale physical AI and vision language models through the network edge.
  • zTouch Networks and partners built an AI-RAN orchestration blueprint showing how operators can safely share GPUs across AI and RAN workloads. By using NVIDIA Multi-Instance GPU technology, the blueprint steers resources in real time, maximizing GPU utilization and improving energy management while ensuring RAN quality of service. This is a key step for making multi-tenant AI-RAN solutions ready for commercial use, so operators can turn GPU capacity into revenue.
  • Northeastern University and SoftBank will demonstrate an AI switching solution for NVIDIA AI Aerial that flips seamlessly and without data loss between AI and classic algorithms for channel estimation. This selects, in real time, the best possible processing solution at all times depending on conditions, improving stability and throughput while proving AI can coexist with classical approaches.

“AI-RAN is emerging as a unifying architecture for future radio networks,” said Alex Choi, chair of the AI-RAN Alliance. “By aligning operators, vendors and researchers around software-defined, GPU-accelerated architectures, we are boosting innovation, validating new concepts quickly and building the foundation for AI-native 6G, now.”

As intelligence moves into the physical world, autonomous systems such as robots and cars depend on AI-RAN networks to see, sense, reason and act.

Capgemini is working within Project ULTIMO, a Horizon Europe-funded initiative, to show how AI-RAN can support large-scale autonomous mobility services across European cities. Autonomous shuttles equipped with the NVIDIA Jetson Orin module process sensor data locally, while select video and telemetry streams are sent over 5G to agentic AI applications on NVIDIA AI-RAN servers. These workloads handle scene understanding, incident and safety detection, and accessibility insights at scale, while mission-critical 5G gets priority access to GPU resources.

A Growing Ecosystem

A growing ecosystem of partners is forming around NVIDIA-powered AI-RAN platforms, enabling operators to choose from a range of deployment solutions. NVIDIA Aerial RAN Computer (ARC) platforms harness the NVIDIA Grace CPU and a variety of GPUs, providing a high-performance, energy-efficient compute foundation for AI-native RAN infrastructure.

  • Quanta Cloud Technology (QCT) is announcing commercial off-the-shelf AI-RAN products that support NVIDIA ARC platforms and Nokia software, giving operators standardized building blocks for AI-RAN.
  • Supermicro is extending support across the full NVIDIA AI-RAN portfolio, including NVIDIA ARC-Pro and NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs, as well as ARC-Compact systems with Nokia software.
  • WNC has introduced a new AI-optimized indoor and outdoor open radio unit, integrated with NVIDIA AI Aerial Testbed and NVIDIA ARC platforms, that supports 5GA and 6G use cases.
  • Eridan has launched a 4T4R O-RU along with its 2T2R O-RU, which was integrated with NVIDIA AI Aerial, and a DU running on the NVIDIA DGX Spark desktop supercomputer, combining spectrally efficient radios with GPU-based baseband processing to create a powerful and portable outdoor base station.
  • LITEON has completed integration of its sub-6 GHz and millimeter wave radio units with NVIDIA AI Aerial, and has expanded its collaboration with ecosystem partners like Supermicro and SynaXG to accelerate AI-RAN commercialization.

Laying the Foundation for Open, Secure, AI-Native 6G

NVIDIA’s latest State of AI in Telecom report showed that the industry is stepping up AI-native RAN and 6G investments — signaling a major intercept ahead of the traditional 6G deployment cycle, with 77% of respondents anticipating a much faster time to deployment of this new AI-native wireless network architecture.

This latest progress on software-defined AI-RAN is setting the stage for secure, open and AI-native 6G systems.

NVIDIA has already open sourced NVIDIA Aerial CUDA-accelerated RAN libraries, fueling the pace of AI-RAN innovation. NVIDIA has also now joined the OCUDU (Open CU DU) Ecosystem Foundation, hosted by the Linux Foundation, contributing to open source RAN software development to accelerate research and commercialization for next-generation wireless networks.

Learn more by meeting NVIDIA and partners at Mobile World Congress. Explore key insights from the State of AI in Telecom survey.

NVIDIA Advances Autonomous Networks With Agentic AI Blueprints and Telco Reasoning Models



Autonomous networks — intelligent, self-managing telecommunications operations — are moving from a future vision to a current priority for telecom operators. In the latest NVIDIA State of AI in Telecommunications report, network automation emerged as the top AI use case for investment and return on investment.

Automation is different from autonomy. Beyond executing predefined workflows, autonomous networks must understand operator intent, reason over tradeoffs and decide what actions to take. Reasoning models and AI agents fine-tuned on telecom data are key to enabling this shift.

For networks to become autonomous, there’s a need for an end-to-end agentic system that includes key components like telco network models and AI agents that talk to each other and use network simulation tools to validate actions.

Ahead of Mobile World Congress Barcelona, NVIDIA unveiled an open NVIDIA Nemotron-based large telco model (LTM), a comprehensive guide for building reasoning agents for network operations, and new NVIDIA Blueprints for energy saving and network configuration with multi-agent orchestration to help operators advance toward autonomy.

And as part of GSMA’s new Open Telco AI initiative — launching tomorrow — NVIDIA is releasing the new open source LTM, implementation guide and agentic AI blueprints as open resources through GSMA, an organization for the mobile communications industry.

Open Nemotron 3 Large Telco Model Brings Reasoning to Telecom 

For telcos to successfully operationalize generative and agentic AI across their operations, AI models must have the ability to understand the language of telecom and reason through complex workflows. NVIDIA has collaborated with AdaptKey AI to release a new open source, 30-billion-parameter NVIDIA Nemotron LTM that operators around the world can use to build autonomous networks.

Built on the NVIDIA Nemotron 3 family of foundation models and fine-tuned by AdaptKey AI using open telecom datasets including industry standards and synthetic logs, the LTM is optimized to understand telecom industry terminology and reason through workflows such as fault isolation, remediation planning and change validation.

As an open model, the Nemotron LTM gives telcos full transparency into how it was trained and what data was used, enabling secure and fast on‑premises deployment within their networks, where they can build and run agents directly. It also lets telcos safely adapt and extend telecom‑tuned reasoning with their own network and operational data, so they can move toward autonomous operations without sacrificing control over data or security.

Teaching AI Agents to Reason Like Network Engineers

NVIDIA and Tech Mahindra have published an open source guide that shows telecom operators how to fine-tune domain-specific reasoning models and build agents that can safely execute network operations center (NOC) workflows.

The guide outlines a framework for teaching models to reason like NOC engineers: focus on high‑impact, high‑frequency incident categories, translate expert resolutions into step‑by‑step procedures and turn those into structured reasoning traces that capture each action, tool call, outcome and decision. These traces become the “thinking examples” the model learns from, so it understands not just what to do, but why a particular sequence of checks and fixes is safe and effective.

Using the NVIDIA NeMo-Skills pipeline, operators can fine-tune a reasoning model on these traces, laying the foundation for telco-specialized AI agents that can reason and solve problems like a network engineer.

Maximizing Energy Efficiency With New Intent-Driven Energy Saving Blueprint

Autonomous networks rely on closed‑loop operation: models that understand the network, agents that act on intent and simulation that feeds results back into the system to validate and refine decisions. The new NVIDIA Blueprint for intent-driven RAN energy efficiency brings these pieces together, helping operators systematically reduce power consumption in 5G radio access networks (RAN) while maintaining quality of service.

The blueprint integrates network test and measurement leader VIAVI’s TeraVM AI RAN Scenario Generator (AI RSG) platform to generate synthetic network data — including cell utilization, user throughput and other traffic patterns — and convert it into a simple, queryable format.

An energy planning agent then reasons over the synthetic data to generate energy-saving policies that can be simulated in AI RSG, allowing operators to safely validate energy-saving policies in a closed loop to meet their intent without changing live configurations or impacting subscribers.

Telcos Put the NVIDIA Blueprint for Network Configuration to Work

The NVIDIA Blueprint for telco network configuration is being adopted by operators around the world.

Cassava Technologies is using the blueprint to build Cassava Autonomous Network, an agentic platform designed to optimize Africa’s diverse, multi-vendor mobile network environment. The platform implements three agents: one to monitor the network and recommend configuration changes, one to apply changes with documentation and governance, and one to assess the impact of changes made and safely roll them back if they have unintended effects.

NTT DATA is implementing the blueprint to bring intelligence to traffic regulation, helping the network manage surges when users reconnect after an outage, and is deploying it with a tier 1 operator in Japan.

An AI agent looks at real-time demand across the network and then decides when and how to admit new users on specific cells. As conditions stabilize, the agent adapts its decisions, turning what used to be manual configurations into a data-driven optimization cycle for more resilient mobile networks.

Evolving Network Configuration With Multi-Agent Orchestration

To help telcos design, observe and optimize complex agentic workflows across the RAN, NVIDIA and BubbleRAN are enhancing the NVIDIA Blueprint for telco network configuration with NVIDIA NeMo Agent Toolkit (NAT) and BubbleRAN Agentic Toolkit (BAT), complementary frameworks for multi-agent orchestration.

BubbleRAN is integrating NAT and BAT into its Opti-Sphere platform to manage network monitoring, configuration and validation agents more flexibly across containers and workloads, and connect them to tools that report network metrics and traffic status so they can continuously propose and validate configuration changes.

Telenor Group will be the first telco to adopt the blueprint with BubbleRAN to enhance its 5G network for Telenor Maritime, the group’s global connectivity provider at sea.

Learn more about the latest advancements in agentic AI for telecommunications at Mobile World Congress, taking place in Barcelona from March 2-5. 

See notice regarding software product information.

Survey Reveals AI Is Delivering Clear Return on Investment in Healthcare


AI is accelerating every aspect of healthcare — from radiology and drug discovery to medical device manufacturing and new treatment methods enabled by digital twins of the human body.

NVIDIA’s second annual “State of AI in Healthcare and Life Sciences” survey report reveals how the industry is moving from AI experimentation to execution, reaping return on investment (ROI) on core applications like medical imaging and drug discovery.

The industry is also embracing open source software and AI models to tackle specific use cases, as well as exploring using agentic AI to speed knowledge retrieval and research paper analysis.

Highlights from this year’s report include:

  • 70% of respondents said their organizations are actively using AI, up from 63% in 2024.
  • 69% said they’re using generative AI and large language models, up from 54%.
  • 82% said open source software and models are moderately to extremely important to their organizations’ AI strategy.
  • 47% said they’re using or assessing agentic AI.
  • 85% of executives said AI is helping increase revenue, and 80% said it’s helping reduce costs.

“Over the next 12-18 months, the most visible and scalable impact of AI will come from logistics and administrative streamlining,” said John Nosta, president of NostaLab, a healthcare think tank. “That’s where adoption curves are already steep — scheduling, documentation, coding, utilization management and care coordination.”

Read more below on some of the report’s key findings.

AI Adoption Ramps Up Across Healthcare and Life Sciences

AI adoption is up across every industry segment in this year’s survey — spanning digital healthcare, pharmaceutical and biotechnology, payers and providers, and medical technology and tools — with digital healthcare leading at 78%, followed by medical technology at 74%.

The top industry workload was generative AI and large language models, according to 69% of respondents. AI for data analytics and data science was the second most-used workload, followed by predictive analytics. New to the survey, agentic AI ranked fourth, with 47% of respondents saying they’re using or assessing AI agents.

“Scaling generative AI in healthcare starts with focusing on real clinical and operational problems, rather than the technology itself,” said Dr. Annabelle Painter, clinical AI strategy lead at Visiba U.K. “The organizations seeing impact are those that embed AI into existing workflows instead of layering AI on top as a separate tool.”

Healthcare and life sciences organizations are deploying these AI workloads across a variety of use cases, each specific to their primary functions. For example, 61% of respondents from medical technology said they’re using AI for medical imaging, such as radiologists using it to work more quickly and efficiently, while 57% from pharmaceutical and biotechnology said drug discovery is being driven by AI.

For the entire industry, the top AI use cases were clinical decision support (such as radiologists highlighting areas of concern on a scan), medical imaging and workflow optimization.

AI Budgets to Increase With Strong ROI

AI is helping healthcare and life sciences organizations become even better at their core competencies — underscoring strong ROI.

In addition to increasing annual revenue and reducing annual costs, AI is boosting back-office productivity through workflow optimization and is scaling across other key business operations such as patient interaction and administrative tasks.

For example, 57% of respondents from the medical technology segment reported seeing ROI from deploying AI for medical imaging. Nearly half (46%) of pharmaceutical and biotechnology respondents said AI for drug discovery and development was among their top ROI use cases.

The top ROI use case for digital healthcare providers was virtual health assistants and chatbots, according to 37%, while 39% of respondents from payers and providers (which include hospitals, primary care providers and insurance companies) cited administrative tasks and workflow optimization as their top area of ROI.

As a result of AI’s positive impact, 85% of respondents said their AI budgets would increase this year, with another 12% saying budgets would stay the same. For almost half of respondents (46%), AI spending will increase significantly, by more than 10%.

“Healthcare organizations that successfully integrate AI are those that explicitly fund and prioritize evaluation as a core operational function, ensuring AI delivers measurable improvements in safety, quality and patient care over time,” said Painter.

Using Open Source for Domain-Specific AI Deployment

Leaning into open source models and software allows enterprises to build domain-specific applications, lending them greater flexibility and efficiency while boosting business returns.

The healthcare industry has embraced open source, with 82% of survey respondents stating it’s moderately to extremely important to their AI strategy.

“Open models will shape the intellectual field,” said Nosta. “They are essential for exploration and for keeping the field honest. But in clinical environments where safety, liability and accountability are nonnegotiable, proprietary systems will remain necessary for validation, integration and trust. The key insight here is that discovery will be open, and deployment will demand stewardship.”

Download the “State of AI in Healthcare and Life Sciences: 2026 Trends” report for in-depth results and insights.

Sign up for NVIDIA’s healthcare and life sciences newsletter.

India Fuels Its AI Mission With NVIDIA


India is the nexus of AI innovation this week as the host of the AI Impact Summit, which brings together global heads of state and industry to chart the future of AI.

At the summit, taking place in New Delhi, industry leaders, government agencies, educational institutions and startups are sharing how they’re working with NVIDIA to drive the AI industrial revolution in the world’s most populous country.

These initiatives support the IndiaAI Mission, a government effort that’s infusing India’s AI ecosystem with over $1 billion to bolster the nation’s compute capacity and foster the development of sovereign AI datasets, frontier models and applications. The mission also supports AI education, startup innovation and frameworks for trustworthy AI.

Read how NVIDIA is supporting IndiaAI Mission priorities including:

NVIDIA Cloud Partners Boost India AI Infrastructure

To achieve its AI ambitions, India is investing heavily in its computing infrastructure. Under the IndiaAI Compute Pillar, the nation is building out its AI cloud offerings with systems including tens of thousands of NVIDIA GPUs.

NVIDIA is collaborating with next‑generation cloud providers Yotta, L&T and E2E Networks to deliver advanced AI factories to meet India’s growing need for AI compute and enable it to develop AI models and services that drive innovation.

  • Yotta is a hyperscale data center and cloud provider building large‑scale sovereign AI infrastructure for India, branded as Shakti Cloud, powered by over 20,000 NVIDIA Blackwell Ultra GPUs. Its campuses in Navi Mumbai and Greater Noida deliver GPU‑dense, high‑bandwidth AI cloud services on a pay‑per‑use model, designed to make advanced AI training and inference affordable and compliant for Indian enterprises and public sector customers.
  • Larsen & Toubro (L&T) is building sovereign, gigawatt-scale NVIDIA AI factory infrastructure in India to reinforce the country’s position as a global AI powerhouse in alignment with the IndiaAI Mission. The roadmap includes initial expansions in Chennai to 30 megawatts as well as a new 40-megawatt facility in Mumbai. These facilities will power sovereign cloud workloads and hyperscale deployments, delivering secure, energy‑efficient infrastructure for advanced AI applications.
  • E2E Networks is building an NVIDIA Blackwell GPU cluster on its TIR platform, hosted at the L&T Vyoma Data Center in Chennai. The TIR cloud compute platform will feature NVIDIA HGX B200 systems and NVIDIA Enterprise software as well as NVIDIA Nemotron open models to supercharge sovereign development across agentic AI, healthcare, finance, manufacturing and agriculture.

India’s AI cloud infrastructure will host workloads as well as manufacture intelligence for model training, fine-tuning and high‑scale inference. Capacity within these data centers will be reserved for model builders, startups, researchers and enterprises to build, fine-tune and deploy AI in India.

Further expanding access to NVIDIA AI infrastructure in India, Netweb Technologies is launching its Tyrone Camarero AI Supercomputing systems built on the NVIDIA Grace Blackwell architecture. The NVIDIA GB200 NVL4 platforms — manufactured in India by Netweb under the government’s “Make in India” mission — feature four NVIDIA Blackwell GPUs and two NVIDIA Grace CPUs to power scientific computing, model training and inference.

NVIDIA and India AI-Native Companies Build the Nation’s Frontier AI Models

Another key goal of the IndiaAI Mission — led by its Innovation Center Pillar — is to develop and deploy foundation models trained on India-specific data and domestic AI infrastructure.

For a nation as multilingual as India — with 22 constitutionally recognized languages and over 1,500 more recorded by the country’s census — frontier AI models are a powerful tool to help its more than 1.4 billion residents interact with technology in their primary language.

Organizations across the country are building AI applications with NVIDIA Nemotron to support public-sector services, financial systems and enterprise operations in multiple languages.

NVIDIA Nemotron open models, datasets, tools and libraries enable organizations to build frontier speech, language and multimodal models at scale and across languages for government, consumer and enterprise applications. It includes India-specific datasets like Nemotron-Personas-India, an open dataset built from publicly available census data using NeMo Data Designer that includes 21 million fully synthetic Indic personas to enable population-scale sovereign AI development.

Adopters in India of Nemotron — and NeMo Curator, an open library for multilingual and multimodal data curation — include:

  • BharatGen, a sovereign AI initiative supported by the Government of India aimed at strengthening the country’s multilingual and multimodal AI ecosystem. As part of this effort, BharatGen has developed a 17-billion-parameter mixture-of-experts (MoE) model from the ground up, using the NVIDIA NeMo framework for pretraining and the NeMo RL library for post-training. The open source models are designed to power applications across public services, agriculture, security and cultural preservation.
  • Chariot, a company building AI systems for speech and multimodal communication. Using the NeMo framework, Chariot is developing an 8-billion-parameter model for real-time text to speech, supporting applications that improve accessibility and digital interaction across consumer and enterprise use cases.
  • Commotion, backed by Tata Communications, which has developed an AI operating system to automate complex enterprise workflows. By integrating NVIDIA Nemotron models and speech capabilities, the platform enables governed, production-grade AI deployments, helping enterprises scale AI across critical business operations.
  • CoRover.ai, which has deployed NVIDIA Nemotron Speech open models and NVIDIA Riva libraries for end-to-end, ultralow-latency speech AI — including the NVIDIA Riva Whisper v3 model for multilingual automatic speech recognition in English, Hindi and Gujarati. Powering customer service applications for the Indian Railway Catering and Tourism Corporation, CoRover’s platform supports around 10,000 concurrent users and more than 5,000 daily ticket bookings.
  • Gnani.ai, which offers enterprises a multilingual agentic AI platform that can interact with customers through voice and text. Gnani is building a 14-billion-parameter speech-to-speech model built on NVIDIA Nemotron Speech models, datasets and NeMo libraries including NeMo libraries through NVIDIA Cloud Partner E2E Networks — with plans to expand to a 32-billion-parameter model. By fine-tuning the NVIDIA Nemotron Speech model for Indic languages, Gnani has achieved a 15x reduction in inference costs, enabling the company to scale to support more than 10 million calls per day for customers in telecom, banking and hospitality.
  • National Payments Corporation of India (NPCI), which operates India’s retail payment and settlement systems and is deploying AI models to support digital financial services. Building on its production deployment of the AI-powered UPI Help Assistant — a pilot initiative for India’s Unified Payments Interface (UPI) — NPCI is exploring training FiMi, a financial model for India, using the NVIDIA Nemotron 3 Nano model and its own datasets. The model, fine-tuned with the NeMo framework, will support multilingual customer service across India’s banking ecosystem.
  • Sarvam.ai, a leader in full-stack sovereign generative AI that provides enterprise-grade multimodal, speech-to-text, text-to-speech, translation and reasoning models. The company is open sourcing its Sarvam-3 series of text and multimodal large language model variants, trained for 22 Indic languages, English math and code. Sarvam is using NeMo Curator to construct high-quality multilingual training data while adopting a subset of NVIDIA Nemotron datasets. The foundation models were pre-trained from scratch across 3B, 30B and 100B parameter sizes using the NVIDIA NeMo framework and Megatron-LM, and post-trained with NeMo RL. Training was conducted on NVIDIA H100 GPUs through NVIDIA Cloud Partners, including Yotta. With these sovereign models, Sarvam.ai’s new Pravah platform enables production-grade inference for Indian government and enterprise applications.
  • Soket.ai, which is using a modern large-model training stack on open NVIDIA Nemotron technologies, including NVIDIA Megatron and NVIDIA NeMo. These open source components enable scalable experimentation, training stability and efficient GPU usage, while preserving full control over the model’s data, design and life cycle.
  • Tech Mahindra, which has developed an 8-billion-parameter foundation model tailored for Indian languages and dialects. The model, built with Nemotron, is being designed for use in classrooms, where it can help make educational materials available in a wider range of Indian languages including Hindi, Maithili and Dogri. The team generated synthetic data with Nemotron libraries and tools such as NeMo Data Designer and conducted supervised fine-tuning with NeMo AutoModel.
  • Zoho, which is advancing its Zia LLM platform with proprietary models built using NVIDIA NeMo on the NVIDIA Blackwell and Hopper platforms, integrated across its software-as-a-service applications. This privacy-first architecture delivers contextual, production-grade AI for critical business workflows like customer relation management and finance, ensuring technology sovereignty and enterprise security at a global scale.

Developers building sovereign AI systems can access NVIDIA Nemotron and NeMo today. Nemotron models can be deployed anywhere on NVIDIA-accelerated infrastructure — including on NVIDIA DGX Spark, which is now available in India through qualified partners including PNY, RP tech India, Tech Data, a TD SYNNEX Company, as well as on NVIDIA Marketplace. A version manufactured in India as part of the “Make in India” initiative is available through Netweb.

DGX Spark also runs sovereign AI models by Indian model builders including Sarvam.ai.

Government and Academic Partnerships to Support Research in AI for Science and Engineering

Under its Application Development Initiative Pillar, the IndiaAI Mission is supporting high-impact AI applications — and its Startup Financing Pillar aims to democratize funding availability for AI entrepreneurs across the country.

NVIDIA is collaborating with government agencies, research institutions, venture capital firms and startups to advance projects aligned with these goals.

NVIDIA is collaborating with the Anusandhan National Research Foundation (ANRF), a statutory body under the Indian government, to spur even more cutting-edge AI research across the nation’s leading academic institutions. The initiative will support ANRF’s AI for Science & Engineering program and future AI programs.

NVIDIA will offer ANRF grantee institutions complimentary access to NVIDIA AI Enterprise software and specialized technical mentorship through the NVIDIA AI Technology Center. The collaboration will also include AI bootcamps, workshops and hackathons to strengthen India’s AI research ecosystem.

NVIDIA is also partnering with prominent venture capital firms including Peak XV, Z47, Elevation Capital,, Nexus Venture Partners and Accel India to identify and fund promising startups of all stages that are building AI solutions for India and international use. More than 4,000 of India’s AI startups are already part of the NVIDIA Inception program.

For more from the India AI Summit, learn how NVIDIA and global industrial software leaders are partnering with India’s largest manufacturers — and how India’s global systems integrators are building enterprise AI agents with NVIDIA.

Nemotron Labs: How AI Agents Are Turning Documents Into Real-Time Business Intelligence


Editor’s note: This post is part of the Nemotron Labs blog series, which explores how the latest open models, datasets and training techniques help businesses build specialized AI systems and applications on NVIDIA platforms. Each post highlights practical ways to use an open stack to deliver value in production — from transparent research copilots to scalable AI agents.

Businesses today face the challenge of uncovering valuable insights buried within a wide variety of documents — including reports, presentations, PDFs, web pages and spreadsheets.

Often, teams piece together insights by manually reviewing files, copying data into spreadsheets, building dashboards and using basic search or template-based optical character recognition (OCR) tools that often miss important details in complex media.

Intelligent document processing is an AI-powered workflow that automatically reads, understands and extracts insights from documents. It interprets rich formats inside those documents — including tables, charts, images and text — using AI agents and techniques like retrieval-augmented generation (RAG) to turn the multimodal content into insights that other multi-agent systems and people can easily use.

With NVIDIA Nemotron open models and GPU-accelerated libraries, organizations can build AI-powered document intelligence systems for research, financial services, legal workflows and more.

These open models, datasets and training recipes have powered strong results on leaderboards such as MTEB, MMTEB and ViDoRe V3, benchmarks for evaluating multilingual and multimodal retrieval models. Teams can choose from among the best models for tasks like search and question answering.

How Document Processing Streamlines Business Intelligence

Document intelligence systems that can pull meaning from complex layouts, scale to huge file libraries and show exactly where an answer came from are incredibly useful in high-stakes environments. These systems:

  • Understand rich document content, moving beyond simple text scraping to capture information from charts, tables, figures and mixed-language pages and treating documents as a human would by recognizing structure, relationships and context​​.
  • Handle large quantities of shifting data, ingesting and processing massive collections of documents in parallel, and keeping knowledge bases continuously up to date.​​
  • Find exactly what users need, helping AI agents pinpoint the most relevant passages, tables or paragraphs to a query so they can respond with precision and accuracy.​​
  • Show the evidence behind answers by providing citations to specific pages or charts so teams can gain transparency and auditability, which is critical in regulated industries.​​

The result is a shift from static document archives to living knowledge systems that directly power business intelligence, customer experiences and operational workflows.

Document Intelligence at Work

Intelligent document processing systems built on NVIDIA Nemotron RAG models, Nemotron Parse and accelerated computing are already reshaping how organizations across industries gain insights from their documents.​​

Justt: AI-Native Chargeback Management and Dispute Optimization

In financial services, payment disputes create significant revenue loss and operational complexity for merchants, largely because the evidence needed to handle them lives in unstructured formats. Transaction logs, customer communications and policy documents are often fragmented across systems and difficult to process at scale, making dispute handling slow, manual and costly.

Justt.ai provides an AI-driven platform that automates the full chargeback lifecycle at scale. The platform connects directly to payment service providers and merchant data sources to ingest transaction data, customer interactions and policies, then automatically assembles dispute-specific evidence that aligns with card network and issuer requirements.

The platform’s AI-powered dispute optimization, powered by Nemotron Parse, applies predictive analytics to determine which chargebacks to fight or accept, and how to optimize each response for maximum net recovery. Leading hospitality operators like HEI Hotels & Resorts use the platform to automate dispute handling across their properties, recapturing revenue while maintaining guest relationships.

By pairing document-centric intelligence with decision automation, merchants can recapture a significant portion of revenue lost to illegitimate chargebacks while reducing manual review effort.​

Read about how Justt’s chargeback management tool autonomously processes financial data to handle disputes for merchants.

Docusign: Scaling Agreement Intelligence

Docusign is the global leader in Intelligent Agreement Management, handling millions of transactions every day for more than 1.8 million customers and over 1 billion users.

Agreements are the foundation of every business, but the critical information they contain are often buried inside pages of documents. To surface the information, Docusign needed high-fidelity extraction of tables, text and metadata from complex documents like PDFs so organizations could understand and act on obligations, risks and opportunities faster.

Docusign is evaluating Nemotron Parse for deeper contract understanding at scale. Running on NVIDIA GPUs, the model combines advanced AI with layout detection and OCR. The system can reliably interpret complex tables and reconstruct tables with required information. This reduces the need for manual corrections and helps ensure that even the most complex contracts are processed with the speed and accuracy their customers expect.

With this foundation, Docusign will transform agreement repositories into structured data that powers contract search, analysis and AI-driven workflows — turning agreements into business assets that help organizations and their teams improve visibility, reduce risk and make faster decisions.

Edison Scientific: Research Across Massive Literature Scale

Edison Scientific’s Kosmos AI Scientist helps researchers navigate complex scientific landscapes to synthesize literature, identify connections and surface evidence.​

Edison needed a way to rapidly and accurately extract structured information from large volumes of PDFs, including equations, tables and figures that traditional information parsing methods often mishandle.​

By integrating the NVIDIA Nemotron Parse model into its PaperQA pipeline, Edison can decompose research papers, index key concepts and ground responses in specific passages, improving both throughput and answer quality for scientists.​​ This approach turns a sprawling research corpus into an interactive, queryable knowledge engine that accelerates hypothesis generation and literature review.​

The high efficiency of Nemotron Parse enables cost-efficient serving at scale, allowing Edison’s team to unlock the whole multimodal pipeline.

Designing an Intelligent Document Processing Application With NVIDIA Technologies

A robust, domain-specific document intelligence pipeline requires technologies that can handle data extraction, embedding and reranking, while keeping the data secure and compliant with regulations.​​

  • Extraction: Nemotron extraction and OCR models rapidly ingest multimodal PDFs, text, tables, graphs and images to convert them into structured, machine-readable content while preserving layout and semantics.
  • Embedding: Nemotron embedding models convert passages, entities and visual elements into vector representations tuned for document retrieval, enabling semantically accurate search.​​
  • Reranking: Nemotron reranking models evaluate candidate passages to ensure the most relevant content is surfaced as context for large language models (LLMs), improving answer fidelity and reducing hallucinations.​​
  • Parsing: Nemotron Parse models decipher document semantics to extract text and tables with precise spatial grounding and correct reading flow. Overcoming layout variability, they turn unstructured documents into actionable data that enhances the accuracy of LLMs and agentic workflows.

These capabilities are packaged as NVIDIA NIM microservices and foundation models that run efficiently on NVIDIA GPUs, allowing teams to scale from proof of concept to production while keeping sensitive data within their chosen cloud or data center environment.

The most effective AI systems use a mix of frontier models and open source models like NVIDIA Nemotron, with an LLM router analyzing each task and automatically selecting the model best suited for it. This approach keeps performance strong while managing computing costs and improving efficiency.

Get Started With NVIDIA Nemotron

Access a step-by-step tutorial on how to build a document processing pipeline with RAG capabilities. Explore how Nemotron RAG can power specialized agents tailored for different industries.​

Plus, experiment with Nemotron RAG models and the NVIDIA NeMo Retriever open library, available on GitHub and Hugging Face, as well as Nemotron Parse on Hugging Face.

Join the community of developers building with the NVIDIA Blueprint for Enterprise RAG — trusted by a dozen industry-leading AI Data Platform providers and available now on build.nvidia.com, GitHub and the NGC catalog.

Stay up to date on agentic AI, NVIDIA Nemotron and more by subscribing to NVIDIA AI news, joining the community and following NVIDIA AI on LinkedIn, Instagram, X and Facebook.  

Explore self-paced video tutorials and livestreams.



Reflection raises $2B to be America’s open frontier AI lab, challenging DeepSeek


Reflection, a startup founded just last year by two former Google DeepMind researchers, has raised $2 billion at an $8 billion valuation, a whopping 15x leap from its $545 million valuation just seven months ago. The company, which originally focused on autonomous coding agents, is now positioning itself as both an open-source alternative to closed frontier labs like OpenAI and Anthropic, and a Western equivalent to Chinese AI firms like DeepSeek.

The startup was launched in March 2024 by Misha Laskin, who led reward modeling for DeepMind’s Gemini project, and Ioannis Antonoglou, who co-created AlphaGo, the AI system that famously beat the world champion in the board game Go in 2016. Their background developing these very advanced AI systems is central to their pitch, which is that the right AI talent can build frontier models outside established tech giants.

Along with its new round, Reflection announced that it has recruited a team of top talent from DeepMind and OpenAI, and built an advanced AI training stack that it promises will be open for all. Perhaps most importantly, Reflection says it has “identified a scalable commercial model that aligns with our open intelligence strategy.”

Reflection’s team currently numbers about 60 people — mostly AI researchers and engineers across infrastructure, data training, and algorithm development, per Laskin, the company’s CEO. Reflection has secured a compute cluster and hopes to release a frontier language model next year that’s trained on “tens of trillions of tokens,” he told TechCrunch.

“We built something once thought possible only inside the world’s top labs: a large-scale LLM and reinforcement learning platform capable of training massive Mixture-of-Experts (MoEs) models at frontier scale,” Reflection wrote in a post on X. “We saw the effectiveness of our approach first-hand when we applied it to the critical domain of autonomous coding. With this milestone unlocked, we’re now bringing these methods to general agentic reasoning.”

MoE refers to a specific architecture that powers frontier LLMs — systems that, previously, only large, closed AI labs were capable of training at scale. DeepSeek had a breakthrough moment when it figured out how to train these models at scale in an open way, followed by Qwen, Kimi, and other models in China.

“DeepSeek and Qwen and all these models are our wake up call because if we don’t do anything about it, then effectively, the global standard of intelligence will be built by someone else,” Laskin said. “It won’t be built by America.”

Techcrunch event

San Francisco
|
October 27-29, 2025

Laskin added that this puts the U.S. and its allies at a disadvantage because enterprises and sovereign states often won’t use Chinese models due to potential legal repercussions.

“So you can either choose to live at a competitive disadvantage or rise to the occasion,” Laskin said.

American technologists have largely celebrated Reflection’s new mission. David Sacks, the White House AI and Crypto Czar, posted on X: “It’s great to see more American open source AI models. A meaningful segment of the global market will prefer the cost, customizability, and control that open source offers. We want the U.S. to win this category too.”

Clem Delangue, co-founder and CEO of Hugging Face, an open and collaborative platform for AI builders, told TechCrunch of the round, “This is indeed great news for American open-source AI. Added Delangue, “Now the challenge will be to show high velocity of sharing of open AI models and datasets (similar to what we’re seeing from the labs dominating in open-source AI).”

Reflection’s definition of being “open” seems to center on access rather than development, similar to strategies from Meta with Llama or Mistral. Laskin said Reflection would release model weights — the core parameters that determine how an AI system works — for public use while largely keeping datasets and full training pipelines proprietary.

“In reality, the most impactful thing is the model weights, because the model weights anyone can use and start tinkering with them,” Laskin said. “The infrastructure stack, only a select handful of companies can actually use that.”

That balance also underpins Reflection’s business model. Researchers will be able to use the models freely, Laskin said, but revenue will come from large enterprises building products on top of Reflection’s models and from governments developing “sovereign AI” systems, meaning AI models developed and controlled by individual nations.

“Once you get into that territory where you’re a large enterprise, by default you want an open model,” Laskin said. “You want something you will have ownership over. You can run it on your infrastructure. You can control its costs. You can customize it for various workloads. Because you’re paying some ungodly amount of money for AI, you want to be able to optimize it as much as much as possible, and really that’s the market that we’re serving.”

Reflection hasn’t yet released its first model, which will be largely text-based, with multimodal capabilities in the future, according to Laskin. It will use the funds from this latest round to get the compute resources needed to train the new models, the first of which the company is aiming to release early next year.

Investors in Reflection’s latest round include Nvidia, Disruptive, DST, 1789, B Capital, Lightspeed, GIC, Eric Yuan, Eric Schmidt, Citi, Sequoia, CRV, and others.

MLCommons and Hugging Face team up to release massive speech data set for AI research


MLCommons, a nonprofit AI safety working group, has teamed up with AI dev platform Hugging Face to release one of the world’s largest collections of public domain voice recordings for AI research.

The data set, called Unsupervised People’s Speech, contains more than a million hours of audio spanning at least 89 different languages. MLCommons says it was motivated to create it by a desire to support R&D in “various areas of speech technology.”

“Supporting broader natural language processing research for languages other than English helps bring communication technologies to more people globally,” the organization wrote in a blog post Thursday. “We anticipate several avenues for the research community to continue to build and develop, especially in the areas of improving low-resource language speech models, enhanced speech recognition across different accents and dialects, and novel applications in speech synthesis.”

It’s an admirable goal, to be sure. But AI data sets like Unsupervised People’s Speech can carry risks for the researchers who choose to use them.

Biased data is one of those risks. The recordings in Unsupervised People’s Speech came from Archive.org, the nonprofit perhaps best known for the Wayback Machine web archival tool. Because many of Archive.org’s contributors are English-speaking — and American — almost all of the recordings in Unsupervised People’s Speech are in American-accented English, per the readme on the official project page.

That means that, without careful filtering, AI systems like speech recognition and voice synthesizer models trained on Unsupervised People’s Speech could exhibit some of the same prejudices. They might, for example, struggle to transcribe English spoken by a non-native speaker, or have trouble generating synthetic voices in languages other than English.

Unsupervised People’s Speech might also contain recordings from people unaware that their voices are being used for AI research purposes — including commercial applications. While MLCommons says that all recordings in the data set are public domain or available under Creative Commons licenses, there’s the possibility mistakes were made.

According to an MIT analysis, hundreds of publicly available AI training data sets lack licensing information and contain errors. Creator advocates including Ed Newton-Rex, the CEO of AI ethics-focused nonprofit Fairly Trained, have made the case that creators shouldn’t be required to “opt out” of AI data sets because of the onerous burden opting out imposes on these creators.

“Many creators (e.g. Squarespace users) have no meaningful way of opting out,” Newton-Rex wrote in a post on X last June. “For creators who can opt out, there are multiple overlapping opt-out methods, which are (1) incredibly confusing and (2) woefully incomplete in their coverage. Even if a perfect universal opt-out existed, it would be hugely unfair to put the opt-out burden on creators, given that generative AI uses their work to compete with them — many would simply not realize they could opt out.”

MLCommons says that it’s committed to updating, maintaining, and improving the quality of Unsupervised People’s Speech. But given the potential flaws, it’d behoove developers to exercise serious caution.