Why True AI Trust Requires Deterministic Reasoning


Amazon recently announced that they have extended their automated reasoning checks in Bedrock Guardrails. This acknowledges what we at Rainbird have long advocated — the critical importance of moving beyond pure probabilistic AI to achieve true reliability and trust in AI systems.

AWS’s move demonstrates the increasing market demands for more rigorous AI validation, although it’s crucial to understand the distinction between implementing guardrails and achieving truly deterministic AI reasoning. 

Let’s explore why this all matters.

Understanding the Evolution: AWS’s Approach vs True Reasoning

AWS’s approach focuses on validating LLM outputs against predefined rules and policies through what they call “Automated Reasoning policies.” These policies can be extracted from a simple policy document and can operate on a set of variables and logical rules that are translated into natural language for accessibility.

However, there are several key distinctions between AWS’s validation approach and true deterministic reasoning:

Static Validation vs Dynamic Reasoning

AWS Bedrock Guardrails implements an “allow-all” approach with selective validation. This means the system attempts to “catch” hallucinations post LLM where explicit policies are defined. 

In contrast, Rainbird reasons over a world model that you get to design and control, captured in a knowledge graph. Decisions are explicitly derived from the graph, not via a probabilistic LLM, so no predicted outputs, just reasoned answers. 

This amounts to a ‘deny all’ approach, with only verifiable decisions being created – a fundamental difference is crucial for applications where determinism, precision and completeness are non-negotiable.

Limited vs Complete Reasoning Chain

Whilst AWS’s solution can validate final outputs, it fundamentally operates as a validation layer on top of probabilistic LLM outputs. An LLM predicts an answer based on a distribution of probabilities, and then a rule checks if that prediction matches predefined criteria. This approach can catch obvious errors, but it cannot provide insight into how conclusions were reached or validate the reasoning process itself. 

In contrast, Rainbird’s knowledge graphs enable true causal reasoning from first principles. Rather than validating probabilistic outputs after the fact, our system builds deterministic logic chains that can explain not just what decision was made, but precisely how and why each step in the reasoning process was taken. This delivers complete visibility and validation of the entire decision-making process, not just its conclusion.

This distinction becomes particularly important in complex decision-making scenarios where multiple policies or rules interact. 

While a validation approach can check individual rules, it cannot understand or reason about their relationships and dependencies. For example, when evaluating eligibility for a financial product, multiple qualifying conditions might interact in subtle ways — income thresholds might vary based on employment status, which in turn could be affected by residency requirements. 

A validation-only system can check each rule in isolation but struggles to handle these interconnected relationships.

In contrast, a true reasoning system built on knowledge graphs can navigate these complex relationships naturally. Because the knowledge is explicitly modelled as an interconnected graph rather than a list of validation rules, the system can trace multiple paths through the knowledge, understand how different conditions influence each other, and arrive at conclusions that respect the full complexity of the domain. 

This means not just more accurate decisions, but also the ability to explain precisely how different factors influenced the outcome and why alternative paths weren’t taken.

The Rainbird Difference: Beyond Validation to True Reasoning

Whilst guardrails can help prevent errors that can be pre-determined, Rainbird’s approach fundamentally transforms how AI makes decisions. Our technology doesn’t just validate outputs – it represents rules as weighted knowledge graphs that unlock true causal reasoning. This means:

  1. Deterministic by Design: Rather than attempting to constrain probabilistic outputs, our systems are inherently deterministic. Every decision follows explicit, traceable logic paths.
  1. Complete Causal Chains: Instead of simply checking if outputs match predefined rules, Rainbird provides complete causal proof for every decision, showing exactly how each conclusion was reached.
  1. Knowledge-First Architecture: Our approach begins with structured knowledge representation, enabling organisations to represent regulations, policy and operating procedures as precise models that can reason, rather than attempting to retrofit precision onto probabilistic models.

The Power of Neurosymbolic AI

What truly sets Rainbird apart is a neurosymbolic approach, combining the best of symbolic reasoning with modern machine learning. This hybrid architecture:

  • Enables precise reasoning over complex knowledge domains
  • Provides complete auditability of decision processes
  • Maintains deterministic outputs whilst handling natural language inputs
  • Eliminates hallucinations through structured knowledge representation

Looking Forward: Noesis and the Future of Trusted AI

With our new Noesis platform, we’re taking this proven approach even further. Noesis will enable developers to:

  • Automatically convert unstructured documents into executable knowledge graphs
  • Deploy deterministic reasoning capabilities through simple API calls
  • Integrate trusted AI capabilities into existing ML pipelines
  • Generate complete audit trails for every decision

The Broader Implications

AWS’s move into automated reasoning validation demonstrates growing market recognition of what Rainbird has been delivering for years: the need for more reliable, explainable AI in enterprise settings. 

However, true trust in AI requires more than guardrails – it demands systems built on deterministic reasoning from the ground up.

As organisations increasingly rely on AI for critical decisions, the ability to provide not just guardrails but complete causal reasoning is becoming more critical. This is where Rainbird’s decade of experience in delivering deterministic AI solutions to major enterprises has proven invaluable.

Whilst we welcome AWS’s recognition of the importance of automated reasoning in AI systems, the future demands more than validation layers. It requires AI systems that are inherently precise, deterministic, and explainable. This is the foundation Rainbird has built upon, and with Noesis, we’re making these capabilities accessible to developers everywhere.

The race to trustworthy AI isn’t just about constraining what AI can do wrong – it’s about building systems that do things right from the start. 

That’s the Rainbird difference.

Want to learn more about how Rainbird can bring deterministic AI to your organisation? Register for early access to Noesis and see the future of trusted AI in action.

The Case for Deterministic Agents in Agentic AI


The re-emergence of agentic AI—intelligent agents capable of autonomously planning, making decisions, taking actions, and continuously adapting to their environment—marks a significant shift in AI. Yet, with this shift comes a crucial challenge: how do we ensure that these autonomous agents are truly trustworthy and reliably aligned with organisational goals?

At Rainbird, we’ve developed a distinct type of AI agent that complements probabilistic agents in these existing systems. Our agents are built for deterministic reasoning, delivering precise and explainable decisions that are guaranteed to be justifiable. As organisations deploy various forms of AI agents, Rainbird’s deterministic agents provide a critical capability: the ability to make decisions where accuracy and transparency cannot be compromised.

The Crisis of Trust in AI Systems

Most sectors are facing unprecedented change due to AI, and nowhere is this more prominent than in professional services where the risks are considered by many to be existential. For centuries, the professional services business model has rested on twin pillars: the billable hour and unwavering trust. While AI promises to dramatically reduce the cost and time needed for complex tasks—from legal document review to financial audits—its probabilistic and non-deterministic nature simultaneously threatens the foundation of trust these firms have spent generations building. As AI drives the cost of time-based work towards zero, firms must confront an uncomfortable reality: their future value will depend almost entirely on their ability to be trusted advisors.

Yet here lies a critical contradiction for professional services firms: while embracing probabilistic AI models is necessary to remain competitive, this approach risks introducing subtle errors that can silently propagate through their work. Such risks are not hypothetical—an incident where fabricated legal authorities generated by an AI system, such as ChatGPT, were presented in a tax tribunal highlights the dangers of relying on unverified AI outputs. The internet is full of reports of such problems. When errors slip past both AI and human checks, the damage extends beyond financial losses; it undermines the client relationship and strikes at the heart of the profession’s centuries-old commitment to accuracy and reliability.

The Dangerous Fallacy of Human Oversight

The common defence that ‘humans will verify AI outputs’ overlooks a crucial paradox in human psychology. Research into automation bias reveals a troubling pattern: the more reliable an AI system appears to be, the less likely humans are to thoroughly check its outputs. This creates a dangerous feedback loop where increased accuracy actually amplifies risk. As AI systems become more sophisticated and generate consistently reliable results, human operators naturally develop a sense of trust and complacency.

This erosion of vigilance means that when errors do occur—particularly in edge cases or novel situations—they’re more likely to slip through unnoticed. The irony is stark: the very success that makes AI systems valuable in reducing human workload simultaneously increases the potential impact of any undetected errors. This psychological blind spot makes human oversight an unreliable safeguard, particularly in high-stakes domains where a single missed error could have significant consequences.

Beyond Language Models: The Critical Gap in AI Decision Making

Recent advances in large language model (LLM) tooling have enabled AI agents to plan, interact with users, orchestrate tasks, and respond to complex queries. Yet even as these agentic systems become more sophisticated, a reliance on probabilistic outputs remains a significant shortcoming. Without a layer of deterministic reasoning, agents struggle to validate their conclusions, explain their actions, or demonstrate a causal logic chain from inputs to decisions.

Rainbird provides this missing link. Instead of simply generating predicted outputs, our platform empowers AI agents to operate over a structured model of the world, captured in knowledge graphs. This transforms agentic AI from a ‘guess and hope’ approach to one in which every outcome can be traced back to its logical foundations.

Why Deterministic Reasoning Matters for Agentic AI

To understand why deterministic reasoning is indispensable, we must examine the nature of agentic AI itself. Intelligent agents do more than answer questions: they take actions that may have real-world consequences. Whether it is automating complex workflows, researching tax treaties, or issuing financial recommendations, the stakes are high. In such scenarios, “good enough” may not be enough. Certainly, the market needs decision-making processes that can be fully understood, vetted, and trusted. At the very least, an agentic approach should allow developers to use “good enough” agents where appropriate and precise and deterministic agents when it really matters. 

By building decisions from explicitly modelled knowledge, Rainbird ensures that agents do not merely predict outcomes but instead derive logically rigorous conclusions from authoritative world models that were deliberately designed and approved. Whereas probabilistic systems will produce results with no clear justification—in most cases influenced by training that is based on the public internet—Rainbird shows exactly why each conclusion follows from the knowledge at hand. For enterprises that must meet rigorous compliance or regulatory standards, this is not just beneficial, it’s essential.

Mastering Contextual Complexity

Real-world applications often involve intricate interactions between multiple variables. Take healthcare scenarios evaluating treatment eligibility or insurance underwriting tasks that must consider risk factors, eligibility criteria, and multi-jurisdictional compliance; these challenges require sophisticated navigation of interconnected rules and relationships.

Rainbird’s knowledge graph approach captures these complexities explicitly, enabling agents to:

  • Reason causally about interconnected conditions
  • Understand how changes in one variable affect others
  • Provide detailed justifications for decisions
  • Maintain consistency across complex rule sets
  • Process natural language inputs while maintaining deterministic outputs
  • Transform human-readable policies into machine-executable logic
  • Eliminate hallucinations through grounded reasoning
  • Bridge the gap between human understanding and machine inference

Introducing Noesis: Accelerating Deterministic AI Development

The challenge of building deterministic AI agents has traditionally been the time and expertise required to create comprehensive knowledge graphs. Our new Noesis platform transforms this process, automatically converting organisational documentation and expertise into executable knowledge graphs in minutes rather than weeks. 

Noesis represents a step-change in the sophistication of deterministic AI agents, and how quickly organisations can deploy them. It automatically processes policy documents, procedures, and regulatory texts, transforming them into precise knowledge graphs while maintaining the rigorous logical structure that deterministic reasoning demands. This automated conversion of knowledge—from written and unstructured form to computable deterministic model—preserves the critical relationships and rules within that documentation while eliminating the manual effort traditionally required for knowledge graph creation.

The same technology is being used to automatically design interviews with domain experts, to elicit and encode layers of tacit knowledge into graphs that already understand a base level of regulation and policy.

Key capabilities include:

  • Automated extraction of decision logic from existing documentation
  • Built-in validation to ensure knowledge graph consistency
  • Developer-friendly APIs and SDKs for seamless integration
  • Comprehensive audit trails and explanation facilities
  • Enterprise-grade security and compliance features

For developers, this means being able to rapidly create deterministic AI agents that can be trusted with critical decisions. 

Building the Future of Trusted AI

As AI agents become increasingly autonomous, the focus must shift from controlling unpredictable AI outputs to building inherently reliable AI agents. Rainbird’s decade-long experience in delivering deterministic AI reasoning to major enterprises demonstrates that trust and transparency aren’t optional extras—they’re fundamental requirements.

The future of AI lies not in probabilistic models constrained by guardrails, but in systems that think clearly and consistently from the outset. Through deterministic logic, comprehensive knowledge modelling, and explainable reasoning chains, Rainbird is making this future a reality.

The Key to Safe AI in Financial Services


As financial institutions increasingly adopt Large Language Models (LLMs) to enhance customer experiences and streamline operations, a critical challenge has emerged: how can these powerful but inherently probabilistic systems be deployed safely in a highly regulated environment?

Today, we’re pleased to announce the publication of our latest white paper, Deterministic Graph-Based Inference for Guardrailing Large Language Models: An Approach to Compliance and Control in Financial AI, which addresses this exact challenge.

The Problem with LLMs in Financial Services

While LLMs like Claude and GPT bring unprecedented language capabilities to financial services, they come with significant limitations that pose real risks:

  • Lack of determinism: The same query can yield different results at different times
  • Hallucinations: LLMs can confidently generate entirely false information
  • Limited explainability: The “black box” nature makes regulatory compliance difficult
  • Vulnerability to prompt injection: Specially crafted inputs can manipulate model behavior

In financial contexts where precision, consistency, and regulatory compliance are non-negotiable, these limitations create substantial barriers to adoption.

The Solution: A Hybrid Approach

This white paper explores how deterministic graph-based inference systems can be integrated with LLMs to create AI solutions that are both powerful and predictable. This hybrid approach combines:

  • The linguistic fluency and generative capabilities of LLMs
  • The precision, consistency, and explainability of rule-based systems encoded in knowledge graphs

We detail two architectural patterns for implementation:

  1. Graph-First Reasoning: Where the deterministic inference engine serves as the primary decision-maker while the LLM acts as an interface layer
  2. Post-Generation Validation: Where the LLM generates responses that are subsequently verified and potentially corrected by the symbolic inference engine

The Benefits for Financial Institutions

Financial institutions implementing this hybrid approach can expect:

  • Complete transparency and auditability of AI decisions
  • Elimination of hallucinations and non-compliant information
  • Regulatory compliance by design rather than by hope
  • Consistent and reliable responses that build customer trust

Implementation with Rainbird

The paper concludes with a detailed implementation framework leveraging Rainbird’s enterprise-grade knowledge graph reasoning platform. Our approach enables financial institutions to transform complex regulatory frameworks into executable, deterministic systems that can effectively guardrail LLM implementations at scale.

Major banks and financial services firms are already deploying Rainbird to address the critical compliance challenges outlined in this paper, encoding regulatory expertise into verifiable knowledge graphs that ensure AI-generated content remains fully compliant with intricate financial regulations.

Download the White Paper

Ready to explore how your institution can safely harness the power of LLMs while maintaining regulatory compliance? Download our white paper to learn how deterministic graph-based inference can transform your AI strategy.

Courtesy or Notebook LLM there is also a podcast based on this white paper here.

For more information on implementing these solutions in your organisation, contact our team for a consultation.

The Middle East Has Entered the AI Group Chat


Donald Trump’s jaunt to the Middle East featured an entourage of billionaire tech bros, a fighter-jet escort, and business deals designed to reshape the global landscape of artificial intelligence.

On the final stop of the tour in Abu Dhabi, the US president announced that unnamed US companies would partner with the United Arab Emirates to create the largest AI datacenter cluster outside of America.

Trump said that the US companies will help G42, an Emirati company, build five gigawatts of AI computing capacity in the UAE.

Sheikh Tahnoon bin Zayed Al Nahyan, who leads the UAE’s Artificial Intelligence and Advanced Technology Council and is in charge of a $1.5 trillion fortune aimed at building AI capabilities, said the move will strengthen the UAE’s position “as a hub for cutting-edge research and sustainable development, delivering transformative benefits for humanity.”

A few days earlier, as Trump arrived in Riyadh, Saudi Arabia announced Humain, an AI investment firm owned by the kingdom’s Public Investment Fund. The Saudi firm launched with blockbuster deals already inked with Nvidia, AMD, Qualcomm, and AWS—US tech giants capable of building the infrastructure needed to train and power cutting-edge AI models.

Trump said in a speech in Riyadh that US and Saudi companies would do deals worth hundreds of billions of dollars, with a focus on infrastructure, tech, and defense.

The deals forged in the Middle East this week are meant to strengthen the global importance of American silicon and AI, but they will also help nations like Saudi Arabia play a more significant role in the global race to develop and distribute cutting-edge technology.

“It will help the Saudis and the UAE become bigger players in providing AI infrastructure,” says Paul Triolo, a partner at DGA-Albright Stonebridge Group, a geopolitical consulting group. “It’s a big deal to get access to these GPUs.”

Saudi Arabia’s deal with Nvidia, which dominates the market for AI training hardware, will amount to 500 megawatts of capacity and involve “several hundred thousand of Nvidia’s most advanced GPUs over the next five years,” the company said in a statement.

According to one estimate, this could translate to around 250,000 of Nvidia’s most advanced chips, which are four times better at training and 30 times better at inference (running models that have already been trained) than the next-best offering. This capacity could lead Saudi Arabia to create frontier AI models.

AWS and Humain said they would jointly invest $5 billion in infrastructure in Saudi Arabia. AWS said in March that it will build an AI infrastructure zone in the country, investing more than $5.3 billion. Humain and AMD said they would spend $10 billion on AI infrastructure in Saudi Arabia and the US over the next five years.

Saudi Arabia, the UAE, and other nations in the region have vast quantities of oil money, access to plenty of power, and a strong desire to shift toward more high-tech economies by building out cutting-edge tech infrastructure. The countries also, however, have significant business ties to China, which sells technology to the region, placing them at the nexus of a growing geopolitical rivalry over the future of AI.

Diffusion Rule

A few days before Trump’s visit to the Middle East, his administration reversed a major Biden-era ruling that would have limited the sale of cutting-edge chips globally. The directive created tiers of nations with different access to cutting edge chips, and sought to limit how many chips Saudi Arabia and the UAE could buy. Critics of the rule suggested it might push some countries to buy Chinese technology instead.

In a statement announcing the change, the US Bureau of Industry and Security said the Biden rule “would have stifled American innovation and saddled companies with burdensome new regulatory requirements” and “undermined U.S. diplomatic relations with dozens of countries by downgrading them to second-tier status.”

Why Human Oversight Fails AI


The Illusion of the Human Safety Net

As AI systems rapidly evolve from passive tools to autonomous agents, a dangerous assumption persists throughout the industry: that human oversight provides an adequate safety net for AI errors. This belief, that a person monitoring an AI’s decisions will reliably catch and correct mistakes, has become the default guardrail in many AI governance frameworks. Yet this approach fundamentally misunderstands both human psychology and the nature of modern AI systems.

The uncomfortable truth is that humans make exceptionally poor guardians for agentic, probabilistic AI. Our human cognitive architecture, evolved for a different world entirely, is ill-equipped to monitor complex AI decision-making. This mismatch creates a perfect storm where AI errors consistently slip through human oversight, sometimes with catastrophic consequences.

Why Human Oversight Fails

The limitations of human supervision extend far beyond mere inattention. Multiple factors conspire to make us unreliable guardians.

Automation bias renders objective oversight impossible. Humans exhibit an inherent tendency to trust computer-generated information over our own judgment. This isn’t simply laziness; it’s a deeply ingrained cognitive bias. When presented with AI recommendations or actions, humans consistently demonstrate an alarming propensity to defer to the machine, especially when it presents information with confidence and authority.

The tragic 2018 Uber self-driving car fatality in Arizona starkly illustrates this reality. The safety driver, meant to intervene if the AI faltered, had become complacent and distracted. This wasn’t an anomaly; it’s an inevitable result of how our brains respond to automation over time.

The opacity of modern AI creates an unbridgeable comprehension gap. Large language models and neural networks operate as “black boxes” that produce outputs through processes largely inaccessible to human understanding. How can a supervisor effectively evaluate a decision they fundamentally cannot understand? When an AI generates content that sounds plausible but contains subtle errors or fabrications, even expert reviewers may miss the problem entirely, as witnessed in embarrassing legal cases where lawyers have submitted entirely fictitious AI-generated case citations to courts.

The speed and volume of AI decisions overwhelm human capacity. As AI becomes more deeply integrated into business processes, the number of decisions requiring review exponentially increases. In domains like algorithmic trading, financial systems make thousands of micro-decisions per second, far beyond what any human could meaningfully monitor. By the time humans recognise a problem, significant damage may already be done.

From Linear Rules to Sophisticated Guardrails

If human oversight is inadequate, what’s the alternative? The answer lies not in simple linear rules, but in sophisticated deterministic guardrails; engineered constraints that reliably prevent AI systems from taking undesirable actions through a network of interconnected logical relationships.

Unlike the linear rule systems of the past that quickly became unmanageable and brittle, modern deterministic guardrails utilise graph-based knowledge structures that can represent complex regulatory frameworks and other knowledge-based processes with nuance and flexibility. These sophisticated structures encode complex causal relationships as formal, traceable networks of probabilities, weights and rules.

The power of graph-based deterministic inference is that it can handle the complexity and interconnectedness of real-world regulatory systems without sacrificing reliability. Unlike probabilistic AI models that produce varied, sometimes unpredictable outputs, deterministic graph systems follow explicit logical pathways with guaranteed outcomes that are entirely repeatable.

This approach creates a comprehensive safety system capable of understanding, for instance, that a financial product recommendation must simultaneously satisfy multiple interrelated regulatory requirements suitability for the client’s risk profile, all verifiable through traceable logical pathways.

This sophisticated graph-based approach can be deployed in two distinct architectural patterns: either as a validation layer to verify and correct LLM outputs, or as the primary reasoning engine with the LLM serving only as a natural language interface layer.

Pure Determinism: The Ultimate Safety Architecture

While validation of LLM outputs offers significant safety improvements, the most powerful configuration for high-stakes domains removes LLMs from the reasoning process entirely. In this pure deterministic architecture, graph-based inference systems handle all critical decisions independently, while LLMs serve solely as the interface layer, managing natural language understanding and communication. 

Most organisations operating in regulated environments would gladly sacrifice the general-purpose nature of LLMs (which, while impressive, is precisely what makes them prone to hallucination) for solutions that are narrower, domain-specific, 100% grounded in verified context, and utterly reliable. After all, why would a credit decisioning engine need to know about sports? Or a financial sanctions compliance system need to generate poetry? The flexibility to answer any question becomes far less valuable than the certainty of answering specific questions correctly every time—particularly when errors could trigger regulatory violations, financial losses, or reputational damage.

This approach completely removes the probabilistic element from the decision-making process itself. The LLM never makes substantive determinations, it simply translates between human language and the deterministic system. All core reasoning—eligibility determinations, compliance verdicts, risk assessments—occurs within the deterministic graph engine that traverses a knowledge network with logical precision.

This stands in contrast to the validation approach, where an LLM generates initial answers that are subsequently verified against the knowledge graph. In a pure deterministic configuration, the decision-making authority never resides with the probabilistic system. Instead, the inference engine and the graph becomes the authoritative reasoning component rather than just a guardrail.

The advantages of this “pure determinism” approach are profound:

  • Total elimination of hallucinations for critical decisions
  • Perfect repeatability across identical scenarios
  • Complete traceability of every decision to specific rules
  • True causal reasoning that follows explicit logical pathways
  • Independence from training data biases that affect LLMs

Consider a high-stakes financial services scenario, to determine whether a transaction requires additional anti-money laundering scrutiny. With a pure deterministic approach, the LLM may help extract relevant transaction details from unstructured sources, but the actual determination comes exclusively from the graph-based inference engine traversing a precisely encoded network of regulatory requirements, or other proprietary knowledge. This creates a system that is simultaneously conversational but also absolutely reliable in its core reasoning functionality.

Accelerating Knowledge Graph Development

While graph building was historically a significant bottleneck requiring months of manual knowledge engineering, recent breakthroughs have transformed this process. 

Specialised LLMs—fine-tuned on all classes of human reasoning, knowledge engineering patterns and a cross section of domain problems—have unlocked the ability to programmatically generate sophisticated knowledge graphs at unprecedented speed. They can extract structured knowledge from regulatory documents, policies, and even domain expertise, and build accurate and computable knowledge graphs—and maintain them. This eliminates what historically was months of manual work, compressing it into days or even hours. 

This capability fundamentally changes the economic equation for implementing a sophisticated knowledge management layer in the enterprise.

Creating Safe Agentic Systems

Looking ahead, the most sophisticated AI applications will likely involve autonomous agents—AI systems that can independently perform complex tasks without continuous human direction. This evolution from passive tools to active agents magnifies all the risks already discussed and introduces new ones around the delegation of authority in multi-step decision processes.

The development of safe agentic systems demands more than ad hoc guardrails or human monitoring; it requires a comprehensive architecture where deterministic graph-based inference serves as the logical foundation for all critical decisions. Such systems can reliably constrain agent behavior within carefully defined operational boundaries while still allowing for the flexibility and generative capabilities that make AI valuable.

Unlike post-hoc human oversight, which attempts to catch problems after they occur, deterministic guardrails prevent problems by design. The system simply cannot act outside its defined parameters, just as a well-designed electrical system has circuit breakers that automatically prevent dangerous overloads without requiring human intervention.

For organisations seeking to deploy agentic systems, this approach offers a pathway to production without rework, significantly lowering risk. Agents can operate while sophisticated deterministic guardrails act as the compliance officer within, ensuring that outcomes adhere to regulatory, ethical, and safety boundaries. This unlocks a future where AI systems can act independently while maintaining the precision and reliability that high-stakes domains demand.

The Implementation Question

For organisations looking to adopt this approach, there are several key considerations. The graph-based guardrails must be designed with sufficient sophistication to capture the nuance and complexity of regulatory frameworks without becoming unmanageable. This requires specialised tooling. 

The integration between deterministic systems and LLMs must be carefully architected to ensure clear separation of responsibilities. In pure deterministic configurations, the LLM should have no authority to override or modify the determinations of the graph-based inference engine; it should simply be constrained to translating logical outputs into natural language.

Testing must be rigorous and scenario-based, focusing particularly on edge cases. Unlike probabilistic systems that can only be evaluated statistically, deterministic systems can be verified through automated testing of logical pathways.

The Rainbird Approach

Rainbird has pioneered the application of deterministic graph-based inference as sophisticated guardrails for AI systems. The Rainbird platform is an ecosystem that enables organisations to transform complex regulatory frameworks and domain expertise into executable, deterministic knowledge graphs that can govern AI behavior with precision and reliability.

Rather than relying on brittle linear rules or unreliable human oversight, Rainbird’s approach uses programmatically-generated, sophisticated knowledge graphs to represent complex interrelationships between concepts, rules, and data. This creates guardrails that are simultaneously robust and flexible—capable of addressing complex regulatory requirements while adapting to evolving business needs.

For organisations deploying agentic AI, Rainbird’s newest capability, Noesis, provides a revolutionary approach to knowledge engineering. Noesis is a developer-first approach and automates the extraction and structuring of knowledge from regulatory documents and policies, transforming dense text into verifiable knowledge graphs with minimal human intervention. The result is sophisticated deterministic guardrails that scale with the complexity of the regulatory environment.

By encoding regulatory expertise into verifiable knowledge graphs, organisations can ensure that AI-generated content and decisions remain fully compliant with intricate regulations while providing the complete traceability and explainability demanded by regulators and stakeholders.

The future of AI governance isn’t about choosing between innovation and safety—it’s about taking a hybrid, neurosymbolic approach that enables both. By implementing deterministic graph-based inference as the logical foundation for agentic AI, organisations can build systems that operate in high-stakes environments, without sacrificing reliability, compliance, or trust.

For more information on implementing these solutions in your organisation, contact our team for a consultation.

Revelo’s LatAm talent network sees strong demand from US companies, thanks to AI


While many tech companies are mandating that their employees return to their offices, and putting an emphasis on building in-person teams, they are also turning in droves to Latin America to find developer talent — especially for post-training AI models.

Revelo, a full-stack platform of vetted developers in Latin America, is seeing a new surge in demand for engineers that can help with LLM training, Revelo co-founder and CEO Lucas Mendes, told TechCrunch. Revelo has more than 400,000 developers on its platform and facilitates the hiring and payment process for its U.S. customers.

Mendes said this recent surge of demand for Revelo’s talent is driven by the next phase of the AI revolution: post-training LLMs.

“There’s a race for data, and especially expert human data, that can actually help LLMs be better at very specific high-value tasks,” Mendes said. “Coding is one of those tasks. And what happened last year is that we saw a surge in demand from [companies] building foundational models that are looking for engineers that can be effective experts and that can provide that human data to help their LLM code better.”

LLM training hires accounted for 22% of Revelo’s revenue in 2024.

Mendes added that often this demand looks like companies coming to them to find experts in specific coding languages to help fill gaps in the post-training they are already doing.

Revelo is supplying workers to U.S. enterprises Intuit, Oracle, and Dell, among others, including “nearly every major hyperscale AI provider.”

Techcrunch event

Berkeley, CA
|
June 5


BOOK NOW

Revelo is not the only company looking to connect U.S. companies to programmers in Latin America; other companies like Terminal, Tecla and Near are just a few with the same goal.

This demand for developers skilled in post-training is just the latest hiring trend that Revelo has been able to ride since it was founded in late 2014.

Mendes said he launched Revelo alongside co-founder Lachlan de Crespigny because the war for talent was tight at the time, and they thought if they created a network of vetted talent in Brazil, companies would be able to find the talent they needed.

The demand was there and Revelo went on to raise more than $48 million in venture funding from firms including Social Capital, FJ Labs and Valor Capital Group. The company also expanded out of Brazil and into broader LatAm.

The Covid-19 pandemic expanded Revelo’s potential reach “massively,” Mendes added. “All of a sudden we started getting inbound from U.S. companies who suddenly realized that you can actually have really high-quality distributed teams and have some of those engineers are in Latin America,” Mendes said. “So what would happen usually is that they would hire one or two and really like the quality and especially the quality cost tradeoff and say, ‘Hey, I want more of these, where do I find them?’”

While the rise of distributed and remote work has largely started to fade as companies return to in-person work, Revelo has still managed to keep growing. Mendes joked that he hates to be the guy that goes against the buzz, but the demand for their LatAm talent has not diminished despite tech’s movement back to the office.

Mendes said he thinks that the demand from U.S. companies for these developers in Latin America has remained because these developers fall more into the “nearshoring” category of workers outside the U.S. as opposed to “offshoring.” He believes the fact that Revelo’s talent is located in the same time zones as their client companies makes these hires a lot more attractive.

Revelo is seeing enough demand that it has acquired five other competitors focused on LatAm talent in the last 30 months including Alto and Paretisa, which were announced in March.

“We’re building that global talent backbone for the age of AI and there will be more acquisitions in the future,” he said.

The TechCrunch AI glossary | TechCrunch


Artificial intelligence is a deep and convoluted world. The scientists who work in this field often rely on jargon and lingo to explain what they’re working on. As a result, we frequently have to use those technical terms in our coverage of the artificial intelligence industry. That’s why we thought it would be helpful to put together a glossary with definitions of some of the most important words and phrases that we use in our articles.

We will regularly update this glossary to add new entries as researchers continually uncover novel methods to push the frontier of artificial intelligence while identifying emerging safety risks.


An AI agent refers to a tool that makes use of AI technologies to perform a series of tasks on your behalf — beyond what a more basic AI chatbot could do — such as filing expenses, booking tickets or a table at a restaurant, or even writing and maintaining code. However, as we’ve explained before, there are lots of moving pieces in this emergent space, so different people can mean different things when they refer to an AI agent. Infrastructure is also still being built out to deliver on envisaged capabilities. But the basic concept implies an autonomous system that may draw on multiple AI systems to carry out multi-step tasks.

Given a simple question, a human brain can answer without even thinking too much about it — things like “which animal is taller between a giraffe and a cat?” But in many cases, you often need a pen and paper to come up with the right answer because there are intermediary steps. For instance, if a farmer has chickens and cows, and together they have 40 heads and 120 legs, you might need to write down a simple equation to come up with the answer (20 chickens and 20 cows).

In an AI context, chain-of-thought reasoning for large language models means breaking down a problem into smaller, intermediate steps to improve the quality of the end result. It usually takes longer to get an answer, but the answer is more likely to be right, especially in a logic or coding context. So-called reasoning models are developed from traditional large language models and optimized for chain-of-thought thinking thanks to reinforcement learning.

(See: Large language model)

A subset of self-improving machine learning in which AI algorithms are designed with a multi-layered, artificial neural network (ANN) structure. This allows them to make more complex correlations compared to simpler machine learning-based systems, such as linear models or decision trees. The structure of deep learning algorithms draws inspiration from the interconnected pathways of neurons in the human brain.

Deep learning AIs are able to identify important characteristics in data themselves, rather than requiring human engineers to define these features. The structure also supports algorithms that can learn from errors and, through a process of repetition and adjustment, improve their own outputs. However, deep learning systems require a lot of data points to yield good results (millions or more). It also typically takes longer to train deep learning vs. simpler machine learning algorithms — so development costs tend to be higher.

(See: Neural network)

This means further training of an AI model that’s intended to optimize performance for a more specific task or area than was previously a focal point of its training — typically by feeding in new, specialized (i.e. task-oriented) data. 

Many AI startups are taking large language models as a starting point to build a commercial product but vying to amp up utility for a target sector or task by supplementing earlier training cycles with fine-tuning based on their own domain-specific knowledge and expertise.

(See: Large language model (LLM))

Large language models, or LLMs, are the AI models used by popular AI assistants, such as ChatGPT, Claude, Google’s Gemini, Meta’s AI Llama, Microsoft Copilot, or Mistral’s Le Chat. When you chat with an AI assistant, you interact with a large language model that processes your request directly or with the help of different available tools, such as web browsing or code interpreters.

AI assistants and LLMs can have different names. For instance, GPT is OpenAI’s large language model and ChatGPT is the AI assistant product.

LLMs are deep neural networks made of billions of numerical parameters (or weights, see below) that learn the relationships between words and phrases and create a representation of language, a sort of multidimensional map of words.

Those are created from encoding the patterns they find in billions of books, articles, and transcripts. When you prompt an LLM, the model generates the most likely pattern that fits the prompt. It then evaluates the most probable next word after the last one based on what was said before. Repeat, repeat, and repeat.

(See: Neural network)

Neural network refers to the multi-layered algorithmic structure that underpins deep learning — and, more broadly, the whole boom in generative AI tools following the emergence of large language models. 

Although the idea to take inspiration from the densely interconnected pathways of the human brain as a design structure for data processing algorithms dates all the way back to the 1940s, it was the much more recent rise of graphical processing hardware (GPUs) — via the video game industry — that really unlocked the power of theory. These chips proved well suited to training algorithms with many more layers than was possible in earlier epochs — enabling neural network-based AI systems to achieve far better performance across many domains, whether for voice recognition, autonomous navigation, or drug discovery.

(See: Large language model (LLM))

Weights are core to AI training as they determine how much importance (or weight) is given to different features (or input variables) in the data used for training the system — thereby shaping the AI model’s output. 

Put another way, weights are numerical parameters that define what’s most salient in a data set for the given training task. They achieve their function by applying multiplication to inputs. Model training typically begins with weights that are randomly assigned, but as the process unfolds, the weights adjust as the model seeks to arrive at an output that more closely matches the target.

For example, an AI model for predicting house prices that’s trained on historical real estate data for a target location could include weights for features such as the number of bedrooms and bathrooms, whether a property is detached, semi-detached, if it has or doesn’t have parking, a garage, and so on. 

Ultimately, the weights the model attaches to each of these inputs is a reflection of how much they influence the value of a property, based on the given data set.

OpenAI co-founder John Schulman has left Anthropic after less than a year


Less than a year into his tenure at the company, OpenAI co-founder John Schulman is leaving Anthropic. The startup confirmed Schulman’s departure after The Information, Reuters and other publications reported on the exit.

“We are sad to see John go but fully support his decision to pursue new opportunities and wish him all the very best,” said Jared Kaplan, Anthropic’s chief science officer, in a statement the company shared with Engadget. Schulman left OpenAI last August alongside Peter Deng, the company’s former vice-president of consumer product. Schulman is considered one of the original architects of ChatGPT.

Following his departure from OpenAI, Schulman said he was joining Anthropic to focus on AI alignment — the process of making machine learning models safe to use — and a desire to return “to more hands-on technical work.” Schulman hasn’t publicly said why he decided to leave Anthropic, nor what he plans to do next. His X profile still says he “recently joined” Anthropic.

If you buy something through a link in this article, we may earn commission.

DeepSeek vs. ChatGPT: Hands On With DeepSeek’s R1 Chatbot


The DeepSeek AI chatbot, released by a Chinese startup, has temporarily dethroned OpenAI’s ChatGPT from the top spot on Apple’s US App Store.

The app is completely free to use, and DeepSeek’s R1 model is powerful enough to be comparable to OpenAI’s o1 “reasoning” model, except DeepSeek’s chatbot is not sequestered behind a $20-a-month paywall like OpenAI’s is. Also, the DeepSeek model was efficiently trained using less powerful AI chips, making it a benchmark of innovative engineering.

I’ve tested many new generative AI tools over the past couple of years, so I was curious to see how DeepSeek compares to the ChatGPT app already on my smartphone. After a few hours of using it, my initial impressions are that DeepSeek’s R1 model will be a major disruptor for US-based AI companies, but it still suffers from the weaknesses common to other generative AI tools, like rampant hallucinations, invasive moderation, and questionably scraped material.

How to Access the DeepSeek Chatbot

Users interested in trying out DeepSeek can access the R1 model through the Chinese startup’s smartphone apps (Android, Apple), as well as on the company’s desktop website. You can also use the model through third-party services like Perplexity Pro. In the app or on the website, click on the DeepThink (R1) button to use the best model. Developers who want to experiment with the API can check out that platform online. It’s also possible to download a DeepSeek model to run locally on your computer.

In order to use all the consumer features, you will need to create a user account that tracks your chats. “We store the information we collect in secure servers located in the People’s Republic of China,” reads the company’s privacy policy. Check out this article from WIRED’s Security desk for a more detailed breakdown about what DeepSeek does with the data it collects. It’s worth keeping in mind that, just like ChatGPT and other American chatbots, you should always avoid sharing highly personal details or sensitive information during your interactions with a generative AI tool.

Is This Basically FreeGPT?

Yes and no! If you’re looking for a free chatbot to use, ChatGPT already includes plenty of free features. So does Anthropic’s Claude, Google’s Gemini, and Meta’s AI tool. So, why is the fact that DeepSeek is free notable? It’s about the raw power of the model that’s generating these free-for-now answers. As previously mentioned, DeepSeek’s R1 mimics OpenAI’s latest o1 model, without the $20-a-month subscription fee for the basic version and $200-a-month for the most capable model. This comes as a major blow to OpenAI’s attempt to monetize ChatGPT through subscriptions.

Another feature that’s similar to ChatGPT is the option to send the chatbot out into the web to gather links that inform its answers. DeepSeek does not have deals with publishers to use their content in answers; OpenAI does , including with WIRED’s parent company, Condé Nast. But the web search outputs were decent, and the links gathered by the bot were generally helpful.

Still, the current DeepSeek app does not have all the tools longtime ChatGPT users may be accustomed to, like the memory feature that recalls details from past conversations so you’re not always repeating yourself. DeepSeek also doesn’t have anything close to ChatGPT’s Advanced Voice Mode, which lets you have voice conversations with the chatbot, though the startup is working on more multimodal capabilities.

A Research Breakthrough, but Still Inaccurate

Though it may almost seem unfair to knock the DeepSeek chatbot for issues common across AI startups, it’s worth dwelling on how a breakthrough in model training efficiency does not even come close to solving the roadblock of hallucinations, where a chatbot just makes things up in its responses to prompts. Many of the outputs I generated included blatant falsehoods, confidently spewed out. For example, when I asked R1 what the model already knew about me without searching the web, the bot was convinced I’m a longtime tech reporter at The Verge. No shade, but not true!

DeepSeek vs. ChatGPT Hands On With DeepSeeks R1 Chatbot

Reece Rogers

A New Jam-Packed Biden Executive Order Tackles Cybersecurity, AI, and More


Four days before he leaves office, US president Joe Biden has issued a sweeping cybersecurity directive ordering improvements to the way the government monitors its networks, buys software, uses artificial intelligence, and punishes foreign hackers.

The 40-page executive order unveiled on Thursday is the Biden White House’s final attempt to kickstart efforts to harness the security benefits of AI, roll out digital identities for US citizens, and close gaps that have helped China, Russia, and other adversaries repeatedly penetrate US government systems.

The order “is designed to strengthen America’s digital foundations and also put the new administration and the country on a path to continued success,” Anne Neuberger, Biden’s deputy national security adviser for cyber and emerging technology, told reporters on Wednesday.

Looming over Biden’s directive is the question of whether president-elect Donald Trump will continue any of these initiatives after he takes the oath of office on Monday. None of the highly technical projects decreed in the order are partisan, but Trump’s advisers may prefer different approaches (or timetables) to solving the problems that the order identifies.

Trump hasn’t named any of his top cyber officials, and Neuberger said the White House didn’t discuss the order with his transition staff, “but we are very happy to, as soon as the incoming cyber team is named, have any discussions during this final transition period.”

The core of the executive order is an array of mandates for protecting government networks based on lessons learned from recent major incidents—namely, the security failures of federal contractors.

The order requires software vendors to submit proof that they follow secure development practices, building on a mandate that debuted in 2022 in response to Biden’s first cyber executive order. The Cybersecurity and Infrastructure Security Agency would be tasked with double-checking these security attestations and working with vendors to fix any problems. To put some teeth behind the requirement, the White House’s Office of the National Cyber Director is “encouraged to refer attestations that fail validation to the Attorney General” for potential investigation and prosecution.

The order gives the Department of Commerce eight months to assess the most commonly used cyber practices in the business community and issue guidance based on them. Shortly thereafter, those practices would become mandatory for companies seeking to do business with the government. The directive also kicks off updates to the National Institute of Standards and Technology’s secure software development guidance.

Another part of the directive focuses on the protection of cloud platforms’ authentication keys, the compromise of which opened the door for China’s theft of government emails from Microsoft’s servers and its recent supply-chain hack of the Treasury Department. Commerce and the General Services Administration have 270 days to develop guidelines for key protection, which would then have to become requirements for cloud vendors within 60 days.

To protect federal agencies from attacks that rely on flaws in internet-of-things gadgets, the order sets a January 4, 2027, deadline for agencies to purchase only consumer IoT devices that carry the newly launched US Cyber Trust Mark label.