Building Robust ML Pipelines for Real-World AI


Machine learning projects often start with a proof‑of‑concept, a single model deployed by a data scientist on her laptop. Scaling that model into a robust, repeatable production pipeline requires more than just code; it requires a discipline known as MLOps, where software engineering meets data science and DevOps. 

Overview: Why MLOps Best Practices Matter

Before diving into individual practices, it helps to understand the value of MLOps. According to the MLOps Principles working group, treating machine‑learning code, data and models like software assets within a continuous integration and deployment environment is central to MLOps. It’s not just about deploying a model once; it’s about building pipelines that can be repeated, audited, improved and trusted. This ensures reliability, compliance and faster time‑to‑market.

Poorly managed ML workflows can result in brittle models, data leaks or non‑compliant systems. A MissionCloud report notes that implementing automated CI/CD pipelines significantly reduces manual errors and accelerates delivery . With regulatory frameworks like the EU AI Act on the horizon and ethical considerations top of mind, adhering to best practices is now critical for organisations of all sizes.

Below, we cover a comprehensive set of best practices, along with expert insights and recommendations on how to integrate Clarifai products for model orchestration and inference. At the end, you’ll find FAQs addressing common concerns.

Establishing an MLOps Foundation

Building robust ML pipelines starts with the right infrastructure. A typical MLOps stack includes source control, test/build services, deployment services, a model registry, feature store, metadata store and pipeline orchestrator . Each component serves a unique purpose:

Source control and environment isolation

Use Git (with Git Large File Storage or DVC) to track code and data. Data versioning helps ensure reproducibility, while branching strategies enable experimentation without contaminating production code. Environment isolation using Conda environments or virtualenv keeps dependencies consistent.

Model registry and feature store

A model registry stores model artifacts, versions and metadata. Tools like MLflow and SageMaker Model Registry maintain a record of each model’s parameters and performance. A feature store provides a centralized location for reusable, validated features. Clarifai’s model repository and feature management capabilities help teams manage assets across projects.

Metadata tracking and pipeline orchestrator

Metadata stores capture information about experiments, datasets and runs. Pipeline orchestrators (Kubeflow Pipelines, Airflow, or Clarifai’s workflow orchestration) automate the execution of ML tasks and maintain lineage. A clear audit trail builds trust and simplifies compliance.

Tip: Consider integrating Clarifai’s compute orchestration to manage the lifecycle of models across different environments. Its interface simplifies deploying models to cloud or on‑prem while leveraging Clarifai’s high‑performance inference engine.

Ml Ops Best Practices - Compute orchestration

Automation and CI/CD Pipelines for ML

How do ML teams automate their workflows?

Automation is the backbone of MLOps. The MissionCloud article emphasises building CI/CD pipelines using Jenkins, GitLab CI, AWS Step Functions and SageMaker Pipelines to automate data ingestion, training, evaluation and deployment. Continuous training (CT) triggers retraining when new data arrives.

  • Automate data ingestion: Use scheduled jobs or serverless functions to pull fresh data and validate it.
  • Automate training and hyperparameter tuning: Configure pipelines to run training jobs on arrival of new data or when performance degrades.
  • Automate deployment: Use infrastructure‑as‑code (Terraform, CloudFormation) to provision resources. Deploy models via container registries and orchestrators.

Practical example

Imagine a retail company that forecasts demand. By integrating Clarifai’s workflow orchestration with Jenkins, the team builds a pipeline that ingests sales data nightly, trains a regression model, validates its accuracy and deploys the updated model to an API endpoint. When the error metric crosses a threshold, the pipeline triggers a retraining job automatically. This automation results in fewer manual interventions and more reliable forecasts.

ML Ops Best Practices - Inference

Version Control for Code, Data and Models

Why is versioning essential?

Version control is not just for code. ML projects must version datasets, labels, hyperparameters, and models to ensure reproducibility and regulatory compliance. MissionCloud emphasises tracking all these artifacts using tools like DVC, Git LFS and MLflow. Without versioning, you cannot reproduce results or audit decisions.

Best practices for version control

  • Use Git for code and configuration. Adopt branching strategies (e.g., feature branches, release branches) to manage experiments.
  • Version data with DVC or Git LFS. DVC maintains lightweight metadata in the repo and stores large files externally. This approach ensures you can reconstruct any dataset version.
  • Model versioning: Use a model registry (MLflow or Clarifai) to track each model’s metadata. Record training parameters, evaluation metrics and deployment status.
  • Document dependencies and environment: Capture package versions in a requirements.txt or environment.yml. For containerised workflows, store Dockerfiles alongside code.

Expert insight: A senior data scientist at a healthcare company explained that proper data versioning enabled them to reconstruct training datasets when regulators requested evidence. Without version control, they would have faced fines and reputational damage.

Testing, Validation & Quality Assurance in MLOps

How to ensure your ML model is trustworthy

Testing goes beyond checking whether code compiles. You must test data, models and end‑to‑end systems. MissionCloud lists several types of testing: unit tests, integration tests, data validation, and model fairness audits.

  1. Unit tests for feature engineering and preprocessing: Validate functions that transform data. Catch edge cases early.
  2. Integration tests for pipelines: Test that the entire pipeline runs with sample data and that each stage passes correct outputs.
  3. Data validation: Check schema, null values, ranges and distributions. Tools like Great Expectations help automatically detect anomalies.
  4. Model tests: Evaluate performance metrics (accuracy, F1 score) and fairness metrics (e.g., equal opportunity, demographic parity). Use frameworks like Fairlearn or Clarifai’s fairness toolkits.
  5. Manual reviews and domain‑expert assessments: Ensure model outputs align with domain expectations.

Common pitfall: Skipping data validation can lead to “data drift disasters.” In one case, a financial model started misclassifying loans after a silent change in a data source. A simple schema check would have prevented thousands of dollars in losses.

Clarifai’s platform includes built‑in fairness metrics and model evaluation dashboards. You can monitor biases across subgroups and generate compliance reports.

Reproducibility and Environment Management

Why reproducibility matters

Reproducibility ensures that anyone can rebuild your model, using the same data and configuration, and achieve identical results. MissionCloud points out that using containers like Docker and workflows such as MLflow or Kubeflow Pipelines helps reproduce experiments exactly.

Key strategies

  • Containerisation: Package your application, dependencies and environment variables into Docker images. Use Kubernetes to orchestrate containers for scalable training and inference.
  • Deterministic pipelines: Set random seeds and avoid operations that rely on non‑deterministic algorithms (e.g., multithreaded training without a fixed seed). Document algorithm choices and hardware details.
  • Infrastructure‑as‑code: Manage infrastructure (cloud resources, networking) via Terraform or CloudFormation. Version these scripts to replicate the environment.
  • Notebook best practices: If using notebooks, consider converting them to scripts with Papermill or using JupyterHub with version control.

Clarifai’s local runners allow you to run models on your own infrastructure while maintaining the same behaviour as the cloud service, enhancing reproducibility. They support containerisation and provide consistent APIs across environments.

Monitoring and Observability

What to monitor post‑deployment

After deployment, continuous monitoring is critical. MissionCloud emphasises tracking accuracy, latency and drift using tools like Prometheus and Grafana. A robust monitoring setup typically includes:

  • Data drift and concept drift detection: Compare incoming data distributions with training data. Trigger alerts when drift exceeds a threshold.
  • Performance metrics: Track accuracy, recall, precision, F1, AUC over time. For regression tasks, monitor MAE and RMSE.
  • Operational metrics: Monitor latency, throughput and resource usage (CPU, GPU, memory) to ensure service‑level objectives.
  • Alerting and remediation: Configure alerts when metrics breach thresholds. Use automation to roll back or retrain models.

Clarifai’s Model Performance Dashboard allows you to visualise drift, performance degradation and fairness metrics in real time. It integrates with Clarifai’s inference engine, so you can update models seamlessly when performance falls below target.

Real‑world story

A ride‑sharing company monitored travel‑time predictions using Prometheus and Clarifai. When heavy rain caused unusual travel patterns, the drift detection flagged the change. The pipeline automatically triggered a retraining job using updated data, preventing a decline in ETA accuracy. Monitoring saved the business from delivering inaccurate estimates to users.

MLOps Signup

Experiment Tracking and Metadata Management

Keeping track of experiments

Keeping a record of experiments avoids reinventing the wheel. MissionCloud recommends using Neptune.ai or MLflow to log hyperparameters, metrics and artifacts for each run.

  • Log everything: Hyperparameters, random seeds, metrics, environment details, data sources.
  • Organise experiments: Use tags or hierarchical folders to group experiments by feature or model type.
  • Query and compare: Compare experiments to find the best model. Visualise performance differences.

 Clarifai’s experiment tracking provides an easy way to manage experiments within the same interface you use for deployment. You can visualise metrics over time and compare runs across different datasets.

Security, Compliance & Ethical Considerations

Why security and compliance cannot be ignored

Regulated industries must ensure data privacy and model transparency. MissionCloud emphasises encryption, access control and alignment with standards like ISO 27001, SOC 2, HIPAA and GDPR. Ethical AI requires addressing bias, transparency and accountability.

Key practices

  • Encrypt data and models: Use encryption at rest and in transit. Ensure secrets and API keys are stored securely.
  • Role‑based access control (RBAC): Limit access to sensitive data and models. Grant least privilege permissions.
  • Audit logging: Record who accesses data, who runs training jobs and when models are deployed. Audit logs are vital for compliance investigations.
  • Bias mitigation and fairness: Evaluate models for biases across demographic groups. Document mitigation strategies and trade‑offs.
  • Regulatory alignment: Adhere to frameworks (GDPR, HIPAA) and industry guidelines. Implement impact assessments where required.

Clarifai holds SOC 2 Type 2 and ISO 27001 certifications. The platform provides granular permission controls and encryption by default. Clarifai’s fairness tools support auditing model outputs for bias, aligning with ethical principles.

Collaboration and Cross‑Functional Communication

How to foster collaboration in ML projects

MLOps is as much about people as it is about tools. MissionCloud emphasises the importance of collaboration and communication across data scientists, engineers and domain experts.

  • Create shared documentation: Use wikis (e.g., Confluence) to document data definitions, model assumptions and pipeline diagrams.
  • Establish communication rituals: Daily stand‑ups, weekly sync meetings and retrospective reviews bring stakeholders together.
  • Use collaborative tools: Slack or Teams channels, shared notebooks and dashboards ensure everyone is on the same page.
  • Involve domain experts early: Business stakeholders should review model outputs and provide context. Their feedback can catch errors that metrics overlook.

Clarifai’s community platform includes discussion forums and support channels where teams can collaborate with Clarifai experts. Enterprise customers gain access to professional services that help align teams around MLOps best practices.

Cost Optimization and Resource Management

Strategies for controlling ML costs

ML workloads can be expensive. By adopting cost‑optimisation strategies, organisations can reduce waste and improve ROI.

  • Right‑size compute resources: Choose appropriate instance types and leverage autoscaling. Spot instances can reduce costs but require fault tolerance.
  • Optimise data storage: Use tiered storage for infrequently accessed data. Compress archives and remove redundant copies.
  • Monitor utilisation: Tools like AWS Cost Explorer or Google Cloud Billing reveal idle resources. Set budgets and alerts.
  • Use Clarifai local runners: Running models locally or on‑prem can reduce latency and cloud costs. With Clarifai’s compute orchestration, you can allocate resources dynamically.

Expert tip: A media company cut training costs by 30% by switching to spot instances and scheduling training jobs overnight when electricity rates were lower. Incorporate similar scheduling strategies into your pipelines.

Emerging Trends – LLMOps and Generative AI

Managing large language models

Large language models (LLMs) introduce new challenges. The AI Accelerator Institute notes that LLMOps involves selecting the right base model, personalising it for specific tasks, tuning hyperparameters and performing continuous evaluationaiacceleratorinstitute.com. Data management covers collecting and labeling data, anonymisation and version controlaiacceleratorinstitute.com.

Best practices for LLMOps

  1. Model selection and customisation: Evaluate open models (GPT‑family, Claude, Gemma) and proprietary models. Fine‑tune or prompt‑engineer them for your domain.
  2. Data privacy and control: Implement pseudonymisation and anonymisation; adhere to GDPR and CCPA. Use retrieval‑augmented generation (RAG) with vector databases to keep sensitive data off the model’s training corpus.
  3. Prompt management: Maintain a repository of prompts, test them systematically and monitor their performance. Version prompts just like code.
  4. Evaluation and guardrails: Continuously assess the model for hallucinations, toxicity and bias. Tools like Clarifai’s generative AI evaluation service provide metrics and guardrails.

Clarifai offers generative AI models for text and image tasks, as well as APIs for prompt tuning and evaluation. You can deploy these models with Clarifai’s compute orchestration and monitor them with built‑in guardrails.

Best Practices for Model Lifecycle Management at the Edge

Deploying models beyond the cloud

Edge computing brings inference closer to users, reducing latency and sometimes improving privacy. Deploying models on mobile devices, IoT sensors or industrial machinery requires additional considerations:

  • Lightweight frameworks: Use TensorFlow Lite, ONNX or Core ML to run models efficiently on low‑power devices. Quantisation and pruning can reduce model size.
  • Hardware acceleration: Leverage GPUs, NPUs or TPUs in devices like NVIDIA Jetson or Apple’s Neural Engine to speed up inference.
  • Resilient updates: Implement over‑the‑air update mechanisms with rollback capability. When connectivity is intermittent, ensure models can queue updates or cache predictions.
  • Monitoring at the edge: Capture telemetry (e.g., latency, error rates) and send it back to a central server for analysis. Use Clarifai’s on‑prem deployment and local runners to maintain consistent behaviour across edge devices.

Example

A manufacturing plant deployed a computer vision model to detect equipment anomalies. Using Clarifai’s local runner on Jetson devices, they performed real‑time inference without sending video to the cloud. When the model detected unusual vibrations, it alerted maintenance teams. An efficient update mechanism allowed the model to be updated overnight when network bandwidth was available.

ML Ops Best Practices - Local Runners

Conclusion and Actionable Next Steps

Adopting MLOps best practices is not a one‑time project but an ongoing journey. By establishing a solid foundation, automating pipelines, versioning everything, testing rigorously, ensuring reproducibility, monitoring continuously, keeping track of experiments, safeguarding security and collaborating effectively, you set the stage for success. Emerging trends like LLMOps and edge deployments require additional considerations but follow the same principles.

Actionable checklist

  1. Audit your current ML workflow: Identify gaps in version control, testing or monitoring.
  2. Prioritise automation: Begin with simple CI/CD pipelines and gradually add continuous training.
  3. Centralise your assets: Set up a model registry and feature store.
  4. Invest in monitoring: Configure drift detection and performance alerts.
  5. Engage stakeholders: Create cross‑functional teams and share documentation.
  6. Plan for compliance: Implement encryption, RBAC and fairness audits.
  7. Explore Clarifai: Evaluate how Clarifai’s orchestration, model repository and generative AI solutions can accelerate your MLOps journey.

 

MLOps Best Practices - Contact us

Frequently Asked Questions

Q1: Why should we use a model registry instead of storing models in object storage?
A model registry tracks versions, metadata and deployment status. Object storage holds files but lacks context, making it difficult to manage dependencies and roll back changes.

Q2: How often should models be retrained?
Retraining frequency depends on data drift, business requirements and regulatory guidelines. Use monitoring to detect performance degradation and retrain when metrics cross thresholds.

Q3: What’s the difference between MLOps and LLMOps?
LLMOps is a specialised discipline focused on large language models. It includes unique practices like prompt management, privacy preservation and guardrails to prevent hallucinations

Q4: Do we need special tooling for edge deployments?
Yes. Edge deployments require lightweight frameworks (TensorFlow Lite, ONNX) and mechanisms for remote updates and monitoring. Clarifai’s local runners simplify these deployments.

Q5: How does Clarifai compare to open‑source options?
Clarifai offers end‑to‑end solutions, including model orchestration, inference engines, fairness tools and monitoring. While open‑source tools offer flexibility, Clarifai combines them with enterprise‑grade security, support and performance optimisations.



[The AI Show Episode 163]: AI Answers


From the environmental costs of data centers to the cultural biases baked into today’s models, Paul Roetzer and Cathy McPhillips answer your questions from our 50th Intro to AI class. Throughout the episode, they unpack the gray areas of AI-generated content, debate what the rise of agents means for work, and consider how creatives can stay ahead with AI.

Listen or watch below—and see below for show notes and the transcript.

 

Listen Now

Watch the Video

 

What Is AI Answers?

Over the last few years, our free Intro to AI and Scaling AI classes have welcomed more than 40,000 professionals, sparking hundreds of real-world, tough, and practical questions from marketers, leaders, and learners alike.

AI Answers is a biweekly bonus series that curates and answers real questions from attendees of our live events. Each episode focuses on the key concerns, challenges, and curiosities facing professionals and teams trying to understand and apply AI in their organizations.

In this episode, we address 20 of the most important questions from our August 14th Intro to AI class, covering everything from tooling decisions to team training to long-term strategy. Paul answers each question in real time—unscripted and unfiltered—just like we do live.

Whether you’re just getting started or scaling fast, these are answers that can benefit you and your team.

Timestamps

00:00:00 — Intro

00:05:13 — Question #1: Which environmental concern feels most urgent for the AI industry to solve in the near term—and who should be responsible for leading the solution?

00:07:58 — Question #2: How well do AI models reflect diverse languages and cultures, and will they ever move beyond an American-centric bias? Have you seen any progress on this front?

00:10:25 — Question #3: What risks and ownership issues come with AI-generated video and images in marketing? Has this evolved over the past few years? Have you seen any legal clarity, or will this remain a gray area in the near term? 

00:15:26 — Question #4: What are the best ways to start experimenting with AI agents, and are there good resources for building them? What’s a smart first step for a solo professional vs. a mid-sized team?

00:18:22 — Question #5: Is there value in using multiple platforms to cross-check results, or is committing to one ecosystem a better strategy? Is this a short-term strategy until the tools improve, or something to build into long-term workflows?

00:22:06 — Question #6: How should businesses weigh built-in AI assistants (like those in Google/Microsoft) versus standalone tools like ChatGPT? Do you think enterprises will eventually standardize on one, or live in a hybrid world?

00:24:30 — Question #7: Are we moving toward a standardized way for websites to guide how AI systems interact with their content?

00:29:27 — Question #8: How do you see different search engines being used or leveraged by AI companies?

00:32:24 — Question #9: How do you choose the right AI model for marketing, HR, and sales tasks? Is there a framework? We often focus on outcomes and use cases, but should we consider transparency, governance, or integration? 

00:34:56 — Question #10: What role do you see AI playing in building and managing communities? Is it more about efficiency (automation, moderation) or about enhancing human connection? 

00:38:31 — Question #11: From an information architecture perspective, what frameworks should teams use when integrating AI into CRM or workflow automation to keep systems scalable and secure?

00:40:51 — Question #12: What are the most common mistakes companies make when trying to ‘force-fit’ AI into a workflow?

00:42:23 — Question #13: Which AI tooling is best suited to develop and monitor a marketing communications strategy at SME vs. enterprise scale? Do you see different adoption patterns between small vs. large companies?

00:45:11 — Question #14: Do you think AI fluency will become a baseline requirement for executives, or is it creating an entirely new kind of leadership role?

00:46:55 — Question #15: What should creatives in fields like graphic design or UX/UI be thinking about as AI continues to evolve? What have you seen creative professionals do successfully to stay ahead?

00:52:29 — Question #16: How do you see coding and technical skills as careers in a world where today’s kids will grow up with AI? And if needed, what other skills should be developed in tandem? How might schools or parents prepare kids for that world?

00:55:35 — Question #17: What’s the best way to handle situations when AI gets things wrong, and how do you approach fact-checking? What processes and humans are needed? Has your answer changed as AI has improved?

00:58:39 — Question #18: If you had to narrow it down to just one ethical principle that matters most right now, which would it be—and why?

01:00:48 — Question #19: How should companies address internal concerns around data privacy, compliance, and governance? Do you see regulatory momentum changing how companies handle this?

01:01:53 — Question #20: Which AI applications do you expect to break through sooner than people think—and which ones are overhyped?

Links Mentioned


This episode is brought to you by Google Cloud: 

Google Cloud is the new way to the cloud, providing AI, infrastructure, developer, data, security, and collaboration tools built for today and tomorrow. Google Cloud offers a powerful, fully integrated and optimized AI stack with its own planet-scale infrastructure, custom-built chips, generative AI models and development platform, as well as AI-powered applications, to help organizations transform. Customers in more than 200 countries and territories turn to Google Cloud as their trusted technology partner.

Learn more about Google Cloud here: https://cloud.google.com/  


This episode is brought to you by AI Academy by SmarterX.

AI Academy is your gateway to personalized AI learning for professionals and teams. Discover our new on-demand courses, live classes, certifications, and a smarter way to master AI.

Learn more here.

Read the Transcription

Disclaimer: This transcription was written by AI, thanks to Descript, and has not been edited for content. 

[00:00:00] Paul Roetzer: AI isn’t the answer to every problem or every need to increase efficiency or productivity. It’s great to assess workflows. It’s great to look at problems differently, but AI isn’t always the answer sometimes. More human is the answer. Welcome to AI Answers a special q and a series from the Artificial Intelligence Show.

[00:00:18] I’m Paul Roetzer, founder and CEO of SmarterX and Marketing AI Institute. Every time we host our live virtual events and online classes, we get dozens of great questions from business leaders and practitioners who are navigating this fast moving world of ai, but we never have enough time to get to all of them.

[00:00:36] So we created the AI Answers Series to address more of these questions and share real time insights into the topics and challenges professionals like you are facing. Whether you’re just starting your AI journey or already putting it to work in your organization. These are the practical insights, use cases, and strategies you need to grow smarter.

[00:00:57] Let’s explore AI together.[00:01:00] 

[00:01:03] Welcome to episode 1 63 of the Artificial Intelligence Show. I’m your host, Paul Roetzer, along with my co-host today, Cathy McPhillips, our Chief Marketing Officer at Marketing Eye Institute and SmarterX. Welcome, Cathy. Thank you so much. It is weird to look across the screen and not see Mike there after there’s been so many of these.

[00:01:20] But this is, I mean, this is like our, our fourth together, right? Like I think, 

[00:01:24] Cathy McPhillips: yeah. 

[00:01:24] Paul Roetzer: So this is not Cathy replacing Mike. This is not our weekly show we do every Tuesday. This is a special edition we call AI Answers. So we introduced this series, I think it was what, June or July of 2025. Yeah. And the idea here is,   as part of our AI literacy project, we, we do a intro to AI class every month free, and we have now done 50 of them, and Cathy and I host that together.

[00:01:50] So we do that every month since the fall of 2021. And then we do a five essential steps to scaling AI class every month for free. And we are on. [00:02:00] 10th, 

[00:02:00] Cathy McPhillips: 10th or 10th. 10th is tomorrow, I guess, the day this drops. 

[00:02:03] Paul Roetzer: Yeah, the day this drops. All right. So Cathy and I are spending a lot of time virtually doing these things this week.

[00:02:08] So AI answers is a, you know, basically every other week or so, we do about two, two to three a month where we just go through and answer question. So when we do these intra AI classes and the scaling AI classes, we will get dozens of questions, and we usually get to maybe five to 10 of them,   on each episode or on each class.

[00:02:27] And so we introduce this new podcast series in partnership with Google Cloud, and we thank them for their support,   to just try and get through as many of these questions as we can. And so that’s the gist of it. it is literally just,   unscripted. Cathy has questions from the thing, and we answer ’em because in real time that’s what happens.

[00:02:43] The questions come in, we answer ’em. So Cathy and Claire on our team curate the questions and then we jump on a call. And,   so if there’s, if there’s questions Cathy asks that I don’t have great answers for. I didn’t prepare for it like it is. It is meant to be sort of real time. And if I can provide some [00:03:00] guidance on some things, we’ll direct you to other resources.

[00:03:03] So that’s what we’re gonna do today. Today’s episode is in addition to being presented by Google Cloud. It is brought to us by AI Academy, by SmarterX.   we announced this and launched this on August 19th. So this was just Tuesday of this week.   this is the thing we’ve been working on for 10 plus months.

[00:03:20] If you listen to the podcast regularly, you hear us talking about this. So we finally,   brought a bunch of new courses, professional certificates,   live experiences, product reviews, all these new things that we’ve built into our AI mastery membership program,   as part of AI Academy. So you can now go check it out.

[00:03:39] We have a brand new website, academy.SmarterX.ai You can go learn all about the individual plans. You can learn about our new business accounts that we’re really excited about, and you can kind of check that out. And you can also access the webinar from Tuesday, the launch event webinar, where we shared the entire vision.

[00:03:57] We went through, you know, really a lot of just making the [00:04:00] business case for AI education and training internally in your organization. The, I would say the majority of the presentation on the launch, it was actually more about,   educational value related to how to make that case and what the value of investing in AI literacy is.

[00:04:15] And then it ends with a kind of an overview of what we’re doing with AI Academy. So again, go to academy.SmarterX.ai and we will also, in the show notes, put a direct link to the launch event webinar,   which is available on demand. Cathy, I’m gonna turn it over to you and kick us off. 

[00:04:31] Cathy McPhillips: Okay. Let’s do this.

[00:04:33] Okay. So this week was different. So this is our fourth class and while oftentimes the questions are so very different, sometimes we do get a lot of the same. So usually Claire will export all of the questions. She’ll go through and do a read and give some recommendations, and I’ll run them through,   AI of some sort.

[00:04:50] Today I used or ChatGPT and said, put these in a flow so Paul and I can have a great conversation. I also ran them through Notebook LM to be sure that they weren’t questions that we’ve [00:05:00] asked, just to make it different just so people could go back to the other episodes. And this is all fresh, different questions and I tweaked ’em a little bit.

[00:05:06] So we’re continually figuring out how to evolve that process for these questions. All right. 

[00:05:13] Question #1: Which environmental concern feels most urgent for the AI industry to solve in the near term—and who should be responsible for leading the solution?

[00:05:13] Cathy McPhillips: Question number one. What, which environmental concerns feel most urgent for the AI industry to solve in the near term? And who should be responsible for leading the solution? We’re start strong. 

[00:05:23] Paul Roetzer: Yeah, really. Um. So just a, a little background, the environmental concerns, this is a question that does come up in various forms.

[00:05:29] Quite often the concerns are,   like I just literally saw this morning that,   Oracle is planning, planning to spend a billion dollars to power an openAI’s data center with gas turbines. Like that’s not great for the environment. Like so, so there are these very real, like immediate concerns where they don’t have enough power in the electrical grid to do the things they want to do.

[00:05:55] So they’re using gas powered machines to power these data centers like that as [00:06:00] an immediate and obvious challenge. The bigger picture here is to do what these major labs like Google and Meta and OpenAI and others want to do requires way more data centers than we currently have. And those data centers require way more energy than we currently have in the grid.

[00:06:22] And so we’re going to have to do things and it cannot all be clean energy. And so there’s a bit of a trade off. Well, there’s a significant trade off, I should say, probably for at least the next decade, where environmental concerns are largely going to be pushed aside by the US government at least, and the labs themselves.

[00:06:43] And the bet that they’re going to make is that if we build more intelligent ai, it will actually help us solve the bigger picture climate problem, long run. And so whether it comes to economics and jobs or energy, that is [00:07:00] generally the talking point of all the leaders of these labs is it’s a trade off.

[00:07:05]   it’s not gonna be where we want it to be in terms of,   being, you know, net zero in terms of the carbon emissions, like we’re gonna emit more carbon.   but. In the long run, we think it’s gonna enable us to solve the bigger problem. So it’s a very real issue. The thing I talked about on the podcast recently that any of us can actually do ourselves, it’s not a great thing, but basically use the smaller, more efficient models that that is.

[00:07:30] Like if you use a reasoning model, if you use image generation, video generation, those require way more compute power,   or if you use just a larger language model versus smaller, more efficient models. So I would say the one thing you can do if you really care deeply about this, it’s kind of like turning the lights off in the room when you leave.

[00:07:49] Like it’s a little thing, but use a smaller model like that that it adds up when you’re talking about billions of users of the AI technology. 

[00:07:58] Question #2: How well do AI models reflect diverse languages and cultures, and will they ever move beyond an American-centric bias? Have you seen any progress on this front?

[00:07:58] Cathy McPhillips: Okay. [00:08:00] Number two, how well do AI models reflect diverse languages and cultures, and will they ever move beyond an American centric bias? And have you seen any progress on that front?

[00:08:09] Paul Roetzer: Geez, a man picture. You’re from an intro class. This is incredible.   yeah, I mean, it’s gonna inherently be bias. There’s, I, I’ve talked about this a lot in the podcast. There’s bias in every element of this. The data that goes in to train the models, the post training of the models, the system prompt that the determines how the models behave, the languages they learn from all these things.

[00:08:31]   and the reality is that most of the models being used today, whether it’s in chat, GBT or Gemini, whatever, are trained by companies in California and the United States. And,   you know, I think that there’s a lot of effort to diversify that. But generally speaking, I think that’s basically where we’re at.

[00:08:50] Like there, they’re gonna be US based models. Now obviously, like China’s a major player, their deep seek is a Chinese based lab that made some waves earlier this year. [00:09:00] And so you’re gonna have other countries that, you know, build models that maybe are inherently trained on,   localized languages. For the most part, what’s happening is companies like Meta and Google and others are training on English, and then the models learn to translate into other languages.

[00:09:18]   I think a lot of it might come down to like post training and things like that, but yeah, I mean this is just kind of, it’s the way they work right now.   and I don’t know that that’s gonna change dramatically in the next year, you know, few years. 

[00:09:30] Cathy McPhillips: Yeah. I think wonder if the more we’re using these tools and the more the international folks,   non-English speaking folks are using the tools, you know, we talked about that.

[00:09:38] I think I wanna say it was on one of our mastery courses that people who are bilingual were using the tools in English and in their native language and we’re seeing the results. And does that contribute to this a little bit? 

[00:09:50] Paul Roetzer: Yeah, I mean, it could, I mean, OpenAI said that I think their largest user base is actually out of India.

[00:09:57] Like I think part of this is [00:10:00] just gonna be market driven, where, you know, where the users are, they’re going to have to adapt the products to be more localized to the user base. So I could see more diversification in that way where they, they just look at the market and say, okay, we have to start catering more to this audience.

[00:10:16] Sure. And it might come back to even the training of the models themselves or the, you know, the specialization of the models after they’ve initially been trained. 

[00:10:25] Question #3: What risks and ownership issues come with AI-generated video and images in marketing? Has this evolved over the past few years? Have you seen any legal clarity, or will this remain a gray area in the near term?

 

[00:10:25] Cathy McPhillips: Okay. Number three, what risks and ownership issues come with AI generated video and images and marketing? Has this evolved over the past few years and have you seen any legal clarity?

[00:10:35] Or is this still just a big gray area? 

[00:10:37] Paul Roetzer: Yeah, there’s not much legal clarity here. the basic premise,   whether it’s text or video or image or anything, is in the United States, if you use AI to create something, you can’t own a copyright to it. It’s gotten a little bit more fuzzy in the last few months,   because the current administration is not as friendly to creators, I would say.

[00:10:58] Like they, they don’t [00:11:00] really put as much stock in copyright.   there’s actually some who have influence within the administration who would like to just throw it away, that there is basically no, you know, no protections for copyright holders. So that could change things. But as of right now, the US   trademark office says that AI generates stuff, can’t hold a copyright.

[00:11:22] So if you’re gonna create videos, if you’re gonna create logos, things like that through marketing,   using ai, you do have to talk to your legal team and be very clear. If it’s something that’s very important to you to hold a copyright to and to be able to protect that,   under US law, then you want to have those conversations with your attorneys.

[00:11:42] I always tell people. We, we pay very close attention to this space. I have worked with IP attorneys for years. I have probably an above average understanding of what’s going on, but I am not an attorney and I am not providing legal advice. So I would just say, yeah, you gotta kind of,   [00:12:00] really just know is it something you want to be able to protect, that you would be willing to spend resources to protect and also understand It’s just getting so hard.

[00:12:09] Like,   one of the things that, you know, I think brands have to worry about, creators have to worry about is just how easy it is to deep fake somebody, like literally deep fake a podcast host and like start a new podcast that looks and sounds exactly like them. And that’s gonna happen to executives of companies.

[00:12:26] It’s gonna happen all across the spectrum. So this is a really important area to pay attention to, but there is not a lot of clarity right now as to where this is gonna go and how it will evolve there. There’s just, there’s a lot of court cases right now. Dealing with this, but I also still don’t feel like we’re gonna have clarity in the next year or two.

[00:12:43] I think it’s just gonna go on for a while. 

[00:12:46] Cathy McPhillips: And is there a difference between generating an image in a tool ver, you know, and using it or ideating in a tool and having an artist create it from that? Is that the same thing? 

[00:12:58] Paul Roetzer: Yeah, I mean, I think everything’s a gray [00:13:00] area. Like right when, yeah, when you submit an application to protect something, you have to like provide that clarity.

[00:13:06] And I think everything’s just gonna be case by case. And if you have to, you know, at some point go through an audit trail of how something was created, it’s going to be up to a reviewer within the patent and trademark office to determine whether that’s good enough. And that’s gonna be subjective on its own.

[00:13:22] There’s gonna be human bias tied to those decisions. So yeah, it’s, you know, I think the general guidance is if it’s something that’s really important for you, that you want to have the human as deeply in the loop as possible, and you want to be able to show the human involvement in that process.   no one is gonna take your word for it.

[00:13:41] If you say, well, it’s actually my idea. I gave it this and then all I had to do is this and this. It’s like, okay, show me the thread. Like, show me that chat. So I think you almost have to,   assume you’re gonna have to prove that the human element and you know, make sure you go through that process. So [00:14:00] yeah, my general guidance is like, again, if it’s critical, like a logo for your company, right?

[00:14:04]   you don’t want 95% of that work done by the AI because that’s something you want to be able to protect and you don’t want other people to steal it and put it on a baseball cap and you can’t do anything about it ’cause you actually used AI to create it. Like, that’s the kind of stuff I think about, 

[00:14:21] Cathy McPhillips: you know, and that responsible AI manifesto you did years ago.

[00:14:24] The one I always, the point and that I always bring back to people is like, legal precedent is lagging so far behind all of this. Like, do the right thing. 

[00:14:31] Paul Roetzer: Yes. Yeah. Yeah. and you know, I think part of it is people, people don’t know what the right thing is. Sometimes when it just comes to these like. Not even knowing that copyright is an issue with ai.

[00:14:41] I can’t tell you how many times I’ve stood on stage and said like, Hey, if you use outside creative firms or you know, outside copywriters, you need to have in your contract with them that they can’t use Gen AI unless you approve it because they may be transferring work to you that they used AI for and you don’t hold a copyright to it and they just stare at you like, [00:15:00] wait, what?

[00:15:01] And I mean, even last year at  MAICON, we had a whole panel about this and I think most people in the room, and this is what, September of 2024 were in shock that, that that was the thing. 

[00:15:13] Cathy McPhillips: And they’re really smart people in the room. 

[00:15:15] Paul Roetzer: Yeah. Really advanced marketers at an AI conference. So it’s still very early and I just think at minim  like an awareness that this is a thing is very important.

[00:15:26] Question #4: What are the best ways to start experimenting with AI agents, and are there good resources for building them? What’s a smart first step for a solo professional vs. a mid-sized team?

[00:15:26] Cathy McPhillips: Absolutely. Okay. Number four, what are the best ways to start experimenting with AI agents? And are there good resources for building them? And what are is, are there different steps between like a solo. Entrepreneur or, and a midsize team. 

[00:15:40] Paul Roetzer: Yeah. So first thing with AI agents is to know what they are. So they’re basically AI systems that can take actions to achieve a goal.

[00:15:46] Now, the confusion comes in with AI agents as to how autonomous they are. So this is, you know, it’s like, Hey, I’m just gonna ask the thing, do the work for me, and it’s gonna do it and it’s gonna be perfect, and I have to verify it. It’s like I, the human’s almost out of the loop. [00:16:00] That’s not where the vast majority of AI agents are today.

[00:16:03] The human is actually heavily in the loop, the best place to start that I think, gives people a, the best example of what an agent is and is going to be, is to go run a deep research project in Google Gemini, or ChatGPT. That’s an agent at work. You’re giving it a prompt. You’re saying, Hey, I wanna do a research report on,   you know, my competitors, here’s the three competitors.

[00:16:27]   here’s their websites. Can you run an analysis of positioning and pricing and product mix?   take a look at their leadership team. Like whatever you’re just, you’re asking for this thing, like you would ask another human to do a project for you. And then it goes and does it goes and looks at all their websites.

[00:16:44] It analyzes everything. It does a summary of it. It pulls out highlights and entities and all these things. That’s an agent at work. So the human set, the project gave the goal, the agent develops its plan of how it’s gonna do it. It goes and does it, and then it comes [00:17:00] back and creates the output. Now you as the human step back and it’s like, okay, is this all true?

[00:17:05] Like, am I gonna verify all the facts? Things like that. But that’s roughly an AI agent at work. It’s a AI system that can go do something.   and so again, it’s, there’s different degrees of autonomy of how much of the work it can do on its own and how much or how little the human needs to be involved.

[00:17:22] That’s where we’re progressing. Another way you could go look at it’s, go look at agent do ai. So this is Dharmesh Shaw, co-founder and CTO of HubSpot created Agent ai. And it allows you to build these much more rudimentary agents where there isn’t much autonomy. It’s kind of like the human sort of saying, okay, here’s my workflow.

[00:17:40] Wanna build an agent that does this workflow for me? The agent itself may not be doing a bunch of thinking and reasoning on its own, but it is executing a sequence of tasks. And so,   I think the agents are gonna get better. They’re gonna get smarter, they’re gonna get more reliable, they’re gonna require less human [00:18:00] instruction.

[00:18:00] But deep research, like I said, is is probably the best example for most people of this idea of an AI system that actually takes action, not just creates an output. 

[00:18:11] Cathy McPhillips: Yep. And we can include in the show notes, the deep research webinar that we did, that you did mm-hmm. To kind of go through that process, both with the input as well as with the output and what was, what’s possible.

[00:18:22] Question #5: Is there value in using multiple platforms to cross-check results, or is committing to one ecosystem a better strategy?

[00:18:22] Cathy McPhillips: That’s pretty cool. Yep. Okay. Number five. Is there value in using multiple platforms to crosscheck results or is committing to one ecosystem a better strategy? 

[00:18:33] Paul Roetzer: So I do this all the time.   you know, I told this story with our AI academy that we, I mentioned we just launched, I built two,   what I call ada AI teaching assistants.

[00:18:44] I built one at Google Gem and I built a custom GPT, same system instruction, same knowledge base, same everything. And because it was a very important project to me, I wasn’t sure if one was gonna be better or the other. And I wasn’t sure, based on the different tasks I was gonna ask of [00:19:00] it, if maybe Gemini was better at helping me write abstracts versus maybe chat GBT was better at images for the cover, you know, slide, things like that.

[00:19:09] And so I used both of them until I got to a point where I realized the gem from Google Gemini just. Was better at what I was looking for. It was good enough at everything that it stopped being worth my time to repeat the task in both of them. And I just spent probably 90% of my time working on with the gem instead of the custom GPT.

[00:19:31] Now, that’s not always gonna be the case.   the other thing I will do is like, if I output a research report, say in in cut in ChatGPT, I may give it to Gemini and have Gemini function as the critic that assesses it and verifies outputs, things like that. So I’m a big fan of, of having multiple, of using them, especially for really important work or, or, you know, deeper thinking where I want to get multiple perspectives.

[00:19:57] Sometimes they come out with roughly the same [00:20:00] output, verifies it. Sometimes you get like a, a little different thing. And so I really like it in those situations where you are doing planning and thinking and creativity and you just want to kind of bounce, bounce around the ideas. Um. If you use it as a critic to crosscheck the output.

[00:20:16] So let’s say you use Gemini to crosscheck ChatGPT, they both still hallucinate. Like you can’t just rely on Gemini to make sure everything in chat. GPT was factually correct. Like there’s no way to get the human out of the loop and I don’t know that there should be, honestly, in the near future.   but yes, I do the cross checking thing all the time.

[00:20:37] I constantly have both Gemini and ChatGPT active, and then IU depending on the project, I will use both of them sometimes 

[00:20:44] Cathy McPhillips: with bigger teams that, you know, can’t afford to have everyone have two different, you know, licenses. What do you recommend? 

[00:20:53] Paul Roetzer:   the, yeah, you make your bet. Like they’re both great.

[00:20:56] I mean, and I know some people like Anthropic, Claude,   some [00:21:00] people, you know, if we’re talking about corporate work, like you only have access to copilot. So it’s not just ChatGPT and Gemini, but, um. I mean, I think the models are roughly commoditized.   they’re, they’re kind of on par with each other.

[00:21:15] They sort of leapfrog each other every three to six months. But if you have access to Gemini or chat GBT or copilot,   I think you just work with the one you have. I don’t, I don’t know that you can go wrong, and I think they just kind of keep improving in different areas. I love Google, Gemini 2.5 Pro. I mean, that’s my go-to for work.

[00:21:35] I would say I probably use ChatGPT more personal, but I also really love,   the pro versions of ChatGPT, like they’re, they’re reasoning models, but I pay the 200 a month for that. Like it’s worth it for me. So, I don’t know, at a, at a very high level, like 20 bucks a month for Gemini, 20 bucks a month, Forche, GPTI mean, we’re talking about PhD level intelligence in your pocket.

[00:21:56] Like, it, it’s hard not to be able to justify 40 bucks a month if you [00:22:00] have enough use cases for them. Sure. But if you’re just like using it three, four times a month and no. You just pay for one of ’em and move on. 

[00:22:06] Question #6: How should businesses weigh built-in AI assistants (like those in Google/Microsoft) versus standalone tools like ChatGPT? 

[00:22:06] Cathy McPhillips: Yep. Okay. Question six. We kinda dipped our toes in this answer already.   how should businesses weigh built-in AI assistance like Google or Microsoft versus standalone tools like ChatGPT, and do you think enterprises will eventually standardize on one, or do you think we’ll just live in a hybrid world for the time being?

[00:22:24] Paul Roetzer: Yeah, I mean, it’s probably gonna follow very similar along to productivity software, you know, for the last 20 years. Like, companies are gonna, you know, have an in-house thing, whether they’re a Microsoft shop or a Google shop, or eventually maybe an openAI’s shop if they get into the productivity game, which it seems like they may.

[00:22:41]   so yeah, I think we’re gonna con, we’re gonna continue to live in this world where there’s choices, probably two to three is what normally happens. One of them is gonna have 40 to 60% of the market share, and then somebody’s gonna have 20% and someone’s gonna have single digits. Like, it’s probably gonna play out like that.

[00:22:56]   it’s like the problem I’ve seen, [00:23:00] I mean, we, so we have Google Workspace internally. Um. The Gemini app as a standalone is way better than Gemini built into Google Workspace. So like, if I go into Google Docs or Google Sheets, Gemini in those platforms is, is almost useless to me. Like I don’t use it yet.

[00:23:18] I think they’ll get there. But the Gemini standalone app is incredible. Mm-hmm. And then you can just export to Docs or sheets. So I kind of reverse work, right? I do my productivity in the app and then I bring it into the workspace.   the challenge people face within corporations that only have access to like, you know, copilot or something is sometimes it’s a watered down version of what you can get directly from ChatGPT PT.

[00:23:46] And that’s where the issues come in, is if people are, have a ChatGPT PT account themselves, and they’re used to working with the full version that’s available through there. And then because maybe they’re in a healthcare company or financial services or a law firm. [00:24:00] There’s more restrictions internally on what they want that copilot to be able to do, then they might just not have all the feature sets in their corporate environment that they have outside of it when it’s not watered down.

[00:24:13] And that’s where I think a lot of the frustration comes in where people are like, oh, I have copilot and it doesn’t really do what I want it to do. It may just be because there’s some guardrails in place that are limiting its functionality for you. But you know, it’s,   I think you’re gonna, you’re gonna use whatever your company gives you, basically.

[00:24:30] Question #7: Are we moving toward a standardized way for websites to guide how AI systems interact with their content?

[00:24:30] Cathy McPhillips: Right, right. Okay. Number seven. Are we moving toward, are we moving toward a standardized way for websites to guide how AI systems interact with their content? 

[00:24:42] Paul Roetzer:   this is a tricky one. So CloudFlare recently enabled a capability where you could basically say like, you don’t want the large language models to be able to learn from your content.

[00:24:55] You can kind of turn it off. So it’s like a, almost like a robot stock, TXT, where it’s like, don’t come and take my content. [00:25:00] That’s a, it’s a challenging environment. Like we are entering a whole new world of how search engine optimization works, how people discover content. We are definitely starting to see reports now of fewer click-throughs to websites because with Google’s ai, o ai mode now, and AI overviews, like they’re just getting the answers they need right from the search engine, or they’re just getting them right from the chat bot and or AI assistant and they’re not having to go to the website.

[00:25:28] So there’s no like, best practices yet. I I think this is a very much a, like a independent, so,   decision has to be made by brands. I would probably, at this point caution overreacting,   because we know so little about how consumer behavior is going to evolve. I would hesitate to wall off your content and think that that’s gonna get you ahead.

[00:25:55] It’s not ideal that we see traffic plummeting to corporate sites, [00:26:00] but we knew this was gonna happen. We said this like early last year on the podcast, like, I assumed our SEO goes to zero. Like I assumed our, our search traffic just goes to nothing. And so I, you know, years ago kind of followed this approach of like, well, let’s go to YouTube, let’s go to podcast.

[00:26:15] Like, let’s diversify our content. Just put it everywhere. and like, if people don’t come to our corporate website, fine. Like, so be it. Like we’ll just be where the audience is. And so I would, I think this is much bigger picture around your overall content strategy,   how people find you. If your company is dependent upon search traffic, you need to be urgently like assessing that because I think it’s very safe to assume whether you’re B2B, B2C or, or both.

[00:26:46]   we just can’t rely on search engine traffic the way we used to. 

[00:26:51] Cathy McPhillips: But going back to just answering our customer’s questions. I mean, the best thing we could be doing, in my opinion. Right, right. I mean, [00:27:00] yeah. And it’s interesting. We were doing some, I was in GA four a few weeks ago, and ChatGPT is our one of our top referers right now.

[00:27:06] And of course, like to your point, I was like, oh my gosh, what can I do right this second? Yeah. And with Academy, I just couldn’t stop what I was doing and focus on that. But you know, it does like, okay, let’s have a strategy behind what we’re doing and what’s up there and if it works with the LLMs then awesome.

[00:27:22] Paul Roetzer: Yeah. And I think, you know, we just kind of assume this, like the, so this AI answers is a great example,   of like just create value. Now the answer like this transcript will be on the internet. It’ll be sucked into the training data of all the models. And maybe the answer to these questions just shows up in ChatGPT with no citation.

[00:27:40] Like there’s a very good chance something like that happens. Um. I think we’re just playing the long game of like, okay, but what’s the alternative? I, we don’t put our transcripts online and we don’t solve for the customer, or like for the end user who just wants the knowledge. So we’re just making a bet on like, listen, let’s just create as much value as humanly possible.[00:28:00] 

[00:28:00] As a result of that, we build an audience of people who come to trust us and seek our knowledge out, whether it’s through their podcast choice or their YouTube channel choice, or the searches they consume, whatever. and like, it’ll just work out. Like I I, it’s weird because I’m so much a metrics driven person.

[00:28:17] Sometimes it’s hard to take that leap of like, I don’t know the actual metrics that’s gonna prove this is working, but sometimes you just have to use your instinct of like, the alternative seems like the less than ideal choice of just shut our content off from these engines.   and so we’re just gonna kind of take a leap of faith and do what we think is right.

[00:28:38] And I, I’m always a believer that like in the end, if you just solve for the audience. Everything works out. And so if we just stay focused on, Hey, we’ve got what, whatever the podcast gets now, 110,000 downloads a month, whatever it is, it was 45,004 months ago. Like it’s, it seems to be working like it seems to be helping people.

[00:28:57] The audience keeps growing and as long as [00:29:00] you do that and then you get the qualitative feedback from listeners about how it’s helping them and how it helped them with a CR career transition or help them reimagine their company. Like you just feel like it’s the right path, even if the numbers don’t always add up and tell you it is.

[00:29:13] and I think that’s where you have to make these like judgment choices that the AI’s not gonna make for you. AI don’t have human judgment and they don’t have human experience,   that that’s been gained over years. And sometimes you just have to trust that human side of it. 

[00:29:27] Question #8: How do you see different search engines being used or leveraged by AI companies?

[00:29:27] Cathy McPhillips: Right. Which leads us to question eight.

[00:29:30]   how do you see different search engines being used or leveraged by these AI companies? 

[00:29:36] Paul Roetzer:   yeah. I mean, this is such a unknown space. I mean, we’re watching a, a real time.   innovator’s dilemma right now with Google where other people are coming in and changing the search engine market and, you know, chat, CBT still dominates in terms of overall search to these,   chatbots and it changed the way people [00:30:00] seek information.

[00:30:00] And so the search engine company had had to evolve and in some ways they seem to be willing to cannibalize their original business, which is what, a year back, I don’t think most people thought they had the will,   to do. And they do seem to be willing to do it now. And so I think search overall is just going to evolve as consumer behavior and how we seek information changes.

[00:30:26] And I don’t know that anybody really has a great view into how that looks, two, three years out because there’s just too many big time variables. Like how much will voice play into all of this? Know, search historically has been, we type in something and we get a results page. Now it’s evolved to, we type in a prompt and we get a response from a chat bot or an AI assistant.

[00:30:51] Well, if Sury actually becomes intelligent, and if chat CPT voice gets integrated, and if meta has their way and [00:31:00] we start interacting with our glasses and you know, maybe Apple comes to the market with like AirPods that people actually just talk to the, maybe voice becomes the way we search and then all bets are off.

[00:31:12] So I think there’s too many people who see voice as the possible next major interface to be able to accurately predict what happens to search engines. Because whatever we think a search engine is today looks nothing like that. If voice becomes a dominant interface for, even if it’s just like Gen Z, like even if it’s just the next generation that uses voice all the time, then you’ll see this slow progression.

[00:31:37] So maybe there is like. I don’t even know what generation we are. Whatev whatever, gen X, whatever,   

[00:31:43] Cathy McPhillips: that you and I are. 

[00:31:44] Paul Roetzer: Yeah. What are we? We’re 

[00:31:45] Cathy McPhillips: Gen X. 

[00:31:46] Paul Roetzer: Okay. Yeah. 

[00:31:49] Cathy McPhillips: I’m very proud of that. 

[00:31:50] Paul Roetzer: Yeah. So like, maybe we don’t change, maybe we still like our search engine and maybe we type it in and maybe like, we’re always gonna kind of be more comfortable doing that.

[00:31:58]   but, but maybe [00:32:00] voice just gets really good and maybe we do change. So I don’t know. And I think it’s something, again, if you’re in a position within your organization where search matters, it’s a space you should be watching very closely because we’re learning new things each month as it goes by and we see new data points where now you’re starting to actually be able to watch the trend line of organic traffic, like plummeting to a lot of major sites.

[00:32:24] Question #9: How do you choose the right AI model for marketing, HR, and sales tasks? 

[00:32:24] Cathy McPhillips: Absolutely. Okay. Number nine.   we often focus on outcomes and use cases when selecting tools, but should we consider other things like transparency and governance integration? We’ve talked about how, you know, sometimes it’s best to pick a tool that. Aligns with your tech stack, but should we look at transparency, governance, environment?

[00:32:43] Should we think that big yet? 

[00:32:45] Paul Roetzer:   yeah. I mean, I think you should always be having those conversations.   you know, I think you, this is where the general AI policies come into play so much where, you know, you’re thinking about how your organization uses ai, you’re thinking about,   [00:33:00] kind of the user stories behind it.

[00:33:01] Okay, what’s HR gonna do? What’s marketing gonna do, what’s sales gonna do? how much do we need to put guardrails in place? And I’m kind of a believer in not getting too,   into the weeds on this. Like, you can’t con, you can’t govern every behavior. you want to govern the overall responsible usage of this technology and you wanna be clear on how to do it safely.

[00:33:25] Like, so for example, I just,   built the generative AI policies course for AI Academy and within that, it was the first time where I conceived of. AI agent guidance specifically related to computer use. And what that means is you can now through Anthropic, through Google and through openAI’s, enable these AI agents that can kind of take over your screen.

[00:33:50] You can also do it through Microsoft,   and they can perform things on your screen, like filling out forms, clicking on things they can actually go and interact, potentially even make [00:34:00] purchases on your behalf. I am a, a huge believer that should be outlawed within companies. Like your employees should not have the independent choice to turn on a computer use agent because there is so little known about the risks of those things.

[00:34:15] And so that has to be considered within your policies. And at this moment, like, I don’t know of people who have done that, like, because most business leaders aren’t even aware, computer use is a thing. So I think that, again, you have to know your, your employee base. You have to know the risks you have within that organization.

[00:34:34] But this is where legal and it really need to be deeply involved across different departments of the organization to make sure that we’re giving people the freedom to experiment with AI and to drive efficiency, productivity, performance with it, but also protecting them from themselves to make sure we’re not misusing the technology in a way that creates greater risk than we need to.

[00:34:56] Question #10: What role do you see AI playing in building and managing communities?

[00:34:56] Cathy McPhillips: Absolutely. Okay. Number 10, [00:35:00] what role do you see AI playing in building and managing communities? Is it more about efficiency, like automation and moderation, or about enhancing human connection? 

[00:35:09] Paul Roetzer: Yeah, I don’t know, I’m, you’re way more involved in our communities than I am, Cathy, so maybe you have something else to say here.

[00:35:14] But I think like, the way I think overall about automation is automate the things that are low impact, low human, where it’s just like, people just want the information. They, they, they don’t, not trying to like make a human connection to free your people up to spend more time on the human connection side.

[00:35:31] So, yeah, I mean, I think like if it’s,   automating, like I don’t, I don’t know, just random example.   let’s say if we took our podcast transcript from every Tuesday and we had an AI do a summarization of that, that it does in 25 seconds, that would take Claire two hours. Otherwise, nobody in our community cares if the summary of the transcript was written by AI or Claire.

[00:35:57] They, they just want the 10 bullet points of what are we [00:36:00] talking about this week. Now, if they had questions about why was Paul saying this and like, what does he mean by that thing he said. They’re gonna want me or Claire or you to come in and say, listen, I think here’s the intent of what he’s trying to say.

[00:36:13] They don’t want ChatGPT then interpreting. So I think that’s where you have to kind of like draw these lines of what is auto like automatable? What are the things we should automate? And then where are the things where the human should be there? And then how do we use the automation to free the humans up to do the more human stuff more often?

[00:36:31] I is that again, you’re, you’re in the time. I don’t know if it very explained. 

[00:36:35] Cathy McPhillips: I agree. You know, I tell this story, I told it about for like four years is that the first time I ever used AI working with the institute, I was writing  MAICON 2021 copy. And I was just like, what is this magic? Mm-hmm. And it saved me.

[00:36:49] So, I mean, and it was fine. I had to go through and edit a lot of stuff, obviously, but then I was like, that just saved me like half a day. So then it was like, I’m calling people, I’m emailing people one-on-one. Yeah. And that was such a better use [00:37:00] of my time. So that’s, you know, obviously that’s what we’re all doing right now with efficiency gains.

[00:37:04] But like right now, if Macy. Came to me and said, oh, I hand wrote, you know, I typed out all of the social and I got it all posted and it took me this long. I’m like, why didn’t you use AI to do that? So you could be in our community, engaging with our customers, listening to them, hearing what they need, getting to know them.

[00:37:21] That’s so much more valuable to our business and to us. And that’s just about, that would bring me so much more joy than writing social posts. 

[00:37:29] Paul Roetzer: Yeah. I think like in the responsibility of principles that you mentioned earlier, which we’ll put a link in the show notes. There was a line I wrote that said,   automation without dehumanization I think is what it said.

[00:37:41] And so this whole idea of like, yeah, we’re not trying to automate everything out. We’re not trying to automate relationships and human connection. We’re actually trying to enrich those things by automating the stuff that we should be automating that’s just data driven, repetitive, like no real human value to the output, other than they just want the information.

[00:37:57] And that was the whole premise of my AI for Writers Summit [00:38:00] keynote this year is like, when do we use ai? Like, when, when is it the human that should be in it? And even if we can use AI to automate the whole thing, when should, um Right. And I think that’s a, you know, it’s, it’s kind of a subjective thing.

[00:38:14] Like we all kind of make those choices, but hopefully your community managers,   make those choices. But again, even beyond community, like customer relationships, doing customer service, like when is a chat bot? Okay. and when does the human need to step in? Right. We have to make these choices. 

[00:38:29] Cathy McPhillips: And back to the, you know, writing social posts.

[00:38:31] Question #11:  From an information architecture perspective, what frameworks should teams use when integrating AI into CRM or workflow automation to keep systems scalable and secure?

[00:38:31] Cathy McPhillips: These are posts to distribute content, not to respond to somebody like that needs information. Yeah. So, yeah. Okay. Number 11, from an information architecture perspective, what frameworks should teams use when integrating AI into a CRM or workflow automation to keep the systems scalable and secure? So I think it goes back to that whole it and legal side of things.

[00:38:52] Paul Roetzer: Yeah. and you know, I think anytime you’re looking at workflow automation, the first thing you have to do is just define the workflow. Like, I think [00:39:00] so many times the greatest gains early going in adoption of AI is just take your 5, 10, 20 top workflows. Say, okay, here’s the 10 steps of this one, 15 steps of that.

[00:39:12] One, where can AI fit into these steps? Which ones do we want the humans to, you know, remain either in the loop or fully doing? And then from there, you really start to, you know, address these bigger questions around security. So maybe you look at something, I don’t know, just stay on the podcast example.

[00:39:29] Say there’s 50 steps in our workflow to do the podcast. Every week you go through and say, okay, 20 of these, we can use AI on two of these. We probably don’t want to though, because some data’s gonna go into the system that we don’t wanna put into the chat bot, whatever. And so you can then go through and kind of do it.

[00:39:45] So it starts with, you know, identification of the workflows. Then it starts with a breakdown, that workflow into the tasks that go into it. Then which ones can AI actually help us with? Then do we want AI to actually help us with this? Is it safe to use AI in this process?   and so [00:40:00] again, I’m, you know, I’m using the podcast, but you can expand this out to say like, what’s the workflow to do,   the customer,   analytics report each week.

[00:40:08] And so maybe there’s a step in that process where like, okay, well we can’t put this information into chat GBT, so even though it would help, let’s not do that yet, until we have an internal like private chat bot through an API or we don’t have any concern about data going back to openAI’s or somebody like that.

[00:40:24] So again, like depending on your level of sophistication, you may need to be working with other people within your organization cross-departmental to make those final decisions. Like, Hey, I’ve identified 20 ways I can make, make my efficiency improve. Here’s three. I’m a little uncertain about though, about whether we should do this, whether it’s a little gray area in our general AI policies.

[00:40:44]   can you know it team, can you please look at this and assess it, or, you know, the risk department, whatever. It’s depending on your industry. 

[00:40:51] Question #12: What are the most common mistakes companies make when trying to ‘force-fit’ AI into a workflow?

[00:40:51] Cathy McPhillips: Yeah. Which is a good flow into number 12. What are some common mistakes companies make when trying to force fit AI into a workflow? [00:41:00] 

[00:41:00] Paul Roetzer: AI is not always the answer.

[00:41:01] Like so often. I think that again, it’s,   I think it’s just a level of like, competency in what AI is capable of and when we should use it. And I think when people are very early in their comprehension of it and how to use it. Like again, AI agents might be a great example here. If you just hear that term and you think, oh, I’ll just make everything agentic, like, everything’s just gonna be like automated through agents.

[00:41:24] you probably don’t have like an advanced enough understanding of what agents are, where they are in their current capability. So like, again, for. AI Academy. I just built an AI agent’s 1 0 1 course. So all this is like kind of top of mind to me. you have to kind of understand the capabilities of AI and then that subjective part about when do we want the human, when do we want the AI to do things?

[00:41:48] And so everything, AI isn’t the answer to every problem or every need to increase efficiency or productivity. And so I think going in with that mindset that it’s great to assess workflows, it’s great to look [00:42:00] at problems differently, but AI isn’t always the answer. Sometimes more human is the answer.

[00:42:05] Sometimes simple automation that has nothing to do with ai. It’s just literally rules based like, Hey, we’re gonna set up this workflow with a make or Zapier or whatever. and it, it’s no AI at all. It’s just literally workflow automation. And so again, it comes down to understanding what the technology is capable of, and then you go from there.

[00:42:23] Question #13:  Which AI tooling is best suited to develop and monitor a marketing communications strategy at SME vs. enterprise scale? 

[00:42:23] Cathy McPhillips: Okay. Number 13, do you see adoption patterns differ between small businesses and large businesses, enterprises? 

[00:42:30] Paul Roetzer: This one’s probably real similar again, to any traditional technology or software decisions. I mean, certainly smaller companies can move quicker. They can decide, you know, in an afternoon the CEO of like 20 person, 50 person, a hundred person company.

[00:42:43] It’s like, all right, we’re, we’re getting ChatGPT team for everybody. We’re gonna roll it out. We’re gonna do a quick training session next Monday. And then I expect everyone to be like, using it by next week. Like, things happen fast. We see this with our AI academy. Like, you’ll get on a call and they’ll be like, all right, we, we want 25 licenses tomorrow for like our, our team.

[00:42:59] Like, [00:43:00] let’s go. There’s no procurement process, there’s no anything. I could just, you just move, you make decisions and you go. And then larger companies, obviously, you know, sometimes there’s bigger procurement side to this.   there’s more,   bureaucracy, there’s more,   guidelines. There’s a sometimes,   a less of a tolerance for risk.

[00:43:20] And so obviously things just move slower. Like we’ve advised some really big companies. Where say like a marketing team just wants to function like a small unit and doesn’t want to have to wait for the bureaucracy to figure everything else out. And so sometimes what happens in large companies is the IT department, the CIO, whomever they’re working with, say a Microsoft to do a massive installation, we’re talking about five, 10,000 licenses.

[00:43:47] And meanwhile, the marketing team’s like we, we just want 10 licenses to ChatGPT team so our team can build some GPT and send our emails faster like we newsletters or whatever. And so we’ve worked with those kinds of [00:44:00] organizations where we’ll just like, all right, fine. Like let’s just go do that. And sometimes you get permission, sometimes you don’t.

[00:44:06] Depending on your organization, you have to make those calls yourself. But you just go like, you default to like, we can’t wait six months for them to figure this out to maybe we get some copilot licenses in the marketing team. We just gotta go now. And so I think sometimes within large companies, you need individual business units with some autonomy to function,   in a, in a more nimble way that doesn’t put anything at risk like that.

[00:44:30] You know, make sure like the use cases are still safe within the gen ve policies, things like that. But yeah, that’s the biggest thing is the speed. I guess, is,   you know, small companies just move faster and they can take more risks. It’s how it’s always been though. Mm-hmm. This isn’t new to ai. 

[00:44:48] Cathy McPhillips: Yeah. My husband and I have that conversation a lot ’cause he’s an enterprise and he’s like, it’s just done.

[00:44:53] Like, yeah, we just did it. 

[00:44:55] Paul Roetzer: I mean, yeah. Taken months the way we function, it’s like, all right, we’re gonna launch an AI [00:45:00] academy and in like three months and it’s gonna have. 40 new courses and say, and you have, people are like, you’re gonna do what? Like, we would take three months to even decide the first course was gonna be, 

[00:45:10] Cathy McPhillips: right.

[00:45:11] Question #14: Do you think AI fluency will become a baseline requirement for executives, or is it creating an entirely new kind of leadership role?

[00:45:11] Cathy McPhillips:  Yeah. Okay. Number 14, do you think AI fluency will become a baseline requirement for executives? Or is it creating an entirely new kind of leadership role? 

[00:45:20] Paul Roetzer: Yeah, I mean, obviously we are,   very big believers that AI literacy is, is maybe the most important skill moving forward at all levels.   I think it’s gonna be very difficult to continue to maintain the authority and trust you have with your employee base as a leader if you don’t understand ai.

[00:45:43] So like, if you’re a CEO, A CMO,   head of hr, like whatever it is, your employees are going to be figuring this stuff out. And if they’re the ones always trying to explain to you or to get buy-in from you to do [00:46:00] something. They’re gonna get really frustrated because once you understand this stuff, it’s so obvious that it has tremendous benefits to the company, to the efficiency, the productivity, the performance, the creativity, the innovation, the decision making, problem solving.

[00:46:17] And so it’s very hard to run companies where the executive team is unaware of all of the ways they could be making the company smarter and better with ai. So, yes, I am, I do believe deeply that AI literacy, I fluency at the executive level is an imperative. And I think that’s gonna become very painfully obvious in the next six to 12 months at all levels.

[00:46:43] Like I think we’re getting there now with public companies because these executives are, you know, being asked about it on earnings calls every three months.   but I think we’re getting to the stage where, you know, it truly is required. 

[00:46:55] Question #15: What should creatives in fields like graphic design or UX/UI be thinking about as AI continues to evolve? 

[00:46:55] Cathy McPhillips: Absolutely. Okay. Number 15. What should creatives in [00:47:00] fields like graphic design or UX and UI be thinking about as AI continues to evolve and what have you seen creative professionals do successfully to stay ahead?

[00:47:09] Paul Roetzer: Yeah, this is interesting. So I was actually this morning listening to a Lex Fridman podcast with Sundar Pichai, the CEO of Alphabet and Google, and they were talking about the impact of VO three, their video generation model. And Sundar was, you know, bringing up the point of like, you know, you know, if we go back and we think about the disruption of media, you know, you go back 10 years, the idea that you could have podcasters like Lex Fridman who have these massive audiences, like that’s very disruptive to media companies.

[00:47:37] Like media companies we’re the gatekeepers. They’re the ones that that in for information out into the world. And that was it. And now we have tens of thousands, probably hundreds of thousands of podcasts. So we, like, we en empowered all these people that was through a distribution channel that wasn’t through ai, but we empowered all these other people to become gay peacekeepers themselves, to become media,   [00:48:00] channels.

[00:48:00] And I think the people who are really good at podcasting or meet each other, they’ll rise to the top just because, like everyone can create podcasts doesn’t mean everyone gets to build an audience. And so I think creativity as a whole is gonna follow a similar path. You’re, yes, like I can go in and create an eight second video now.

[00:48:18] I have zero ability to do video production, but I can do that now. But someone who does video for a living can do things I can’t even dream of. With VO three, like Claire on our team, Claire Claire’s way, well beyond any of our abilities with video creation. And so what Claire can do with VO three versus what you or I could do, Cathy, it’s like, it, it’s like magic.

[00:48:41] So I think that’s what’s gonna happen at all levels, whether it’s graphic design, video production, even with writing research, like all of these,   fields where we have to output something, where there’s creative elements to it. The people who are already good to great are just gonna 10 X up. They’re gonna have just [00:49:00] tremendous superpowers to improve their outputs and to improve the volume of outputs if they choose to.

[00:49:06] And then it’s gonna democratize it for everybody else who all of a sudden can now create stuff.   and so I think it’s gonna be a noisier space, but it’s gonna be a bigger pie of creativity. And yeah, I don’t know. I, I, that’s kind of how I think about it, is just like the people who sort of embrace this and figure it out, they’re still gonna be creative.

[00:49:25] Like they’re still gonna be designers and video professionals and writers,   but they’re just gonna have these kind of underlying superpowers. And you know, I think that’s exciting. But I can also see how, if you don’t wanna embrace it. It can be a bit daunting and it can feel like the thing that defines you maybe isn’t as special anymore.

[00:49:42] And I don’t think that’s true. I mean, my, my wife is an artist. My daughter’s an artist at 13. Like, I don’t think that at all, like they’re way more talented and if they choose to use AI and what they do, it’s just gonna level up what they’re capable of. 

[00:49:56] Cathy McPhillips: Right. Yeah. One of my really good friends is a, is in graphic [00:50:00] design, and for a long time he’s like, absolutely not.

[00:50:02] Absolutely not. And then recently he’s like, Hey, it’s doing all these things that I don’t wanna be doing so I can spend, you know, really be more creative. Or I’m using it for ideation with my team who isn’t creative, help us be able to communicate better with each other. So there are so many ways that he’s been using it that aren’t taking away anything from him.

[00:50:21] Paul Roetzer: Yeah. And I think that comes back to that awareness and understanding of if, if you, if you haven’t embraced AI yet and you just think it’s a replacement to you, like if, again, whether you’re a writer, graphic design, whatever. If you just look at it as that thing that’s gonna replace what you do and so you don’t want anything to do with it versus, well, maybe there’s like 50% of my job that I actually don’t enjoy.

[00:50:42] What if I just use it for that and I can actually do more with the other 50% now? And I think once people take the time, whether it’s coming to like our intro class or just have that first experience like, oh wait a second, this is amazing. Like I hate writing the report on Sunday [00:51:00] nights that my CEO wants and I don’t have to do that part anymore and I can be Sunday night back with my family and I can actually like do something else.

[00:51:07] I think once you find those use cases that make you realize you get to still be you and the thing that made you special still, you’re still special, like you still have those abilities, then I think you sort of change your perspective on ai. When you get, when you realize you still get a choice, you doesn’t have to replace you.

[00:51:25] You get to choose how you use it. 

[00:51:27] Cathy McPhillips: One of the first conversations I had with Jeremy on our team who started a few months ago was he was showing me this tool that could version out ads and do it well. And I was like, excuse me. What? Because right now that’s been me and Canva. 

[00:51:41] Paul Roetzer: Yeah. 

[00:51:41] Cathy McPhillips: And it takes forever.

[00:51:43] Paul Roetzer: Yeah. So, and there’s no fulfillment from there’s, you don’t get fulfillment in your job from that. It’s a task you have to do as part of your job 

[00:51:51] Cathy McPhillips: just being bitter about versioning out ads. 

[00:51:54] Paul Roetzer: Yeah. And honestly, like that’s a, that’s an interesting filter. Cathy is like, you know, we talk about with jobs GPT, you can go in and like, [00:52:00] here’s all the ways AI can help.

[00:52:01]   which a custom GPTI built that’s available with people, we’ll put it in the show notes, but one way you can think about it is like, if you just took a spreadsheet and went and wrote down like, okay, here’s the 25 things I do in my job. And then you made a column that says,   fulfillment. And it’s just a yes or no.

[00:52:15] Like, do I get fulfillment from doing this thing? Do I enjoy this part of my job? Take the things where you say no, and those are the first things you should automate. Like the things that give you fulfillment, bring more time up to do those things. 

[00:52:29] Question #16: How do you see coding and technical skills as careers in a world where today’s kids will grow up with AI?

[00:52:29] Cathy McPhillips: Yeah. Okay. Number 16, how do you see coding and technical skills as careers in a world where today’s kids will grow up with AI and if needed, what other skills should be developed in tandem?

[00:52:42] Paul Roetzer: I think I’ve talked about this one on the podcast where I’m, so my son is 12. He has taken a keen interest in coding, game design, robotics.   I’m all for it, like watching them play Minecraft, watching the things he builds when he goes to these coding camps, [00:53:00] you can just see it. It is teaching problem solving.

[00:53:04] It’s teaching, working through hard things, doing repetitive tasks that like require two, three hours of focus that is transferable. Like whatever coding looks like when he gets outta college in nine years or whatever. Anything he learns, these skills and behaviors will be applicable. And so like, would I, would I pay a hundred thousand dollars a year for a college right now for someone to go get a computer science degree if my son was a senior in high school?

[00:53:37] Like, that’s a conversation we would probably have to have of like, I don’t know that it’s necessary to do that. Like you could take these classes at, at Ohio University, like, and not spend a hundred thousand like great college, liberal arts college, do the computer science there. Like I would have a hard time with that.

[00:53:55] I would think more deeply about the true value of a computer science degree [00:54:00] versus getting that knowledge from anywhere and those skills from anywhere.   so I think the prestigious universities may struggle in the, in the coming years to justify the cost of a computer science degree. Not the degree, not like the degree itself isn’t valuable, it’s just is it as valuable as it would be at a major university?

[00:54:20] That’s something they’re gonna have to face. I think that’s probably already happening. I just saw a stat yesterday that computer science majors are, are having a, a very difficult time getting jobs right now. So I think we’re in this challenging job environment where there’s questions, but the technical skills, the behaviors, the traits developed are valuable and I think we have to figure out economically what that means to getting, you know, degrees in it and things like that.

[00:54:46] But I I am not at all discouraging my son from pursuing that path right now. I think it’s a very viable path. And if I was schools, I would, I would be leaning into training these skills and traits regardless of what the. [00:55:00] Job market may look like,   for computer science degrees at the moment, 

[00:55:03] Cathy McPhillips: but I think it’s also as important to be teaching them communication skills and relationship skills and all of that because you, we all need that, especially sometimes 

[00:55:13] Paul Roetzer: don’t go hand in hand.

[00:55:14] Like I do worry about that. It’s like problem solving, strategic planning, like you’re getting that, playing Minecraft and doing these things and building these environments, but like, okay, now let’s step outta this and let’s go to the playground. Let’s like, it is a hard balance to give kids those, those skills as well.

[00:55:29] But you’re a hundred percent right. the communication skills are, are fundamental and I would make sure they’re getting that balance. 

[00:55:35] Question #17: What’s the best way to handle situations when AI gets things wrong, and how do you approach fact-checking? 

[00:55:35] Cathy McPhillips: Number 17, what’s the best way to handle situations where AI gets things wrong and how do you approach fact checking?   what processes in humans are needed and has your answer changed as AI has gotten better and has AI gotten better?

[00:55:51] Paul Roetzer: Yeah, I mean, it’s getting better. The hallucination rate, the air rate is, is going down as the models get smarter, but it’s still there to the point where you can’t rely on the AI output [00:56:00] on its own without human fact checking, especially if it’s a important piece of information you’re putting out. So I shared this example.

[00:56:07] We talked about the AI gaps on the podcast recently, and one of them was the verification gap. It’s, I can go into Google, I can run a deep research project in Gemini right now. It’ll gimme this 40 page output that looks incredible. It has all kinds of data, dozens of citations, and it’s like, man, this on the surface looks better than any human I’ve ever hired would, would output.

[00:56:27] And then you dig into it and you’re like, okay, but the whole thing comes down to this one data point, and where did it get that data point from? And then you go into the citations and you’re like, Ooh, boy, I would never cite that source. And where did that come from? And then you start digging into it, and then the dominoes start falling where you’re like, this looks amazing.

[00:56:44] It looks like a PhD student wrote this thing. It’s all based on flawed assumptions and data, and so I have to throw the whole thing out. And so I think that’s the problem we see now is like people who don’t understand that these things get stuff wrong all the [00:57:00] time.   entities like, you know, facts, names, places, data points, whatever, and they just assume they can just publish whatever it comes or share internally, whatever it says.

[00:57:11] Like you do that surface level scan, it’s like, oh, it was amazing. I just did the five hour job in five minutes and I’m gonna send it to my boss. And then the boss looks at it and it’s like, wait a second, like two lines in. And I know that nobody checked this thing. And I think that that is the danger right now in companies is there’s so little true understanding of how these things work and where the heirs can occur.

[00:57:32] And so you have lower level managers outputting things with ChatGPT and Gemini, passing on to their leaders. The leader who has maybe some more domain expertise or intuition. Questions things more thoroughly than maybe the middle management does. And that’s where we’re kind of have problems. And the same with like interns and entry-level employees.

[00:57:52] Like they can do things really fast, but sometimes fast is not good. And I always say like re like [00:58:00] the simplest litmus test I always give is,   I, I’ve done this since like the early days of my agency, I would just ask somebody like, is this the best you can do? Like, gimme this research report, great, gimme this strategy.

[00:58:10] Great. Like, is this the best you can do? And if the answer is like, like internally, you’re like, yeah, I didn’t actually like check the sources or maybe I didn’t do like a full edit,   whether AI helped you or not, the question is the same. Is this the best you got? Because if I, if I’m gonna take the two hours to read this and I find errors in it, we got a problem.

[00:58:28]   and I think too many people are hitting the easy button right now when it comes to like using AI for research and planning. And I think there’s gonna be,   there’s gonna be some repercussions for that within businesses. 

[00:58:39] Question #18:  If you had to narrow it down to just one ethical principle that matters most right now, which would it be—and why?

[00:58:39] Cathy McPhillips: I agree. Number 18, if you had to narrow it down to just one ethical principle that matters most right now, what would it be and why?

[00:58:48] Paul Roetzer: Ooh, wow. So I don’t know. I mean, for me,   we talk a lot about this, but like everything we do is [00:59:00] about putting humans at the center of this, like unlocking human potential, not replacing it. Like I, I’m just a big believer that it’s, it’s too easy to just look at what AI is capable and say, well, let’s just, let’s get fewer people and let’s just do things.

[00:59:15] Let’s save some money, let’s increase our margins. Like, and ethically, I don’t think that’s the right thing. Like I think the right thing, ethically and morally is to say, how do we create more fulfilling lives for people? How do we create more time for people in their personal lives, their business lives, so they get more fulfillment outta their jobs and.

[00:59:33] Their family lives and like that’s the most important thing. Like I, if I didn’t think that was possible, I wouldn’t be doing what we’re doing. It’s why I’m doing it myself. Like I think, and I don’t know if I’ve ever publicly told this story, so whatever, but   so like the SmarterX logo, the icon is a black hole.

[00:59:49] Like I, I, nobody probably knows that other than Cathy who works with me on the logo design. But the whole premise of a black hole, if you don’t, you know, know the concept is as you [01:00:00] approach a black hole time dilates, it slows down because of the gravitational force of the black hole. And so I have fascination with cosmology.

[01:00:08] I have a fascination with physics and all these things. And so we were building the logo. I wanted the logo to represent the slowing down of time because to me, the greatest value that AI can give humanity is to slow time down. It’s the one thing none of us can get back. And so if we are able to automate some things that we don’t get a ton of fulfillment out of, and if that gives us more time to do the fulfilling things, or to be with our families and friends.

[01:00:33] Like we’ve, we’ve made an impact. And like, that’s why SmarterX exists. That’s why I started pursuing AI 13 years ago, was like, I wanted to create more time. And so that’s, to me, like keeping that centered in what we do is very important. 

[01:00:48] Question #19: How should companies address internal concerns around data privacy, compliance, and governance?

[01:00:48] Cathy McPhillips: That’s such a nice answer. Okay. Number 19, how should companies address internal concerns around data privacy, compliance, and governance?

[01:00:56] And do you see regulatory momentum changing how companies handle this? [01:01:00] 

[01:01:00] Paul Roetzer: This is definitely gonna be in, in many ways tied to what industry you’re in. And again, AI or no ai, like you are governed by these same policies and laws and regulations. And so you have to just accept that and be aware of that. Now, it is a dynamic environment.

[01:01:19] The laws are evolving,   the regulations with diff different industries. The data privacy regulations, all of this is a constantly evolving thing, but again, regardless of ai, that is true. AI is just accelerating a lot of it and creating more questions and unknowns that need to get addressed. But this is why it’s so important to work closely with legal team, with your risk team,   to do things within the parameters that keep your data safe, keep your customer’s information safe, keep your, your, your employees,   safe from doing things they shouldn’t be doing.

[01:01:53] Question #20: Which AI applications do you expect to break through sooner than people think—and which ones are overhyped?

[01:01:53] Cathy McPhillips: Yeah. Okay. Last question. Number 20.   which AI applications do you expect to break through sooner than [01:02:00] people think and which ones are overhyped? 

[01:02:03] Paul Roetzer:   so I think AI agents are overhyped for sure. They’re just, they’re just misunderstood.   and that’s the fault of the technology companies themselves that presented them as these autonomous things that they’re not.

[01:02:14] Yeah. That being said, two, three years from now, they’re not overhyped like I I think that long-term AI agents will transform the future of work and business. I just feel like out of the gate they got a little bit o over their skis. In terms of autonomy, I think the thing that’s overlooked right now is reasoning models.

[01:02:32] I really, very confidently believe that most business leaders have no concept of how significant reasoning models are like to high level knowledge work, strategic planning, decision making, problem solving innovation.   the ability to go through these chains of thought to think more deeply about problems.

[01:02:52] They get smarter the longer they think, like, that’s just weird. and most people have never [01:03:00] even tried a reasoning model knowingly. They’ve never run a deep research project. And I think once you do, you, you’re, you can’t look at anything the same. Like you look at business differently. So I think over the next, you know, six months or so, more and more business leaders are going to knowingly or unknowingly start experiencing the power of reasoning models.

[01:03:22] And I think that will accelerate change within businesses even more than we’re already seeing. 

[01:03:28] Cathy McPhillips: Wonderful. Since we still have you, and since tomorrow is Friday, August 22nd, and prices for MAICON early bird are ending, do you wanna give like a 32nd or six second plug on MAICON and some of the new speaker announcements?

[01:03:41] We have, 

[01:03:42] Paul Roetzer:   I don’t know, are we making speaker announcements? 

[01:03:44] Cathy McPhillips: We are. We’ve got a couple of them. 

[01:03:47] Paul Roetzer: So, yeah. So MAICON is October 14th to the 16th in Cleveland. This is our sixth annual, Cathy. It’s, is that right? It’s okay.   so you can go to  MAICON.AI. You can see the [01:04:00] agenda,   the speaker lineup.

[01:04:02] We do have what it looks like, I dunno, six or seven new speakers that we’ve just added. Are they added to the site now? 

[01:04:08] Cathy McPhillips: They are, 

[01:04:08] Paul Roetzer: yeah. I’m learning things when we do these podcasts. I didn’t know who was actually added to the site. So we have an incredible lineup,   on the main stage. Incredible lineup of breakout talks.

[01:04:17] There’s four amazing workshops.   and yeah, I mean, go to the site, check it out, and you can see all the speakers. And I’m, I think the marketing team’s probably gonna be spending out announcements of, you know, some of the keynotes that we’re adding,   as we go. So yeah, it’s, it, it’s awesome. You can do pod100 promo code and if you get in by Friday the 22nd, you can take advantage of the,   earlier bird pricing.

[01:04:40] Cathy McPhillips: Yes, you can. All right. Thank you, Paul, as always. And we will see everyone next time. 

[01:04:45] Paul Roetzer: Thank you. And thanks to Google Cloud for, for sponsoring the AI Answer series. Thanks for listening to AI Answers to Keep Learning. Visit SmarterX.ai where you’ll find on-demand courses, upcoming classes, [01:05:00] and practical resources to guide your AI journey.

[01:05:03] And if you’ve got a question for a future episode, we’d love to hear it. That’s it for now. Continue exploring and keep asking great questions about ai.



Features, Pricing & Use Cases


Why It’s Important to Look at GPT-5

The release of GPT-5 on August 7, 2025, was a major step forward in the progress of large-language models. A lot of people want to know how this new model stacks up against older ones and other systems that compete with it as businesses and developers quickly start using it.

GPT-5 gives you more context, better reasoning, fewer hallucinations, and a safer experience for users. But is it really the best choice for everything?

This article goes into great detail comparing GPT-5 to other LLMs, looking at its pros and cons, price, safety, and how well it works for different uses. We also talk about how Clarifai’s platform can help businesses work together and combine different models to get the best results and save money.

 


What We’ll Talk About

  • A brief history of GPT models and the LLM market, which is very competitive
  • The most important new things about GPT-5: size, reasoning, safety, and price
  • A look at the pros and cons of GPT-4, Claude, Gemini, Grok, and open-source models
  • In the business world, use cases include coding, making content, research, help, and regulated fields
  • Pricing and deployment problems, like how to combine Clarifai and keep costs low
  • Moral and safety issues, like fewer hallucinations and safer completions
  • New things and trends that could have an impact on the LLM environment in the future

By the end, you’ll know exactly what GPT-5 does well, what its competitors do well, and how to choose the best model for you.


The Expansion of GPT Models and Their Market

Quick Progress from GPT-1 to GPT-5

OpenAI’s GPT family has changed a lot since the first model came out in 2018. As each new generation came out, the number of factors, context length, and reasoning skills grew, which made conversations flow better and make more sense.

  • GPT-3.5 allowed for chat-style interactions.
  • GPT-4 added multimodal input through GPT-4o and improved reasoning.
  • GPT-5 now has a single system that automatically sends questions to the right model version.

There are three types of GPT-5: main, mini, and nano. There are four levels of reasoning for each: low, medium, and high. The model is a mix of a quick model for easy tasks, a deeper reasoning model for harder ones, and a real-time router that picks between the two.

This model is much better than earlier ones because it can take in up to 272,000 tokens and give out up to 128,000 tokens. It can hold long conversations and summarize long documents.

The Broader LLM Landscape

The competition has also moved quickly:

  • Claude (Anthropic): Known for constitutional AI and clear safety rules.
  • Gemini (Google): Works well with the Google ecosystem and supports many modes.
  • Grok (xAI): Targets open-source users by offering low prices and high performance.
  • Open-source (Llama 3, Mistral): Free, local options for projects that need privacy.
  • Clarifai platform: Makes it easier to set up, manage, and monitor models across LLMs.

You need to know these players because not every model works for everyone. In the next few sections, we’ll compare GPT-5 to each one in terms of features, price, and safety.


What GPT-5 Is Capable Of and What It Can Do

Longer Context and Reasoning Modes

The 272k token input limit and the 128k output limit are two of GPT-5’s best new features. This bigger context window lets the model read whole books, complicated codebases, or long meeting transcripts without stopping.

  • It can take in text and pictures, but it can only send out text.
  • DALL-E and GPT-4o make audio and images.

There are four levels of reasoning in GPT-5: low, medium, and high. This lets you choose how much computing power you need and how deep your answers are.

A real-time router chooses between a fast, smart model and a deeper reasoning model based on how complicated the conversation is. This mixed method makes sure that simple prompts work well while keeping strong reasoning for more difficult tasks.

Safe Completions & Reduced Hallucinations

OpenAI’s system card says that there have been big improvements in reducing hallucinations and making it easier to follow directions.

In GPT-5, safe completions are a new way to train that puts the safety of outputs ahead of binary refusal. GPT-5 doesn’t just refuse to answer a sensitive question; it changes its answer to follow safety rules while still being helpful.

The system card also talks about how to cut down on sycophancy by training the model not to agree with users too much. Prompt injection and deception are still problems, but early red-team tests show that GPT-5 does better than many of its competitors and has a lower success rate for behavior attacks.

Pricing & Competitive Costing

The prices for GPT-5 are very reasonable:

  • $1.25 per million input tokens
  • $10 per million output tokens

The GPT-5 small and nano models give even bigger discounts:

  • $0.25/m input (mini)
  • $0.05/m input (nano)

If you use input tokens again within a short amount of time, you get a 90% discount. This is very important for chat apps because they keep giving the same information about the conversation over and over.

So, GPT-5 costs less than GPT-4o and a lot less than Claude Opus ($15/m input, $75/m output) or Gemini Pro ($2.5/m input, $15/m output).

Model Variants & Modality Support

You can use the same software on a lot of different devices because there are three versions of GPT-5: main, mini, and nano.

  • GPT-5 mini is a less expensive option that doesn’t require as much reasoning.
  • GPT-5 nano is made for light uses like mobile apps or IoT devices.

But all of the models have the same way of training and keeping people safe.

Important: GPT-5 doesn’t support audio or image output by default. In GPT-4o and DALL-E, these features are still there.

GPT 5 vs other models


GPT‑5 vs GPT‑4 & GPT‑4o

Architectural Differences

GPT-4o had better latency and could take input from more than one source, but it still used only one model architecture.

GPT-5, on the other hand, uses a hybrid system with a real-time router and multiple models.

The result is better use of resources: simple tasks use the quick model, and complex questions use the deep reasoning model. Compared to GPT-4, GPT-5’s ability to switch automatically is a big step forward in architecture.

Context and Memory

GPT-4 could handle up to 32,000 tokens (and 128,000 for GPT-4 Turbo), but GPT-5 can handle 272,000 tokens and send back up to 128,000 tokens.

  • You can now summarize long technical documents or audio transcripts that are many hours long without having to break them up.
  • People don’t have to split content into smaller pieces anymore, which makes it easier to understand and less mentally taxing.

Reasoning and Performance

Early testers say that GPT-5 does its job better and makes fewer mistakes.

  • It is great at writing code, fixing big codebases, and solving hard math problems.
  • GPT-5 can answer hard questions and keep long chains of thought going because it has more ways of thinking.
  • According to Folio3, GPT-5 is better than GPT-4 at tasks like summarizing documents and answering hard questions.

Hallucinations & Safety

The system card for GPT-5 says that a lot of progress has been made in reducing hallucinations.

  • The safe completions system doesn’t stop responses; it just moderates them so they stay helpful.
  • Post-training also makes people less likely to be sycophantic, which means the model is less likely to agree with wrong things that users say.
  • Simon Willison says he hasn’t seen hallucinations in his daily life, but he knows experienced users stay away from prompts likely to cause them.

Pricing & Availability

  • When it comes to input costs, GPT-5 is less expensive than GPT-4o.
  • ChatGPT Pro subscribers can only get the high reasoning version, GPT-5 Pro, for $200 a month.
  • By default, all ChatGPT users can use the standard model.
  • When you use token caching discounts for conversations, you can save even more.

GPT 5 vs other models


GPT‑5 vs Claude, Gemini, Grok & Open‑Source Models

Claude (Anthropic) vs. GPT-5

People know that Claude Opus 4.1 has good safety rules and is honest about them.

  • Its context window (200k tokens) and reasoning depth are about the same as GPT-5’s high mode.
  • Big price gap: Claude Opus costs $15 per million input tokens and $75 per million output tokens — about 12× GPT-5’s input price.
  • Claude’s Sonnet and Haiku are cheaper, but less capable.
  • Claude is praised for careful answers and constitutional AI, making it a good fit for regulated industries.
  • Some developers think Claude is better than GPT-5 at creative writing or certain logic puzzles.
  • But many choose GPT-5 as default for its deeper reasoning and lower cost.

Gemini (Google) vs. GPT-5

Gemini 2.5 is very good at multimodal tasks and integrates with Google’s products.

  • Context windows: over 200k tokens.
  • Tiers: Flash and Pro.
  • Pricing: $2.50 per million input, $15 per million output — slightly more than GPT-5.
  • Strengths: Real-time web browsing and Google Workspace integration.
  • Weakness: May not match GPT-5 in deeper reasoning or safe completions.
  • Gemini relies more on refusal for safety, while GPT-5 moderates responses.
  • Choice: Gemini for rich multimodal experiences, GPT-5 for cost savings and reasoning.

Grok (xAI) vs. GPT-5

Grok 3 and Grok 4 are open-weight models from xAI, focused on open-source and community.

  • Pricing: $3 per million input, $15 per million output.
  • Performs well in coding and math tasks.
  • Appeals to developers who value transparency and self-hosting.
  • Weakness: No safe completions and higher hallucination rate than GPT-5.
  • GPT-5’s router and deeper reasoning give more consistent results.

Llama 3 and Mistral (Open-Source) vs. GPT-5

Free, open-source models that can run locally.

  • Great for privacy-sensitive applications or when cost is top priority.
  • Limitations: Smaller context windows and weaker reasoning than GPT-5.
  • Developers must manage safety, infrastructure, and governance.
  • For enterprise-grade reliability and safety, GPT-5 or Claude are better.
  • Clarifai’s local runners can host Llama or Mistral for low-cost inference and combine them with GPT-5 for complex tasks.

https://clarifai.com/openai/chat-completion/models/gpt-5


Industry‑Specific Performance & Use‑Case Comparisons

Coding & Software Development

GPT-5 is great at writing code and finding bugs.

  • Folio3 says GPT-5 outperforms GPT-4 in code generation, summarization, and answering complex queries.
  • Expanded 272k token context window enables processing of entire repositories or large code files.
  • Early adopters report GPT-5’s deeper reasoning reduces iterations when debugging or designing algorithms.

Other models:

  • Claude Opus: Strong at reasoning but more expensive.
  • Claude: Good for creative coding exercises or brainstorming.
  • Gemini: Works well with Google Cloud, generates code in Google Colab.
  • Grok: Open-source enthusiasts like it for transparency and cost, but requires manual prompting and verification.

Content Creation & Marketing

GPT-5 produces coherent long-form articles with fewer hallucinations and safe completions.

  • Great for blog posts, white papers, or scripts — maintaining tone and structure across thousands of tokens.
  • Claude: Safe and nuanced, but slower and pricier.
  • Gemini: Best for multimodal content (text + images, videos, tables).
  • Grok & open-source: Handle basic blog content at low cost, but weaker at complex narratives.

Research and Analysis

Researchers need to synthesize long reports and keep context across sources.

  • GPT-5’s large context and reasoning allow deep summarization of research papers and technical docs.
  • Safe completions reduce risk of hallucinated citations.
  • Claude: Provides careful summaries, but smaller context.
  • Gemini: Strong for up-to-date research via web browsing.
  • Grok & open-source: Cost-effective for internal docs, but need manual checking.

Customer Service & Support

In support, safety and cost are paramount.

  • GPT-5’s safe completions ensure compliant answers while staying helpful.
  • Mini and nano variants enable cost-efficient deployment in chatbots or IVR systems.
  • Claude: High safety, but costly — suited for regulated sectors.
  • Gemini: Multimodal support (e.g., screenshots, forms).
  • Open-source + Clarifai: Good for FAQs, while GPT-5 handles complex cases.

Regulated & High‑Risk Domains

Industries like healthcare, finance, and law require accuracy, safety, and auditability.

  • GPT-5: Focus on safe completions and hallucination reduction.
  • Its system card shows filtering of personal information from training data.
  • Claude: Constitutional AI may give stricter responses.
  • Gemini: Strong red-team testing and compliance integration.
  • Grok & open-source: Need extra governance and fine-tuning.
  • Clarifai: Adds secure hosting and audit tools for managing risk.

GPT 5 vs other models


Pricing, Accessibility & Deployment

Pricing Comparison

Based on what Simon Willison wrote in his blog, the table below shows the average price of inputs and outputs per million tokens.

Model

Input $/M tokens

Output $/M tokens

Notes

GPT-5

1.25

10.00

90% off reused tokens

Mini GPT-5

0.25

2.00

Less reasoning, cheaper

Nano GPT-5

0.05

0.40

For lightweight jobs

Claude Opus 4.1

15.00

75.00

Most expensive but strong safety

Claude Sonnet 4

3.00

15.00

Mid-tier performance

Claude Haiku 3.5

0.80

4.00

Cost-effective but limited

Gemini Pro 2.5 (>200k)

2.50

15.00

Large context, multimodal

Gemini Pro 2.5 (<200k)

1.25

10.00

Similar cost to GPT-5

Grok 4

3.00

15.00

Open weight and competitive

Grok 3 Mini

0.30

0.50

Lower cost but fewer capabilities

Mistral / Llama 3

0

0

Free, but hosting costs apply

 

Subscription Models & Access

  • GPT-5: Available to all ChatGPT users, even the free tier.
  • GPT-5 Pro (high reasoning): Only for ChatGPT Pro subscribers at $200/month.
  • Claude Opus: Requires an Anthropic subscription; advanced reasoning often reserved for enterprise.
  • Gemini: Free and paid tiers within Google Workspace.
  • Grok models: Accessible via xAI’s platform or open-source release.
  • Open-source models: Free, but require infrastructure for hosting.

Safety, Ethics & Reliability

Safe Completions & Moderated Responses

  • Traditional LLMs often refuse risky prompts outright.
  • GPT-5’s safe completions provide a middle ground: the model answers while removing harmful or disallowed content.
  • This makes GPT-5 more usable in education and support contexts where users may ask sensitive questions.
  • Safe completions rely on output-centric safety training, not binary classification.

Reduced Hallucinations & Sycophancy

  • OpenAI highlights that GPT-5 significantly reduces hallucinations and improves instruction-following.
  • Sycophancy reduction: Post-training teaches the model not to agree excessively with users.
  • Hallucinations still occur, especially with factual prompts outside training data.
  • Users must stay vigilant and fact-check in high-stakes contexts.

Data Privacy & Training Sources

According to the system card:

  • GPT-5 was trained on public data, partner data, and user-generated content.
  • OpenAI uses advanced filtering to minimize personal data.
  • Enterprises must still ensure compliance with data protection laws, anonymizing sensitive inputs before sending to the API.

Prompt Injection & Vulnerabilities

  • Prompt injection remains a major risk in deployed LLM apps.
  • OpenAI acknowledges GPT-5 is not immune — red-team tests targeted system-level vulnerabilities.
  • Mitigations:
    • Input sanitization
    • Retrieval augmentation
    • Ongoing monitoring
  • Clarifai supports these controls with retrieval pipelines and audit logs.

Implementation Considerations & Clarifai Integration

Choosing the Right Model for the Job

When selecting an LLM, weigh:

  • Task complexity
  • Budget constraints
  • Latency needs
  • Safety requirements

Examples:

  • Simple chatbots: GPT-5 mini or nano (low cost, fast).
  • Complex research/analysis: GPT-5 thinking or Claude Opus (deeper reasoning).
  • Multimodal tasks: Gemini.
  • Privacy/budget focus: Open-source models.

Clarifai orchestration can dynamically route queries based on these factors.

Orchestrating Multi‑Model Workflows

Developers can build pipelines where a query triggers multiple models in sequence or parallel.

Example pipeline:

  1. Intent classification: GPT-5 nano sorts the query.
  2. Retrieval: Clarifai’s vector search fetches relevant docs.
  3. Generation: Depending on classification, route to GPT-5 thinking, Claude Opus, or Gemini.
  4. Post-processing: Safe completions evaluate output safety.

This ensures optimal cost + performance while maintaining safety.

  • Clarifai’s caching lowers token costs.
  • Local runners enable on-prem deployments for compliance.

Evaluation & Monitoring

  • Track accuracy, relevance, latency, cost.
  • Monitor hallucination rate + user feedback to fine-tune selection.
  • Use A/B testing to compare GPT-5 vs. competitors.
  • Clarifai dashboards provide visual analytics + alerts when metrics drift.
  • Regular audits + human oversight maintain compliance and trust.

Future Trends & Emerging Topics

Toward Unified & Agentic Models

  • GPT-5’s hybrid system points to a future where different model types merge into a single architecture that balances speed and depth.
  • Researchers are exploring agentic AI → models that not only generate text but also plan and execute tasks using external tools.
  • GPT-5’s deeper reasoning + real-time router create a foundation for these future AI agents.

Open‑Weight & Transparent Models

  • Llama 3, Llama 4, and Mistral 8B (open-source) show the community’s commitment to transparency and autonomy.
  • Future GPT models may:
    • Provide greater training transparency
    • Possibly release open weights
  • Regulations could enforce higher transparency standards for powerful AI systems.

Improved Safety & Alignment

  • Efforts for fewer hallucinations and safer completions will continue.
  • Possible future improvements:
    • RAG (retrieval-augmented generation) built directly into LLMs → models fetch real data instead of relying only on memory.
    • Better prompt injection defenses
    • Context-aware moderation systems

Multimodal Expansion

  • GPT-5 cannot yet generate sounds or images.
  • Future updates may merge GPT-5 with DALL-E or voice models, enabling seamless multimodal interaction (text, vision, sound).
  • Competitors like Gemini already push in this direction, so OpenAI is likely to follow.

Clarifai’s Role in the AI Ecosystem

As the LLM landscape diversifies, Clarifai’s role becomes critical in orchestrating, monitoring, and securing AI systems.

  • Supports multiple models: GPT-5, open-source LLMs, computer vision models.
  • Offers vector search, compute orchestration, and local runners.
  • Expected to expand with:
    • Deeper integration into agentic workflows
    • Enhanced retrieval-augmented pipelines

Frequently Asked Questions: GPT-5 vs. Other Models

What are the differences between the versions of GPT-5?

  • Three versions: main, mini, and nano.
  • Each has four reasoning levels.
  • Main: full capabilities.
  • Mini/Nano: trade depth of reasoning for lower cost + faster speed.

What is the difference between GPT-4’s and GPT-5’s context windows?

  • GPT-5: 272,000 input tokens, 128,000 output tokens.
  • GPT-4 Turbo: 128,000 max.
  • GPT-5 is far more capable for long documents.

Is GPT-5 safer than older versions?

  • Yes. GPT-5 reduces hallucinations and offers safe completions instead of refusals.
  • It also uses post-training to reduce sycophancy.

How much does GPT-5 cost compared to other models?

  • GPT-5: $1.25 input / $10 output per million tokens.
  • Claude Opus: $15 input / $75 output.
  • Gemini Pro: $2.50 input / $15 output.
  • Grok 4: $3 input / $15 output.
  • GPT-5 mini and nano are even cheaper.

Which model is best for writing code?

  • GPT-5 excels in coding and debugging.
  • Claude: more creative/narrative output.
  • Grok: handles technical tasks cheaply.
  • Choice depends on complexity + budget.

Do I need Clarifai to use GPT-5?

  • No, but Clarifai offers:
    • Multi-model orchestration
    • Token caching (saves costs)
    • Local/private model hosting
    • Document retrieval for grounded responses
  • Especially useful in enterprise settings requiring multiple models + strict safety.

What sets GPT-5 apart from GPT-5 Pro?

  • GPT-5 Pro (a.k.a. thinking-pro) uses the deeper reasoning model exclusively.
  • Only for ChatGPT Pro members → $200/month.
  • Ideal for intensive reasoning tasks.

In 2025, Choosing the Right Model

GPT-5 represents a major leap forward in LLMs:

  • Longer context
  • Deeper reasoning
  • Safer outputs
  • Competitive pricing

Its hybrid architecture + flexible reasoning levels make it versatile across workloads. Safe completions + sycophancy reduction improve trustworthiness.

Compared to GPT-4/4obig improvements in memory and reasoning.
Against competitors (Claude, Gemini, Grok) → GPT-5 balances performance + affordability, though rivals retain niche strengths.

Key decision factors:

  • Task complexity
  • Cost sensitivity
  • Safety requirements
  • Multimodal needs

For many enterprises, a multi-model strategy via Clarifai offers the best of all worlds:

  • GPT-5 → deep reasoning
  • Gemini → multimodal tasks
  • Claude → high-safety environments
  • Open-source models → cost-sensitive/private workloads

Flexibility + responsible deployment will be essential to harness AI’s full power in 2025 and beyond.



How to Create an AI-Powered Search Strategy with Wil Reynolds [MAICON 2025 Speaker Series]


MAICON brings together top visionaries and experts in the field of AI during a three-day conference packed with actionable sessions and networking events—all to position you as the change agent your organization (and career) needs. In this ongoing speaker series, we’re featuring these extraordinary leaders, with forward-looking predictions, actionable tips you can use today, and a preview of their MAICON 2025 sessions. Continue reading “How to Create an AI-Powered Search Strategy with Wil Reynolds [MAICON 2025 Speaker Series]”

Enterprise Architecture & Use Cases


Introduction: Why RAG Matters in the GPT-5 Era 

The emergence of large language models has changed the way organizations search, summarize, code, and communicate. Even the most advanced models have a limitation: they produce responses that rely entirely on their training data. Without up-to-the-minute insights or access to exclusive resources, they may generate inaccuracies, rely on old information, or overlook specific details unique to the field.

Retrieval-Augmented Generation (RAG) bridges this gap by combining a generative model with an information retrieval system. Rather than relying on assumptions, a RAG pipeline explores a knowledge base to find the most pertinent documents, incorporates them into the prompt, and then crafts a response that is rooted in those sources.

The expected improvements in GPT-5, such as a longer context window, enhanced reasoning, and integrated retrieval plug-ins, elevate this method, transforming RAG from a mere workaround into a thoughtful framework for enterprise AI.

In this article, we take a closer look at RAG, how GPT-5 enhances it, and why innovative businesses should consider investing in RAG solutions that are ready for enterprise use. We explore various architecture patterns, delve into industry-specific use cases, discuss trust and compliance strategies, focus on performance optimization, and examine emerging trends such as agentic and multimodal RAG. A detailed guide with easy-to-follow steps and helpful FAQs makes it simple for you to turn ideas into action.


Brief Overview

  • RAG explained: It’s a system where a retriever identifies relevant documents, and a generator (LLM) combines the user query with the retrieved context to deliver accurate answers.
  • The importance of this issue: Pure LLMs often face challenges when it comes to accessing outdated or proprietary information. RAG enhances their capabilities with real-time data to boost precision and minimize errors.
  • The arrival of GPT-5: With its improved memory, enhanced reasoning capabilities, and efficient retrieval APIs, it significantly boosts RAG performance, making it easier for businesses to implement in their operations.
  • Enterprise RAG: Our solutions enhance various areas such as customer support, legal analysis, finance, HR, IT, and healthcare, providing value by offering quicker responses and reducing risk.
  • Key challenges: We understand the issues you face — data governance, retrieval latency, and cost. Our team is here to share best practices to help you navigate these effectively.
  • Upcoming trends: The next wave will be shaped by agentic RAG, multimodal retrieval, and hybrid models, paving the way for the next evolution.

What Is RAG and How Does GPT-5 Transform the Landscape?

Retrieval-Augmented Generation is an innovative approach that brings together two key elements:

  • A retriever that explores a knowledge base or database to find the most relevant information.
  • A generator (GPT-5) that takes both the user’s question and the retrieved context to craft a clear and accurate response.

This innovative combination transforms a traditional model into a lively assistant that can tap into real-time information, exclusive documents, and specialized datasets.

The Overlooked Aspect of Conventional LLMs

While large language models such as GPT-4 have shown remarkable performance in various tasks, they still face a number of challenges:

  • Limited understanding – They are unable to retrieve information released after their training period.
  • No proprietary access – They don’t have access to internal company policies, product manuals, or private databases.
  • Hallucinations – They occasionally create false information due to an inability to confirm it.

These gaps undermine trust and hinder adoption in critical areas like finance, healthcare, and legal technology. Increasing the context window alone doesn’t address the issue: research indicates that models such as Llama 4 see an improvement in accuracy from 66% to 78% when integrated with a RAG system, underscoring the significance of retrieval even in lengthy contexts.

RAG with GPT 5How RAG Works

A typical RAG pipeline consists of three main steps:

  1. User Query – A user shares a question or prompt. Unlike a typical LLM that provides an answer right away, a RAG system takes a moment to explore beyond itself.
  2. Vector Search – We transform your query into a high-dimensional vector, allowing us to connect it with a vector database to find the documents that matter most to you. Embedding models like Clarifai’s text embeddings or OpenAI’s text-embedding-3-large transform text into vectors. Vector databases such as Pinecone and Weaviate make it easier to find similar items quickly and effectively.
  3. Augmented Generation – The context we’ve gathered and the original question come together in GPT-5, which crafts a thoughtful response. The model combines insights from various sources, delivering a response that is rooted in external knowledge.

GPT-5 Enhancements

GPT-5 is anticipated to feature a more extensive context window, enhanced reasoning abilities, and integrated retrieval plug-ins that simplify connections with vector databases and external APIs.

These improvements minimize the necessity to cut off context or split queries into several smaller ones, allowing RAG systems to:

  • Manage longer documents
  • Tackle more intricate tasks
  • Engage in deeper reasoning processes

The collaboration between GPT-5 and RAG leads to more precise answers, improved management of complex problems, and a more seamless experience for users.

RAG with GPT 5


RAG vs Fine-Tuning & Prompt Engineering

While fine-tuning and prompt engineering offer great benefits, they do come with certain limitations:

  • Fine-tuning: Adjusting the model takes time and effort, especially when new data comes in, making it a demanding process.
  • Prompt engineering: Can refine outputs, but it doesn’t provide access to new information.

RAG addresses both challenges by pulling in relevant data during inference; there’s no need for retraining since you simply update the data source instead of the model. Our responses are rooted in the current context, and the system adapts to your data seamlessly through intelligent chunking and indexing.


Building an Enterprise-Ready RAG Architecture

Essential Elements of a RAG Pipeline

  • Gathering knowledge – Bring together internal and external documents such as PDFs, wiki articles, support tickets, and research papers. Refine and enhance the data to guarantee its quality.
  • Transforming documents into vector embeddings – Use models such as Clarifai’s Text Embeddings or Mistral’s embed-large. Keep them organized in a vector database. Fine-tune chunk sizes and embedding model settings to balance efficiency and retrieval precision.
  • Retriever – When a question comes in, transform it into a vector and look through the index. Utilize approximate nearest neighbor algorithms to enhance speed. Combine semantic and keyword retrieval to enhance accuracy.
  • Generator (GPT-5) – Create a prompt that incorporates the user’s question, relevant context, and directives like “respond using the given information and reference your sources.” Utilize Clarifai’s compute orchestration to access GPT-5 through their API, ensuring effective load balancing and scalability. With Clarifai’s local runners, you can seamlessly run inference right within your own infrastructure, ensuring privacy and control.
  • Evaluation – After generating the output, format it properly, include citations, and assess results using metrics such as recall@k and ROUGE. Establish feedback loops to continuously enhance retrieval and generation.

Architectural Patterns

  • Simple RAG – Retriever gathers the top-k documents, GPT-5 crafts the response.
  • RAG with Memory – Adds session-level memory, recalling past queries and responses for improved continuity.
  • Branched RAG – Breaks queries into sub-queries, handled by different retrievers, then merged.
  • HyDe (Hypothetical Document Embedding) – Creates a synthetic document tailored to the query before retrieval.
  • Multi-hop RAG – Multi-stage retrieval for deep reasoning tasks.
  • RAG with Feedback Loops – Incorporates user/system feedback to improve accuracy over time.
  • Agentic RAG – Combines RAG with self-sufficient agents capable of planning and executing tasks.
  • Hybrid RAG Models – Blend structured and unstructured data sources (SQL tables, PDFs, APIs, etc.).


Deployment Challenges & Best Practices

Rolling out RAG at scale introduces new challenges:

  • Retrieval Latency – Enhance your vector DB, store frequent queries, precompute embeddings.
  • Indexing and Storage – Use domain-specific embedding models, remove irrelevant content, chunk documents smartly.
  • Keeping Data Fresh – Streamline ingestion and schedule regular re-indexing.
  • Modular Design – Separate retriever, generator, and orchestration logic for easier updates/debugging.

Platforms to consider: NVIDIA NeMo Retriever, AWS RAG solutions, LangChain, Clarifai.


Use Cases: How RAG + GPT-5 Transforms Business Workflows

Customer Support & Enterprise Search

RAG empowers support agents and chatbots to access relevant information from manuals, troubleshooting guides, and ticket histories, providing immediate, context-sensitive responses. When companies blend the conversational strengths of GPT-5 with retrieval, they can:

  • Respond faster
  • Provide reliable information
  • Boost customer satisfaction

Contract Analysis & Legal Q&A

Contracts can be complex and usually hold important responsibilities. RAG can:

  • Review clauses
  • Outline obligations
  • Offer insights based on the expertise of legal professionals

It doesn’t just depend on the LLM’s training data; it also taps into trusted legal databases and internal resources.

Financial Reporting & Market Intelligence

Analysts dedicate countless hours to reviewing earnings reports, regulatory filings, and news updates. RAG pipelines can pull in these documents and distill them into concise summaries, offering:

  • Fresh insights
  • Evaluations of potential risks

Human Resources and Onboarding Support Specialists

RAG chatbots can access information from employee handbooks, training manuals, and compliance documents, enabling them to provide accurate answers to queries. This:

  • Lightens the load for HR teams
  • Enhances the employee experience

IT Support & Product Documentation

RAG simplifies the search and summarization processes, offering:

  • Clear instructions
  • Useful log snippets

It can process developer documentation and API references to provide accurate answers or helpful code snippets.

Research & Development

RAG’s multi-hop architecture enables deeper insights by connecting sources together.

Example: In the pharmaceutical field, a RAG system can gather clinical trial results and provide a summary of side-effect profiles.

Healthcare & Life Sciences

In healthcare, accuracy is critical.

  • A doctor might turn to GPT-5 to ask about the latest treatment protocol for a rare disease.
  • The RAG system then pulls in recent studies and official guidelines, ensuring the response is based on the most up-to-date evidence.

RAG with GPT 5


Building a Foundation of Trust and Compliance

Ensuring the Integrity and Reliability of Data

The quality, organization, and ease of access to your knowledge base directly affects RAG performance. Experts stress that strong data governance — including curation, structuring, and accessibility — is crucial.

This includes:

  • Refining content: Eliminate outdated, contradictory, or low-quality data. Keep a single reliable source of truth.
  • Organizing: Add metadata, break documents into meaningful sections, label with categories.
  • Accessibility: Ensure retrieval systems can securely access data. Identify documents needing special permissions or encryption.

Vector-based RAG uses embedding models with vector databases, while graph-based RAG employs graph databases to capture connections between entities.

  • Vector-based: efficient similarity search.
  • Graph-based: more interpretability, but often requires more complex queries.

Privacy, Security & Compliance

RAG pipelines handle sensitive information. To comply with regulations like GDPR, HIPAA, and CCPA, organizations should:

  • Implement secure enclaves and access controls: Encrypt embeddings and documents, restrict access by user roles.
  • Remove personal identifiers: Use anonymization or pseudonyms before indexing.
  • Introduce audit logs: Track which documents are accessed and used in each response for compliance checks and user trust.
  • Include references: Always cite sources to ensure transparency and allow users to verify results.

Reducing Hallucinations

Even with retrieval, mismatches can occur. To reduce them:

  • Reliable knowledge base: Focus on trusted sources.
  • Monitor retrieval & generation: Use metrics like precision and recall to measure how retrieved content affects output quality.
  • User feedback: Gather and apply user insights to refine retrieval strategies.

By implementing these safeguards, RAG systems can remain legally, ethically, and operationally compliant, while still delivering reliable answers.

https://clarifai.com/openai/chat-completion/models/gpt-5


Performance Optimisation: Balancing Latency, Cost & Scale

Latency Reduction

To improve RAG response speeds:

  • Enhance your vector database by implementing approximate nearest neighbour (ANN) algorithms, simplifying vector dimensions, and choosing the best-fit index types (e.g., IVF or HNSW) for faster searches.
  • Precompute and store embeddings for FAQs and high-traffic queries. With Clarifai’s local runners, you can cache models near the application layer, reducing network latency.
  • Parallel retrieval: Use branched or multi-hop RAG to handle sub-queries simultaneously.

Managing Costs

Balance cost and accuracy by:

  • Chunking thoughtfully:
    • Small chunks → better memory retention, but more tokens (higher cost).
    • Large chunks → fewer tokens, but risk missing details.
  • Batch retrieval/inference requests to reduce overhead.
  • Hybrid approach: Use extended context windows for simple queries and retrieval-augmented generation for complex or critical ones.
  • Monitor token usage: Track per-1K token costs and adjust retrieval settings as needed.

Scaling Considerations

For scaling enterprise RAG:

  • Infrastructure: Use multi-GPU setups, auto-scaling, and distributed vector databases to handle high volumes.
    • Clarifai’s compute orchestration simplifies scaling across nodes.
  • Streamlined indexing: Automate knowledge base updates to stay fresh while reducing manual work.
  • Evaluation loops: Continuously assess retrieval and generation quality to spot drifts and adjust models or data sources accordingly.

RAG vs Long-Context LLMs

Some argue that long-context LLMs might replace RAG. Research shows otherwise:

  • Retrieval improves accuracy even with large-context models.
  • Long-context LLMs often face issues like “lost in the middle” when handling very large windows.
  • Cost factor: RAG is more efficient by narrowing focus only to relevant documents, whereas long-context LLMs must process the entire prompt, driving up computation costs.

Hybrid approach: Direct queries to the best option — long-context LLMs when feasible, RAG when precision and efficiency matter most. This way, organizations get the best of both worlds.

 


Future Trends: Agentic & Multimodal RAG

Agentic RAG

Agentic RAG combines retrieval with autonomous intelligent agents that can plan and act independently. These agents can:

  • Connect with tools (APIs, databases)
  • Handle complex questions
  • Perform multi-step tasks (e.g., scheduling meetings, updating records)

Example: An enterprise assistant could:

  1. Pull up company travel policies
  2. Find available flights
  3. Book a trip — all automatically

Thanks to GPT-5’s reasoning and memory, agentic RAG can execute complex workflows end-to-end.

Multi-Modal and Hybrid RAG

Future RAG systems will handle not just text but also images, videos, audio, and structured data.

  • Multi-modal embeddings capture relationships across content types, making it easy to find diagrams, charts, or code snippets.
  • Hybrid RAG models combine structured data (SQL, spreadsheets) with unstructured sources (PDFs, emails, documents) for well-rounded answers.

Clarifai’s multimodal pipeline enables indexing and searching across text, images, and audio, making multi-modal RAG practical and enterprise-ready.

Generative Retrieval & Self-Updating Knowledge Bases

Recent research highlights generative retrieval (HyDe), where the model creates hypothetical context to improve retrieval.

With continuous ingestion pipelines and automatic retraining, RAG systems can:

  • Keep knowledge bases fresh and updated
  • Require minimal manual intervention

GPT-5’s retrieval APIs and plugin ecosystem simplify connections to external sources, enabling near-instantaneous updates.


Ethical & Governance Evolutions

As RAG adoption grows, regulatory bodies will enforce rules on:

  • Transparency in retrieval
  • Proper citation of sources
  • Responsible data usage

Organizations must:

  • Build systems that meet today’s regulations
  • Anticipate future governance requirements
  • Enhance governance for agentic and multi-modal RAG to protect sensitive data and ensure fair outputs


Step-by-Step RAG + GPT-5 Implementation Guide

1. Establish Goals & Measure Success

  • Identify challenges (e.g., cut support ticket time in half, improve compliance review accuracy).
  • Define metrics: accuracy, speed, cost per query, user satisfaction.
  • Run baseline measurements with current systems.

2. Gather & Prepare Data

  • Gather internal wikis, manuals, research papers, chat logs, web pages.
  • Clean data: remove duplicates, fix errors, protect sensitive info.
  • Add metadata (source, date, tags).
  • Use Clarifai’s data prep tools or custom scripts.
  • For unstructured formats (PDFs, images) → use OCR to extract content.

3. Select an Embedding Model and Vector Database

  • Pick an embedding model (e.g., OpenAI, Mistral, Cohere, Clarifai) and test performance on sample data.
  • Choose a vector database (Pinecone, Weaviate, FAISS) based on features, pricing, ease of setup.
  • Break documents into chunks, store embeddings, adjust chunk sizes for retrieval accuracy.

4. Build the Retrieval Component

  • Convert queries into vectors → search the database.
  • Set top-k documents to retrieve (balance recall vs. cost).
  • Use a mix of dense + sparse search methods for best results.

5. Create the Prompt Template

Example prompt structure:

You’re a helpful companion with a wealth of information. Refer to the information provided below to address the user’s inquiry. Please reference the document sources using square brackets. If you can’t find the answer in the context, just say “I don’t know.”

User Inquiry:

Background:

Response:

This encourages GPT-5 to stick to retrieved context and cite sources.
Use Clarifai’s prompt management tools to version and optimize prompts.

6. Connect with GPT-5 through Clarifai’s API

  • Use Clarifai’s compute orchestration or local runner to send prompts securely.
  • Local runner: keeps data safe within your infrastructure.
  • Orchestration layer: auto-scales across servers.
  • Process responses → extract answers + sources → deliver via UI or API.

7. Evaluate & Monitor

  • Monitor metrics: accuracy, precision/recall, latency, cost.
  • Collect user feedback for corrections and improvements.
  • Refresh indexing and tune retrieval regularly.
  • Run A/B tests on RAG setups (e.g., simple vs. branched RAG).

8. Iterate & Expand

  • Start small with a focused domain.
  • Expand into new areas over time.
  • Experiment with HyDe, agentic RAG, multi-modal RAG.
  • Keep refining prompts and retrieval strategies based on feedback + metrics.

Frequently Asked Questions (FAQ)

Q: How do RAG and fine-tuning differ?

  • Fine-tuning → retrains on domain-specific data (high accuracy, but costly and rigid).
  • RAG → retrieves documents in real-time (no retraining needed, cheaper, always current).

Q: Could GPT-5’s large context window make RAG unnecessary?

  • No. Long-context models still degrade with large inputs.
  • RAG selectively pulls only relevant context, reducing cost and boosting precision.
  • Hybrid approaches combine both.

 

Q: Is a vector database necessary?

  • Yes. Vector search enables fast, accurate retrieval.
  • Without it → slower and less precise lookups.
  • Popular options: Pinecone, Weaviate, Clarifai’s vector search API.

Q: How can hallucinations be reduced?

  • Strong knowledge base
  • Clear instructions (cite sources, no assumptions)
  • Monitor retrieval + generation quality
  • Tune retrieval parameters and incorporate user feedback

Q: Can RAG work in regulated or sensitive industries?

  • Yes, with care.
  • Use strong governance (curation, access control, audit logs).
  • Deploy with local runners or secure enclaves.
  • Ensure compliance with GDPR, HIPAA.

Q: Can Clarifai connect with RAG?

  • Absolutely.
  • Clarifai offers:
    • Compute orchestration
    • Vector search
    • Embedding models
    • Local runners
  • Making it easy to build, deploy, and monitor RAG pipelines.

RAG with GPT 5

Final Thoughts

Retrieval-Augmented Generation (RAG) is no longer experimental — it is now a cornerstone of enterprise AI.

By combining GPT-5’s reasoning power with dynamic retrieval, organizations can:

  • Deliver precise, context-aware answers
  • Minimize hallucinations
  • Stay aligned with fast-moving information flows

From customer support to financial reviews, from legal compliance to healthcare, RAG provides a scalable, trustworthy, and cost-effective framework.

Building an effective pipeline requires:

  • Strong data governance
  • Careful architecture design
  • Focus on performance optimization
  • Strict compliance measures

Looking ahead:

  • Agentic RAG and multimodal RAG will further expand capabilities
  • Platforms like Clarifai simplify adoption and scaling

 By adopting RAG today, enterprises can future-proof workflows and fully unlock the potential of GPT-5.

 



Top GPT-5 Applications for Enterprises & Developers


Introduction: A New Era of Intelligent Work

Generative AI has become a common tool in boardrooms and back offices since OpenAI’s GPT-4 came out. But the fact that GPT-5 will come out in August 2025 is more than just a small update. This means that the architecture is now unified and can switch between quick responses to conversations and complex analytical thinking without needing help from people. GPT-5 is more like a PhD-level expert than a chat assistant because it has longer context windows, can take in more than one type of input, has permanent memory, and has a lot lower hallucination rates.

This article talks about how GPT-5 changes the way businesses work, what it can do for certain industries, the risks it poses, and how businesses can use Clarifai’s platform to bring together different types of AI solutions. We’ll also give you a plan for how to use your model over the next 90 days, show you how it compares to those of your competitors, and give you a look at what AI will be like in the future.


Quick Summary

  • Unified Model: The Unified Model takes parts of different models and puts them together into a two-mode system: quick Chat and deep Thinking. A router that works in real time looks at each prompt.
  • Larger Context and Multimodality: GPT-5 Pro can handle up to 272,000 tokens and can natively handle text, photos, audio, and soon video.
  • Fewer Hallucinations: The number of mistakes is about 45% lower than GPT-4o, and the number of hallucinations is about 4.8%.
  • Enterprise Use Cases: Businesses can use GPT-5’s reasoning and ability to remember context to do things like agentic coding, making marketing materials, predicting finances, analysing the law, optimising operations, and helping customers.
  • Clarifai Integration: Businesses can use Clarifai’s compute orchestration, model runner, and vector search features to connect GPT-5 with powerful computer vision models. This makes sure that AI pipelines can handle a lot of different kinds of data and are safe.
  • Risks & Ethics: Prompt injection, obfuscation attacks, and hallucinations that last for a long time all need strong protections.

Now that we have these main points in mind, let’s look more closely at GPT-5.


Understanding GPT-5’s Unified Architecture and Key Features

The End of the Model Zoo

ChatGPT used to make users choose between different models, such as GPT-4o for multimodality, o-series models for reasoning, and micro models for cost-effectiveness. GPT-5 makes this less confusing by giving you one system.

A smart router is at the centre of it all. It looks at the complexity and purpose of each prompt and then sends it down one of two paths:

  • GPT-5 Chat is a simple and quick way to talk to people and get things done.
  • GPT-5 Thinking is a way of reasoning that takes a lot of resources and is only used in very hard situations, like high-stakes research, advanced coding, or multi-step analysis.

Users don’t have to choose models by hand anymore. The system automatically sends the suggestion to the right place, which makes it easier to use and encourages people to use it regularly. Baytech Consulting says this architecture not only makes things easier for users, but it also fits with OpenAI’s strategy of product-led growth, which pushes people to move up to higher levels.


Dramatic Improvements Over GPT-4o & 4.5

Reasoning and Hallucination Reduction

It’s not an exaggeration to say that GPT-5 is a “PhD-level expert.” Baytech says that the model gets all of the AIME 2025 maths problems right and 89.4% of the PhD-level science problems right, which cut hallucinations down to about 4.8%. GPT-5 is great for high-stakes areas because it makes hallucinations go down even more when you focus hard. This progress is because of integrated chain-of-thought thinking, which helps the model break problems down into steps that make sense.

Longer Context and Multimodality

GPT-5 has a context window of 256,000 tokens in regular models and 272,000 tokens in GPT-5 Pro. This lets it look through whole papers or conversations with more than one thread. It is totally multimodal. It can handle text, pictures, sound, and soon video all at the same time. For example, a doctor can upload a scan and notes about the patient, and GPT-5 will show them the data for the first time.

Persistent Memory and Personalisation

Earlier versions of GPT lost track of things after a few messages, but GPT-5 has persistent memory that keeps track of user preferences, conversation history, and other session details from one session to the next. This lets the system give personalised answers, change the tone and vocabulary based on user profiles, and pick up projects where they left off.

Auto-routing and Model Tiers

A real-time router is used by auto-routing to send prompts to the right mode. There are also different API sizes (normal GPT-5, small, nano) and pricing levels (Free, Plus, Pro, Team). These create cost and performance trade-offs. Companies can choose the small model for real-time tasks and the Pro version for mission-critical analysis.

GPT 5 Applications


Transforming Enterprise Workflows – GPT-5 Use Cases by Function

Engineering: From Assistant to Autonomous Agent

With GPT-5’s agentic coding, engineering teams might stop writing code that doesn’t need to be done and start solving problems in a more planned way. At the launch event, OpenAI showed off vibe coding, which is a type of coding where with just one question, a whole French language learning program with games and a way to keep track of progress was made.

What GPT-5 can do:

  • Write code and apps that are ready for production using natural language descriptions.
  • Check out multi-repo architecture reviews to learn about security, scalability, and risks.
  • Fix old code, lower technical debt, and write documentation.
  • Write unit tests, fix bugs, and connect with tools like GitHub Copilot and Azure AI Foundry to make agents that work from beginning to end.

Marketing: Hyper-Personalisation and End-to-End Campaign Creation

GPT-5’s ability to combine data and make content on a large scale changes marketing:

  • Hyper-personalized content: Connect GPT-5 to CRM data to make emails, landing pages, and ad copy that are unique to each customer.
  • Automated campaign kits: Tell GPT-5 to make full content packages that follow brand rules. These packages could include press releases, social media posts, email sequences, and blog drafts.
  • Deep market research synthesis: The model can look through hundreds of sources, such as market reports, competitor websites, and academic papers, and come up with strategic insights in just a few hours.

Sales: Strategic Account Planning

Sales teams can use GPT-5 to make plans for strategic accounts by looking at meeting notes, CRM notes, and stats on how often people use their products. “Make a strategic account plan for [customer] that lists goals, risks, opportunities, and next steps” is an example of a prompt that will give you a full plan that works with your sales goals.

Finance: Forecasting, Modelling and Due Diligence

GPT-5 is great for making plans and looking at money. Hebbia has shown that GPT-5 can read SEC filings and virtual data rooms to make full three-statement models, do multi-variable projections, and make scenario assessments.

Finance teams can:

  • Automate due diligence by adding up hundreds of pages of documents and highlighting risk factors and important metrics.
  • Do thorough analyses of differences and come up with ideas for how to lessen them that can be put into action.
  • Combine streams of real-time data to give you the most up-to-date market information and risk assessments.

Operations & Process Optimisation

Operations managers can use GPT-5 to make processes better by entering performance data and SOPs. The AI finds problems, comes up with solutions, and makes plans for how to fix them. It can also work with other agents to set up tasks, keep track of resources, and send alerts when things don’t go as planned.

Customer Support & Service Automation

GPT-5 is great for helping customers because it can handle many languages, modes, and tasks at the same time. When you add it to platforms like WorkBot, it can:

  • Give answers that are very relevant to the situation without forgetting what has been said before.
  • Handle text, audio, pictures, and documents all at once. This speeds up resolution time and makes agents’ jobs easier.
  • Answer questions in a way that fits the brand and solves problems before they happen.

Human Resources & Talent Management

Even though the sources don’t say so, GPT-5 can write job descriptions, screen resumes, make training programs, and analyse employee comments to find out how they feel. It can keep track of professional success and make personalised growth plans because it has a long memory and context window that doesn’t go away.


Sector-Specific Applications – Healthcare, Finance, Legal and Education

Healthcare: Enhancing Patient Understanding and Clinical Workflows

GPT-5 is helpful because it doesn’t hallucinate very often and knows more about medicine. During OpenAI’s presentation, a patient named Carolina used GPT-5 to make sense of a complicated biopsy result. This gave her more power over her treatment options.

Some important uses are:

  • Teaching patients: GPT-5 uses simple language to explain diagnoses, lab results, and treatment options, and it tells patients to talk to their doctors to make sure they are right.
  • Clinical research: Researchers can quickly put together a lot of information, look for patterns, and plan studies.
  • Multimodal diagnostics: Doctors can upload both images and text notes at the same time for a more complete first look.

Finance & Banking: Automating Analysis and Decision-Making

Companies that provide financial services were among the first to use GPT-5. Here are some ways to use it:

  • Automated financial modelling: GPT-5 makes three-statement models from unstructured data that are very accurate.
  • Scenario forecasting: It looks at a number of plans and makes changes based on how the market is doing and how well the company is doing.
  • Finding fraud and risk: The model looks at how people usually act during transactions and points out any strange behaviour.
  • Real-time updates on the market: GPT-5 can give you timely investment advice and risk alerts because it gets data in real time.

Legal & Compliance: Contract Analysis and Research

Legal teams can have GPT-5 look at the same document over and over again:

  • Contract analysis: The AI finds important terms, points out mistakes, and suggests changes.
  • Compliance monitoring: Checks to make sure that the company’s policies follow local laws and finds problems with the rules.
  • Case research: GPT-5 helps with making decisions by searching through legal databases and summarising past cases.

Education & Research: Personalised Learning and Knowledge Synthesis

Teachers and researchers can take advantage of GPT-5’s adaptive tutoring and research help:

  • Create personalised learning paths that explain things at different levels of difficulty.
  • Write a summary of what other researchers have said, compare their methods, and come up with ideas for experiments.
  • Help with multilingual support for research projects that include people from other countries.

GPT 5 Applications


Agentic AI and Multi-Agent Collaboration

Chatbots can only handle one request at a time, but GPT-5 adds the ability for multiple agents to work together. Different agents can work together on hard tasks and focus on one thing at a time.

For instance:

  • A Research Agent gets information from sources that can be trusted.
  • An Analysis Agent looks at the data to find patterns and useful information that can be used.
  • A Writing Agent writes well-written content that is right for the audience.

These agents work together perfectly, so you can do things like market research, write reports, and make product documentation in minutes instead of days. Lasting memory lets them keep the same context from one session to the next.

From the standpoint of implementation, API parameters such as reasoning_effort and verbosity let developers change how deep and detailed responses are. Also, GPT-5’s free-form function calling can talk to custom tools and old systems. Text-based instructions are great for using software that is only available to you.

These features make it possible for agentic coding, where GPT-5 does hard work on its own and sends the results to other agents or systems. Platforms like Azure AI Foundry and Clarifai manage all of this.


Strategic Adoption & Implementation: Selecting the Right Model and Building Pilots

Choosing the Right GPT-5 Model

OpenAI has a number of models that are good for different things:

  • GPT-5 Pro (ChatGPT – 272,000 tokens): More thinking, most correct; unlimited use of mission-critical features in a Pro subscription. Study, deep analytics, long talks that you have to pay for.
  • GPT-5 (Standard/API – 256k tokens): The best model for advanced analytics, agentic processes, and complex code. $1.25 in (input) and $10.00 out (output).
  • GPT-5 mini: Fast and cheap; better performance in real time than the competition. Apps, customer-facing representatives. $0.25 / $2.00.
  • GPT-5 nano: Minimal context. Ultra-low latency, large volume. Classification, Q&A, fine-tuning targets. $0.05 / $0.40.

Businesses should look at how much they can afford, how long it will take to get things done, and how hard it will be to do. If you use GPT-5 Pro for important tasks and GPT-5 small for less important ones, you can save money.


A 90-Day Integration Plan

Baytech’s strategic roadmap is a helpful guide for how to adopt:

  • Educate & Evangelise (Days 1–30): Hand out learning materials and hold workshops. Talk about what GPT-5 can and can’t do, with a focus on human control.
  • Identify Low-Risk, High-Impact Pilots (Days 31–60): Choose internal projects that are low risk and have a clear ROI. For instance, automating variance analysis, writing marketing emails, or making unit tests. Adjust reasoning_effort and verbosity.
  • Evaluate and Measure (Days 61–90): Set up measurements like efficiency gains (time saved per task) and business output uplift (more leads, quicker resolutions). Use results to argue for more widespread use.

Also, create internal data models and knowledge graphs to help agentic AI understand. This ensures GPT-5 can get to structured information, allowing more correct logic.


Risks, Limitations and Ethical Considerations

Prompt Injection and Obfuscation Attacks

Even though safety has gotten better, the problem of prompt injection still needs to be fixed. Baytech says that tests were able to change GPT-5 through attacks that are hard to figure out. The Techzine report says that the red-teaming group SPLX easily did an obfuscation attack, which hid bad instructions in harmless inputs.

The risk is higher because GPT-5 can act as an agent. If there aren’t strong rules in place, the model might follow bad instructions. Companies need to secure systems externally, clean inputs, and monitor continuously.

Residual Hallucinations and Reasoning Slips

Even a “PhD-level” model can make mistakes. During the launch demo, GPT-5 did a wrong decimal subtraction that shows a flaw in logic. Its lower but still present hallucination rate shows that human-in-the-loop verification is needed for critical business, legal, or medical use cases.

Safety Testing and Compliance

Microsoft’s AI Red Team ran a lot of tests on GPT-5 and found it had one of the best safety records among OpenAI models. But caution is still warranted. Companies should use compliance checks, audit trails, and role-based access. Bias detection and fact-checking tools help, but they don’t replace human judgement.

User Experience and Perception

At first, not everyone liked GPT-5. Techzine says some people didn’t like GPT-4o’s conversational style. GPT-5 sounded more businesslike, so OpenAI brought back GPT-4o for paying users.

Lesson: User experience matters. Offer model or style options in your AI products to meet diverse needs.

GPT 5 Applications


Competitive Landscape and Alternatives

Benchmarking Against Competitors

OpenAI’s GPT-5 enters a crowded market that includes Google’s Gemini 2.5 Pro, Anthropic’s Claude Opus 4, and xAI’s Grok. Baytech’s benchmark study shows GPT-5 is slightly better than Gemini 2.5 Pro in the Artificial Analysis Intelligence Index (69 vs. 65), and on MMLU-Pro (87% vs. 86%) and GPQA Diamond (85% vs. 84%). But other models may be better at creative tasks.

Pricing Strategy and Market Pressure

OpenAI’s two-part plan puts market pressure on rivals. GPT-5 Pro is for high-end enterprise tasks, while GPT-5 small offers similar functions at lower cost. This forces competitors to cut prices or improve capabilities.

Open-Weight Models: GPT-OSS and Beyond

Three days before GPT-5 launched, OpenAI released gpt-oss-120b and gpt-oss-20b under the Apache 2.0 license. These can run on local hardware, letting businesses keep sensitive data on-site while benefiting from OpenAI’s design. This hybrid approach suits regulated industries.

Real-World Adoption

Case studies show early success:

  • PwC: Launched ChatGPT Enterprise with secure identity management in UK & US.
  • Motor Oil Group and Physics Wallah: Use GPT-5 via Azure OpenAI Service.
  • Figma & Expedia: Integrated GPT-5 into design and travel workflows.

GPT 5 Applications


Future Trends and What’s Next

The Rise of AI Time

The launch of GPT-5 is what Forbes calls a “quadruple play” that accelerates innovation cycles and forces companies to operate on AI Time. OpenAI open-sourced small models, advanced its frontier model, served consumer + enterprise users, and provided models to governments — a multi-front strategy never seen before.

Multi-Agent Ecosystems and Hybrid Deployment

We can expect multi-agent frameworks to become standard, combining agents for research, writing, vision, and action. Hybrid strategies will mix open-source local models with proprietary cloud models, giving regulated sectors flexibility. Tools like Clarifai’s orchestrator and Azure AI Foundry will be central to ecosystem management.

Responsible AI and Regulation

As AI expands, governments enact rules such as the EU AI Act. Future models must embed safeguards, transparency, and user control. Research will focus on reducing prompt injection and hallucinations, while enterprises must invest in AI governance frameworks.

GPT-6 and Beyond

If GPT-5 unified models, GPT-6 may focus on real-time adaptability, longer memory, cross-modal synthesis, and embodied agents. Research aims to reduce hallucinations further, improve reasoning speed, and integrate cross-domain knowledge.

Preparing Your Business

Business leaders must prepare for continuous learning and adaptation. Build AI fluency, invest in data quality, and set up AI ethics committees. Clarifai’s platform bridges proprietary + open-source ecosystems, keeping businesses agile and compliant.


FAQs

How does GPT-5 differ from GPT-4?
GPT-5 has one architecture with quick Chat and deep Thinking modes, a longer context window (272k tokens), multimodal support, and a lower hallucination rate.

Which types of businesses benefit most from GPT-5?
Healthcare, finance, law, education, marketing, engineering, and customer service. Especially for regulated/high-stakes domains.

What GPT-5 model fits my business?
Use mini for cheap/fast tasks, standard for analytics, Pro for mission-critical workloads. Many use a mix.

How do I link GPT-5 with existing systems?
Use the OpenAI API, or platforms like Azure AI Foundry and Clarifai’s orchestrator. Tune with reasoning_effort and verbosity.

Is GPT-5 safe and trustworthy?
Safer than before, but still vulnerable to prompt injection and obfuscation. Needs human oversight, input sanitisation, external guardrails. Microsoft’s AI Red Team shows it has a strong safety profile.

What does Clarifai do to improve GPT-5 installations?
Combines GPT-5 with vision models, builds multimodal pipelines, deploys open-weight models on-prem. Offers flexibility, compliance, performance.

What’s next for AI?
Expect multi-agent ecosystems, hybrid deployments, stronger regulations, GPT-6 innovation.


Conclusion

GPT-5 is a big step forward for AI in business. By putting models together in one system, it opens new levels of automation and understanding by balancing speed and reasoning, expanding context and modality, and reducing hallucinations.

But it’s not easy to use GPT-5 — companies must navigate pricing, integration, risks, governance.
Clarifai is pivotal for multimodal workflows, letting businesses use GPT-5’s power while meeting privacy and compliance rules.

As we enter AI Time, the winners will be those who combine technology with strong governance, continuous learning, and flawless execution.



GPT-5, Google DeepMind Genie 3, Cloudflare vs. Perplexity, OpenAI’s Open Source Models, Claude 4.1 & New Data on AI Layoffs


GPT-5 finally landed, and the hype was matched with backlash. In this episode, Paul and Mike share their takeaways from the new model, provide insights into the gravity of DeepMind’s photorealistic Genie 3 world-model, unravel Perplexity’s stealth crawling controversy, touch on OpenAI’s open-weight release and rumored $500 billion valuation, and more in our rapid-fire section. 

Listen or watch below—and see below for show notes and the transcript.

Listen Now

Watch the Video

Timestamps

00:00:00 — Intro

00:04:57 — GPT-5 Launch and First Reactions

00:25:29 — DeepMind’s Genie 3 World Model

00:32:20 — Perplexity vs. Cloudflare Crawling Dispute

00:37:37 — OpenAI Returns to Open Weights

00:41:21 — OpenAI $500B Secondary Talks

00:44:26 — Anthropic Claude Opus 4.1 and System Prompt Update

00:49:57 — AI and the Future of Work

00:56:02 — OpenAI “Universal Verifiers”

01:00:42 — OpenAI Offers ChatGPT to the Federal Workforce

01:02:59 — ElevenLabs Launches AI Music

01:05:32 — Meta Buys AI Audio Startup

01:09:46 — Google AI Pro for Students

Summary:

GPT-5 Launch and Initial Reactions

OpenAI has unveiled GPT-5, calling it its smartest, fastest, and most useful model yet.

It’s the first “unified” system from the company, combining quick-response chat with deeper reasoning when needed. You don’t need to tweak any settings. Instead GPT-5 will route your requests to the right type of model for the job, depending on if it needs to think for longer or act fast.

The company says it outperforms earlier versions in coding, writing, health advice, and multimodal reasoning, with big reductions in hallucinations and a more honest approach when tasks can’t be completed. 

It also has a context window of 400,000 tokens and 128,000 max output tokens. And OpenAI notes that it has significantly fewer hallucinations and is about 45% less likely to contain factual errors than GPT-4o.

For coders, GPT-5 can spin up full apps from a single prompt, with better design sensibility and debugging skills. For health, it’s far less error-prone and more proactive about flagging issues, though it’s still no substitute for a doctor. Creative work also gets a lift, with more nuanced writing and “better taste” in design.

The launch includes GPT-5 Pro for extended reasoning, new preset personalities that change how the model responds, and API access in three sizes. Free users now get GPT-5 as the default, while Plus and Pro subscribers get higher limits and Pro access.

DeepMind’s Genie 3 World Model 

Google DeepMind has unveiled Genie 3, a breakthrough “world model” that can generate fully interactive, photorealistic environments in real time. Unlike earlier versions, Genie 3 can render at 24 frames per second, maintain visual and physical consistency for minutes at a time, and respond instantly to both navigation and text-based prompts.

The model can simulate anything from volcanic landscapes to enchanted forests, or recreate historical sites like ancient Athens—all based on a short description. Worlds evolve dynamically as you explore, and “promptable world events” let users change conditions on the fly, from altering weather to adding new objects.

This realism isn’t just for show. DeepMind sees world models as a key step toward AGI, offering limitless training grounds for AI agents to learn and adapt. Genie 3’s long-horizon consistency means agents can now tackle multi-step goals, opening the door for complex simulations in robotics, education, and science.

Still, the tech has limits: short interaction durations, constrained actions, and challenges with simulating multiple agents or perfectly accurate real-world locations. For now, it’s in a limited research preview, but DeepMind calls it a “significant moment” in the evolution of generative environments.

Perplexity v. Cloudflare Crawling Dispute

Cloudflare says AI search startup Perplexity has been disguising its web crawlers to bypass site blocks, a practice known as “stealth crawling.”

According to Cloudflare, when Perplexity’s bots hit a robots.txt rule or a firewall block, they sometimes swap their identity from “PerplexityBot” to something like “Google Chrome on macOS,” and rotate IP addresses that aren’t on its official list. 

Cloudflare says the company also changes its network identifiers to dodge detection, a tactic it claims has been used across tens of thousands of domains, making millions of requests each day. Perplexity is pushing back hard against Cloudflare’s claims.

In a detailed rebuttal, Perplexity denies intentional wrongdoing, calling Cloudflare’s post a “publicity stunt” and saying the company mixed up legitimate, user-triggered requests with bot activity, and even confused some of it with unrelated traffic from a tool called BrowserBase.

According to Perplexity, its AI assistants aren’t traditional web crawlers. They don’t systematically scrape and store the internet. Instead, they fetch specific pages in real time when a user asks a question, use that content to answer, and discard it with no training or long-term storage.

Perplexity argues this is no different from a browser or email client fetching a page on a user’s behalf, and warns that labeling such requests as “malicious” risks breaking legitimate tools and creating a two-tier internet where access depends on infrastructure gatekeepers.

Cloudflare has now delisted Perplexity as a verified bot and rolled out new methods to block its crawlers.


This episode is brought to you by our Academy 3.0 Launch Event.

Join Paul Roetzer and the SmarterX team on August 19 at 12pm ET for the launch of AI Academy 3.0 by SmarterX —your gateway to personalized AI learning for professionals and teams. Discover our new on-demand courses, live classes, certifications, and a smarter way to master AI. Register here.


This week’s episode is also brought to you by Intro to AI, our free, virtual monthly class, streaming live on Aug. 14 at 12 p.m. ET. Reserve your seat AND attend for a chance to win a 12-month AI Mastery Membership

For more information on Intro to AI and to register for this month’s class, visit www.marketingaiinstitute.com/intro-to-ai.

Read the Transcription

Disclaimer: This transcription was written by AI, thanks to Descript, and has not been edited for content. 

[00:00:00] Paul Roetzer: So the question has always been, does OpenAI have a secret sauce? Is there something they’re doing that was gonna allow them to get that six to 12 month lead over everybody else? The answer is no. Welcome to the Artificial Intelligence Show, the podcast that helps your business grow smarter by making AI approachable and actionable.

[00:00:20] My name is Paul Roetzer. I’m the founder and CEO of SmarterX and Marketing AI Institute, and I’m your host. Each week I’m joined by my co-host and marketing AI Institute Chief Content Officer Mike Kaput, as we break down all the AI news that matters and give you insights and perspectives that you can use to advance your company and your career.

[00:00:41] Join us as we accelerate AI literacy for all.

[00:00:48] Welcome to episode 161 of the Artificial Intelligence Show. I’m your host, Paul Roetzer, along with my co-host Mike Kaput. We are coming to recording, Monday, August 11th at 11:00 [00:01:00] AM ish Eastern Time. Our long awaited GPT-5 has arrived. Our, our team was like messaging us on Friday, like, are we gonna do an emergency podcast in Talk GPT-5?

[00:01:11] And I’m like, you’re gonna get these AI Academy courses finished, or you’re gonna gonna get an emergency podcast. So, Mike and I chose to focus on getting the AI Academy courses ready for launch instead of the emergency pod, but we’ll have plenty to discuss about GPT-5 today. all right, so this episode is brought to us by AI Academy by SmarterX, which I was just talking about.

[00:01:35] We are having our kind of relaunch event, I guess. We first introduced AI Academy in 2020. we have spent the last, almost year now completely re-imagining what academy is, how it functions, the technology behind it, how to infuse AI into it. the overall learner experience, how to build learning journeys, like everything has just been completely, revised, updated, [00:02:00] improved everything.

[00:02:01] And so on. August 19th at noon Eastern time, we will have a launch event. There’s a webinar you can sign up for to hear all about it. We’re gonna go through the vision and roadmap for AI Academy. We’re gonna talk about all the new on-demand courses and professional certificates that we are developing and launching that day.

[00:02:19] A bunch of ’em are coming out that day. We’re gonna talk about the new AI Academy live, which I’m super excited about, which is gonna be a regularly scheduled occurrence where members are actually gonna be able to join in live. talk, you know, not only with Mike and I, but go through deep dives, go through AI transformation, spotlights, book clubs, things like that.

[00:02:36] There’s a new learning management system coming later this year. We’re gonna preview that, how to build personalized learning journeys. We’re gonna talk about new business accounts where, companies, universities, people can come in, get five plus licenses. You get a whole bunch of. features and benefits, specific to those plus dramatically reduced pricing.

[00:02:56] And then we’re gonna have an ask us Anything session with me and Mike and Kathy. [00:03:00] So all kinds of stuff coming out. We have a new AI fundamental series, a third edition of our piloting AI series, a second edition of our Scaling AI series, which I am finalizing and literally between meetings and the podcast today.

[00:03:14] Mike did a new AI for, for professional services. also Mike, created a new AI and marketing series. So all of these are launching along with a bunch of other stuff. So go to SmarterX dot ai at the top of the page, there’s a banner you can click on to register for the webinar, and we’ll also drop that link in.

[00:03:33] So again, that webinar is free and it is happening on August 19th. this episode is also brought to us by Intro to ai. So this is, I have been teaching this class free every month since November or October of 2021. We are having our 50th edition of Intro to ai. this is happening Thursday, August 14th at noon so you can register.

[00:03:57] we’ve had, I think close to [00:04:00] 40,000 people have gone through this class since I started doing it almost four years ago. So it’s a about 30, 35 minutes. I do a live, kind of go through the fundamentals of ai and then we leave the last 25 minutes for questions. We usually get anywhere between 50 and a hundred questions.

[00:04:16] We do our best to answer as many as we can, and then the ones we can’t get to, we then do a, the week later we do an intro to ai, special for the podcast where we go through a bunch of other questions that we got. So, intro to, to ai 50th edition coming, Thursday, August 14th. and then we’ll do a follow-up podcast with some questions we didn’t get to.

[00:04:38] So I’ll put a link to the show notes in the show notes to intro to AI as well. And we will share all of that information. Alright, so, two great live events coming up August 14th and August 19th. Check those out. And now Mike, the long awaited GPT-5. Let’s get into it. 

[00:04:57] GPT-5 Launch and First Reactions

[00:04:57] Mike Kaput: All right, so first [00:05:00] topic, predictably openAI’s has unveiled GPT-5.

[00:05:03] They’re calling it their sma smartest, fastest, and most useful model yet it is the first unified system from the company. It combines quick response chat with deeper reasoning when needed. you don’t really need to tweak any settings. Instead, GPT-5 will route your requests to the right type of model that it deems to be correct for the job, depending if it needs to think for longer or act faster.

[00:05:31] The company says it outperforms earlier versions in coding, writing, health advice and multimodal reasoning. There are big reductions in hallucinations, and it says it has a more honest approach when tasks cannot be completed. It also has a context window of 400,000 tokens and 128,000 max output tokens.

[00:05:51] Now, another note on those hallucinations, openAI’s says it has significantly fewer hallucinations than GPT-4o and is [00:06:00] 45% less likely to contain factual errors compared to GPT-4o. For coders, GPT-5 can spin up full apps from a single prompt. It’s got really good design sensibility and debugging skills for health.

[00:06:12] It is far less error prone and more proactive about flagging issues. And creative work has also gotten a lift with more nuanced writing and better taste in design. Now this launch includes GPT-5 PRO for extended reasoning. There’s new preset personalities that change how the model responds. And API access across three different model sizes.

[00:06:35] Now, free users are now getting GPT-5 as the default, while plus and pro subscribers get higher limits and access to GPT-5 Pro. Now Paul, there’s a lot to unpack. Here’s a few different angles we’re gonna talk about here, but maybe let’s kick off by saying, what are your initial impressions of GT five?

[00:06:56] Paul Roetzer: A lot of my initial impressions come from [00:07:00] curating opinions of other people online who, whom I trust. And I, you know, I’ve read lots of their reviews, I have experiment with it a bit myself. I didn’t have, you know, I was working on the courses all weekend, so I couldn’t like really put it through a bunch of experiments, but I was, you know, dabbling in it.

[00:07:18] so when you follow the people we follow online, they generally were the people who weren’t super happy about this. So I think like, I want to, I wanna, my caveat here is like. It seems like a really good model. It it, it is not this life changing model that we all kind of have been anticipating for like a year and a half now of G PT five.

[00:07:41] It’s always been like, well, once GPT-5 gets here, then everything changes. So I will say, one, as part of the AI Academy, we are introducing a new Gen AI app series, and Mike and I were talking this morning and he’s gonna do a, GPT five review as the first, course in [00:08:00] that series. So we’ll have more to say.

[00:08:01] It’s like a 15, 20 minute product review, basically. So that’ll be dropping next week for academy members. But he, here’s my, my take Mike. I’m, I’m gonna try and like, you can hear a, a lot of, like, here’s the, you know, Ethan Mooch has a bunch of great stuff. Like, Brian Brickman, our friend, like they, there people have done like these great reviews.

[00:08:19] Allie Miller had had it, people who had access to it beforehand. There’s all these great reviews. So I’m gonna give more of like a zoom out, like what’s the impact here. So. First, it is not multimodal from the ground up. So when they say unified model, what they mean is it’s still like four or five different models that are packaged as one thing called GPT-5.

[00:08:42] And then there’s a router that based on your prompt decides which model it’s gonna use. So if it’s gonna use one that has reasoning, if it’s gonna use the traditional chat, if it’s gonna use image generation, video generation, like all that’s not in a single model. so I, you [00:09:00] know, I assume GPT six will be that it’ll be truly multimodal from the ground up.

[00:09:04] as far as I know, they didn’t give any updates on image generation or soa, their video generation as part of this. I think they made some tweaks to voice capabilities, maybe. I think they improved the voice a little bit. so we, on this podcast have for a while talked about the confusion of the model choice and.

[00:09:24] When you would go into ChatGPT last week, there was eight models to choose from. And the point we always made was the average user has no idea what the difference is between those. Oh, you know, 4 0 0 3, mini, like the average user has no idea. and so they would just use whatever default. And so our point was always why for the average user would you make them choose from a list of models that they don’t understand what the difference is.

[00:09:51] And so it would seem that this router is sort of heading in the right direction, but it actually caused chaos because [00:10:00] there is a small fraction of Chad CPT users who do understand what the different models are and have preferred models that they like to use. And what OpenAI did, sort of their first misstep, and we’ll go through a series of missteps that they made in this process, is that they almost just ignored.

[00:10:20] the loudest, the most vocal online users who do actually understand the different models and really liked some of the other models. ’cause what OpenAI did is they turned on g PT five and removed all the other models. And then when the router was doing its work, I go into ChatGPT, I give a prompt, help me write a business plan for this, you know, idea I have.

[00:10:43] I would have no idea which model it was actually using. So there was no transparency into what model was actually being used. And if there was a model, I used to like that. I liked the tone, the personality, the style, the format. It was gone. And so people were pissed by, by like, end of the day Thursday, people are [00:11:00] like, gimme my model back.

[00:11:01] Like, what I want 4.0, like, I like talking to 4.0. And so kind of surprisingly, Mike, it’s like, it’s almost like OpenAI didn’t understand their user base. Like yeah, there was obviously people who wanted that choice. and then there was this other faction of people who obviously were very attached to specific models and almost like emotionally attached to like 4.0.

[00:11:27] and five is a very different personality. it, it responds in like shorter bursts. Like it’s, it doesn’t have, you know, it’s not as like comforting and things like that. Like it’s just missing some of that. So there was one user, and I didn’t know this guy previously on X, but I thought he gave a great synopsis.

[00:11:46] I’ll just read this one. Put the link in. Alistair McClay is his name, and he said, open. I forgot who actually matters. Power users always lead the culture curve. They set the vibes for a product, especially in consumer software. They’re the [00:12:00] loudest most passionate and have the highest expectations. They are your biggest asset as a consumer company, and you need to keep them front of mind at all times.

[00:12:08] With the GPT-5 launch in chat, GPT OpenAI seems to have been so focused on the benefits of their new router could provide. To their less sophisticated users, which automatically switches the underlying model without telling them that they totally overlook the user group. That actually matters the most.

[00:12:25] If you put yourself in the shoes of Chad GPT Power user, it’s blatantly obvious. They will continue to want the ability to hard switch between models. It’s obvious. They will expect transparency in which model is being used by the router at any point in any time. And most important of all, it’s obvious.

[00:12:40] They will expect to have a reasonable notice period before their existing models deprecated. The response we saw was inevitable. The power users who make up the majority of the noise online quickly set the vibes of frustration, disappointment, and broken trust. People who used four oh or 4.5 were writing for writing were suddenly left with no good [00:13:00] alternative.

[00:13:00] Plus, users who had access to oh four Mini and o3 suddenly found themselves with a 200 message weekly cap on GPT-5 thinking and a router that wouldn’t tell them which model they were actually talking to. Not to mention, most people I’ve spoken to had no idea. There’s now a cap on GPT-5 thinking you only find out when you hit and lose access for the rest of the week.

[00:13:21] So it’s like, that’s a pretty good synopsis of what was going on. And then openAI’s immediately realized this, like Sam Altman was in full blown crisis communications mode by Thursday night, which told you like they just missed this. Like, damn, they didn’t think this through. So Altman tweeted and we’ll put links to all these tweets.

[00:13:43] so this was August 10th, this was on Sunday. If you’ve been following the GPT-5 rollout, one thing you might be noticing is how much of an attachment some people have to specific AI models. It feels different and stronger than the kinds of attachments people have to previous kinds of technology.

[00:13:59] And so [00:14:00] suddenly, suddenly deprecating old models that users depended on in their workflows was a mistake. This is something we’ve been closely tracking for the past year or so, but still hasn’t gotten much mainstream attention. people have used technology including AI in self-destruct, self-destructive ways.

[00:14:17] If a user’s in a mentally fragile state and prone to delusion, we do not want the AI to reinforce that. Most users can keep a clear line between reality and fiction or role play, but a small percentage cannot. We value user freedom as a core principle, but we also feel responsible in how we introduce new technology with new risks.

[00:14:34] So that’s the attachment thing. The rate limit thing was like almost just like sideswipe people. Mm-hmm. So this is, this is an interesting one, Mike, because not only did Sam tweet about this on Sunday, other openAI’s researchers we’re also tweeting about this. So, you know that this one was like a real hot button internally and with their users.

[00:14:54] And the thing that I think about with this one is their restrictions [00:15:00] on capacity, compute capacity to do inference so quick. Like, you know, there, there’s compute to train these models, but then when you and I use them, that’s inference. So when it delivers an answer. Reasoning, which is now baked into this, requires way more computed inference than a standard chat, as does video, as does image, things like that.

[00:15:20] And so the fact that they’re straight up saying this is an issue with capacity opens the door for Google in, in my opinion, like this is a really interesting play where open AI’s lack of maturity and infrastructure when it comes to compute and data centers mm-hmm. is not an issue for Google as much.

[00:15:41] So here was Sam’s tweet again on Sunday, said, today we are s significantly increasing rate limits for reasoning for chat. GPT plus users and all model class limits will shortly be higher than they were before GPT-5 and then today being Monday or Tuesday, they expect to share their thinking on how we are going to make capacity trade-offs [00:16:00] over the coming months.

[00:16:01] Meaning we, a lot of people like our product, we have 700 million users. And the more they use reasoning, the more these things, like we’re, we’re gonna just run outta capacity. Like we have to set rate limits, but people don’t want ’em. and then there was a couple other openAI’s people who also talked about the rate limits.

[00:16:17] Then the other one was that this was the first time we’ve seen this data that I thought was very fascinating. Mike was, we assumed, and we’ve talked about this, like I’ve said, I go, who talks all the time? I ask rooms of hundreds of people, like who’s ever used a reasoning model? Who’s used O three?

[00:16:33] And you get like five hands. And so our like vibe check or like just, you know, eyeball check was, I don’t less, less than 1%, less than 3% of people have any clue what a reasoning model even is. And this is as of like a month ago, openAI’s verified that for us. So yes, the vast majority of open AI’s users have no clue that reasoning models exist or what they do.

[00:16:55] So they have 700 million users, for many [00:17:00] people. GPT five is the first time they’re going to interact with a reasoning model, but they probably won’t know it now because it’s just baked into it. So Sam tweeted the percentage of users using reasoning models each day is significantly increasing. For example, for free users, we went from less than 1% to 7%, and for plus users 7% to 24%.

[00:17:25] Now, that’s a big jump, but that means that people who were paying the plus is 200 bucks a month. Right? Mike? Isn’t that the plus? Is? 

[00:17:31] Mike Kaput: Plus is 20 and then Pro is 200. Okay, so paying the paying tiers. Yeah. So 

[00:17:36] Paul Roetzer: of the people paying 20 bucks a month, only 7% were using the reasoning models, which is wild. Yeah. So, and that would tell you like once you go from seven to 24, now all of a sudden the compute capacity becomes massive.

[00:17:49] and then three other quick thoughts here. The big question with g PT five that we’ve all been waiting for an answer for is, was it going to be a leap [00:18:00] forward over the other frontier models? GPT-4, when it came out in March, 2023, was state of the art for a year and a half. It took, it took a year and a half for Google and others to create something on par with G PT four.

[00:18:12] So the question has always been, does openAI’s have a secret sauce? Is there something they’re doing that was gonna allow them to get that, you know, even six to 12 month lead over everybody else? The answer is no. Like there, my guess is Gemini three from Google, the next version of Claude, the next version of Grok, they will all leapfrog over G PT five.

[00:18:35] there’s some arguments that like Gemini 2.5 Pro is probably already like better than GPT-5 in some capacities. So we, we kind of have our answer that the frontier models have been commoditized. Like there, there is no apparent secret sauce at the moment, which means. We’re back into the game of distribution, who can put a, a comparable model in front of enough users?

[00:18:59] [00:19:00] So openAI’s has 700 million, that’s huge. But Apple, like, you’re, you’re back in the game. Like if you’re Apple, you realize like, hey, we don’t need like the best, we don’t have to build our own frontier model. If you’re Google, you have seven products with over a billion users, seven, seven power, platforms and product like distribution becomes massive again.

[00:19:20] And then the big question I have was like, well, what, what about gpt? I didn’t hear anything about gpt. No. And so I went and looked and it looks like the only thing that changed is the model selector of recommended model as the creator of the GPT is now five, five thinking, or five prob. Like that’s kinda all I can see.

[00:19:36] Yeah. So again, I just wanted to zoom out and be like, high level, the things we were really waiting for was like. Yeah, the model choice issue, was it gonna be a different frontier model than everything else that would cause people to switch back to chat GPT if they were like, love and claw or Gemini, things like that.

[00:19:52] And I, I, overall, it just seems like it’s probably a really, really smart model. The average user isn’t gonna notice the difference. [00:20:00] And there’s, there’s lots they touted, but there’s very little that seems truly, differentiated at this point. And I you, you spent more time with it though, Mike, did you have any other different impressions of it or any other initial feedback?

[00:20:14] Mike Kaput: Yeah, no, I largely agree with your take. I will say it just really struck me how much preferences matter here because personally, and this will seem crazy to some people, I love this model. Yeah. Like, I genuinely find it more useful simply because it is smarter, it is faster, really fast, which is really helpful.

[00:20:33] I get a lot more done. all of my prompts and workflows I’ve tested so far with it work better, which is amazing. I personally don’t have as much preference for switching models. I thought four oh was a little too dumb. Mm-hmm. O three was brilliant, but the form o3 Pro is like my favorite model to use it.

[00:20:52] There’s very much so. However, I would also get frustrated a bit sometimes with the formatting and the slowness of being able to not be [00:21:00] able to just go back and forth rapidly and kind of iterate and converse. For me, this model like squares that circle and like really provides the perfect balance. For me personally.

[00:21:10] I like the tone a lot more. That’s all personal preference. I’m really glad we have it. I think some people hate that it exists. It’s really interesting to see. And I would also add too, if you want to go down a horrifying rabbit hole, go to the ChatGPT subreddit because the stories of people, I don’t know how much of this is like too played up and like viral, but there are so many posts.

[00:21:35] Of people deeply emotionally attached to 4.0 that you feel like the posts are written by people going through withdrawal. Yeah. And it’s really, really weird. 

[00:21:44] Paul Roetzer: And that’s I think what Sam was referring to with that. Like, hey, some people get really attached as therapists, as friends, as companions and like we have a tough job here to balance, like what is unhealthy?

[00:21:58] ’cause they can see the chats [00:22:00] like yes, they know what people are doing with these things and they’re trying to balance like what is good for mental health versus like what is acceptable personal 

[00:22:10] Mike Kaput: choice. It’s really interesting to just see that play out. And they did have an interesting emphasis on health throughout all their, yeah.

[00:22:19] Launch materials. So I think they’re really just understanding that people, for better or for worse are turning to this for emotional and physical health needs. 

[00:22:27] Paul Roetzer: Very, very much. Have you run a comparison, like do you use 2.5 Pro from Gemini? Much? Yeah. How do you think quite a co compares head to head?

[00:22:35] Like have you done any side-by-side? 

[00:22:37] Mike Kaput: I’ve done, I haven’t done too much yet. I really like and rely on Gemini 2.45 Pro for a lot of things, but I usually just cycle between that and either O three slash four. Oh, depending on the use case, obviously it’s way better than 4.0, but just in terms of speed or the complexity of it, that’s kind of my next big thing is like, okay, let’s run, you know, ’cause I have GPTs and gems built out for [00:23:00] some of the same stuff.

[00:23:01] Let’s see how these stack up. I’ll be interested to see what that, how that plays out. And also I think we’ve been seeing more and more chatter even this morning that Google is releasing something like today or tomorrow. I’m convinced they’re 

[00:23:14] Paul Roetzer: just sitting there waiting. Like I think they know that they probably have.

[00:23:18] Maybe something that’ll perform better, at least on the evals. And they were just like, it was a game of chicken. Like you wanna go ahead and release yours first? Yeah, for sure. ’cause Open has done that to them so many times. So I would not be surprised at all if Google came out with something comparable or better in, in, in ways.

[00:23:34] Mike Kaput: And just one kind of final note or impression or kind of perspective here is I genuinely would encourage people just go without any bias, go use this model as extensively as you can. I mean, again, I find it extremely impressive. I also think we all might need to take a breath too. Mm-hmm. Because it’s so easy when we’re in this bubble to be like, you know, you’re gonna see whatever Google comes out with and you’re be like, openAI’s is dead.

[00:23:59] Or ChatGPT, BT [00:24:00] sucks. And it’s like, this is like the first thing that felt like minimum viable AGI to me, to be perfectly honest. But I feel like we, you could make that argument a 4.0 in, in a different context. Right. So I think it’s worthwhile to keep some perspective because this is a genuinely useful model to me, and it just works a lot of the time and I really appreciate that.

[00:24:20] Paul Roetzer: Yep. Yeah, I agree. And I think, get in there, try it. And again, like if people weren’t using reasoning models Yes. And all G PT five does is injects reasoning into their workflows without them even knowing it, it will feel like a leap forward. Yes. Because that’s the biggest thing is Mike and I’ve talked about this many times, using 2.5 Pro, using O three from chat GPT, that is like, at least for me, the majority of my uses is reasoning models now for higher level strategic thinking.

[00:24:51] Mm-hmm. so if you weren’t using those, then you don’t really comprehend how far along these models are [00:25:00] to. Changing 

[00:25:01] Mike Kaput: work, the nature of work. And I wonder once we get past this kind of initial freak out, like how many other stories we’ll see given those numbers you shared. I mean, giving 4, 5, 6 x the amount of people suddenly access to using reasoning models based on those numbers and how they’ve jumped.

[00:25:17] I, I wonder what we’re going to hear people say about this model moving forward too. 

[00:25:23] Paul Roetzer: Yeah. All good. Well, I’m looking forward to your course next week. Yeah, me too. Awesome. 

[00:25:29] DeepMind’s Genie 3 World Model

[00:25:29] Mike Kaput: All right, so next up, Google DeepMind has unveiled Genie three. This is a breakthrough, what they call world model that can generate fully interactive photorealistic environments in real time.

[00:25:41] So, unlike earlier versions of Genie, genie three can render at 24 frames per second, maintain visual and physical consistency for minutes at a time, and respond instantly to both navigation and text-based prompts. So this model can do things like simulate an entire virtual world. Volcanic landscapes, enchanted [00:26:00] forests that can recreate historical sites like ancient Athens, all based on a short description.

[00:26:06] And those worlds evolve. Think of being in a kind, dynamically evolving video game. They evolve as you explore, and there’s these profitable world events that let users change conditions on the fly from altering weather to adding new objects. So DeepMind actually says they see world models as a key step towards AGI because they give a kind of limitless virtual training ground for AI agents to use to learn and adapt.

[00:26:34] So Genie three. Long horizon consistency essentially means agents can now tackle multi-step goals. So this kind of opens the door for really complex simulations in fields like robotics, education and science. But right now, this is still somewhat limited. There’s pretty short interaction, durations constrained actions.

[00:26:55] And it is in a limited research preview. So you can go to the, we’ll provide the link in the show [00:27:00] notes. You can go test out some kind of pre-made examples, but you cannot directly use this yourself. But DeepMind still calls it kind of a pretty significant moment in the evolution of these generative environments.

[00:27:13] Now, Paul, I mean, I realize like world models, this can kind of seem a little bit sci-fi to a lot of people. It’s not available yet to the general public. We’ve got massive news with GPT-5 coming out. But we did wanna talk about this because it seems like world models are pretty important to the trajectory of where AI is going long term.

[00:27:32] So maybe you could talk us through why they matter so much. 

[00:27:36] Paul Roetzer: Yeah, it, it’s been a pursuit of labs for years. This idea of giving the machine the ability to understand the physical world, to create simulations that follow the laws of physics. And DeepMind in particular and Demi asaba, specifically ha have been talking a lot more about them over the last year.

[00:27:56] Like I was going back, when I was kind of getting ready for today [00:28:00] and just looking at the different times that we’ve featured quotes from Demis on the podcast where he was talking about world models and their importance. And they talked about, like even with vo, the video generation, how it just, I mean they, this is their words, like it, it just emerges.

[00:28:16] Like when you train it on enough video data, it starts to like understand the laws of physics. And when you then ask it to produce simulations, it just seems to do it. Now there’s tons of limitations and they highlight those in the launch post. But I mean, in essence it does open all of these possibilities for applications.

[00:28:37] And you know, I think that this idea of the path to AGI when they really start to think about embodying intelligence and like humanoid robots and those robots being able to. See something happening and kind of like think out ahead of, because I understand the laws of physics, I understand human nature, like what is likely to be happening next.

[00:28:56] And that comes whether you’re, you know, training autonomous vehicles or you’re training a [00:29:00] robot to, to work in a human environment. All of these things become kind of essential. And so there’s some cool examples. As you mentioned, Mike, you can play around with like modeling, modeling physical properties of the world.

[00:29:10] So like water and lightning and complex environmental interactions. simulating the natural world. So they talk about generating vibrant ecosystems from animal behaviors to intricate plant life. So it, again, it just like kind of learns and then it’s able to recreate these things. And so this could come into play in storytelling where you’re trying to create these narratives, video game development where it’s rendering in real time the environment.

[00:29:34] So imagine like right now, programmers write all the code to create everything that happens in the game. They create all the environments that stuff. This what they’re envisioning. Elon Musk talks a lot about this. He actually tweeted this week, and he thinks by next year this will be a reality where you could go in and prompt your own video game and like everything just starts happening in real time, creating everything that you see.

[00:29:57] and that’s kind of wild. And [00:30:00] then even like, another tangible example is like right now in a Tesla, when you have autonomous driving going, it shows very like video game-like simulation. It’s showing your car and it shows cars of like approximate size. It’ll show a truck or a motorcycle, but it’s not like watching a live stream video of the road around you.

[00:30:20] What this is saying and what, what Elon Musk implies Tesla is going toward is when you’re driving a Tesla and you’re watching the full self-driving do its thing, it will actually render the physical world to show on the display. But it’s not a live stream. It’s actually like a rendering occurring where it’s simulating this whole world.

[00:30:40] It’s. Yeah, it’s really crazy and it becomes massive in robotics because now you can like simulate these environments and the robots can train in them and all these kinds of things. So world models are huge. We talked about, Fe Fe Lee Spatial Intelligence as a company. She created, I forget what episode that is.

[00:30:56] We can drop the link in the show notes, but she’s [00:31:00] someone who’s been working intensely on this in addition to the research that’s going on in the major labs. 

[00:31:05] Mike Kaput: Yeah, it’s a good reminder too that we will, regardless of the hype or the, buildup of something like GPT-5, regardless of where the verdict ends on that, I mean, progress is happening on a lot of different fronts in ai and it is not slowing down on many of them.

[00:31:22] Paul Roetzer: Yeah. And it’s commonly like six to 12 months ahead of what the public is aware of. Mm-hmm. So if they’re releasing this, they’re obviously already probably far beyond this and within the lab itself. Yeah. and you get people like Elon Musk who just straight up tweet and say, yeah, I think this is coming in three months.

[00:31:40] And yeah. So I mean, if as mu again, like you, you have to, you have to filter like the stuff from Elon Musk you wanna read, but like if you, if you want like a true inside, like just clear train of thought of like what someone thinks it’s possible. Nobody is more honest than you about what he thinks is gonna happen and his [00:32:00] opinions of these other models and kind of where they’re going.

[00:32:02] And while he has a history of sort of over-hyping when technology will arrive, dude built a frontier model in like a year and a half that caught up to the best models in the world. So he, he knows a few things about science and technologies he’s kind of worth paying attention to from that side. 

[00:32:20] Perplexity vs. Cloudflare Crawling Dispute

[00:32:20] Mike Kaput: Alright, our next or third big kind of main topic this week is that CloudFlare says that AI search startup perplexity.

[00:32:28] Has been disguising its web crawlers to bypass site blocks. This is a practice known as stealth crawling. According to CloudFlare, when perplexity bots hit a robots dot txt rule or a firewall block, they sometimes swap their identity from what’s called perplexity bot to something like Google Chrome on Mac Os, and rotate IP addresses that aren’t on its official list.

[00:32:52] So basically, CloudFlare says the company is doing things to dodge detection, including also changing its [00:33:00] network identifiers, which is a tactic. It claims has been used across tens of thousands of domains making millions of requests each day. Perplexity has pushed back pretty hard against Cloud Cloudflare’s claims in a detailed rebuttal.

[00:33:14] Rebuttal. They said they deny intentional wrongdoing. They called cloudflare’s post a publicity stunt, and says the company mixed up legitimate user triggered requests with bot activity. Now, according to perplexity, it says its AI assistant aren’t really traditional web crawlers. They don’t systematically scrape and store the internet.

[00:33:34] Instead, they fetch specific pages in real time. When a user asks a question, they use that content to answer it, and then they discard it with no training or long term storage. So in response, CloudFlare has now delisted perplexity as a verified bot and rolled out new methods to block its crawlers. Now Paul, this is, seems a little technical on the surface kind of in the weeds, but it does seem like a pretty important issue because, correct me if I’m wrong, it seems [00:34:00] like at its core, this is about how AI companies are or are not respecting the boundaries set up by publishers and websites of how their content can and can’t be accessed and used.

[00:34:12] And there’s this big fear given how models were trained, how the content’s already been used, that this material is going to get scraped and used to train models are used to essentially bypass websites entirely. 

[00:34:25] Paul Roetzer: Yeah, which has been going on for the last few years. Like, that’s the thing is like none of this is, well, I mean, I guess the agent side is new, but 

[00:34:33] Mike Kaput: Yeah.

[00:34:34] Paul Roetzer: I mean, part of the issue with like the New York Times lawsuit against openAI’s and others was that they were bypassing paywalls, like to get access to information and stuff. And so, you know, I think in, in the case of perplexity, the problem that we’re running into it here is this is their mo Like, there was, I forget, I don’t remember, I’d have to go back and find the podcast episode we talked about, when Arvin was literally bragging about the fact that they used to scrape LinkedIn against the terms of use, [00:35:00] like that, that that is just what they do.

[00:35:02] And he was proud of the fact that they did it. And it’s kinda like we’re gonna do it until we get caught. So when you’re on the record saying you constantly do these kinds of things, it’s really hard to have credibility when you come out saying, no, we’re not doing anything wrong. It’s like, dude, you’ve, you’ve admitted to things like this before.

[00:35:23] So. The, you have to consider the company itself and its history when you’re looking at this, but when you remove that out, the reality at the end of the day is the rules of the web and business are being rewritten. Yeah. Like we’re gonna have these men messy instances where you have semantics of like, yeah, but we’re not really scraping.

[00:35:43] It’s an agent and an agent’s being requested by a user. So it’s actually really the user that’s visiting the website. So the, you know, how this gets played out, whether it’s through business agreements or court cases or whatever. We’re gonna have this very prolonged [00:36:00] transitional phase where we start running into these kinds of issues and AI agents are gonna be a massive part of this.

[00:36:05] Like, yeah, the more traffic on the web that comes from AI agents, the more challenging it’s gonna be for brands to deal with, for publishers to deal with. It’s kind of similar to, you know, how we’re struggling with copyright and like, were the models allowed to steal it or weren’t they allowed to steal it?

[00:36:21] Was it fair use or not fair use? There, there’s just gonna be so many unanswered questions that we’re gonna come up a against as agents permeate the web and more and more of the traffic and actions taken online are taken by agents. 

[00:36:34] Mike Kaput: Yeah. The fact they’re already having issues with this now, before we even have real, a real explosion of AI agents tells me that we are not ready for whatever’s about to happen.

[00:36:46] Paul Roetzer: Yeah. and I mean, as a, as a publisher of a website, as a, as a brand, you can just like say, well, we want, we don’t want these users or these agents or these, you know, bots to crawl our site. but then what, you’re [00:37:00] just gonna stay out of the chat bot ai, assistant AI agent economy. Like you don’t, your content’s not gonna show up anywhere.

[00:37:08] Yeah. There’s no simple answers, but, I I, and again, like this, when you look at like where, where’s the future of work? Like there’s gonna be people whose jobs is just to kind of figure this sort of stuff out to like wade through all the issues and challenges and figure out plans for this stuff. But yeah, this is, this is kind of a messy one.

[00:37:28] I think it’s just the tip of the spear basically. Like there’s a lot more coming 

[00:37:32] Mike Kaput: for sure. Alright, let’s dive into rapid fire this week. First up, 

[00:37:37] OpenAI Returns to Open Weights

[00:37:37] Mike Kaput: openAI’s has released its first open weight language model. Since GPT two, there are two new models, GPT dash oss, dash one 20 B and G PT dash OSS dash 20 B, that are free to download under the Apache 2.0 license, meaning anyone can run them locally, fine tune them and even use them commercially.

[00:37:58] They support chain of [00:38:00] thought reasoning, tool use and code execution. And the smaller 20 billion parameters with the 20 B stands for 120 billion and 20 billion parameter version. The 20 billion parameter version is able to run on a high-end consumer laptop. OpenAI says the models perform on par with some of its proprietary systems and in certain benchmarks, even exceed them all while being cheaper and faster to operate.

[00:38:25] CEO Sam Altman framed this release as a way to keep innovation in open models happening in the US amid competition from places like China’s Deep Sea. So Paul, I’m curious about openAI’s motivations here. Obviously they are, doing a few things. They’ve got a few things on their plate at the moment. So why spend a bunch of precious time and resources competing in open source at all when your entire business model relies on selling access to closed bottles?

[00:38:53] Paul Roetzer: Yeah, I mean, they’ve talked about the fact they were going to do this for a long time, that they were committed to, you know, the open source [00:39:00] community or just, you know, open weights. so we’ve known it was coming. I think, The way the labs are looking at this now, and we’ve talked a little bit about this before, I know deas has said point blank.

[00:39:13] This is what they’re doing is the open source versions that they’ll release are basically like last year’s proprietary models. So the proprietary models that they’re selling keep getting better, keep getting smarter, more generally capable, let’s say every eight to 12 months is the release cycle for a a next version.

[00:39:34] GPT five obviously took a little longer, but for the most part, the labs are, are looking at kind of that eight to 12 month release cycle of the next version. And so every roughly 12 months, the prior version that’s now kind of outdated. You open source, as long as it’s safe to open source it. And the belief obviously is that the paying users are still going to pay for the premier [00:40:00] version of what’s available.

[00:40:01] plus, you know, they’re, they’re still able to, you know, service the developer community. build those relationships, integrate, you know, APIs still drive a lot of revenue for these, you know, labs specifically openAI’s and Anthropic. It’s a ton of their revenue through their APIs. So it’s just having to service that developer community and be a part of it.

[00:40:21] And then just overall, like the mission of the organization. Now we’ve seen some pullback a little bit on this, like Zuckerberg, who’s been the ultimate champion of open source 

[00:40:28] Mike Kaput: Yeah. 

[00:40:29] Paul Roetzer: Has said already, like they, they may move off of that. They, they may, you know, keep some of their technology more in-house.

[00:40:35] But again, I think what they’ll do is they keep the current frontier model proprietary and then you open source the prior generations accepting that there’s a small portion of users who will just use the open source and not pay for the other stuff. But it’s just kind of the standard model of the labs seem to be following now.

[00:40:54] Mike Kaput: So it’s kind of a no risk way, at least no risk of cannibalizing your existing products. [00:41:00] To get developer goodwill, move the ecosystem forward, remain relevant with people still building on your open source 

[00:41:06] Paul Roetzer: model. Yeah, and I mean in some organizations they’re gonna want to build on the open source too.

[00:41:10] Like you get into an enterprise. So you may have enterprises that have 5,000 chat GPT enterprise licenses, but then the IT teams, you know, also building on top of the open source model, things like that. 

[00:41:21] OpenAI $500B Secondary Talks

[00:41:21] Mike Kaput: Alright, next steps. More openAI’s news. They are in early talks to let employees cash out some of their shares at a valuation of around $500 billion.

[00:41:30] So this is a secondary stock sale. it’s a deal that would potentially be worth billions, giving current and former staff a way to turn their paper wealth into real money while helping the company retain talent. In an era where Meta is trying to poach people for like nine figures. This would basically create a huge jump in openAI’s valuation from going to 500 billion from the last $300 billion valuation when they did a $40 billion financing round led by SoftBank.[00:42:00] 

[00:42:00] And it comes on the heels of an $8.3 billion funding boost. That was oversubscribed. And as openAI’s aggressively pushes on product, so we’ve got open weight models, G PT five, we’ll talk in a second about a federal deal to provide chat GPT to the federal government. Paul, I guess as we’re looking at employees being able to cash out of openAI’s, like what motivates a move like this right now?

[00:42:25] Paul Roetzer: Yeah, I mean, they’re, they’re being drawn by a lot of money from other labs and you have to find ways to, you know, motivate people to stay. You have to give that ability to get something off the table so it makes sense. I’m just looking, Mike real quick. I searched, larger companies in the world by market cap.

[00:42:44] Just to provide some perspective of the significance of a half a trillion dollars. So ExxonMobil, which was the largest company in the world for quite a while, their market cap is 455 billion. Mm. Netflix is 515 billion. MasterCard’s [00:43:00] five 19 Visa’s 6 49. there’s only, well, we got at the trillion dollar plus mark.

[00:43:08] We have Tesla, Berkshire Hathaway, TSMC, or TSM, Broadcom Meta, Amazon, alphabet, apple, Microsoft, Nvidia. That’s it. That’s a list of companies in the world that are a trillion or more. Yeah. and I, and there’s actually only two between a half a trillion and a trillion, so, or well, no, I guess that’s, there’s seven.

[00:43:32] It’s one of like the 20 to 25 biggest companies in the world. Yeah. At, at a half a trillion is what I’m saying. 

[00:43:37] Mike Kaput: That’s incredible. 

[00:43:38] Paul Roetzer: It’s a big number. 

[00:43:40] Mike Kaput: So we’re gonna start seeing a, a, a whole host of other AI researchers, being deca a hundred millionaires, billionaires at some point. Yeah, there 

[00:43:49] Paul Roetzer: was a crazy stat.

[00:43:50] I’d have to find it, but, so don’t, don’t quote me on like the exact numbers here, but it go look it up. the number of Nvidia employees who are [00:44:00] millionaires and the number who are worth like more than 25 million. It’s absurd because their stock in the company, if they’ve been there for any amount of time, like go back, say nine years or more, you worth 10, 20 million.

[00:44:13] Like, it’s crazy. That’s wild. Yeah. It’s, it’s a large percentage. but that’s what’s gonna happen within some of these, you know, massive AI companies is everybody who’s a part of ’em are just gonna make a ton of money. 

[00:44:26] Anthropic Claude Opus 4.1 and System Prompt Update

[00:44:26] Mike Kaput: All right. Next up, Anthropic has released Claude Opus 4.1, and it is a notable step up from Opus four.

[00:44:32] In coding research and reasoning tasks, it hits a 74.5% rating on SWE Bench, a benchmark that is a tough test for real world coding. some companies are reporting it’s better at pinpointing exact corrections in code without making unnecessary changes. The coding startup Windsurf says the improvement is roughly on par with the leap from sonnet 3.7 to sonnet four on their junior developer benchmark [00:45:00] and beyond.

[00:45:00] Code Opus 4.1 has stronger agentic search and detail tracking. It’s more effective for deep research and data analysis. And this upgrade is available to paid users, via Claude Code, the API, Amazon Bedrock and Google Cloud’s Vertex AI, all at the same price as before. Now, interestingly and related to this, just after the release, Anthropic researcher Amanda Asell, shared some more information about the overall updates to Claude’s system prompt.

[00:45:30] This is the master prompt that essentially influences how the model behaves and responds. So in addition to a new model, we gotta look kind of under the hood at how Claude works. these are basically a bunch of updates and tweaks to how Claude interacts with users. So, for example, aswell shared that one change was made that reins in overly casual language and needless swearing from the model.

[00:45:52] Another nudge is clawed to be even handed and critical. Rather than hyping up every idea hears. Claude will also be more direct if it’s suspect, someone might [00:46:00] be dealing with a mental health issue instead of only dropping subtle hints. So Paul, really cool. I mean, in any other news cycle, this would be a huge story.

[00:46:09] Obviously GPT-5 overshadows everything, but it was really cool to see Amanda giving us a peek under the hood of the system prompt too, because, I mean, correct me if I’m wrong, this is at least more transparent than it seems. Some of the labs have been about system prompts, at least until they’re forced to.

[00:46:27] Right? When there’s a huge change to a system prompt like they did when GPT-4o had the really controversial change in their personality. Or unfortunately, when Grock had some really recent unhinged racist behavior due to some system prompt issues. So, maybe talk me through what, what was cool to see about this system, prompt stuff here.

[00:46:47] Paul Roetzer: Yeah, Amanda’s sort of the lead on the personality behind Claude, so she’s great to follow. She, she’s pretty transparent on x about that stuff. the system prompts. You know, the labs aren’t very forthright in [00:47:00] them, but they’re not easy or they’re not hard to extract. So there’s a, I assume I think it’s a guy, I don’t know, but there’s a user on X called Pliny the Liberator.

[00:47:10] the handle is at elder underscore Plin. So we’ll put a link in and the guy drops the system prompts within like an hour after every major update. So he’s a hacker and he’s able to get into, you know, the system and figure out what the system prompts are. And then he publishes the entire system prompt on X.

[00:47:29] So like, if you ever wanna know what the system prompt is, just follow Pliny and you’ll know it. and I know he is been recruited by a lot of the labs. Anthropic in, in particular was trying to hire him recently, and he talked a little bit about that online. So the system prompts are intriguing.

[00:47:42] You actually learn a lot by seeing how they talk, you know, tell the systems to behave and things like that. semi-related, I listened like last week was my. I, I’ve been, I’ve been grinding to get these courses done and like my brain has been like on overdrive every day. So I’ve started a new thing where [00:48:00] like I just go for a run every night.

[00:48:01] So I run like three miles or something, and I’ve been listening to a lot of podcasts, so I put it on 1.75 speed. And you can get through a lot of podcasts, you know, taking a three mile run every night. and so I had like five I listened to last week that were all really good and maybe I’ll list them out in the newsletter this weekend, but one in particular, just to, to the whole point of the story, big technology podcast had an interview with Dario Ade.

[00:48:27] It was, Mike, you gotta listen to this interview. Pissed. Like, it was the most, I don’t, I don’t know, like he’s generally a pretty authentic guy and he kind of seems to wear his emotions on his sleeve a a little bit. But there was a quote where Jensen Wong, CO of Nvidia sort of accused him of being a doomed of like, and he, here’s, here’s the quote.

[00:48:51] I get very angry when people call me a dor when someone like this guy’s a dor. He wants to slow things down. He says, you heard what I just said. And he’s talking about [00:49:00] like his e efforts to like advance and accelerate ai. So my father died because of cures that could have happened a few years later. I understand the benefits technology.

[00:49:09] I’m sure you’ve heard the criticism. This is now the host asking this. I’m sure you’ve heard the criticism from people like Jensen who say, well, Dario thinks he’s the only one who can build this safely and therefore wants to control the higher, higher industry. Dario said, I’ve never said anything like that.

[00:49:24] That’s an outrageous lie. That’s the most outrageous lie I’ve ever heard it. And it just like he was, he was edgy. Yeah, like the whole thing. It’s fascinating about their, their model, their rivalry with openAI’s, how they make money, all this stuff, but like the domm and anthropics approach to safety and how they choose to release models when they release them, things like that.

[00:49:46] safety of the model. So we’ll put the link in. It’s, it is a really good interview. It’s like an hour long. but it’s worth it. It’s, it, it’s good. 

[00:49:57] AI and the Future of Work

[00:49:57] Mike Kaput: Alright, next up, we are still kind of trying to get [00:50:00] a clear picture of AI’s impact on the economy, and we might be making a little progress. So, first we got a report that outplacement firm Challenger Gray and Christmas announced that more than 10,000 US job cuts were directly linked to employers adopting generative AI in the first seven months of 2025.

[00:50:20] They also said that AI appears in four times as many descriptions compared to the previous period. Now, at the same time, though, according to some other reports, including one in the Wall Street Journal, a core question is baffling. Economists, if AI is so valuable in, say, replacing human labor or producing productivity gains.

[00:50:40] Why isn’t it showing up in terms of impact in the form of increased productivity at the macroeconomic level? Because so far economic economists say that AI is not showing up at all in GDP numbers, which is where they would expect to see AI’s impact if it was truly transforming the economy. [00:51:00] But according to a new study from researchers, including Eric Brisen, who we’ve mentioned before, he studies AI’s impact on the economy.

[00:51:08] AI’s impact may be showing up in some other numbers. So Brisen and his colleagues argue that while government data barely registers the value of generative ai, Americans gained an estimated $97 billion in what they call consumer surplus from free or low cost AI tools in 2024 alone. Now, the way they define and quantify this is.

[00:51:33] They basically estimated how much money a US adult would need to be paid to give up the usage of a free or low cost AI tool. And they estimated this based on a survey they ran at $98 per month. In other words, kind of the implicit estimate of the value that the user was getting out of those tools each month.

[00:51:53] Then they went and multiplied that by an estimated number of regular users of ai and they come up with that $97 [00:52:00] billion number. Essentially, they say consumers are getting $97 billion in value out of these tools. These are benefits that don’t appear in GDP because they accrue to users, not companies traditionally.

[00:52:11] GDP accounts only market transactions. So this kind of thing would be invisible. And Brin Sand’s colleagues say this is similar to the paradox that economists spotted with computers starting in the 1980s. You start to see the technology everywhere except in the productivity stats. So. Paul, it’s interesting to see, see real data on AI’s job impact those 10,000 jobs.

[00:52:37] Seems clear it’s having an impact. We know anecdotally through the conversations we’re having, it’s having an impact, but it’s not showing up in the economic data really. Can you maybe walk through the contradictions here in what we’re seeing? 

[00:52:51] Paul Roetzer: So the opinion piece is based on a forthcoming paper called GDP dash B, accounting for the value of [00:53:00] new and free goods.

[00:53:02] so I read this article three times. I think I was trying to like comprehend what they’re saying. so the way where, where I kind of landed on this, since this is a rapid fire item, the logic of the value not being counted in the GDP makes sense. So the reason they give as to why it’s not showing up at GDP is very logical and pretty straightforward.

[00:53:23] The math to get to 97 billion seems. Pretty subjective and like some math gymnastics. Like, it, it, it, it’s a really nice number to put in a headline, 97 billion, the consumer surplus concept and like how they calculate it by like saying, Mike, how much right would it take for you to not use Chad GPT? And you’re like, I don’t know, a hundred dollars.

[00:53:52] Like how do you, how do you come up with that number? So I, again, I I will withhold any judgment. I love the fact that we’re [00:54:00] doing this. I love that economists are trying to find other ways to measure value. I think it’s great and the paper itself may end up being exceptional and make perfect sense in the form of a 500 word opinion piece.

[00:54:12] It’s kind of hard to understand how they’re coming up with that number and how valid that number is. It makes for a nice headline though, and probably research worth. Reading through when it comes out. 

[00:54:24] Mike Kaput: Yeah. I feel like they should have waited for the paper to be Yeah. I don’t get it. It’s really, so 

[00:54:27] Paul Roetzer: it’s way too complex of a concept to try and do in, in a 500 word opinion piece.

[00:54:33] Mike Kaput: But, and I won’t go down the rabbit hole here since it is rapid fire. But it, the point here too is even if this research ends up being terrible, people are scratching their heads about like, ai, we’re seeing productivity gains in our own work. Is it, it’s just not diffused enough into the economy. Like where are the numbers showing up?

[00:54:50] But we’ve talked in the past, we are also sometimes skeptical. Are economists measuring the right thing? Are they aware of the productivity gains happening in other areas? So it’s definitely [00:55:00] a relevant conversation that we need to keep tabs on. 

[00:55:02] Paul Roetzer: Yeah, it’s just like, and again, I, again, I don’t wanna spend too much on this, but this is what it says.

[00:55:07] Rather than asking what people pay for a good, we ask what they would need to be paid to give it up. Hmm. So let’s. Play this out with chat. GPT. Let’s assume you were a chat GPT user maybe paying 20 bucks a month, who was in the camp that had never tried the reasoning model and didn’t know the full value of the system?

[00:55:23] Mike Kaput: Yeah. 

[00:55:23] Paul Roetzer: So I ask you, as someone who’s never used the reasoning model, right, what would, what would it take for you to give it up? And it’s like, I don’t know, 25 bucks, 50 bucks, a hundred bucks. You ask me or Mike, like, dude, I don’t even know. 5,000. Like, I, it’s just worth a lot of money to us. And so then it says, our own survey found their average valuation to forego these tools for one month is $98.

[00:55:44] Multiply that by 82 million users and 12 months in the $97 billion surplus surfaces. It’s like, wait, what? It just seems like a quite a leap to get to 97 billion. But again, I like the direction and I’m anxious to see the actual paper. They’re respected [00:56:00] economists and author. So yeah. 

[00:56:02] OpenAI “Universal Verifiers”

[00:56:02] Mike Kaput: Next up, a couple new articles are giving us a peek under the hood of chat.

[00:56:05] GPT. One of them tackles it from a highly technical perspective. The other from a behavioral one, both are pretty important to understand. If you want to understand where chat GPT and AI is headed. So first, the information reports that openAI’s is now using something called a quote, universal verifier as a quote, secret weapon within chat GPT.

[00:56:27] So basically, a universal verifier is a technique for checking whether an AI’s answers are not just plausible, but actually correct. Basically, like a referee AI model grading another model’s work, pulling in research from multiple sources. For example, in math, it would essentially have AI verifying each step that AI follows to solve a math problem.

[00:56:50] The information speculates that universal verifiers may have actually helped open AI’s latest model score, a gold medal At the International Math Olympiad, which we talked about in [00:57:00] past weeks, researchers say the approach could boost performance in domains that are subjective or hard to score from business decision making to creative tasks.

[00:57:09] Now second OpenAI themselves published a post called What We’re Optimizing Chat GPT for In it, they kinda lay out a short philosophy for how they’re optimizing chat, GPT. They say they are not trying to keep you in the app longer, they’re trying to help you get what you need and get back to your life.

[00:57:29] They wrote, quote, instead of measuring success by time spent or clicks, we care more about whether you leave the product having done what you came for. They also point out that people are increasingly relying on chat GPT for emotional and personal needs, and some new updates reflect that. Chat. GPT will now give gentle break reminders during long sessions.

[00:57:51] It will refuse to make decisions for you on high stakes, personal matters, and provide more thoughtful, grounded support when you are struggling. Apparently [00:58:00] OpenAI says they have worked with more than 90 physicians in over 30 countries, plus researchers in mental health and human computer interaction. To fine tune how the model responds in sensitive moments.

[00:58:11] So Paul, these are two really different looks at how chat GPT works under the hood, but I think they’re both useful to understand. So maybe first let’s really quickly touch on why do universal verifiers matter and then maybe talk about open openAI’s as like emotional and behavioral approach to how this works.

[00:58:30] Paul Roetzer: The verification gap that we’ve talked about numerous times is sort of illuminates why the verifiers would be so valuable. It’s the more you can have other agents or AI that can look at the output. So like if you get a deep research product that’s 42 pages long mm-hmm. And the human has to go through and verify it.

[00:58:50] Well if they build a really smart verifier on top of that and it checks all the stats and you know, makes sure all the citations are correct and the data’s real and you know, does [00:59:00] lookups of those things, like, it’s just increasingly able to do higher value work for humans. So they’re gonna be critical not only in the training of the models, the reinforcement learning of the models, but the actual use of them being a secret weapon.

[00:59:14] Seems like it’s probably a bit of an exaggeration. I know for a fact the other labs are working on these kinds of things. They’ve talked about them publicly, so I can’t imagine, I mean, maybe open eyes a month or two ahead on their use of a verifier. But, that seems like a pretty standard practice within labs to be building agents that can do the verification process.

[00:59:35] Mike Kaput: And, you know, it did strike me too, that some of their commentary around kind of the other side of it, like the emotional, behavioral stuff mm-hmm. Like, was really interesting. I could, like, I feel like there were a couple companies they weren’t naming that they were, that they were taking aim at in saying, you know, we’re not trying to engage you on the app and keep you clicking and eyeballed on it, et cetera.

[00:59:57] Paul Roetzer: Yeah. I think it was also part recruiting [01:00:00] and part retention of talent. They’re basically saying like, listen, if you go work for XAI or meta. You’re just selling yourself off to monetize this technology and keep people on platform. That’s what they need to do with their social platforms. it’s clicks and time on site and daily active use, hourly active use, whatever their, their metrics are.

[01:00:22] And that’s not what we’re doing here. So it’s sort of like a mission thing of like, it’s more than money, like we’re here to actually make the world better, not make more money on ads and clicks and time on site. So yeah, it was a pretty not so subtle dig it. I would imagine meta and X AI in particular.

[01:00:42] OpenAI Offers ChatGPT to the Federal Workforce

[01:00:42] Mike Kaput: All right, next up, OpenAI has struck a deal to make chat GPT Enterprise available across the entire US Federal Executive branch for the next year. So under the agreement, each agency of participates will for just $1 per agency, get access to openAI’s top [01:01:00] models. And get an extra 60 days of unlimited use of advanced tools like deep research and advanced voice mode.

[01:01:06] This also includes some custom training, a dedicated government user, community, and consulting support from Slalom and Boston Consulting Group. So obviously this program aims to cut time spent on red tape and paperwork, freeing public servants to focus on core mission, opening eyesight, some early pilots that show Promise.

[01:01:26] In Pennsylvania, employees saved about 95 minutes a day on routine tasks in North Carolina, 85% of staff and at 12 week trial reported positive experiences. So Paul, the focus on the executive branch is interesting. They call out literally in the announcement, the AI action plan. So I’m guessing this is somewhat related to or motivated by that.

[01:01:49] this definitely seems like a trend of openAI’s getting more embedded in federal and local governments, doesn’t it? 

[01:01:56] Paul Roetzer: Yeah. and obviously the. The [01:02:00] administration is just very, very aggressively moving in and doing deals on these things. Like we had, it came out over the weekend that Nvidia is now allowed to sell their H 20 chips, I think it is, to China.

[01:02:10] Yeah. And then I think Financial Times had the story that the, in essence bribed the government to allow to happen. So like 15% of the revenue for all those sales goes back to the federal government. So they, they basically bought an exclusion on the tariffs. 

[01:02:24] Mike Kaput: Yeah. 

[01:02:25] Paul Roetzer: And so we know that the government is wheeling and dealing all over the place.

[01:02:28] And so yes. On, on its surface. Great. It, it is probably gonna make for more efficient government, no doubt. My guess is sometime within the next 30 days, the information or Financial Times or some Bloomberg, somebody has the story of what was the quid pro quo here? Like what, what did Interesting. Yeah. Yeah.

[01:02:43] What did opening I get in exchange for giving the federal government these licenses for a dollar? Like it’s, yeah, I don’t know. It’s, it’s just, there’s always layers to this stuff, but. On the surface, great. It’ll make for more efficient governments if they’re trained how to actually use this stuff. 

[01:02:59] ElevenLabs Launches AI Music

[01:02:59] Mike Kaput: Right?[01:03:00] 

[01:03:00] All right, next step. 11 Labs, which is best known for its ai, voice technology, is now stepping into music With 11 music, an AI generator that can create fully produced songs from a simple text prompt in minutes. It can generate any genre or style with or without vocals and blend instruments and traditions into seamless original tracks.

[01:03:20] It is apparently built for both creativity and commerce. It has licensing options for film, TV, ads, gaming podcasts, and more. And the company frames it as a way for creators to kind of skip the stock music grind and produce fully unique soundscapes. Interestingly, AI expert and copyright advocate, ed Newton Rex, who we talk about often posted about how the company’s approach, at least initially seems to differ from market incumbents.

[01:03:45] He said co-founder of 11 Labs confirms that their new AI music models trained only on songs they’ve licensed. That is really good to see. When a handful of AI companies try to tell you generative AI can only be built with scraped copyright work, remember that the majority [01:04:00] of AI music models license their training data, including now 11 labs model.

[01:04:05] Very embarrassing for the couple of AI music companies that are known to train on people’s music without permission. Now, Paul, ed Newton, Rex and some follow up comments in this thread on x did say he’d like to see evidence backing up the claim of 11 labs as co-founder. He also asked a few times if they trained their voice model only on audio they’ve licensed.

[01:04:25] Did not get an answer, but at least this does seem like a step in the right direction. 

[01:04:31] Paul Roetzer: Yeah, the tech’s awesome. The is like anything else, like a whole, all these tools are great image generation, video generation music or whatever. there’s always this underlying Yeah. But yeah, it’s train time legally at some point.

[01:04:47] I, I mean, the story’s not gonna go, I don’t want the story to go away, per se, that like, I think this is. There’s people like Ed need to keep the pressure on these labs and find ways to, [01:05:00] compensate creators. I don’t know the answer to how that happens, but so many of the AI labs just seem to kind of like moved on.

[01:05:06] It’s like, man, of course we took their stuff, like, leave us alone. Like, it’s the general gist of how the labs respond whenever they’re called out on it. It just is what it is. So, I don’t know. I don’t know when we’re gonna finally have like a court case that changes anything or some industry agreement that changes things.

[01:05:24] But up until then, every time we talk about awesome something is, there’s always the, yeah, they, they did, they still copy material. Like 

[01:05:32] Meta Buys AI Audio Startup

[01:05:32] Mike Kaput: now some more, AI audio news Meta has quietly snapped up a company called Waveforms, which is a fast rising AI voice startup for an undisclosed s It is Meta’s second major AI audio acquisition in just about a month that follows their purchase of play ai.

[01:05:50] And this is all part of their new AI unit. Super Intelligence Labs Waveforms was founded only eight months ago, but had already raised $40 million from Andreessen [01:06:00] Horowitz, hit. They hit $160 million valuation. Their company, the company’s tech, is focused on passing the so-called quote speech Turing test.

[01:06:08] So basically making AI speech indistinguishable from humans and on building what they call emotional general intelligence to detect and respond to emotional cues. Two of the co-founders, Alexis Conno, and a former meta and openAI’s researcher who helped develop GPT-4o’s, advanced Voice and Corale Lamare, former Google ad strategist, both of them have reportedly joined Meta as part of this.

[01:06:33] So Paul Meta acquires play AI back in June. That’s a, they, that’s a quote, a startup that uses AI to generate human sounding voices. Waveforms is building emotional general intelligence. We’ve been talking in past episodes about meta’s aspirations to build personal super intelligence. I don’t know. This really seems to me like we’re heading in the direction of meta building, hyper personalized voice assistance or companions.

[01:06:58] Like what do you think 

[01:06:59] Paul Roetzer: [01:07:00] definitely seems to be going in that direction? I mean, I think, Zuckerberg’s been on record in recent podcasts talking about voice, plus glasses. Yep. You know, they basically think that the touch goes away as like a largely as an interface, and that most of your interactions with intelligence, with agents, with assistance happens through voice, and your interactions with the world around you.

[01:07:26] and so it makes sense that they would be kind of making lots of investments in this direction. and again, it gets, it gets back to that distribution question. Like, obviously openAI’s is going in the same direction. They’ve been putting a ton in voice. It seems like openAI’s probably had a lead.

[01:07:41] Maybe they still do on voice. Google’s obviously making major plays into voice. I do think, like as you were saying this, like the one thing that crossed my mind, I dunno if you have this issue, Mike, ’cause I think you use, ChatGPT voice as well. I love it, but I often use it when I’m driving. Yeah. And it drops [01:08:00] in like dead zones all the time.

[01:08:01] Definitely. It drives me crazy. And that goes to the whole, like the open source or like the opportunity for Apple to put a smaller voice model on the phone, like on device where I don’t have to be going off device to, to have that conversation. Those are like the windows of opportunity for someone like a Google with Pixel or Apple, you know, with the iPhone where I don’t have to leave and I can just have that uninterrupted voice conversation where like I’m talking, talking, talking, and I’m like, three minutes goes by and then I realize I lost the connection and the voice wasn’t there anymore.

[01:08:32] And you’re like, oh, everything I just said was perfect. I don’t want to have 

[01:08:36] Mike Kaput: to repeat that a hundred percent. That happens all the time. And I feel like despite how amazing in man’s voice mode is, I feel like voice is under. We rated or underutilized at the moment. So yeah, it’s not only just having it on device, but the type of device, right, like phone is the form factor right now we know openAI’s is coming out with some type of device.

[01:08:55] We don’t know what wearables maybe is the play. I feel like [01:09:00] Air Pods, yeah, air Pods would be incredible. just feels like this could be a real big unlock. 

[01:09:05] Paul Roetzer: Yeah, and it’s, it just seemed like a year ago, openAI’s was knocking on the door. Like they had basically solved it with their whisper, you know, technology and built it in and then it just feels like they lost momentum or they ran outta compute.

[01:09:17] Like I could be, it’s very possible they just couldn’t launch it because they, they didn’t have enough compute to do the all the other stuff. Yeah. But again, these are where those Apple Google where like the stalwarts, the people with the distribution with the devices, like that’s where the opportunity is.

[01:09:32] I assume whatever they’re building with Johnny, ive like, that’s probably tied to voice in some capacity. So yeah, I think there’s just gonna be a lot more to come with voice, you know, probably still in 2025. Yeah, for sure. 

[01:09:46] Google AI Pro for Students

[01:09:46] Mike Kaput: All right. Last but not least, we have as our last topic here, Google is making a big push to offer its most advanced AI tools to college students for free.

[01:09:56] It is committing a billion dollars to AI education, [01:10:00] training, and research in the us. So starting now, students in the US and they also added on Japan, Indonesia, Korea, and Brazil can sign up for a free 12 month Google AI Pro plan. That includes Gemini 2.5 Pro. For homework help and research notebook, LM for organizing ideas.

[01:10:17] VO three for AI generated videos, higher limits on Google’s AI coding agents, and two terabytes of storage. This release also debuts guided learning, which is a mode in Gemini that doesn’t just give answers, but actually walks students through problem step-by-step to deepen their understanding. In the us.

[01:10:36] Google also reports that over a hundred colleges have already joined their new AI for Education Accelerator, which is offering free AI training and Google career certificates to college students. CEO Sundar Phai says the goal is to put top tier AI in students’ hands and teach them how to use it well, helping them thrive as the first true generation of what he calls quote AI natives.

[01:10:59] Now, [01:11:00] Paul, I’m, I feel like this might have flown a bit under the radar with all the other news. I mean, I would have to benchmark it, but a billion dollars in commitments to US schools over three years seems pretty significant. the offer of free AI training and Google career certificates to every student, I mean, I just feel like I have a fair amount of conversations.

[01:11:19] I know you did too with teachers, higher ed institutions, this feels like something that could really move the needle if they stick the landing on it. 

[01:11:27] Paul Roetzer: Yeah, it’s great to see. And I don’t know what the connection is to the, like, in April there was the executive order from the White House on advancing artificial intelligence education for American youth.

[01:11:37] And then they just came out, I think it was like last month or something with, the policy plan because the executive order basically said that we would, policy the United States to promote AI literacy and proficiency among Americans by promoting the appropriate integration of AI into education, providing comprehensive AI training for educators, and fostering early exposure to AI concepts and technology to develop an AI ready workforce in the next generation of [01:12:00] American innovators.

[01:12:01] So that was like saying, Hey, we’re gonna do this. We’re gonna create a task force, and in 90 days, 180 days, whatever, this is the plan. I don’t know if this is connected to that, and a commitment from Google related to that, but it, I, it would seem they’re, they’re very closely aligned, at least. So I, yeah, I think this is great.

[01:12:17] I think we’re seeing more and more of this from the major AI companies, whether it’s Microsoft, openAI’s, Andros been releasing some great stuff. And so I would say like, as you’re building out, and ironically like I was building the AI Academy course this morning about building internal AI Academy.

[01:12:32] So this is very, very top of mind for me. think about these things as you’re building personalized learning journeys for your teams. It’s like, okay, we’re gonna have our core curricul but what can we pull from a Google? What can we pull? And obviously this is more K to 12, but conceptually, like, what can we pull from these different resources that can really enhance our people and prepare them for the future of work?

[01:12:52] And as you’re even starting to hire, like looking at what kind of curriculum have people gone through with their AI education, where are they already at with their [01:13:00] understanding and competency in this stuff? So yeah, it’s awesome to see this, you know, really large focus, not just from Google, but the White House and other major companies that AI literacy is, is absolutely critical to the future of work and innovation, not just in the us 

[01:13:16] Mike Kaput: but beyond that.

[01:13:18] Yeah, a hundred percent. All right, Paul. We made it through GPT-5 week. Thanks for breaking it. It all down. You feel 

[01:13:24] Paul Roetzer: different. Like I think my overall is like, it seems awesome. It’s just like after a year and a half of like waiting Yeah. You just, you thought the world was gonna change after GPT-5 came out.

[01:13:35] It feels like they did more backtracking than like actually accept. 

[01:13:40] Mike Kaput: Yeah. I don’t know. We’ll see. I feel like it’s like going to be much more impactful than I even realized now. Yeah. But it’s gonna be a lot more subtle. It’s, you know, like when 

[01:13:47] Paul Roetzer: we look back in like 30 days, 90 days, it’s like, so, oh wait, that actually was a bigger deal than maybe it’s those first 48 hours.

[01:13:54] Mike Kaput: Hey, I, what I just said could be out of date by this afternoon when Google releases something. But yeah, I do [01:14:00] think that we’re going to look back and be like, Hmm, okay. That might’ve been a subtle turning point. But again, it just shows like the bubble, the hype is outta control. 

[01:14:07] Yeah. and that we all live that.

[01:14:09] Paul Roetzer: Anyone listening to this show, at least, we generally live in a bubble and most of your peers have no idea that GPT-5 came out or what it is like. It’s funny, my, my dad who listens to the podcast every week, he’ll often like text me things and he texted me, I think the morning after and he goes, nothing on the news today.

[01:14:28] So he was like watching the news to see if GPT-5 was even talked about in mainstream media. And he is like, nothing. Wow. And so that, again, it tells you like we’re not to the point, like we’re waiting, waiting, waiting for a year and a half. The general public like care less. It’s a non-event to them.

[01:14:45] Mike Kaput: Until the next Studio Ghibli filter goes viral or something. Right. All right, Paul. Well thanks. Good stuff. Thanks again. 

[01:14:52] Paul Roetzer: All right. Thanks everyone. We’ll talk to you next week. Thanks for listening to the Artificial Intelligence Show. Visit SmarterX dot AI to [01:15:00] continue on your AI learning journey, and join more than 100,000 professionals and business leaders who have subscribed to our weekly newsletters, downloaded AI blueprints, attended virtual and in-person events.

[01:15:12] Take in online AI courses and earn professional certificates from our AI Academy and engaged in the Marketing AI Institute Slack community. Until next time, stay curious and explore ai.



Google DeepMind’s Genie 3 Could Be the Virtual World Breakthrough AI Has Been Waiting For


Google DeepMind just pulled back the curtain on Genie 3, a real-time, photorealistic “world model” that can conjure interactive environments straight from a text prompt. Continue reading “Google DeepMind’s Genie 3 Could Be the Virtual World Breakthrough AI Has Been Waiting For”