Episode 33 Apr 06, 2025 11:04 3.5K views

How Gemini 2.5 Pro Just Changed the Future of AI

About This Episode

In this episode of the AI Agents Podcast, we dive deep into the game-changing release of Gemini 2.5 Pro and how it's redefining AI capabilities across reasoning, creativity, and code generation.

Host Demetri Panici explores the key improvements in Gemini 2.5 over previous models, including its standout performance in complex tasks like advanced mathematics, logical reasoning, logistics planning, and long-form text responses.

With detailed comparisons against OpenAI's O3 Mini, the episode highlights Gemini's powerful long-context capabilities and its surprisingly human-like communication style.

Listeners will get a first-hand look at how Gemini 2.5 Pro handles creative prompts, technical outputs, and futuristic thought experiments like brain-computer interfaces—showcasing its ability to synthesize information and adapt across disciplines.

Whether you’re an AI enthusiast, developer, or just curious about the latest in artificial intelligence, this episode unpacks why Gemini 2.5 might just set a new standard for state-of-the-art reasoning systems.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
⏰ TIMESTAMPS:
0:00 - Introducing The Launch Plan Output
0:32 - Meet The Host And Show Overview
1:08 - Gemini 2.5 Release Breakdown
2:24 - AI Benchmark Comparisons
3:28 - Live Code Demo And Game Creation
4:49 - Prompt Testing In Real Time
6:14 - Analyzing Creative Writing Prompts
7:21 - Logic-Based Prompt And Math Validation
8:02 - Socioeconomic Impact Of BCIs
9:19 - Final Thoughts On Gemini Vs O3
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Sign up for free ➡️ https://link.jotform.com/AtHF72NuSU
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Follow us on:
Twitter ➡️ https://x.com/aiagentspodcast
Instagram ➡️ https://www.instagram.com/aiagentspodcast
TikTok ➡️ https://www.tiktok.com/@aiagentspodcast
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

Transcript

All right. So, this gave a response that was correct in the math, I'd presume. And I like the format that it's spitting it out in. Um, it gives a breakdown of all of the thinking here, but let's go into the output. That's more important. Okay. Defines the goal. Lists the rocket properties. Okay. Checks the budget constraint and then does a calculation for overall success. Cool. Compare plans and select the best. Nice. Final launch plan. Showcases the different tonnage. Okay, that seems fine. Hi, my name is Demetri Bonichi and I'm a content creator, agency owner, and AI enthusiast. You're listening to the AI Agents podcast brought to you by Jot Form and featuring our very own CEO and founder, Idkin Tank. This is the show where artificial intelligence meets innovation, productivity, and the tools shaping the future of work. Enjoy the show. Is it really happening?

Are we finally getting the Gemini model that's going to change the world? Am I finally going to learn how to start a episode of this podcast with a normal intro? Well, we already know the answer to one of those questions and it's about me with intros. But in all seriousness, Gemini 2.5 is released. It is an update by Gemini that's pretty much done away with the conception that it is quote not as good at thinking or reasoning. It hadn't hit that complex knowledge level yet that some of the other items here have. And you can see very simply in some of the examples here of the metrics that we have a product with higher scores than open AI's 03 mini on a myriad of different tests including mathematics codebench is still a little bit worse than it simple QA much better visual reasoning it's doing

well in comparison to 4.5 and sonnet humanity's last exam it's interesting it scored the best on two what is that humanity's last exam am. It's a final closedended academic test of human knowledge. It spans a wide range of subjects from classics to ecology and involves questions that even seasoned experts would find challenging. Okay, so it's just showcasing the sheer amount of info it does know, which is good. And it's gotten much better at long context. Taking a look at this here, you can see a better graph that showcases the improvement versus the other products, like how it's doing better in mathematics than a lot of the models. It's doing better in science than a lot of the models. And for advanced coding, as you can see, this is an example that was given where it says create your own dinosaur game like the one that

you see when your internet is out to lunch as in it's not working. And you can in Google usually play the dinosaur space game. So check a look at it here. Make me a captivating game. Here's some code. Here's some code. It's showing the amount of time it's taking on the bottom left there. And from here, if you see the copy and paste into this JavaScript preview, and look at that, the DinoRunner is able to be created. Granted, it is showcasing how to make a game that is a Google game, unless they took it from someone and I'm unaware. But yeah, it's here and it is definitely a bit better. So, from the standpoint that I see here, I am on Gemini Advanced. I do have that product because I pay for it cuz why not? What I want to do here is I want

to showcase some great examples of it doing well. So, I'm actually going to ask it. There's always a good way to interact with these AI. What is the three best prompts you can give me to test the capabilities of Gemini 2.5 Pro verse open AI 03 and I'm going to open up my OpenAI in another tab. You can see it's doing a bunch of reasoning which we're not used to. Usually Gemini is fast fast. All right. So let's try it out with three different things. We got prompt one. Write a short story. It's good for the creativity standpoint. Then complex logical reasoning. Okay. A plenty and code generation. Cool. And number three, information synthesis. Okay, let's do it. Let's open up three tabs. Start new chats. All three. And we'll run it. Okay, we're going to grab prompt one. We're going to test it out.

Then we're going to go to prompt two. Test it out. Now, I do want to remind you that I don't have 01 Pro. I just have the team plan. I'm not going to pay $200 a month at the moment for that. All right. So, we got all three of these going, right? It's taking about 20 30 seconds to respond. So, let's go. This is prompt number three. This is prompt number two and prompt number one. All right, cool. So, what we have here is all three of these going, right? And this one is finished. So, let's just take a look first at the short story. The slow pulse of the deep earth is my clock. Ooh, dang. Sun warmth on my upper reaches. I like this. This is kind of kind of nice. The cool kiss of the rain on my bark. Ooh, the relentless

tug of gravity on my limbs. These are my events. I have felt the world tilt through ages. The ice crawl close and then retreat. The river shift their beds like slumbering serpents. Okay. Okay. Now, we got some bars here. Now, granted, it's a little bit I don't know what the word is I'm looking for. A little odd in regards to like how gratuitous it's being with wording, but yeah, it's definitely unique. It's definitely melancholic, right? And detached. Before the fleeting ones, there were only the slow patterns. The green spread and receded with the great breaths of warmth and cold. The sky fire danced. The ground shuddered and settled. Okay, you're being very grandiose in the way you're talking, but I like it. Now, first of all, let's test. I think it got the tone decently right. Let's do a little word test. Word counter.net. 644.

You can't count. That sucks. I feel like most of these AI products can't count. It's very odd. All right, let's go into write a short story. I have stood here for more than a millennium, rooted deep in the earth and stretching my limbs to the sky, witnessing the slow march of time. I am an ancient oak, sentient witness to eras that the fleeting lives of humans can scarcely imagine. I recall when the first human footprints disturbed the soft earth near my roots. Okay, this is good. It's not as grandiose. Let's see how accurate it was regarding the word count. Let's do a test there. 553. So, one did less, one did more. It's interesting. Gemini did more. You don't usually get that. By the way, just to call out, Gemini 2.5 is a 1 million token context window, but it has a 2 million one

coming soon, which is going to be very interesting. So, and then when it comes to 300, it seems like 03 Mini has a context window of 200K. So, that's something to note. All right. Now, let's go back to the logistics option. This is essentially a good test because of the fact that what it does is it's complex logical reasoning. So, you're managing a logistics situation for a lunar base supplied by three types of cargo rockets. Type A, type B, type C. Okay, it breaks down different success rates and costs for each. You need to deliver exactly 115 tons of critical supplies within a budget of $60 million. Develop a launch plan. Specify the number of each rocket type to use that minimizes the overall risk of failure. Maximize the combined probability of all required launches succeeding. Okay, cool. All right, so this gave a response

that was correct in the math, I'd presume. And I like the format that it's spitting it out in. Um, it gives a breakdown of all of the thinking here, but let's go into the output. That's more important. Okay. Defines the goal, lists the rocket constraint, and then does a calculation Okay, that seems fine. I don't think there's anything wrong with this. And then if we go to the lunar base cargo planning, this is once again just seeming to be shorter. Granted, I will point out that the outputs here are odd because of the fact that this is like a way larger text size. I mean, like physically. So, let's just check the output length. 717 words versus 536. So, it's odd. Google's actually the longer one here for the most part and the more detailed explanations. That is surprising. Usually 03 and 01 are crazy

with let that sort of thing and go off the walls. All right. So next we have analyze the potential long-term parenthesis next 20 to 30 years. Socioeconomic and ethical consequences of brain computer interfaces. Ooh, as they become widely accessible and capable of high bandwidth, thought to text and thought to machine control. Consider the impacts on communication and language evolution, education and skill acquisition, employment and economic inequality, personal privacy and mental surveillance. Ooh, I will say by the way just just a thought to me it really does seem when I get responses out of this new Gemini you'll see it too when you look at it chat says below is a detailed structured analysis here it says okay let's analyze the potential long-term socioeconomic and ethical consequences of widely accessible high bandwidth brain computer interfaces so it's speaking like a person which is a little weird

in comparison because I do feel like the more deep thinking models have had a problem doing that. But shout out, I've had no errors on all three of these. It was pretty much a perfect output in regards to it actually responding to everything I asked. And it gave real examples. And by default, since it is on Google, I just got to point this out. It is going to do search where this is only within its ecosystem. Obviously, I could press search to do something with it. However, that is kind of important in my opinion. the fact that it is automatically gonna find sources without me needing to say, "Hey, look things up and give it support." Overall, I'm not going to make any claims as to whether the ethics of the AI are good about the ethics on AI for either of them. I just

noticed these little things about formatting is actually pretty cool inside of OpenAI. I think it's actually a little bit better formatted of a response. However, the and that's because of the sizing of the text, which is odd, but I'm sure if we go into I'm actually not even sure. Can we go into the settings and change that? I doubt it. It's usually not something that like they're going to give you to No, I see nowhere to change it. So, it giving more detailed responses for all three was obviously weird, especially due to the fact that 03 has always been such a long responder, but this gave a longer and more humanlike speech to it. However, I do think the creative speech thing here was way over the top, though. It is seeming to get there and becoming more and more creative. So, they claim it's

more powerful than 03. Who knows? Everyone's saying that it was a big win. Everyone's saying that it's great at coding. I'm feeling pretty good about the fact that it's competing and improving above and beyond open AI right now cuz Google, I felt like for a while, was behind. And I just love that they're jockeying prefer position. I just love it. Uh, glad they got rid of the name Bard and I'm glad that they're jockeying for position and will continue to in this upcoming year. So, with that being said, let me know your thoughts on Gemini 2.5. Pretty interesting outputs, solid reasoning model, and I'm excited to see what you can do with it today. Please leave us a like and a review on Apple Podcast, Spotify Podcast, whatever platform you watch on. Thank you so much for watching and we'll see you in the next

one. Peace.

← Episode 34 The Evolution of Personalized AI with Gemini 2.0 Episode 32 → Why OpenAI’s Free ChatGPT Image Generator is AMAZING!