The Future of Tech and AI
About This Episode
Host Demetri Panici sits down with David Colwell from Tricentis to discuss the transformative power of generative AI, the shift toward automated reasoning, and the challenges of developing intelligent systems that can truly understand and execute human-level tasks.
David shares insights from Tricentis’ journey into AI-driven quality assurance and the surprising capabilities (and limitations) of emerging large language models like GPT and Claude.
Whether you're curious about how AI is impacting software testing, the surprising origins of generative AI, or what recursive models might mean for the future of work, this conversation offers a deep dive into the evolution of tech and automation.
Learn how tools, context, and reasoning models are shaping tomorrow's workflows—and why staying ahead in this space is as much about understanding systems thinking as it is about cutting-edge code.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
⏰ TIMESTAMPS:
0:00 - Into The World Of Legacy Code
1:32 - What Tricentis Is All About
3:35 - Building The AI Department
6:02 - Early Struggles With Generative AI
9:41 - Breakthroughs In Agentic Automation
12:01 - Understanding Reasoning Models
16:11 - Rise Of Superintelligent Models
19:02 - Why AI Feels Like Magic
24:48 - The Power And Limits Of Tool Use
39:14 - How Tricentis Adapts With AI
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Sign up for free ➡️ https://link.jotform.com/IMxV1mMywM
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Follow us on:
Twitter ➡️ https://x.com/aiagentspodcast
Instagram ➡️ https://www.instagram.com/aiagentspodcast
TikTok ➡️ https://www.tiktok.com/@aiagentspodcast
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Transcript
You would be amazed at some of the stuff I've seen like uh programs that were built in the 60s or uh in a language named after a dude who has passed away 10 years ago and no one knows how to maintain it anymore. And our tools just seem to work on all of it, which is phenomenal. And so that's kind of where Tricentus uh started to play is big companies with big QA problems that need really really comprehensive solutions to them. Hi, my name is Demetri Bonichi and I'm a content creator, agency owner, and AI enthusiast. You're listening to the AI Agents podcast brought to you by Jot Form and featuring our very own CEO and founder, Idkin Tank. This is the show where artificial intelligence meets innovation, productivity, and the tools shaping the future of work. Enjoy the show. Hello, my name is Demetri
and welcome back to another episode of the AI Agents podcast. In this episode, we are interviewing David Coowell from Tricentus. How are you doing, David? Doing great today. How are you, Dimmitri? Doing awesome. We're really excited to have you on. It's been uh been good chatting with you in the pre-screen. We were just uh nerding out about a new release that just came out. As we're recording this, about an hour previous to the uh start of our call, Cloud 4.0 dropped. So, I was just learning a little bit about that from you, and that was that was that was fun. I'm I'm the AI nerd came out. Yeah, the AI nerd had came out. We forgot we were going to record a podcast. Um, so you know, just to get started, I I want to learn a little bit more about yourself and, uh, Tricentus.
Yeah. So, um, a little bit about me. I came from the world of quality assurance. Development was more of a hobby for me than a primary job. So I grew up in you know QA doing quality assurance for organizations consulting for organizations uh building QA practices and the whole time it kind of struck me that uh QA was in need of really good tooling and so I started you know using things like Selenium automated tooling they always felt a little weird like we were asking people that didn't code for a living to code for their living and so an opportunity came up at uh Tricentus in the Sydney office and they said, "Hey, do you want to come over and join us?" And so jumped over onto that side of the fence. Tricentus is the leader in QA tooling. They offer tooling for managing QA,
optimizing practices, tracking code through to tests, automating literally anything. You would be amazed at some of the stuff I've seen like uh programs that were built in the 60s or uh in a language named after a dude who has passed away 10 years ago and no one knows how to maintain it anymore. And our tools just seem to work on all of it, which is phenomenal. And so that's kind of uh where Tricentus uh started to play is big companies with big QA problems that need really really comprehensive solutions to them. So I won't rattle through our customer list. There's a ton of them. You can check it out on the website which is tricentus.com. But my role within Tricentus, I actually kind of started out in the uh solution architecture side of it which I was sort of trying my hand in more of
the uh customerf facing side of the role after having spent all my days in technology and backer house lasted about two years. uh built the APAC division of the solution architects and then the R&D side got tired of me suggesting all the cool new things we could do with the product and said hey how about you come and build them instead and so I uh ended up back in the tech side um I built our AI department at Tricentus in 2018 the goal of it was always hey there's a lot of uh there's a lot of labor in quality assurance that's not a ton of fun things like I've got a test case and it runs and then it breaks because the developers decided to refactor the front end code and now the business users are saying I thought you said this thing was going
to save us time and now it's breaking what's what gives and so I thought all this labor is just a really good opportunity to apply AI to see if we can make AI approximate good solutions um we started out writing patents and designing neural networks for object detection and regression networks and all the things that used to be called AI prior to 2023 and and uh you know the generative AI things. It was actually kind of funny. We started in generative AI in 2019 I think. 2019. Wow. Yeah. You may not know this but generative AI started in 2017. Uh so that was the first transformer models. Um back then there was some kind of Sesame Street naming convention. So the first models were things like BERT. Um and that was a biirectionally encoded representational transformer. think there's always a way of making the acronym
into a word and making it humorous. Like the first image detection models that we used were called YOLO models, which for those of you that were um on the internet, circa 2017, you'll you'll find it very funny. Uh for everyone else, they'll be like, "What the hell are you talking about?" But yes, the so putting aside the weird humor of data scientists, um we were looking at these models because everyone was excited about their ability to translate natural language and translating natural language into something structured is a large part of what QA is because you have to take a requirement written by a human and humans aren't algorithms. They don't structure things in predictable ways with deterministic you know XY Z formats even though we tried with things like BDD we said you right and given when then and then business people said how about
we don't and you try to translate it into that for us so attempts to change humanity did not work. So we thought maybe we can use these transformers to change humanity for us. So just understand what the human means. The answer in very short was no. It it didn't work because the early transformer models were very large for the time which is hilarious now. You say, "Oh, it was a billion and a half parameters." Well, like that's a baby model now. Back then when GPUs, you know, with 16 gigabytes of memory deal. Uh that was a big model, but also they were kind of rubbish. like uh they understood language and they could write words. But if you there's a hilarious video online actually about uh one of the early GPT clones called Botnik writing a chapter of a Harry Potter novel and it is
pure comedy gold. But it was celebrated by the uh machine learning community because we're like wow it knew how to understand the concept of Harry Potter and writing and a novel like it knitted all that together. But to everyone else, they looked at it and it said, well, the first sentence was um Ron was going to be spiders. He just was. There was nothing Harry could do about it. You're like, he was going to be spiders. He was going to be spiders. Yes. Not that he would eat them or look at them. He was rapidly becoming an arachnid apparently. So that's incredible. I know. It actually to this day is the funniest chapter of Harry Potter I've ever read. At one point, uh, it gets kind of down a rabbit hole and it thinks that the death eataters are a form of middle management and
they go up to the top of the castle and start having a meeting when at the end of the meeting they like congratulate each other and oh, well done. And you're just reading this and you're like, I can see how this flows as language, but as an as a work product, I find it wanting. So that was kind of the story with testing. We give it requirements and say turn these into test cases and it would just start rambling on about what test cases are and then just probably insert spiders at some point. So not super useful. Uh and GPT3 not 35 the the Da Vinci ones that were kind of completions. Um they were a big step forward for us. Yeah. They were the first point where we were tricked into thinking that this could do the job. uh because it was a lot
better at writing, but it didn't actually produce good work product. It was kind of but it was now a lot harder to tell that it wasn't good work product because it sounded a lot more like it was. It it like used all the right terms and it kind of threw things in there that like I'm going to do some functional tests and some regression tests and some whatever. And then you'd go, "Yeah, this looks great." and we all were kind of super hyped about it. But when we looked at the test cases and we started applying actual rigorous thinking to it, um none of them were actually useful. So like all right, so what's the net effect? And we worked out that the net effect of generative AI in those early days was a negative productivity because you spend more time reviewing the output than
you would have to write it yourself. So, we had this because the outcome that we're after is somewhat exact. It's not like um it's not like we're writing a novel where you can just kind of edit it along the way and correct it bit by bit. You ended up needing to know all of the test cases that you needed on one hand and all of the test cases that it wrote on the other hand in order to figure out if it missed anything. And so that's double the work because you need to write the output before you can review the output. So we kind of parked that for a while and lent into agents and we started building the first agents with uh chatgpt 3.5. I think it was version 1102. The first version that could use tools and we gave it access to one
of our AI um screen not recording but you'll get the idea. It's kind of like a screen recording and acting software that uses an object detection model. It's called Vision AI. There's a whole post about it on our blog to see how it works. And we allowed chat GPT 3.5 to just kind of steer the model saying I want to click on that item. I want to look at this thing. I want to select from that drop down. And it was okay, but it was another kind of sugar hit where we went, this is this is it. We've solved it. Humans don't need to do, you know, test automation anymore. The AI can craft it for us. And we figured out that no, it can't. it just kind of gets stuck in circles. It uh does pointless actions. The test it generates is really full
of noise and when you give it to a human they spend a long time going through and cleaning up the work that the AI did because again the output we're after is a bit precise. And so we kind of stopped investing too much into that and went down the same path as everyone else with assistive AI shipping things like co-pilots. Um, and they're okay. Like they help users use our products and answer questions. They provide a minor productivity uplift, which definitely pays for itself, but isn't like gamechanging. But between that, there was a lot of shift in models. The biggest shift was 01. Um, yes. Now, out of all those, everyone kind of remembers deepseek and forgets 01. But like, uh, 01 and deepseek kind of the same thing. They're instances of the first reasoning models. And the reason that that was important to us
is because reasoning models are kind of a way of the AI reflecting on its work as a trained component rather than us trying to force it with things like um what's that old thing that we used to do where you'd force the AI to reason through. It's a chain of thought prompting. Yeah. Reasoning is like built-in chain of thought prompting that works 10 times better. So we went down the 01 models. Yes. And that's why they did it. Um we went down the 01 models and finally we could actually start building and shipping agents, things that would actually do the work that the human would do, check their output, be more precise, be more productive. And so we're kind of in this world where now the AI is caught up to where we want to be. And now the AI is racing ahead. And every
day we're kind of like, all right, we got what can we do? We need to adapt more things. Oh, MCPs appeared. Where did that come from? Quick, put that in the products. Oh, wait. We don't need it. No, wait. Yes, we do. It's kind of this iteration of trying to figure out which parts of the AI buzz cycle are actually durable components that are going to stick around for the long term and solve problems that we actually have and which ones are the uh here today, gone tomorrow noise that gets wiped away when Anthropic or OpenAI decide that they're going to ship a protocol for. So, that's kind of the world where we live at the moment. Um, yeah, and that's probably more than you wanted to know about my background. That was a good It was a good starting point. Um, I I was
uh I was just letting you cook there for a minute. You kind of you brought us through what you guys are doing and also what uh the history of I feel like AI has been through, you know. Um, so it wasn't uh it was definitely it was definitely useful for a lot of us cuz I I wasn't actually aware of the the actual year when these models started to be you know and it in their earliest infies like 2017 was not a number that came to mind immediately. Um so it's it's pretty incredible when you think about it. We were actually somehow almost eight full years then into generative AI right based on that. Yeah. Yeah, we are. And there's this interesting um point in time in generative AI where where you aren't entirely sure what happened. Um it's called the uh or some people
call it the emergent point. Um it's where the AI started to display emergent behavior. So things that we didn't expect in the AI like it responding with emotion um just started to appear. And the reason that was unexpected is the models got a lot bigger. This actually was a the Da Vinci model was 175 billion parameters from which was about 10 times the size of the next biggest model. Yeah. But in it had been kind of a truism of machine learning up to that date that model size was uh a diminishing return. So you kind of you would see it sort of rising and then flattening out um asmtoically. Um and that was true of computer vision models and it had so far been true of transformers up to that point where the accuracy and the the quality that you get out of them diminished
with the size not not went down but the increase in quality was not uh linearly proportional to the amount of parameters that you had in Yeah, I would imagine. But then it was with GPD 3.5 there was almost this unknown jump in size like when you went all the way up to these huge model suddenly it was a lot better. Like we expected it to be slightly better because you're getting these diminishing returns but instead of being slightly better it was just a lot better. And so no one was quite sure what caused that jump at the time. There's some theories on it now that maybe it had hit space limitations in the AI and lots of words for people saying I don't know but I've got psychological terms that make it sound like I do. Um but then from there there was like this
arms race with make model bigger because model bigger is going to result in just exponentially better performance. That also kind of stopped being true. We're starting to see those sort of as asimtotes again with the giant models like GPT4.5 and I don't know when's GPT5 coming out, open AAI, how big is that going to be? They stop publishing it. But as humans, we've started to see less of the giant leaps that we saw between GPT 3.5 and GPT4, which was a massive leap in uh quality. And then when you look at GPT4 to GPT40, it's a little bit harder to tell. When you skip like five model vent versions and say like GPT40 to maybe 03, they massive leaps, but it's more coming in those like reasoning jumps in between. So yeah, the the history of generative AI is a weird one because I think
as humans we've kind of stopped understanding what's happening in the machine and we're just sort of along for the ride now. Oh, I think that's a totally fair point. I'm not sure if you've seen there's an extra interesting chart out there that actually breaks down uh the quote intellect uh or IQ, right, of the AI. Have you seen the year-over-year change? I haven't seen the year-over-year change. Um, I saw how like it's gone to super intelligent, I think, with O was it 031 or 03? 03. 03, I think, was the first point in where it was. Uh, I I can maybe find it. It's It was on Twitter. Uh, well, whatever. You know, whatever we want to call that platform, right? Um, so saying it was on X just sounds weird, doesn't Yeah, it's just sound on X makes it sound like I'm saying a
variable. Um, so right bunch the bunch of math professors are very math. Now this is interesting. So this is the the increase here and I and I have a question based off of this. So let me let me make this a two-parter. Okay. Yeah. Part one, I find it incredible that we've hit 03 is at 136 Mensa IQ score. Um, that's insane. Uh, that's number one. Number two is last year we were actually in the realm of these like 40 models and and stuff like that and Claude I think was maybe actually leading the charge at one point here was only really at like a 70 IQ like that's a mic drop in my opinion because I was looking at it was 70 to 80 IQ is kind of the range of what was like better models a year ago and I went I looked
at it and just went huh you know, like so what what are you seeing on your side intangibly at the company that is coming out of this leap in intelligence that I feel like to the lay person um I'm in a bubble so I I see it and I work with it every day, but how would you describe that to to the average person like what you're getting out of this incredible improvement? So I think the easiest way to understand why it is hard to see the progress unless you're giving it a discrete and measurable task is imagine you're talking to someone of normal IQ like you're talking to a person in like the 90s to hundreds kind of range. So I know it's not dead on average but you know some someone in that range of course. and you you're asking them questions, you're
conversing, you're doing a lot of what you do with AI at the moment. Hey, help me write this thing and they're writing you content. And then you go talk to someone of mental level IQ and you say, "Hey, come help me write this thing." And they help you write that thing. If the task is sort of not an expert task and it's imprecise, then you're probably not going to perceive that as the giant gap that because you'll get decent results out of both. One might be perfect and the other one will be near perfect, but the gap is small. You only start to notice it when you give it incredibly difficult tasks. And I think what's making us feel like a IQ is the smartest dumb person I know or the dumbest smart person I know is you can give it a task and be
absolutely blown away. For example, in coding, you might say, "I've got this really difficult to optimize thing. Can you go through and find the areas where there's a memory leak?" and then create a new vector map to improve the speed of this. And you're like, "Wow, it that that was phenomenal." But then at the same time, you're like, "Hey, you know, I need you to help me plan a trip to go around. I'm in Austin now. I'm not Sydney anymore, so I don't know the landscapes well, but you know, I'm going to travel to Utah." And it'll go, "Oh, well, the first step is you should just head far north until you get to Col Colorado and then drive north through Col. Like it just gives you like this linear path straight through them." like that was dumb. But it's not that it wasn't a
smart model. It's just sometimes they make these elementary grade mistakes which make us go okay's not there. And the reality is the model can do a lot of the things that we can do. Now it's a scary scarily large amount of the tasks that we can do a model can achieve. And almost all the problems that it's having at the moment is it it lacks feedback loops. It doesn't know when what it's doing is right or wrong because half the time what's right is something that's in our head that we haven't explicitly called out as I want to do exactly this. So we we look at the model and say well that was that was a dumb thing to do when the reality was that we were imprecise in our instruction. So that and it can b like if you're telling it to go a
certain direction it has no walls to bounce off sometimes it just goes straight off the track which we see a lot of in in coding uh one of the common complaints for example of people using aentic coding tools is well I I told it to fix this function it ran the tests the test failed so it deleted the whole class and wrote a new one and now it's like oh the tests are passing but the commit that I've got going into my repo is massive and that's cuz it kind just went a little off track and then started uh because it had no guardrail to say no you can't do that it just kept going. So we do see the significant uplift in performance and it comes at the the boundaries of a task where it is actually truly hard to succeed. Yeah. So just
diving into this a little bit more what I noticed is the exact same thing. So, when I was working in my experience uh just with 03 when it came out, I'm not going to say 01 wasn't a revelation or like 03 mini wasn't a revelation, but the first time I really understood that AI was getting to reasoning levels was when I saw the way that 03 was operating in the back end because it started to give you that preview bubble that was a little bit more like a it was in the background. It was like really churning through thought processes and b the results were incredible. So do you recall the ability that okay for me this ability was a revelation. It had the image that I would give it and it would essentially figure out in five minutes or less where I was in
the world. Mhm. I got like very It was like a weird moment where AI had reached a point to where I was actually kind of concerned, you know, it was it was pretty funny cuz like I I'm Greek so I went to a Greek fest and like my I'm in Chicago so it figured out where I was based off of one picture at a parade and then it figured out um I was inside a church and like a lot of the murals in a lot of these big churches, they're not like that different, right? But in 5 minutes figured out which one I was in in both instances and I was like my head started to scratch and I said it had to do a lot of cross referencing and thinking in order to be accurate in that context. So that was the first time
I saw it really push the limits. Um, then the other thing was I was thinking of a content strategy for like my own LinkedIn or whatever and trying to see how it could fit relative to what I was doing in my day-to-day life as like a business owner. And it was the first time it actually came up with something from a strategic standpoint that was better than I would have thought of like in in that time frame of that uh amount of time, right? I was like kind of stuck on it and generally speaking, I do content for a living so the AI doesn't really help me with like strategy. Mhm. So, you know, like things immediately will pop up much quicker than AI. So, it was it was just it was very it was cool in that sense. The image thing was disconcerting. How
do you feel about this increase in in reasoning and kind of where where it's leading from a uh I don't want to say concerning standpoint or if you have any concerns, but from kind of the context of what I just talked about. Well, I uh I just got a Slack message a minute and a half ago from one of my engineers who's playing with uh Claude 4 and um he said, "This thing's fantastic. I'll see you on a beach in Bondi in 2 years cuz we won't have a job." and and um but at the same time uh I would be lying if I said I wasn't concerned um just because of the rate of change and I I've done a lot of thinking on this because part of working as a leader in AI is you have to give thought to what the societal
impacts are that your are yeah your tooling is going to have because um we're all in the business of accelerating the robot revolution in some way shape or form And so we have to give some thought to what impact is this going to have on society. Um now we may not all act on that consideration of the impact it's going to have on society but you have to think about it. And to me the the most concerning fact is uh we've been here before but in different ways. So everyone says oh industrial revolution all these type of things like those are prior waves. um computation, you know, the accessibility of computers, you know, these type of things. What we miss is each one of those was automation of a specific aspect of life. So the industrial revolution was automation of mechanical process. Yeah. Yeah. Stamping
things um building tools like all that kind of stuff. It was automation of that. Um and because it was primarily physical, it had to be rolled out via a physical distribution channel. It took some time and over that period of time people retrained um and we got used to the fact hey the machines aren't there let's let's kind of ignore the work of a blacksmith because now we've got a stamping machine. That's okay though. What we'll do now is we're going to design tons and tons of better things and we're going to have the machines build them. And then computers came along and the designers that were in papers are like oh no now this um the computers are automation of process. So they're automating a sequence of things that you need to be able to do that aren't necessarily physical. It's like um the
automation of uh procedural calculation. And so we didn't have to calculate what a standard deviation is anymore. We don't need to deal with all the math when we're building a bridge. We can like let the computer do a lot of that work for us. And so we thought, well, that's okay. We'll spend more and more of our time like in thinking and strategizing and telling the computer what to do. Generative AI is the automation of thought. And like the revolutions before, the easiest things are the things that drop fastest. So things like, oh, I want to write a marketing article. It's like, well, that's writing. It's creative writing. That's right in line with what this thing does. So now, now LinkedIn is mostly populated with AI bots writing thought pieces on what AI bots wrote and then replying to them with AI bot summaries. So
that part of it like the automation of thought is slightly troubling because everyone's like, "Oh, we'll find something else to do." Well, we've gave up the doing, we gave up the like procedural calculus. If we give up the thinking, uh where do where does human go from there? And more importantly, is that a skill you want to give up? Is that a skill you can afford to give up? Um, as an example, developers that use, and Microsoft published a paper on this, uh, I think it was in November. Um, developers that use AI coding assistants lose the ability to write, uh, coding functions. Like if you get them and say, "All right, go write a conditional loop to call this method." It takes them time. They have to go back and, you know, Google some of the stuff and pick it up. It's a muscle
that they stopped exercising. And we say that's okay because you don't need to write that code anymore. The AI will write that code for you. You can spend your time thinking about the how do I string the code together and how do I design the approach. In your case, this was like how do I design the strategy for my content uh marketing idea or my content idea. Sorry. What happens when the AI starts to intrude on that level of thought? Like oh well now the strategy is being designed for you by these more advanced reasoning models. The model that scares me the most is 04. Not the Claude 4 or any of these other ones. One that scared me the most is4. Yeah. The reason it scares me the most is because it has the ability to use tools in its thinking process. So the
flaw of the reasoning process is that the entire reasoning process is internal. Now every LLM hallucinates. All an LLM does in fact is hallucinate. It's just that we find some hallucinations useful. So the thought process is a sequence of hallucination that we hope will correct itself towards the correct outcome. But since it can use tools in that hallucinatory thought process, those tools provide grounding. So it can go, "Oh, let me go out to the web and search for this. Let me uh look in the documents library and figure out what we've done before and let me explore the real world while I'm thinking." And that because thinking is the self-correction part of the AI's outcome process. Now that it can use tools in that self-correction process, it's entirely possible that it will start to become complete in its outcome. So rather than you saying, "Go
generate me a content strategy based on these things." You get back kind of the general vague outlines of what a content strategy should be with some specifics in there that may or may not be correct. you start to get a very specific, tailored, grounded content strategy that you're like, "Oh, I this is what I would have written. Maybe it's even better than what I would have written." At that point, I'm starting to go, I don't think we as humans can retrain fast enough for how quickly the AI models are moving. And that worries me a little bit because if we give up the thinking ground, I don't know where we go to from there. Will and philosophers maybe. Yeah. I I I I'm in the same boat with you. I think it's it's interesting because I'm probably very what's the term properly white pill on
it in general as to like outcomes cuz uh for society because I'm not quite sure whether in history we've ever been able to accurately predict the outcome right of uh these things like for example when uh ATMs came out then one anal the one example I always use is that actually there was the decentralization of banks so then we actually got more bank tellers which is funny so our prediction is usually pretty bad on this stuff. So, I'll just hopefully be optimistic, right? Then why are tech companies laying off developers? Well, yeah. Um maybe because the AI is predicting it for them. Maybe uh my term we uh only applied to humans before uh now the reasoning models are doing a better job. But it's it's interesting. I think that what you're pointing out with the thinking is definitely my concern as well cuz I
noticed with with 03 that it was starting to use things in the background and then like you said with 04 it has tools as a base. So when it comes down to it all accessible advice on a public spectrum if it is truly capable of having access to it like I remember mentioning this like it's on record on the podcast like 3 4 months ago. I was like the limitation of these LLMs is actually not necessarily at the moment directly correlative to uh how much data it's trained on. It's the fact that it's limited to its own boundaries of information. Mhm. Right. And once we start having it native to the model like you're saying with 04 to where it has tool access um I'd imagine s you know by the time we get to the '05 realm of things it is essentially a self-learning
situation where it just has the ability to access all resources that are public knowledge and then continuously do that in a loop until it has reached peak marketing guy or pe in a niche you know so I'll tell you what and I you know anthropic and open AI If you're listening, the model that would convince me that it isn't now ready to replace humans more generally is the tool using recursive model because there are two limitations. There are two limitations to AI. One limitation is access to information. The other limitation is context width. How much information it can hold in its mind at any given time. And it's demonstrable that if you give it more context, it becomes more distracted. So a lot of what AI systems engineers, which is realistically what my role is, is I design systems that use AI. We don't train
the AIS anymore because, you know, if we trained 3.7 of Claude, then four comes out, chances are it's better than what we trained into 3.7. So we're just kind of like, don't waste your time there. Um, but the reason that there is still work for AI systems engineers is because the models themselves are kind of unary, if that makes sense. like it it tries to complete the whole task itself and if the task is sufficiently large then it will get distracted along the way it'll go down part of it you'll complete part of the task it'll say I'm done when it's not done we've been able to prove with our agentic um approaches that if you can correctly break those tasks down and systematically contain them so that each one has like a clear definition of input and output everyone has some validation metrics it's
got access to only the suitable tools cuz just giving it all the tools is a really bad idea. It gets kind of distracted as to which ones to use. Um if you go down that path, you can break down big complex problems into uh very very well constructed solutions. It's not perfect with the current models because the linkage between those is externalized. It's not part of the model's reasoning context. It means that it needs humans to kind of guide it between those. But the model that I'm convinced would replace that entire solution is the model that could recurse a task onto itself and say, I'm giving me a task. I'm going to pause my prior context and I'm going to use this new context. If you trained that technique into a model, then it would become exceptional at breaking down work and determining input and
output criteria for that work, which makes a whole AI system redundant. And that model would actually be able to cure its own illness in that it could account for its own context length. I believe at that point it wouldn't actually need to learn because it could just delegate work down to the point where access to the information in the tools and the ability to reason and read would be enough for it to be able to achieve most goals. So that's the model architecture I think is probably going to appear and I think it's happening next year. Wow. And when that happens, you know, seems like we're in the mood for predicting. What do you think? Bondi. Um, what' you say? What did you say? I said beach in Bondi. Um, beach in Bondi. Heck yeah, that'd be fun. Uh, I think um I think we
will find out whether we can pay that bill both in terms of energy and computation cost and I think the only limiter at that point will be uh trust. So okay, we we trust other people more than we should and we certainly trust them more than AI and a part of that trust is because we have mechanisms to recoup loss. So like uh if you are my Uber driver and you run off the road uh then I can get I can recoup that loss from you through the court system. Um I can well assuming I'm still alive otherwise my my family can um if you are an AI and you run off the road then in theory the company that made you is uh responsible for that and I me or my family can recoup that loss from the company that made you. But the
problem is that all these companies are putting uh the self-driving terms into their uh terms and conditions which is the human that is using the AI is responsible for the AI's you know actions. In other words, you have to be behind the wheel. Uh the problem there is we're going to reach the point where we can't actually meaningfully review the work of the AI. We're already kind of there. If you use agentic coding and it's writing code, sometimes it's almost impossible to keep up with what the AI is doing and to review if it's correct or not. So given that fact, it becomes unreasonable to be expected to review it. So either the company just accepts the responsibility for it or we go down one of two paths. One path is we say, "All right, well, you just can't use AI for that use case
because it's too high risk." The other path is we as a society allow for algorithmic error. So I say it's acceptable that the AI is sometimes wrong because it's safer than humans in the same context. So if it's wrong, tough bananas, here you go. I don't think we can go down that path because it opens an avenue for AI developers to effectively abregate responsibility. So I think that the only thing that will be holding back in the year of the recursive model whenever that comes out will be society's willingness to allow it to do its work. Interesting. Okay. Um so then what does this uh necessarily mean I guess in the context of like for you right like with what you guys are doing specific not you individually right? So, we're talking about a lot of It's funny. I don't I feel like we didn't
really spend much time or actually getting decently close to to amount of time on that podcast, but we didn't really talk about you guys, right? Like you as Tricentus, what does this mean for you? The the easiest way to think about it is AI uses tools. AI is the brain. It's not the um the hammer, if you will. So imagine AI like if Google didn't exist, how good would AI be at searching the web? It be complete rubbish because they would have to use Bing and you know who knows what you would get back at that. Complete rubbish. Yeah. Sorry Microsoft where whoever our Microsoft rep is. No, you never have to apologize. A company a company with that many billions of dollars being that bad does not deserve an apology. Yeah, that is true. Um but our our bet in tricentus is we had
the best tools before AI came out for achieving the goal. So AI is just going to make those tools work for more better. Okay. So our whole goal right now is just how do we get those tools in the hands of the most people and the most AI possible like we taking that it's basically taking a uh your tools and you're essentially giving it to a better craftsman. Exactly. Okay. Good. Yeah. And so that's how we view like the future of AI and Tricentus is we are going to we are carving out a niche for ourselves in that we want we want to have an out ofthe-box craftsman for you where you engage with it and you say all right I want you to use the tools for me. I want an agent that is going to, you know, look at a requirement and completely
break it down into exactly the scenarios that need to be run that picks up my enterprise knowledge that can cover every aspect of that and can prove that it can cover every aspect of that. And that's that's a niche for us because when I look at um any work in AI, I split it into two categories. There's interesting things that we need to do to solve our customers problems, but there also things that everyone else needs to do, right? So this could be how do I map knowledge and create a graph out of that knowledge. Like we need this because there's a ton of organizational knowledge that we have to map and graph in order to make this work. But that's not a problem related to testing and related to QA. That's a general problem. So we ignore those problems until a solution appears because
guarantee someone's working on it. The other set of problems are problems that we need to use AI for that are related to testing. So a very good example is like you got a requirement, you got a bunch of requirements. One, how do you break that down in a systematic way so that you can prove you've got good coverage of the requirement? And two, how do you prove that traceability so that someone can eyeball it quickly and go, "Yep, that works for me." and just approve it without having to read everything cuz the moment you make someone read the entire output of the AI, you kill about 3/4 of the potential productivity because we're really slow readers. So everything should be about trying to get away from the reading. Let the AI write, let you edit. And so that traceability is where we see some unique
value out of us in the AI game. But the foundational value, the thing that's going to help us, you know, stay relevant in the long term, is really just that our tools are going to be the best tools on the market because they already are and we're dumping a ton of money into making sure they stay that way. And that's that's advice that I give to sort of the startups out there, the people that are looking at an AI startup, telling them AI won't be your differentiator. We know because you're using the same AI everyone else is. You can't turn that into a different say we're using Claude 4 like everyone. So what is your differentiator? It's either the tools, it's your IP or it's your user experience. Tools is a durable differentiator. IP is a transient differentiator and user experience is a temporary differentiator
because everyone's going to be able to copy that. So that's the advice we give. It's also what our path is in Tricentus. And I don't know if there's a company pitch here. Hey, go visit our website. We do really cool stuff. It's tricentus.com. And occasionally I publish blogs there which are super nerdy related to AI and how we go about building agents and stuff. So, you're welcome. I didn't even have to tell you to plug it. That was awesome. There you go. I'm all over this. No, I really appreciate it. It was it was kind of a fun chat just about the future of AI and I feel like a lot of people probably learned a little bit more about how they necessarily work and and the history of where how we got to where we're at. So, we really do appreciate you for taking
the time here with us today. Uh, like he said, make sure to go to Tricentus this website at tricentus.com in order to check them out. Remember their tools and the improved models are essentially just better craftsman to help their tools work. That being said, thank you so much for listening to this episode of the AI Agents podcast and we'll see you in the next one. Bye. Bye.