Episode 46 May 06, 2025 46:40 7.6K views

How Sapien io is Changing Fine Tuned LLMs

About This Episode

In this episode of the AI Agents Podcast, we explore how Sapien.io is transforming the way fine-tuned large language models (LLMs) are built.

Demetri Panici interviews Rowan Stone from Sapien to uncover how the company is rethinking AI training data by decentralizing the data collection process.

Rather than relying on centralized data centers, Sapien empowers contributors worldwide to provide high-quality, structured knowledge, helping eliminate biases and enhance model performance.

Rowan details how Sapien’s innovative data-foundry framework and incentive structures enable scalable, high-quality input for LLMs — from autonomous vehicle data to nuanced human insight across vertical specialties.

With a growing network of 560,000+ contributors and clients like Midjourney and UN agencies, Sapien.io is setting a new standard for how enterprise-grade AI models are fine-tuned and optimized in a rapidly evolving tech landscape.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
⏰ TIMESTAMPS:
0:OO - Introduction
1:00 - Introducing Sapien And Rowan
5:19 - Understanding Decentralization In AI
09:20 - Real-World Applications Of Decentralized Data
14:05 - Advice For Startups On Funding Success
16:52 - The User Base of Sapien
18:58 - Building And Scaling An Early-Stage Team
21:44 - Improving Data Quality With Peer Review
25:58 - Agentic Models And Chain Of Reasoning
30:40 - Practical Use of Chain of Reasoning in the Workplace
34:08 - Why Work with Vertical Models
37:42 - The Next Leap In Humanoid AI Robotics
40:58 - The Intelligence Level of AI Models
45:08 - Final Thoughts
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Sign up for free ➡️ https://link.jotform.com/YzeGAspiCC
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Follow us on:
Twitter ➡️ https://x.com/aiagentspodcast
Instagram ➡️ https://www.instagram.com/aiagentspodcast
TikTok ➡️ https://www.tiktok.com/@aiagentspodcast
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

Transcript

And so that's kind of how we settled out and that's the first step that we took towards building what we frame now as an open protocol or like a data foundry. And the reason that we frame it like that is that we're not focused on just one tiny little part of this puzzle anymore. We're no longer aiming to build decentralized scale AI for lack of a better way to frame it. Instead, we want to build the framework that allows that knowledge transfer, allows that data capture, data collection, data structuring to be done by anyone, anywhere for a variety of different enterprise customers who are actively hunting for that kind of data. Hi, my name is Dmitri Bonichi and I'm a content creator, agency owner, and AI enthusiast. You're listening to the AI agents podcast brought to you by Jot Form and featuring our very own

CEO and founder, Idkin Tank. This is the show where artificial intelligence meets innovation, productivity, and the tools shaping the future of work. Enjoy the show. Hello, and welcome back to another episode of the AI Agents Podcast. In this episode, we're going to dive into a conversation that I'm having with Robin Stone, who is with Sapion.io, which is a fine-tuned LLM model company. How you doing, Rowan? Doing good. How are you? I'm doing well. Doing well. I appreciate um you spending the time to have the conversation. There's a lot of interesting stuff you're doing at Sapion. So, just give everyone a little bit of an introduction as to what uh Sapion.io is and how you guys got your start and how you get to where you're at. Happy to. So, we are a data company. It's the really short version of what we do. AI models

obviously need to ingest huge amounts of data in order to become more sophisticated and really just to become more useful for each of us to use. And so our piece of this puzzle is providing that data to help the companies building them build the best possible models. And we do that in a slightly different way to norm. The norm really is kind of like a centralized hub and spoke model. Tons of really cool companies building in this space including people like Scalyi and Appen. And they typically have really large almost warehouse-sized facilities full of people and those people provide their nuanced understanding of the world or they annotate things that they're seeing and they basically just help structure data to make it more useful for these models. We do it a little bit differently. We enable anyone anywhere to meaningfully lean in and to earn

by providing their knowledge or by providing their nuanced understanding of the world around them so that these models can become smarter and more useful for each of us. So, Sapium was started uh about a year and a half ago and it was actually started via a conversation with one of the founders from one of the large data companies and the conversation essentially questioned whether or not it would be possible to build a data labeling company initially. That's kind of how Sapen started in a decentralized model. And to flip that question, it was really more pointed. It was like why are we still doing data capture, data curation, data structuring in this centralized way that's almost in some ways exploitative? You're finding the cheapest possible labor in the world and you're using them to help make some of the most sophisticated technology the world has ever

created better. And so the question was like 2024 or 2023 back then, surely there's a way we can do this better. myself and a few others, I actually got involved a little bit later, but we are people have been building in the kind of onchain world for quite a long time. And so we're very used to figuring out how to solve problems in a distributed or decentralized way. And so for us, we're approaching this problem from that angle. And that's kind of where we thought that we should start exploring. And the initial exploration was can we still maintain super high quality data in this decentralized way? And we experimented with a bunch of different ways to potentially make that work. And we really netted out at aligning incentives in a meaningful way. And so that means that the people that are contributing, they need to

have some skin in the game. And that skin in the game they can earn their way up to and then have it kind of locked and staked if you like. And that's really just collateral against their future work. But having some money on the line isn't enough. They also need to have the reputation linked to their work. And so they're using that reputation as collateral. They're building reputation in the system as time goes on. And ultimately they're able to get access to more sophisticated work to do higher paying work and ultimately be able to do the quality control on their peers. And so that's kind of how we settled out and that's the first step that we took And the the reason that we frame it like hunting for that kind of data. Now, you mentioned decentralized a lot. But I think there's a couple

things there that um if you wouldn't mind doing some 101 with some of us as I know you previously were working at uh Coinbase. I think you you were the co-creator of Bass, correct? Which is like the um uh I'm not basically what I'm trying to get at is I'm not very wellversed in how blockchain works whatsoever. And I would imagine even a lot of us in the AI space aren't either. So, if you could explain what decentralized means um and and you know, maybe a brief introduction to that, I think that would probably be helpful. Yeah, for sure. And I think it's a word that's overused. And so, I'm going to try my best not to overuse it. And I think one of the most important things is to recognize when it's useful. And so, before I explain decentralized, talking specifically about AI training

data, that's what we do. The quality of the training data is by far the most important thing. AI models are garbage in, garbage out. If you feed it sewage, your model's going to hallucinate. It's going to give you all kinds of nonsense answers. Is not going to be useful. But if you can hone a data set and get it to the point where you're very confident there's super high quality, clear, well ststructured data, then your model is going to have really good outputs. One of the issues that we saw in the kind of past few years, I was going to say in the early days, like this is not we've not been going long enough for the early days, but over the past few years, when you centralize the curation of data, you inherently introduce a bunch of different problems and those problems are primarily

biases. And so you could have a geog geographical bias, i.e. you've sourced or outsourced your data needs to a farm of people in the Philippines. Great. Like that works. It's affordable. Downsides, your model's probably going to think like a 20-year-old Filipino man. And like maybe you're building a model for that market and that's your expected outcome. In which case, great. But really, for you to remove geographical or sex or age or any type of bias or slant that could be in your data set, you want to decentralize the sourcing of that data. You want to have a good distribution from lots of different types of people in lots of different parts of the world, different cultural upbringings, different backgrounds, so that you have a rounded view of reality, and you can actually get to the point where you have something that you're confident is the

truth because you're taking in lots of different outputs. And so decentralized to me literally just means rather than operating in a centralized fashion where you have like one entity or one group of people that are calling all of the shots essentially gatekeepers for whatever the world is doing. We instead distribute that power, distribute that curation and collection of data and we pull it in from lots of different places. In the onchain world, decentralization is typically used when you're talking about monetary policy and things like this. And so if we bring it towards things that apply to our everyday lives, a centralized entity is something like a bank. A bank has a ledger and that ledger says Rowan owns $1, Dmitri owns $2. And if Dimmitri wants to send Rowan one, only the bank can update the ledger because only they have access. It's centralized. for

onchain stuff for cryptocurrencies. Part of the lure here is that we create a different system whereby everybody has a copy of the ledger should they so wish and therefore everybody can verify the legitimacy of a transaction and things can work 247 with complete uptime. It's not like oh it's Saturday sorry you can't withdraw your money. It's, oh, it's Saturday. It still takes three or four seconds and you can send it anywhere in the world because we all have control and access in a decentralized way. Interesting. Super relevant for this conversation, but that's typically what people mean when they talk decentralized. No, I I it's it's hard for me, right? because I think, you know, I I know some people in my life who are um are lawyers and uh what they always do and I think it's it's well, they don't always do it, but

they're told to do this in court is defined terms, right? And a lot of times we get into conversations like this about AI uh or, you know, obviously with your background in crypto and people just use, you know, terms and we don't know what they mean necessarily. So, I had to I had to make it clear for the audience because I think it's kind of hard uh to always presume everyone knows what we're talking about. And I guess from from what we're talking about, you know, you're saying that you have this decentralized data set. I think it's great. I obviously we don't want um bias in data sets for something and I agree with you on the fact that it's garbage in garbage out in the AI world. You'll see this even with front-end LLM models. But my question is, you know, you're obviously different

than your basic models out there and your, you know, your catch GBTs, etc. So, what necessarily do you see are the main sectors that you're able to help and maybe give an example of in a specific se sector? Um, if you list a couple industries, that's fine, but in like one of them, what necessarily a use case could be that you can help out more so than these basic, well, there are advanced, but these basic more general models, I would say. Yeah. So, I'm going to start by saying that we're still a very young company and so we've been doing a ton of exploration to try and figure out where we have extra leverage, like what is our superpower. And so, we've done a huge array of different types of data over the the time we've been operating. Everything from audio speech through to medical

imaging, engineering schematics, having professors help models teach kids to do things like math or geography all the way through to much more specialized things um which are a little bit tricky for most people to handle like 3D and 4D data which is really the kind of fuel behind autonomous vehicles. And so companies like Toyota or Amazon Zuks or kind of the most I guess world famous company building in this way is Tesla. Uh they work by having a bunch of cameras and a bunch of sensors capturing data constantly, huge amounts of data and they're driving millions of miles in lots of different places around the world. Now the problem is randomness. They're going to constantly encounter weird situations where like somebody cuts them up or they see road works that they weren't expecting or just something happens, the road layout changes, or perhaps they just

drive to a different part of town they haven't been in before. And so every time they're finding randomness, they need humans. They need people to help essentially teach them what they're looking at, but also how to reason their way through what they're looking at so that they can make a good safe decision and keep their occupants safe and protected. So, one of the superpowers that we found is that our users, we have about 560,000 people that are actively leaning in and providing their time, their knowledge. And we have an oversized number of gamers within the community because in the early days, we realized that this type of work is actually pretty dull in most cases. Annotating or providing uploads or capturing stuff, not fun. We wanted to make it a little bit more entertaining. So, we added some gamified elements. We ended up attracting a

bunch of people from the gaming community. And it just so happens that these people are really good at annotating 3D 4D data, like LAR data for autonomous vehicles. So, we found out that's one of our superpowers. And then the other superpower, because we have 560ish thousand people and we're growing by about 4,000 people per day, capturing data across a bunch of different modalities from a completely distributed group of people is massively valuable to AI companies building different types of models. And so if they need to teach a model how to recognize human handwriting, for example, in lots of different languages, in lots of different styles, we can do that. We can have people jot on pieces of paper all around the world, upload those images, and all of a sudden the company has what they need to start trading their models. And so I think

the short answer is anything that you can really think of is in scope, but we are trying to focus on a couple of key areas where we know that the community has strength and we know that we have the right tooling and the right QAQC processes to make that data high quality. Interesting. Okay. So, you are obviously um in a position where you secured um a round of funding. Congratulations on that. By the way, uh we actually had some people who I'm sure are actually watching this that are early stage um and maybe early stage to the point to where, you know, they just have their idea for their app. What necessarily would you recommend to somebody who's more early stage in order to get to the point where they they achieve said, you know, funding round, you know, well, if you I don't want

you don't have to get into crazy specifics obviously on fine contract details, whatever, but whatever you can give as a as a set of advice for how you've gotten to that point because that's that's a big win for for many. I think the most important thing you can do if you're early stage is really simply taking a step back and listening to your customers. Like you cannot go and build in silo. It sounds so obvious, but like even myself, this is not round number one. This is this is not my first rodeo in the cheesy way that people say, but it's so easy to put blinkers on and be building something and completely ignoring the only people that really matter, your customers. And so the advice that I give myself, I would give to anybody else is like listen to the people that actually matter.

Listen to the people that you want to use your product, your service, whatever it is your business does. Find a way to solve their problems. That's the way you're going to get some traction. and then prove that traction to people that want to invest and seed in early stage companies. And providing you can tell a convincing story and you have tangible traction, you're not going to struggle. It's going to be an easy conversation to raise capital and you're going to know what direction you need to take your business because you're listening to the customers constantly. And that's part of how we've evolved. The early idea of let's build a decentralized scale AI wasn't the right idea. a big chunk of that business is going to end up going to zero quite quickly. Not Scaly Eyes business. They're evolving quick. They're they're superheroes in the data

world. Like they're they're kicking ass in lots of different ways. Obviously, all big companies have their controversies and I'm not going to comment on whatever, but generally speaking, they're up and to the right. They're doing great. So, I'm not going to sit here and say bad things, but what we had in our heads on day one is very different to what we're actually building now. And the evolution of that thinking has been driven by the customers. Every time we win a new contract, a new customer, we learn a little bit more. We get a better understanding of what they need to build their particular model and it moves us forward. It gives us the next few steps and allows us to continue on the journey. I almost What were some of those questions that you um maybe asked, you know, more specifically because I'm trying

to get a grasp of this because you were founded in 23, like the fall of 23. uh terabytes. Yeah. Okay. Yeah, that's a pretty quick I was just saying it's pretty quick strong turnaround for for getting uh and your user base I think you said was a couple hundred thousand if I'm not wrong. Maybe I'm misspeaking. So I need to be a little bit careful talking about customers. We have 25 paying customers and a variety of them we can openly talk about because they're on our website. companies like Midjourney if anyone's loving creative midJourney we help provide data to Midjourney to United Nations to Alibaba to BU uh and a variety of others that I would love to talk about in the autonomous vehicle space that's somewhere we're actively doubling down I'll be a little bit careful in that space for now but you want

to check out game.sapion.io or just sapion.io you'll see some of the customers there and at least in the early days like the questions that we're asking really are going to depend on the problem that you're solving but you need to fully understand the problem space that you're trying to solve for and you need to throw something out into the world and then aim to get as much feedback as you possibly can to figure out where it's working and where it's not working in the the crypto realm which is where I've spent a good chunk of my career career. Sure. Tracking and watching what users do is like a bit of a bad word. People don't want someone watching them. They want to have some level of privacy. Kind of goes against the original cipher punk vibes of of crypto. However, it's super important to understand

how your customers are using your product. It's really important. You need to have a detailed understanding of where they're actually engaging, where they're falling off. You need to have a clear view of little tweaks that you make. Are they making positive funnel adjustments? Are they going down the wrong way? But, uh, yeah, I mean, it's hard to say what questions to ask, but Sure. No, that's that's fair enough. I mean, understanding how they're using it is a good Yeah. is a good parameter. uh when you are you know when I'm looking at your company and thinking about the aspects of everything that goes goes into building a company I'm thinking also about you know what it takes to build a company like this doesn't just require product it also requires people so as you've grown this company obviously you have this uh newfound funding which

obviously is going to help with hiring and whatnot but how how are you able to in the beginning bootstrap or did you need to like what what was that whole process like of of scaling the team? It's really just about finding the right people. It's just a people problem. I say just like it's no big deal. It's a super big deal. I was going to say but being able to bring the right few people in, having a good exec team around you and empowering them to do good work. I think one of the worst things that you can do when you're assembling a team is be breathing down their neck. Like you need to find someone or a group of people that you can actually properly trust to delegate the things that matter to and then you need to literally walk away and give them

the space and the time to go and execute. And that's that's how you scale your own capacity and your abilities. You have to delegate and it's it's super hard. It's something that I have struggled with for many, many years and I continue to struggle with. I think it's one thing that most people, if they're really honest with themselves, do struggle with. Delegating is is not easy. It's hard to trust that something will be done in the way you want it to be done. But so so important, particularly in early stage, and as you start to scale, you start to see the benefits of being able to delegate to people that actually know what they're doing and and get things done. But bootstrapping uh in this particular case was a case of raising some funding and then experimenting with how we manage quality, getting the first

couple of customers in, listening to them, making a bunch of mistakes, like don't get me wrong, tons and tons of mistakes, doing dumb things that don't scale. We're still doing dumb things that don't scale. Quality control, for example, is currently in a centralized model, like a more traditional. We have bunch of people in a room checking the quality of data before it goes out to customers. I'm like, cool, but miles from scalable. It's literally dumb things that could never scale to mass and and are not going to work longer term. But in startup mode, you do the things that don't scale while you build the systems and frameworks that do scale. And so that's exactly what we're doing. Interesting. And then how do you take those transitions from being in uh like for example for like do you have a plan for getting out of

that situation? I'm guessing I'm sure. I mean I'm So by the way I'm a small business owner too. So I I and we produce content and uh primary thing that I'm limited with is I'm the I'm the c on camera guy. So we're working on like writing and stuff to eventually get me to not be on camera. But it's a lot like so I'm I scaled the stuff that can scale like editing team very easy to do that social post. But now I'm really working on this. So I'm just curious as where you're at now, how do you how do you do that thing? You you clear it's it's like a weird cognitive dissonance moment until it's fixed, I'm guessing. Yeah, you need to carefully plan and scope it out. It's it's not an easy new transition to make for us. It's I mean I

might have to go a little bit deep here for a second, but the way that Okay. The way that we need to do this is to give ourselves enough confidence that we can select the right members of the community and elevate their privileges to enable them to start doing quality control. Right now it's a group of 50 to 70 people that we have literally manually recruited and they can't cope with the amount of data that's going through the system. The demand is too high. We would need to double or triple the size of this team. It's not a scalable path to market. And so instead, like I mentioned before, we're going to layer in incentives and disincentives. That's going to help us go from say zero to 100% data accuracy, i.e. we need to QC everything because it could be complete garbage or it could

be amazing and everything in between to having people incentive aligned means that they're much more likely to do good work. They have some money on the line so they're scared that they might lose that. They also have their reputation on the line which directly impacts their earning ability. So again, they're scared to impact that. It means that work, to behave, not to cheat, not to do anything nefarious. And the output, at least their output in testing that needs to be proven in production, is that now we're not QCing zero to 100%. Instead, the band is much narrower. Maybe we're now QCing from like 50 60% up to 100. So like we've knocked down the workload. The second part is now we need to distribute that workload among a much larger group of people. And we do that by allowing the people that build reputation in

the system essentially the highest reputation users in the system to start doing the QC work of the lowest reputation users. And so that way no longer do we need this centralized QC. Instead, everybody is essentially doing peer QC on each other based upon their ranking and reputation. And that's how we really go from one example, we took an order, I won't say from who, but we took an order in relatively recently from a large multinational in the autonomous vehicle space for$2 million US. uh with our current system and current centralized QC, this is easily six plus months of literally making sure the data is good before we ship it. In the new model, we're pretty confident we can get this down to inside a month. And then longer term, we think we could even get this down inside a week. And so that's the order

of magnitude difference in terms of throughput and capacity by moving from things that don't scale to the things that are much more scalable. Interesting. Okay. Yeah, that's that's no, that's a very detailed process. I appreciate that. And what do you think the gains in regards to your product um are that you're going to see when you make that increase in um quality from 60 to 100% for example? It's going to be able to make us more reliably provide the quality of data that we want at significantly less effort, if that makes sense. And so it's a massive efficiency improvement. there a lot less wasted work where people are just doing things that's no longer actually never actually going to be used. Um but ultimately the aim is deliver the best possible quality data to the customers. That's what they need. That's what they pay us

for. Without that we don't exist and so everything is honed to provide that data. Got it. So like for example I mean midjourney right you work with them. Um what does your data necessarily do in the context of them? I mean if you can sell me or if you don't have to I don't know. Can you can you discuss like what you can help with them with or how does that work? Uh I think it's better that I so the problem here just to be very transparent is that data is one of the only areas that these companies can get an edge and so they're very powerful about because if you think about it like the variables in play here are compute. This is like can I throw loads of money at GPUs to make my model train faster better. Yeah. Yeah. Yeah. You can

solve that with money. And then the next problem is like algorithm like how are you actually doing the training again there's some level of edge to be gained there but generally one algo is out everybody's using something similar everybody's doing open source research and so there's not much gain to be had there most companies are looking for an edge by literally finding the highest quality most unique data they can so that their version of whatever model they're building could be a mathematical model it could be an autonomous vehicle driver model, doesn't matter, whatever it is, has something that the other models don't. And so they're all very careful to be talking about what exactly they need done. They certainly don't publicize like, hey, I'm working with providing XYZ. But like at a really high level, yeah, the best way to think about this stuff is

that they're either looking for something really specific like we've talked about 3D 4D liar data to be annotated so the car understands or a lot of the time they're looking for new knowledge. And so it's like chain of thought reasoning. And what I mean by this is again if we go to math as an example a calculator will tell you that 5 + 5 is 10 but it won't explain to you why 5 + 5 is 10. And if like this is a very simplistic example that clearly you don't need the explanation. But when you extrapolate up to something much more sophisticated or even non-math problems, the way that you're training AI to be more sophisticated is by giving it not just the answer to a question, but the reasoning and the understanding of how to get to the answer. So chain of thought is

like 5 + 5 is 10 because x y and z this is how you do it. And then the model becomes smarter and more able to do not just that specific question but lots of others. And so chain of thought reasoning is a big part of the data collection that we do. And another big part is opinion. Humans are always going to be required to point reference into the world and kind of say, I like this better than that. Or it could be much broader survey information. But there's a variety of different kind of data capture things that are based on user preference, based on what we as humans prefer to see, prefer to hear, prefer whatever. So I think that's a really political crappy answer to your question. It gave me something. and it gave me something to work with cuz I I can

kind of bounce off that a little bit with Are you familiar with the um I guess are you familiar with the improvements that were just made with the 03 model by uh JPT? Are they just not in a technical depth but I use it myself and it is okay. Yeah, because my understanding is they even are in the background executing Python like and doing chains of uh reasoning now which originally right we we went from like oh like 3.5 whatever that was called um I think 3.5 then 3.5 turbo was when it got popular to the masses if I'm not mistaken and none of these hit like reasoning level to my understanding until I think 03 mini or until 03 Mini. It was either 03 Mini or 01 came out. And now what we're seeing is they're continuing to improve. But what I what I

think I've noticed and maybe you can correct me on this is the reasoning and chain of thinking is an interesting one because I've worked with this new model seems to be able to do incredible things. Like I I sent it a picture of me. I'm Greek. So I was like look I was like at a Greek Independence Day parade in Chicago and I took a picture and I was like geo guess this pretty much and it figured out where I was and uh it showcased like its reasoning. It really does seem to be getting through that. So that was an interesting thing that you brought up which is that next level. Um, and my question kind of bouncing off of that is as you see these new models improve with all this data, where do you think we this kind of sits in the agentic

realm? Obviously, this podcast is about AI agents and I'm just curious where you think in this year these improvements for these models are going to take the practical use of chain of reasoning into the workplace. And when I mean the workplace, I mean like people being replaced in the workplace now that we're having models actually be able to chain thoughts together. I think it really is the next big unlock and we've been doing it for a while. It's not massively new for models. I think if we if we look at Open AI as an example, uh to be clear, we don't do any work with OpenAI. We actually don't work with any of the general models. We're typically working with enterprise businesses building the vertically specialized models. That's kind of what we do. Um, however, familiar enough to talk it through, OpenAI has a bunch

of different advancements and they're building a bunch of different models and it's almost like each of these models as a Lego block and they're now starting to assemble those. And so you had initially a large language model that was kind of generally pretty smart in a lot of different ways, but it was offline. And then you had a large language model that was pretty smart, but could also search online. And so it became more useful. And then you had a large language model that was pretty smart and it could search online and it could do detailed research. And so it became much more useful. Like they basically extended the amount of time they let it think for or they let it do like a proper hunt. And so rather than using X number of tokens and like a few seconds to answer your question, it's

like why don't you just take a minute or two and go and get all the info you think you need and come back and then answer with this like much larger data set. And that unlocked a ton of extra functionality. And now what you're seeing is a model that's pretty smart, that can hunt online, that can hunt online for an extended period in a kind of research way, and that can apply extended reasoning where it almost questions itself. Like it will find a bunch of stuff, ask itself some questions, find some more stuff, and do a few loops before it really comes back and says, "This is what I think the user wants to see in terms of an answer." And where this is super powerful from an agentic perspective is that where before you may have had to chain together four or five different

super bespoke agents to do really unique pieces of work because ultimately if you tried to give them too much they just fall over. You can now give a more meaty piece of work to one thing or you can have that one thing become the hub and you can have it being almost like your brain like your little CEO of your agent army and it will help figure out how to assimilate the work in lots of different ways. And so I think, let me be clear, I'm not building an army of agents. Like I'm vaguely familiar with what people are doing with them and I've played around a little bit myself and I think it is a hugely interesting part of the market. But for me, the big unlock here for agents is just having that ability to be doing much more complicated tasks as a

single agent rather than as an army. Interesting. Now, you mentioned just now that you're working with more of the vertical model, vertical models, the vertically specialized models, sorry, than the general models. Um, yeah, I find that interesting. I also don't find that on I'm I'm not like too surprised by that, right? Uh, from where you'd probably be sitting at, I guess. Is this something that kind of kind of makes you a differentiator from um some of the other, you know, uh, from the other situations? because you know you saw ChatGBT just released the new uh image um generator right they actually just released the API for that so that's going to be an intriguing situation there going to be so many blog every blog will not have a graphic designer making the graphics and every LinkedIn carousel will be made automatically just wait but um

I'm just interested why necessarily are you working with the more specific uh vertical models and and how do you feel like what you do helps differentiate them so that they can compete with. I mean, CHIGBT's got all the funding. I'm guessing all the data, all the team. The short answer like I'd love to give you some it's strategic yada yada nonsense. It really isn't. It's like that's where the demand is. That's where we can really get traction. And just to kind of take a step back, open AI um have huge data needs, but they have a lot of this capability in house. They realized quite early on that that's the superpower. That's the unlock. They kind of got AI in terms of large language models to the next level of sophistication literally just by throwing industrial scale human labor at the problem. And that's what

allowed them to leaprog Meta, to leaprog Google and all the rest of them. And so they're they're like the first mover here. They've been doing this for a long time. And they have a bunch of their own systems, resources, and partnerships that allow that to work. We're also a tiny little company. Like it's really quite surprising to me that we didn't manage to sign some of the customers that we have. And I'd love to be much more open about some of these customers because some of the recent deals that we've signed are ridiculous in my view. Like year and a half old company with 30ish people. We shouldn't have the right to be in the room with some of these Fortune 100s. But yet here we are. And here we are because they need data and they're craving an edge. And because we've built a

decentsized network, it's not gamechanging in terms of network size. I hope that one day we can build a game-changing network with millions or tens of millions of people. That has to be the aim. But even for now with 560ish thousand people, we have access to knowledge, to expertise, to information that they don't otherwise have access to. And so they're willing to give it a try and they're willing to see if we can provide decent data. And when we do, they double down. Interesting. Okay. Yeah. No, the reason I asked that is because maybe from my own side, I'm just I find it I find it amazing that a company like yourself is able to get this, you know, um, up and off the ground with companies like Midjourney and and the like. The Midjourney is, I'd say, if you're familiar with AI, is actually a

household name. So just I think it's a bit of a an interesting what's the word I'm looking for kind of story to tell that you can be quick to market and get somebody uh in that category. Um you don't need to it doesn't take as much time as one might think if you build the right team. Going back to the other questions I had. So for me, I guess uh the last question would just be if you had to name one intriguing thing that you think will happen in the world of AI in the next calendar year that's on your radar, what would it be? Because every single day, I swear to God, there is something that is new. Like for example, I was thinking to myself about a month ago, oh, I can't wait for the image generation stuff to come out with uh

CHPT or I can't wait for I'm a big automation person. I can't wait for like make.com to introduce AI agents. By the time I finish the sentence, they're out. So, that's just little bits and pieces of it, but at the larger scale, what do you think is going to happen in this year that could be on people's radar already? But what do you think is going to be the main push um moving forward? I think it's less likely to be a kind of generational leap in function. It's much more likely to be a generational leap in modality. Modality. What I mean by that is like arguably we're already at AGI. Like maybe not by every single possible tick box that we could make, but like let's be real. We have supremely intelligent AI systems today that can do very advanced, very sophisticated reasoning across hundreds,

if not thousands of different very specialized areas. And they can do it at a level that's like master's degree, if not PhD level, in a lot of places. To me, that's pretty close to just what how I would define AGI. Like maybe it's not ticking every box, but sure. I think the big unlock that I expect to see in the next year or so is that we're going to go from interacting with AI from a phone, from your computer to interacting with different types of robots. And that for me seems to be the most logical evolution of where we are today. There's a ton of companies building humanoid robots or even different variations of like househel type things. All of these will be powered by AI. All of these will need huge amounts of data in much the same way your autonomous vehicles do. They're

going to walk around. They're going to roll around. They're going to see weird things. And they're going to need help to understand what these things are and how they can safely interact with them. But I see that as being the next phase of how we interact with AI. Um, okay. Aside from that, I think it's going to be application of what we already have strung together to try and solve problems that we haven't been able to crack before. I think that's the most logical next step. But I'm not convinced that needs like a big technological unlock. I think it's more about just pointing at a problem and then just chewing it long enough to get an answer. The bit I'm excited for, because well, my background's blurred, but I've basically got a bookshelf full of sci-fi is like humanoid robots that have real ability to

help or real ability to do things. It sounds a little bit dystopian. It sounds a little bit kind of creepy, but it's the right next unlock and it's going to enable manufacturing of a whole bunch of different things that we've never thought about and can't currently do. Yeah. You know, I I was thinking about it when I looked at it. Um there's uh this graph that I saw. It showcased the improvement in intelligence from literally an IQ standpoint from these models. Last year we were in the majority. Uh actually I don't want to give away what it is. Where do you think we were at on average for like the name brand models last year from an IQ on like the normal IQ 100 is the median sort of thing? Uh last year, man, it's hard to think about what they were like last year.

It feels like every month they get s significantly smarter. Maybe like above medians, like 120 or something last year, something like this. Well, that's actually kind of closer to where we are uh now. We're getting into like the 130ish uh range, which is entertaining to me. So, we there's this graph that I can pull up real quick. Um, but it's it's pretty incredible to me. So, I'll pull this up for you. Just I I'm starting I'm sharing this with the world. I find it entertaining. So, if we take a look at this, right? Take a look at this graph. This is where we're at now, right? This is what we're do dealing with now. So, 4.5 GPT is um you know around average and some of the stuff's still below you know like 01 vision. You can see Gro 3 is a little bit over

03 Mini High and then OpenAI just released these models the other day and obviously Gemini got really up there with the Pro experimental version. But um 03 is almost at the 140 range. The O4 models that are uh mini high and mini are in the 120 range. Last year, let me pull it up. It was from the same account. This gentleman right here was in the same convers he had the same basically when oath he's a premium tester and like kind of does stuff early. So take a look at this chart from 2020. Wow. Big difference. This isn't a year we we've gotten so far and this is why I asked I'm starting I'm going to start asking that question to to the people that come on because I just find it so intriguing. I mean we weren't even at 100 for for all intents

and purposes last year. um Enthropic and uh CHPT were kind of at the the higher end. So, does that give you an interesting perspective on on where things at and how things have changed? Cuz this 03 model dropping was actually insane. It's not really being talked about. Um well, it's being talked about enough actually. I take the back. Yeah, it's surprising. I mean, it I feel like I I used Claude I used Claude a fair fairly decent amount. Feel like it I guess this is a very specific test though. This is what Mensa Norway test or something. This is like a a bit of a beast. But I feel like if I talk to the average IQ 100 person, I think last year most of these models were able to answer much more sophisticated things in a lot more detail than the average 100 IQ

person. So it's it's a hard thing to measure, but it's a really cool visual just to see that step change in overall intelligence. Very cool. Yeah. Yeah. Well, it's interesting because I I ask what I found interesting about models in general, just a short tangent before we close things out, is it actually tends to do math is been something that's improved a lot on, but what's actually been okay at is the first step of philosophical questions because it's just trained on like Hume or Quin or whatever. So, you know, like it just has the information from the from the documents u and the writing. So, that's always something to remember. You pointed this out earlier that humans having to be the first input is a big part of it because people forget it is actually just the parrot at some at some level. It could

be a really good parrot. You know, it could it could know a lot of things, but um these parrots are starting to be pretty smart as well. So, I wanted to call that out and and say that, you know, the data that you're using to help these these more uh vertical platforms, I I find that really intriguing. We haven't dealt with anybody who's been in this space before, so we really appreciate you having having you on the podcast. Is there anything that you'd like to say to close things out to plug before um we let you go? I cannot bail and not plug, so I have to plug. If you have used a model recently and you've asked a question and the answer you got about something that you're deeply involved in and the answer that you got was just okay, it wasn't great. Reason

for that is that we haven't found a way yet to incentivize you to provide your knowledge and your detailed understanding to the people building these models. And so if you want to do that, we've figured out the right incentive framework or what we hope to be the right incentive framework. And you can earn by transferring that knowledge and helping these models get smarter, get more sophisticated. And so the simple way to do it, game.sapion.io. Literally just click earn now. It'll figure out all the complexity for you. You don't need to worry about creating wallets and all this sort of stuff. Sign up with your email. There's a few tasks in the dashboard today. Not very many. Most of the work is being done in private for now until we build out reputation and onboarding in a more sophisticated way, but get involved, earn some cash,

transfer some knowledge.io. Awesome. Well, with that being said, thank you Rowan so much for being on the podcast. We appreciate it. Make sure to go to game sapion.io and we'll see you in the next one. Bye.

← Episode 47 How AI is Revolutionizing Communication with Jacob Bank & Relay.app Episode 45 → The Best AI Search Engines in 2026