Episode 91 Oct 24, 2025 53:37 7.7K views

Voiceflow CEO Braden Ream - Building Real World Voice AI That Actually Works

About This Episode

Braden Ream, CEO of Voiceflow, joins the AI Agents Podcast to share the journey of building a powerful, real-world voice AI platform that truly works.

In this conversation, Braden explains how Voiceflow evolved from early Alexa skills to a full-fledged platform used by Fortune 500 companies to create conversational agents across chat and voice.

He dives into the limitations of traditional NLU systems, the shift toward agentic frameworks, and how LLMs revolutionized the state and response design process in voice tech.

Learn why streaming processes and speech-to-speech models are the next big leap in voice AI, how Voiceflow is navigating the balance between usability and power, and what the future holds for agent builders amid rapid advancements from OpenAI, Anthropic, and others.

If you're interested in the transformative potential of conversational AI for call centers, support automation, and beyond, this episode offers a deep dive into the tools and trends defining the next generation of voice AI.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
⏰ TIMESTAMPS:
0:00 - The Death Of Manual Design
1:22 - How Voiceflow Got Started
4:44 - Voice Versus Chat In AI
6:12 - Transitioning From NLU To LLMs
10:00 - Streaming AI Vs Batch Processing
13:00 - Unlocking Real-Time Voice Agents
20:00 - Serving Customers Of All Sizes
26:00 - Voice AI Model Comparisons
33:00 - The Bundling Of Agent Builders
39:00 - The AI Bubble And Valuation Debate
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Sign up for free ➡️ https://link.jotform.com/KgFhZ4kHi4
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Follow us on:
Twitter ➡️ https://x.com/aiagentspodcast
Instagram ➡️ https://www.instagram.com/aiagentspodcast
TikTok ➡️ https://www.tiktok.com/@aiagentspodcast
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

Transcript

And I was like, that's it. Like manual conversation design's done. All of our revenue, all of our customers, while yeah, it doesn't go away overnight. You know, like I always joke to our team, the fax machine is still a multi-billion dollar market that it's clear that the technology is crusting, [music] right? Like traditional NLU intent driven agents are going to move away and it's going to move into like agentic frameworks and things like that. >> Hi, my name is Dmitri Bonichi and I'm a content creator, agency owner, and AI enthusiast. You're listening to the AI agents podcast brought to you by Jot Form and featuring our very own CEO and founder Idkin Tank. This is the show where artificial intelligence meets innovation, productivity, and the tools shaping the future of work. Enjoy the show. Hey there and welcome back to another episode of the AI

Asians podcast. In this episode, we have the founder of Voice Flow, Braden Ree. How you doing? >> Good. How you doing? >> I'm living the dream. And uh seems like people who want to have the complete chat and voice AI platform are as well when they use voice flow. >> Yeah. So it was corny, but [snorts] yeah, >> it's it's funny. I feel like every founder hates their website in H1. Uh myself included. Uh but you know, some things you got to do for SEO. >> Absolutely. Um, and just to give everyone more of an insight outside of the tagline, tell us a little bit about your background and uh, first of all, how you get into AI and secondarily how voice flow got started and a little bit of the story there. >> Sure. Uh, so voice flow is my second company. Uh, I

had a company in college for three or four years that I was uh, sort of running. Uh, and uh, at the end of the, you know, that was like a social networking thing, kind of like first product, first company, first time building a team, you know, all that good stuff. >> Um, >> and at the end of college, uh, Alexa had just launched and, uh, at least it launched in Canada, been around for a couple years, >> and, uh, that's where I was living at the time. And I started playing around with the SDK, uh, like building like like Alexa apps and Alexa skills and I was like, "Wow, this is like really cool." Like it feels like, you know, it was still pretty rough at the time. Uh but I think it was like clear that assistance and you know what was then called

assistance now called agents uh was really going to be the way that brands interacted with customers in the future but you know through voice through chat through any modality like really sort of being an operating system between a brand and and a customer and so yeah like as I was building these apps like I started to develop that thesis and I was my co-founders had also started other companies we all kind of knew each other from like that and Um it just kind of hit that like wow like all the tools out there to actually build these things are really bad. Like >> yeah, >> they're not friendly to use. Uh those that are friendly to use are not powerful and those that are powerful and not friendly to use. And we're like where's like the like the Figma quality product for this space? Like

you know where is the uh you know something that's collaborative allows my team to work together and build a great agent? Because I think, you know, at that time, uh, people really viewed agents as, um, like chat bots in the bottom right corner or like a call center automation that you hate and like it was like a necessary evil. They weren't really viewed as like products yet. And so I think we were, you know, I don't want to be too grandiose about it, but we were probably one of the first companies that like we raised money to build like an agent as a product. Like the the whole product was an agent. It was an Alexa app. We raised like half a million dollars to go build this like you know probably one of the very few venturebacked like Alexa skills and as we were

building that yeah we just found that the tooling layer was super rough and we ended up building voice as an internal tool through like 2019 and then uh 2020 uh we had a bunch of folks ask like can they start using our tool and that's kind of like the light bulb moment of oh yeah you know what like our app was called story flow like why don't we make like our internal tool which was called voice flow the whole company uh and yeah it's been crazy journey since then. Hundreds of thousands of developers on the platform, over 4,000 customers, some huge Fortune 500s, 100s, small startups, you know, kind of all over the gamut. >> Very cool. And um [snorts] when making the transition into, you know, this this new venture and everything like that. It feels like there's been a lot of improvement over

the last year or so, especially on the on the voice side. And you know I I know when we talked previously that you know your name is Voice Flow and maybe it's kind of interesting how the nature of your approach towards what's the main focus of of your service offering is and how like voice has kind of gotten better really recently. Um but you like state again you were you were founded a couple years ago. So kind of maybe before >> Yeah. before LM like two >> Yeah. before LLM's. Yeah. >> Two years before LLM. Yeah. Or two or three. And um yeah, we like basically back then um the only way to get the agent to be better was through like manual conversation design. Like you're designing more flows, you know, like huge and even today, this is still the way that most enterprises

do it. Um you've got like a spreadsheet and it has intent response. user says this, you say this, and you have a couple thousand rows of all your different like intent couplings and response couplings. Uh, and then you'll have a Visio flowchart that shows how it all ties together. That's how like traditional conversation design is done. The Visio is like your state graph. That's how the conversation flows. And then the like Excel is like what to say. Um, and so voice like replaced all that. It was one, you know, uh, native product that had like, you know, we we're very focused on product design, like a very design driven company. And so, like, it's beautiful. It's easy to use. Uh, but it also like you can both design it, prototype it, launch it, scale it, all in the same platform without having to use Excel

and Vizio. And then typically what would happen is the designers would throw it over to the engineers and then they would actually build the thing. Like that was the old workflow before voice low. So like that's what led to you know our series A and like you know starting to really scale into the enterprise and all that and then LLM's um when that came out it was a very clear like you know I remember like it was yesterday the day that chatbt came out I was I was sitting in our San Francisco office and I started playing with it and I was like that's it like manual conversation design's done uh all of our revenue all of our customers well yeah it doesn't go away overnight you know like I always joke to our team the fax machine is still a multi-billion dollar market. Um

it's that it's clear that the technology is cresting right like traditional NLU intent driven agents are going to move away and it's going to move into like agentic frameworks and things like that. Now at the time we didn't really you know when LM came out we didn't know as a gentic frameworks uh because like 3.5 uh the first chat GBT didn't really know how to do that kind of came around with like four uh 40 and others um but it was clear LMS are the future of at least response design and then now we're getting into even the state design is is a Gent and so yeah it's been a big transition uh we focus on chat for a while to be honest because uh even though we started as voice flow and voice was our prim primary interface. We quickly learned that voice was

just pretty rough back in like 2019, 2020, 2020, you know, 2021. Um, and so we wanted to build the best agents, not just like any agent. And so we actually focused on chat for a long time. Uh, and then only in the past like year and a half has voice gotten like really good and very quickly. And it's like a bunch of things coming together all at once. improvements on the speech to text improvement on the text to speech, speech to speech models, uh token streaming is like a huge unlock for latency, so you don't have to like batch the response. You can like stream a response in as it comes. Like all these things have come together to deliver like really humanlike uh voice experiences. So yeah, that's it's been exciting to see like the original vision for the company kind of start to

come to fruition after a bunch of detours. Uh you know, uh just given where the tech tech was at. >> Absolutely. Just nerding out from from a technical standpoint, explain the difference real quick again on a more deep level from batch processing to the streaming uh you just were saying because I because I' I've heard of what batching is, but I hadn't heard about the >> the new streaming. >> Yeah. So, so a stream is going to be um I'll I'll explain batch first um traditional models how it's going to work is uh a user's uh um utterance which is essentially what they say comes in it first goes to a classification layer. So typically, you know, uh traditionally that was like an NLU and by the way that was the new thing uh you know when when voice really started getting getting off the

ground like NLU models are still all the rage now they're really being replaced but that was like your classification layer that would take what the user said and determine which intent is this. So an intent being kind of like a a grouping of potential phrases that the NLU is trained on. And so it might be like I need a refund. I have a refund. refund, please. Those are three uh what are called um uh sample phrases. Those live inside of an intent that gets trained on the model. That's what helps the agent understand like the only AI and conversational AI preLMs was that layer because then the response and then the actual conversation itself was all human written. Um and so yeah that like you know if if if people want an interesting framework to think about there's kind of three components of a of

an agent there's um the first which is understanding what did the user say like understand two is um decide what to say next like decide the action and then the last is the actual response like the production of the response those are the three components >> conversational I historically like pre-NLU all three human written Then NLU came onto the scene and now the understanding layer was AI, right? So you could use AI to understand what the user said, but then it was still human written action, human written response. When LMS first came out, they didn't immediately replace action. Uh instead it was the response. So it' be AI like NLU to understand what they said followed by humans deciding what the flow is going to be, what state they're in, what where to go next, what to do, and then an LLM would generate the

response like, hey, we're in this state. I'm going to generate, you know, a welcome message or something like that. Agents are now all three of those things. We understand what the user says. We perform the action and we produce a response. And so all three are now in the same layer, and they're all like uh LM based. So that's kind of like the that's the evolution. So that was batch processing. Um batch processing at the very end of it the action all of it gets produced at once. So what would happen is like the classification has to finish first and then it sends it over like it's all like in one big waterfall from like utterance comes in you classify it you send that to the state manager. The state manager decides what to what to do and say and then it sends it back.

streaming like token streaming is as the model is listening to the user like a speech to text model it's producing tokens the state manager or in this case the agent that's like computing it like it's understanding that uh in real time and then producing response right so it's like kind of one continuous data stream there are parts of that that are still going to be batch in between I'm trying to kind of kind of keep it high level but it's not as sequential as it used to be >> kind of sounds like a person's thought process of inhering uh something and processing it as it's happening and then responding. It's kind of creepy. >> Yeah. [laughter] Like the old model would be like if you and I were having this podcast over voice notes, right? >> Exactly. And the new model it, you send it

over to me, I listen to it. I think about say I, you know, send my response back. This is kind of like what modern voice AI is like where we're just having a conversation in real time. H >> And what kind of things has that been able to unlock um from a product standpoint for you? I mean the whole the whole use case we like um as a company we don't ship junk. Um >> Mhm. And like you know I'll give a great example like uh when GPT5 came out um it was having a ton of like outages you know because a lot of people are trying it out you know they're just rolling out this model to production for the first time and so people were like add it into voice add to voice and we looked at it and we're like nah not

yet like let's give it like a week to cook and so we actually had it sitting in a feature flag like we basically it's very easy to add a model you add the model and we just had it sitting there until we felt like it was ready and then we add it. Um, and so when it came to voice, we just weren't doing voice when it was old kind of the old tech cuz it wasn't good, right? Like, you know, what's the point of voice flow, you know, being this like way to build incredible AI customer experiences if the customer, yeah, the workflow is maybe easier to get there, but if the end customer experience is no better, we we don't want to do that. Um, and so with all the new voice AI tech, it's allowed us to like act like like confidently say,

okay, you can do phone calls with voice flow. Uh, you could do like web embeds uh for voice conversations. Like we're now confidently doing all this stuff because we believe it creates a great customer experience. So yeah, it didn't just unlock like a single use case. I think it unlocked really all use cases for us. And then if we get to like specific use cases, anything that's conversational like um a lot of uh uh traditional call centers, the agents like the conversational AI was not actually doing resolution of the conversation. >> It was actually just trying to route you to the right contact uh center queue. So like let's say there's like a refunds queue and a support queue and maybe like a card activation queue if it's a bank. A lot of traditional IVRs, they can't actually do the tasks because it's too frustrating

to do it over the phone. Like, >> you know what I mean? Like you don't want to have to like press one and like it's just it's too much. Um, and so instead, the entire job of the agent was to get you into the right queue because that allows for the Q agent to resolve your your ticket faster because you're in a specific queue, a specific context. Maybe it gathers some information on that call tree to where you get there. But all the agent was meant to do was either to make it a frustrating so that you leave the call and don't even need support. That like that's a real hidden incentive or b um we're trying to uh get as much context as possible so that by the time you get to the queue the human agent is able to resolve things faster for

you. Now with conversational agents because they're so conversational, they're low latency, they have access to tools, um they can figure things out on their own. Now you can actually resolve the whole query. So, like that entire use case is a huge unlock. >> That is a big one. Yeah. And you know, it's funny. I or not funny, but it's interesting. I I took a little bit of a look at your customer stories before the call and I I recognized some of the names on there. I know there was uh Turo and StubHub were a couple of different ones that were on there and it's pretty intriguing kind of what you've been able to do for companies at different sizes and ones that you've kind of heard of. What is the primary customer size as in how big is their company so to speak that you

you feel like you're working with right now? >> It's so voice has a productled growth motion. Um so we don't hide behind like a contact sales uh uh button. You know, I think in our space right now um I'd say there's sort of two different like philosophies. One is um you don't care how the sausage gets made. You just care about the outcome, right? And so in that world, I'm going to have a landing page that has, you know, beautiful case studies. Here's all the outcomes. Here's the ROI statements. Contact sales. And that salesperson is going to spend all their time trying to learn your use case. Uh trying to learn, you know, what you're trying to do. Um and then when they build it, they're going to build it all for you. And typically the because you're buying an outcome, >> the pricing is

going to be different. The pricing is going to be like based off out uh outcome based pricing, price per resolution. Like that's more the customer sports specific example. Um, and so in that world, um, what they're often doing is there's like a very lightweight product, maybe like a dashboard or like, you know, some ability to tweak the prompts and things, but they're doing like four deployed engineering to build the whole thing for you, and then they're charging you based off of a particular outcome. So, that's not voice flow. uh we have the uh a different philosophy which is uh if you can't do it yourself that's a product problem right like instead of uh trying to say hey we'll just do everything for you and charge you based off outcome our whole vision and like the thing we love doing is like helping customers build

it themselves and I think um having been in this space for too long now you know it's I feel like that uh we went from being the new kid on the block to one of the uh OGs um is you I think the space is going to increasingly move to to in-house teams uh because I saw this in the previous wave um with NLU what happened was agencies made a ton of money for deployed engineering made a ton of money and then once the contact center agent or like once the agent becomes seen as critical it's similar to a mobile app you often like if it's a critical enough thing you have a team that works on it. Now maybe you're still using consultancies and implementation partners and things but like once it's critical infra big brands especially really care about customer experience are going

to have a team they're a product owner who's like I own the agent experience I've got you know a small team of uh you know developers designers you know AI folks whatever it might be and so I think in that world those people want to make changes themselves to the customer experience we've heard horror stories from the other side of the of the table where you know they've built out a team because they it's important and in order for them to make a change, they have to submit a support ticket. Uh, which is funny because that support ticket goes to a human for their automated support bot so that that human can go in and make the change uh for them. I just don't think that's the future um personally, but I can see right now when the space is so, you know, hot and

everyone's just trying to get something if you don't have a prompt engineer, you don't have anyone who knows it, it's just easier to just buy a solution, do it all for me. But I don't think that's the long-term trend of the space. Yeah. No, that that's a good point because there are companies where they're hiding behind that um book a call, book a sales call um spot. And you know, it's pretty interesting to me uh how everyone's taking or companies are taking different approaches. Uh I I I really >> I want to kind of get >> really quick. I realized I didn't answer your question. >> Yeah, I was about to say. Yeah. >> Yeah. So I I was explaining the two different models because I think it's actually really important context because we're self-s serve uh and PLG and we have services like you

know we work with enterprises we do the fully deployed thing too or the for >> Yeah. You have an the enterprise button's got to have the contact sales though for everybody that's kind of uniform like SAS thing. Yeah. Yeah. >> Yeah. 100%. So, like we do that like we'll build and maintain agents for companies. Like I I'd like to think we have the best of both worlds because if you want to build it yourself, great. It's way less uh and you have like that full iteration experience. If he wants to build it for you, that's fine. Uh you know, we we do that all the time. We love work doing it with customers. Gets built right, you know, which is always really important. Um but because we have that PLG motion though, we've got companies of every possible size. Um, and it's actually kind of

difficult for us to say we're only going to go for a very specific type of company. You know, we often explain as like fishing. Like if you're a salesled company, you're kind of spear fishing a little bit. Um, you know, like we're going for tuna, right? We are going for companies that are 5,000 to 10,000 employees in um, uh, you know, maybe like regional credit unions or like you know, whatever it might be. Like this is who our our ICP is and they spearfish. Um, and they can do that very successfully because a lot of the pipeline generation, a lot of the brand awareness comes from targeted conferences, from targeted podcasts and outbound sales, right? Where you're going after the same people. Voice like we're not spear fishing. We're throwing a net in the the a net in the water to where we think the

most tuna will be. So if we want that same ICP, that same customer profile, instead of like seeing it and throwing a spear, we're putting a net in the water and we catch whales, we catch tuna, we c, you know, sardines, like the whole thing. And so we've gotten you know, as big as government agencies, like national governments all the way down to like, you know, individual entrepreneurs who are just building uh, you know, building an agent uh, you know, as as their first product. So we'd love to see more clustering, but it's like it's kind of hard with the PLG motion. And then the other fun side is like we're big we have a big community over 10,000 developers in our our voice of Discord and like those folks are going to be of every every size. And I think that there's something cool

because I you know I think enterprises it's not like they can't learn from someone who spent an entire weekend building a really amazing call flow, right? Um I think they can all kind of learn from each other regardless of the size. And if anything, uh, the folks who aren't bound by as much, uh, internal, um, I don't want to say bureaucracy, it's almost a dirty word, but like internal process, they actually might benefit from seeing how folks who don't have that burden are able to uh, to move to give inspiration for possible things they could do. >> Yeah, fair point. Where do you think kind of the you know you've seen some improvements as you just mentioned going from or a little bit ago going from the batch to the you know the streaming flow there. Obviously we've had model improvement after model improvement on

the text side as well. Uh Claude 3.5 sonnet or sorry 4.5 sonnet about two weeks ago or a week and a half ago. >> Time flies I don't know. Uh GPT 5 Pro was just released technically yesterday or today I forget. and um you know it's just going to continue to improve. So where do you think the improvements especially on the voice side are going to come? Um is it just like quicker responses, more realism or is there something else that we might be missing? >> Yeah, it's likely the joining of um TTS and ST. So speechto speech models are doing this, right? Like the the current stack for most voice AI companies is you have a speechto text model um that turns your speech into text. You then process that with an LLM and then it sends that to a uh texttospech model that

turns that back into speech. And you do that as fast as possible with token streaming and uh you know you're trying to remove any latency in between the components um to pro provide the sort of the fastest most fluid experience. Um there's now uh speechtoech models where there is no conversion into text. it is speech in process speech and then uh a speech out uh and so I think OpenAI's real-time API um is aspirationally in that direction uh at least from what I've seen today like there's a a couple simple tests I've done um to see uh how connected are the two um and I think one thing that was interesting with the real talk so I'm going to talk about two different models I think one is true speech to speech I think the other is is a a a much tighter orchestration kind

of being branded in that speech to text or sorry speech to speech model. So with the real-time API from OpenAI um if you ask it to detect your sentiment um it can't now um so for example you say hey match my sentiment and you speak really quiet or you speak really angry like whatever it might be it's not going to match you on the other side. So that's sort of one thought and the other thought is um there's still options to select which speech to text model you want which in in a true speech to speech you wouldn't have that like it would all be just one model and so I think that what suggest what it suggests to me is um they're still doing an orchestrated approach it's not true sort of speech to speech at least not today right at least not what

you know what we've seen on the API in the playground the advanced voice mode might be different because it is incredible uh that is like one of the most incredible products I've ever used which is the um chatbt advanced voice mode. Um that might be different. It actually feels very different. Um but at least what we have on the API today is not that. Um and so that brings us to uh Google and they have I forget the name of the model. I think it's might be Gemini Flash 2.5 or something like that. >> Yeah. 2.5. Yeah. >> Yeah. I'd have to pull up the exact name. I just want to make sure I get the right one. But um that one passes those two tests that I just talked about. Um whereas OpenAI doesn't in terms of like whether it's a true speechtoech model.

Uh if you speak uh in a whispering tone and ask it, it knows and it's able to match that. Uh >> oh wow. >> Yeah, it's it's quite neat. It's blazingly fast. Um >> yeah, it's just it's able to detect things that only a true kind of speech to speech model uh could do. You could probably do this on an orchestrated side, but it'd be a lot of like work to do that versus a speech-to-spech model should be quite native. Now, the downside is we found it's very bad at tool calling. Uh, and I think it's likely because LMS have there's been so much work on tool calling that in order to make a true like voice agent model. Um, you still need to rely on the tool calling abilities of the LM. So this might to be to OpenAI's credit. Yeah, it's not speech

to speech, but it's way better. Like for most, you know what I mean? Like for most actual developers, you don't even need speech to speech. It's already so fast on its own. It's already an incredibly expressive TTS. There's stuff you can do to conjoin them really tightly, but you still get the benefits of an LLM that's really good at tool calling. So like, for example, if I was to give uh if my goal was to give an incredible demo, it's going to wow an executive. I'm going to use Google. Um, if I'm going to do a production application, I'm going to use realtime uh API from OpenAI. Like though, that's kind of the difference. It's like the speech to speech is incredible from like a conversational standpoint. Uh, incredibly fluid, but it's not um it's not capable yet, at least from what we've what we've

been able to evaluate. So, that's kind of yeah, just the I think that's where Voice A is going to go, though. I do think that's the future. Like speech to speech is the future. I just think the tool calling is not very good today. >> Yeah, that's fair. And and that's uh you know I think a good follow-up question will probably be like when do you think maybe the capabilities at mass for that to tool calling will be improved significantly? I don't know. It's It's tough to say. And um yeah, it's really tough to say. And I think the other question that's worth asking is um do we even need speech to speech? Like you know, I think today from what I've seen, it's the future. But there's also a point at which your voice experience is so fast like the latency is so low

that having that flexibility of being able to do orchestration in the middle like to go do other things to have deterministic responses that are not like scripted responses that are not or I think people are calling it directed dialogue now it's like kind of changing the name. Um there's a point at which do you even need a speech to speech model and I think that's another valid question like an open I may have come to that realization. So um my gut tells me that speech to speech is the future of the space but I also might be wrong in that people don't even you just don't even need it. Uh because like or an orchestrated approach has gotten so fast like deepgram launched a model yesterday. It kind of got overshadowed by or actually I think it launched on Friday but it's kind of been

overshadowed by opening eyes dev day which I'm sure we'll chat about. Um and it's blazing fast. It's really really good. And what they've done is they've incorporated more elements of like um it's not just doing uh speech to text. It's essentially um more natively uh working with some of the dialogue without getting too technical like it's optimized for conversation and it performs way better uh than the other models we've seen from them. Um when it comes to actual like dialogue driven voice agents. So who knows? It's uh open question as to when it'll get better and another open question as to whether we'll need it if it does get better. >> You know, [snorts] I I I think it's a fair and humble response to be like, ah, I don't it's hard to say because a lot of this stuff kind of comes out of

nowhere. I know for me today, I just mentioned about you as well just mentioned about the chatbt improvements. Um like the agent builder that just came out today. Um they their SDK was just released too, right? or their agent SDK, I should say. Um, >> yeah, I think I think so. Yeah, like >> yeah, and same with Claude the other week. >> Yeah, we have um we've got a good relationship with the the folks at OpenAI and Anthropic and um we we sort of knew that they were building um agent kit ahead of time, like sort of a couple months before, you know, we were under NDA, now it's public. I can kind of talk about it, but we we knew they were building it. Um, and I think it leads to this really interesting idea of um the commoditization and bundling of agent builders

inside of products. So maybe like you know if I kind of go back um two years ago uh when you know agents are really starting to get off the ground you know still small but starting to get off the ground um you had a lot of like agent building products you know voice was in that bucket where the core product was an agent builder right like that was the goal was to orchestrate you know one point it was prompt chaining and then you get into agent chaining and I think what's happening is um as the models plateau at least in terms of like their exponential like intelligence curve and you know even if they are getting you know better at a rapid rate you also get to the question of do you need like a super PhD level intelligence for some of these questions or

like is a bachelor degree level of intelligence fine you know what I mean like um for some customer support queries you just don't need a super PhD uh and so you know for a lot of use cases it's it's going to be you know pretty comparable so in that world um if the models aren't advancing uh crazy fast anymore and even if they are do you need it? um you kind of get into a weird spot as a model provider around lockin like how do you ensure people continue to use your models right because these are not like recurring you know at the enterprise side you're doing recurring contracts and things but like for a lot of folks it's just purely based off the API and it's so easy to switch like customers in voice flow we have a model garden people switch constantly from

anthropic to open AAI to you know uh meta to like whatever it might be they're just constantly switching different models so how do you create lockin Right. And I think that's where OpenAI really intelligently is like verticalizing, right? they're adding layers that consu like they're they're going into the app layer to ensure usage of the compute layer because I think unlike unlike in the cloud where you know you I think at first people were thinking about open anthropic like uh AWS or GCP um but I think the difference is to move your cloud is so hard it's so much work to do it like relative I should say relative to an API call to to an AI model. Um that like there was a little bit more lock in and then they did continue to build in the apps on top of the cloud, right?

Um and so everything did start to kind of bundle up. I think OpenAI is just doing the same thing. They're running that same playbook where they're bundling things in, but I think they're doing it even faster haste because, you know, unlike the cloud where it's a little bit more friction to move, you know, from AWS to GCP, um it's like two minutes to switch from OpenAI to Enthropic. So they have to go up into that layer. So with that, I think agent builders are becoming essentially a product or sorry a feature of a product versus I think before they were standalone products and I think that's like the trend in the space. Um so like 11 Labs a great example they added their agent builder. It's a way for them to drive more lock in towards the TTS voices towards like their 11 11 Labs

stack um because that's kind of their core business and then they're layering on this like top of funnel of like hey you know don't go use a uh another vendor that has a multitude of TTS voices we'll just give it to you for free or you know for nominal fee just cover hosting whatever it might be cover development and we're going to lock you into our stack. So, I think that's what's happening is is agent builders are becoming bundled into larger uh products. >> Okay. Yeah, that that's a fair point because I've been wondering the same thing and it kind of goes back to a conversation that we had on our pre-screen as well which I find fascinating that you brought up to me which is about um I guess you could call it a bubble, you know, I think it was the term you

might have used. Um there's kind of an interesting position a lot of people are in in regards to funding and and where we sit in the value attribution to companies right now. I believe you made the point that anthropic for example or open AAI is the foundational crutch of a lot of these tools right or Gemini where they're without that model right they wouldn't really have a product now they're able to take the models and utilize it to the best of their effect very specific use cases and and setups but it's kind of like this double attribution of value towards the model, right? Because Enthropic is getting funding, OpenAI is getting all that Microsoft money and then the companies that are utilizing those models to the best effect are kind of getting the funding again in an interesting way. >> Well, could you speak a

little bit more to that and your thought on on the whole the industry as a whole due to that? Yeah, I it's funny like when people, you know, I've seen so much like we're in a bubble talk. Um, and it's funny. I think even in our pre-screen call, I talked about like, you know, I'm I big shareholder in AMD, uh, at least big relative my portfolio and that did really well yesterday, right? You know, with OpenAI announcing a big compute, I think it popped like 30 30% or something. >> 30%'s crazy. >> Dang it. I I'm a big I'm a big Nvidia guy. I've been on the Nvidia. I got on the I thought I was cool for that. Now there's another one I should have jumped on. Dang it. >> Yeah, I think all the semis I think Intel is actually the biggest winner.

I think they're up like 80% or something. But um >> for them >> and yeah know it's it's great. I think the US needs to um needs to have more uh >> there's a whole chat on like national security like having semis. That's a whole that's a whole different thing. But um yeah, it's it's great that the semis are doing well. Um I think so when people talk about a bubble, like I'm so in this space and I don't know about you, but like I don't use Google search anymore. I really don't. I I actually use Google search >> only out of convenience that I use Chrome as my primary browser and it's right there. But I'm never searching. I'm actually just like getting I'm just going it's literally just my my portal to the internet. Like it may as well just be a CLI.

Um, and I think when you look at like my consumption of Claude, like when I kind of look at my consumption of everything personally, and then I look at um the ramp of voice flow and like our customers and what they're doing, the success they're seeing, I don't think it's like a bubble in how a lot of people would define bubbles. So, I think every bubble's kind of different and like a bubble being defined as like overstretched valuations. Um, so I think back in like, you know, the three tech cycles that most people talk about are going to be like the.com crash where the internet was this brand new thing and the valuations got crazy and I think it's because like people didn't understand the business model yet. If you remember there there's this classic like everyone's focused on like eyeballs, right? Like eyeballs, eyeballs,

eyeballs. >> That was all that mattered on like the dot era. And then there was this like mismatch between the expectation that eyeball eyeballs meant revenue customers like it just never converted to revenue and so it ends up you know that bubble pops. So I think that was like a the bubble was really a um the misunderstanding of of tech business models like SAS didn't exist like a lot of these things just didn't exist. So then we get into uh 2021 uh so not even that long ago at this point which is crazy to say out loud. Um I think that was uh the revenue was real, the growth was insane, but it was a misunderstanding of um the growth pull forward that the pandemic caused where all these uh like we saw this at voice our revenue accelerated like crazy because suddenly all these

companies like everyone's working from home uh the stimulus is pumped in the economy like people like everything just went on overdrive and there was this whole concept of like the new normal um and that was like the phrase that kept getting thrown around which is funny because that end up that was not the new normal right we actually went back to the medium back to trend it was just a pull forward of demand uh for folks sort of adopting digital services because they had to right they're all working from home and that like really sort of funneled through the rest of the uh rest of the tech space and so that was that was that bubble like it was it was real revenue real demand um but it was a pull forward in demand and we were expecting that to last for longer than it

did right where we're currently in I think is to kind of what you alluded to is like um value misattribution. Um because you know companies that are quoted as AI native are raising at really high multiples right now. Um which is funny because I would actually if I was a VC I would flip it. If you're AI native you might have really fast growth but it's so hard to pick apart your value from the value of the underlying model. Um and ultimately um that's what reflects in margin. Um and so you've seen a lot of companies like you know uh companies in AI space where insane growth insane but then you're seeing uh cursor is a good example having to change their pricing on on their community to try how do you figure out how to get margin in there because if your value is

so coupled with a model it can be difficult to layer on your own margin on top. Um, and so you've seen companies have negative margins, uh, like unit margins, right? Or, you know, maybe not unit, but like customer, you know, or specific customer margins. Like I think that's really challenging and I think that's where the misattribution is coming in because if a company raises $100 million um, and uh, they then uh, go and spend that hundred million on Anthropic and then Enthropic gets a multiple and that company gets a gets a high multiple. They both have real growth. Enthropic just added 100 million AR. They just added 100 million essentially as a reseller of Enthropic tokens with some value added features on top. Like I think that's actually where the like quoteunquote bubble is is this like inability to segment um like we've always just

assumed tech has like 80 to 70 to 80% margins, right? Um and I just don't think that's true anymore um with a lot of AI native companies. And so yeah, I think I think this can be solved. Um, but I do think that's where some of like the, you know, the overvaluedness comes from. The companies that I think are going to be the strongest uh right now are those who actually were existing companies that layered on AI and their value is very clear. Like it was already very clear before like if HubSpot, you know, it's a CRM company, they add in, you know, an Ask AI, they add in, you know, customer support, uh, you know, uh, powered by AI, like whatever it might Like I feel like the value was already clear and they're layering in these awesome features for customers versus if your

value is so deeply tied to the model itself and if the model went away, you went away. That gets really hard. >> Yeah. No, that's a very good point. And you know, I actually I know some companies that are kind of in that margin range, but I don't actually think it's as common as one would expect. Um nowadays it's like some right and previously cuz it was maybe the company that >> or most of these companies were providing their entirety of their value. Not saying well do you think it's a fair way to put it like you know >> well it's murky that's that's the challenge. >> Yeah that's why it's confus Yeah. >> Yeah. And you know like what is the value of it's kind of like the whole like rapper thing you know we went into a phase of like don't fund rappers

and then the rappers had incredibly fast growth because AI is valuable right like there's no bubble when it comes to the value that customers get from AI um and the you know businesses all that all that good stuff um and so because of the value the growth is real I think the question becomes um how do you price the thing so that your like distinct rapper value like you know rappers used honestly too thinly like everyone's a rapper to some degree you know you could kind of go all the way down the stack until you get to the person like you know mining the silicon right like [laughter] the rapper is just a way to to define a value chain and >> that's fair actually I hadn't considered that that's kind of true >> yeah you can kind of go all the way down right

like um so anyways like the um it's kind of like how thick your rapper is. I think the problem is people are just not like pricing models haven't evolved yet in my opinion. Um to to the place they need to be where you are clearly segmenting your own value. Um and yeah, you might be entirely powered by AI but people can very clearly say like if you had to ask a customer say hey um what is if the customer knows what your value is on top of the model I think you've probably achieved something. Um, but that's obviously tough to do. Uh, but I think that's probably the place that you need to get to to have like some sort of stable um stable margin and and pricing. Even the way you value companies like how much of this is model revenue and how much

of this is like your value added revenue um because you know in the past like our AWS bill does not rise exponentially, right? like it's so marginal um relative to revenue versus LM costs are very real variable costs that scale like scale with your revenue. Yeah, >> it's directly correlative for the most part. It seems like that's something that I I am wondering about too because >> these models are as we're kind of discussing maybe from a valuation standpoint company to company you have anthropic which if you guys haven't looked into how claude code is just a hilarious cash pile burning situation with Anthropic right now until they change I think their pricing soon which they claim they will but there are guys like doing continuous was cla code with sub agents basically on a $200 a month plan doing like $30,000 a month in

coding costs and um that tells you very plainly the value they're providing over the cost for themselves is significant and so they're losing money practically and then the companies that are getting funded that are quote rappers sort of might be even in the same situation that have margin issues. Do you think these companies at some point will just say and when they have that buy in that you were talking about earlier that stickiness where they're going uh to be required that they will adjust their pricing models? >> Yeah, I think you're going to see a lot of financial engineering over the next two years. Um because like you know the goal of every company is to create more shareholder value right you know the mean >> that is it >> and stunks >> how do you create shareholder value um because not all revenues created

equal right your margin matters sure um but then also your repeatability matters like your stickiness because ultimately like a company gets valued off of its future cash flows and so the more predictable your revenue is the more valuable your company is that's why SAS companies get valued more than um you know maybe a consumer tech uh a consumer platform you know growth revenue all that stuff valued equally enterprise SAS is going to get the highest multiple because it's the most predictable it's the most sticky and so it's the easiest to uh understand compounding uh revenues over time and so I think you'll probably see you know everything's kind of pay as you go today um but I think you'll start to see more creative like resurgence of SAS to try to get some of that repeatability back in. Um, so we do a little bit

of this at voice flow where uh we might take a actual hit on absolute revenue but for the sake of a plan. So voice flow runs off what we what we use as credits. It's essentially our like currency that you can spend on whatever compute you want, right? Like text to speech, hosting, like whatever it is, it's it's your credits. Um, you can buy those credits at a uh sort of like rack rate. Um, or if you want to commit to a plan, we're actually going to give you a big discount on those, right? So, we're actually taking less revenue, but we're getting the repeatability, which actually is more valuable. Um, and so like there's there's going to be a lot of financial engineering I think like that like the cloud, you know, uh, pro plan. I wouldn't be surprised if you know you get

your feature plan and then you get your like compute plan on top and that has overages tied in like all this stuff's going to be engineered in a way to um maximize the best highest quality revenue you can. Um but I think right now everyone's just so focused on growth and distribution which is like the right thing to do. Um but I do think you'll see a lot of like pricing models and things change to adopt. >> Okay. Yeah, that's a fair interesting. Yeah, it's it's kind of fun speculating on this type of stuff. Um, and the main one that I've kind of asked people over the last little while, last question to close things out would be where do you think the job market is going to be impacted in the next like two to five years from this uh agentic boom? >> Yeah,

I think um I think the entertainment industry is going to get way bigger like >> really. Yeah. Um, so like at the most basic level of jobs is like sustenance farming like you you know what I mean or like or like nomadic folks like hunting and gathering like that's their job and the goal of the job is to feed yourself and your family like you know um and then what's happened was tech as technology has increased right uh you know better irrigation methods all this kind of stuff um that's allowed us to have uh essentially all that matters is the ability to produce more with less, right? So instead of like every family used to have to do their own sustenance farming to just feed themselves, now instead one family can feed two families and that allows for the other family to specialize. Now maybe

they're going to be the one building the tools. And so like that's the sort of the you know how technology as technology allows for further uh output per unit like that just enables humans to do more things. So, when we look all the way through history, um, that's kind of been the case. Like you and I sitting here on a podcast, like >> sure, >> I don't know if you're full-time, but if you were full-time, >> that just like is unthinkable. Like, you know, 100 to 200 years ago, like influencers, like >> um the pro athletes are getting paid more than ever, right? Like look at the salary caps of the NFL. It's unbelievable how they've grown. you know, the C like Patrick Mahomes $500 million deal. Like go to go ask a pro athlete a 100 years ago inflation adjusted like how insane that

would be. They >> No, you're you're very right. I was just watching um America's Team uh documentary that's on Netflix. And I think Troy Aman's deal in the '9s, I want to say he was paid um a record setting 11.2 million six-year rookie contract and they 50 million 8-year contract extension. Let's say, you know, 50 divided by 8, let's say he's getting like 57 uh sorry uh 60 63 over 10 at the time, >> right? Adjusted for inflation. What's 63 million USD in 1990? Um five would have been 1333 million today, right? Which means that adjusted for inflation, Mahomes' contract versus Troy Aman was 5x. >> Yeah. >> Which is wild. >> And so I think what's causing that is the surplus value has to go somewhere, right? >> Entertainment. >> Yeah. It's going to go to entertainment. It's going to go to just buying

more things. And I think as people have uh, you know, the 5day work week is an invention. It wasn't always this way, right? Like that was Ford Motors. Uh, I believe it was Ford. Uh, where >> Yeah. Right. It was Ford. Yeah. >> Yeah. You know, they quote unquote invented the 5day work week, right? In 2021, there were chats about a 4-day work week. Um, and so as people get more time back, um, what are people going to spend their time on? they're going to spend their time on entertainment. And so that's kind of the first point on like where is the surplus value going to go. But then, you know, people still have to have jobs, right? Um this is where people have found their way like that surplus value has to be spent on something. And so I think that more jobs are

going to be created particularly around entertainment, uh services, uh travel, leisure, all this kind of stuff. Um and then of course there'll be like lots of new like uh sort of white collar blue collar jobs as well. Um we just don't know what they are yet, right? Like um it's hard to predict the future. You know, prompt engineer is a real title now. That was not a title a couple years ago. Um you know, things constantly evolve and we constantly find ways for humans to do work. And so I'm just like I'm an eternal optimist that like humans don't want to sit around all day and do nothing. like they do want to have something that they're passionate about or fulfillment about and ideally you know you find a job that fills that need and if not you need a way to make money like

someone's going to have a job that you know they're building their dream you know they're building a company they'll have a job for you right um and so yeah that's that's what I would hope the you know maybe it's not a dream if it's like an IBM executive or something but you know they have a job for you regardless they'll have some some work to do so yeah I think those two factors like the demand and the supply I think the the supply of jobs will be there because the demand is only going to grow and like it's not like this value gets destroyed, right? Like that value is going to go somewhere and people are going to have jobs to do it. So, you know, maybe it's all just all of us working for Patrick Mahomes one day, but that's still a job. >>

Patrick Mahomes. Yeah. Um definitely going to happen. Uh as a Chiefs fan, I wouldn't mind working for working for Patrick Mahomes, but sorry if you're not. Um I feel like I was Patrick Mahomes was working for the defense last night. Anyways, um so we appreciate having you on the show. It was a really really fun chat. Is there anything else you'd like to say before we close this episode out? >> Uh if you are looking to automate customer support, contact center, outbound calling, inbound calling chat, wherever it might be, reach out to us. Uh voiceow.com. I I had to do a plug. You know, we're talking about five day the invention of the fiveday work week. You know, we got kind of off topic. Use voice flow. That's it. That's my plug. >> Absolutely. Well, I appreciate your time and I hope that everyone will

go check out Voice Flow. Thank you everyone for listening and we'll see you in the next one. Bye. Thanks.

← Episode 92 Try Out ChatGPT’s New Agent Builder! Episode 90 → Building AI Agents That Work Arjun Pillai Docket AI