How Multimodal AI Replaces Keyword Detection for Safety
About This Episode
In this episode of the AI Agents Podcast, we sit down with Ina Jovicic, CEO and founder of Enough, to explore how AI is revolutionizing personal safety.
Ina shares the story behind building a wearable AI “mini bodyguard” that uses multimodal AI—combining audio and video—to detect threats and respond autonomously in real time.
In this video, you’ll learn about:
- How early, rule-based AI compares to today’s reasoning-driven models
- The shift from single-input AI to multimodal understanding (audio and video)
- Why AI agents and reasoning models expand what products can handle
- How edge AI enables faster responses in critical, time-sensitive situations
- The role of hardware and optimization in real-time AI performance
Subscribe to AI Agents Podcast Channel: https://link.jotform.com/subscribe-to-podcast
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Sign up for free ➡️ https://www.jotform.com/
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Follow us on:
Twitter ➡️ https://x.com/aiagentspodcast
Instagram ➡️ https://www.instagram.com/aiagentspodcast
TikTok ➡️ https://www.tiktok.com/@aiagentspodcast
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Transcript
What has the experience kind of been from the standpoint of as AI has evolved? Because you just Huh. >> Mhm. Yeah. Sorry. Sorry. >> As AI No, you're good. As AI has evolved, you just mentioned how, you know, like the AI was the part that maybe you hadn't considered the most. as [snorts] AI has kind of improved, how has that impacted the products and made you adjust your way of thinking too? Because I'm guessing, you know, reasoning battles kind of came out this year. Um, if you're saying you've been doing it for >> the last three-ish years, um, I'm guessing at the beginning AI decently rudimentary in comparison to now from a model standpoint. So, I'm kind of curious how that kind of impacted the product from a functionality standpoint. >> 100%. I think that's been a lot of impact like I feel like our
entire like >> almost AI architecture has improved so so so much because we were just starting with like very simple things like for example like spotting um keywords like from our data set of like things that would >> um say what like that it could be potentially dangerous situations now the option for us to combine that multimodel situation that we have like a video and audio but not just that but we have that AI agent reasoning behind it like that has just improved entirely the scope of everything that AI would be responsible in the product. Um, and that it just adds like it's it adds way we're ready for more complex situations because if you only had let's say audio and you were only purely looking at that there would be a lot of context that you would be missing for that dangerous situation where
you have that agent that sees the audio and the video part and can draw the conclusions. Like I feel like that has been the m like main uh evolution within the AI for for NF. >> Yeah. No, I think that's that's interesting to me because it's kind of you have to probably draw an interesting This is a followup question on the tech, too. Just kind of curious. So, >> it's got to be hard though in the context of what's going on with reasoning models to analyze the situation fast or is it not or how how is that kind of impacted because obviously with reasoning models it it takes a longer period of time. So, I'm kind of curious how you guys are kind of managing to do it in a timely manner. you know, >> I mean, we're doing we're working working a lot on
the edge AI aspect as well. We don't have everything on edge right now, but there's I would say 20% right now that is on edge. So, that's something that we're tapping to uh way more than before. Um, but that obviously that kind of goes hand inhand with how the hardware develops and we're quite reliant on the hardware being really really good to handle um, edgi. But in in terms of like for example the audio and video like so far that has not been an issue or like as in obviously we're at the point that we are going to be doing a lot of testing in the upcoming months. So I feel like that's going to really give us the feedback loop that we're looking for but ultimately like the timing or the reasoning hasn't been something that we've been struggling with. It's more like us
optimizing how much we can move as well on anch because that's way way way faster. So for example, we have uh safety words that you can use right now that would be automatically identified um with our edge AI. So that's for example I I don't know by how much but way faster and in those moments when it really matters that's what we need to be super super quick. >> Yeah. So edge AI is not a term I had really heard until you just said it but is it like a local AI type of processing thing or >> so that would be having the AI ride on the device. So right now we have uh ESIM on the device. So we're communicating with like obviously our with the cloud and that's where the AI sits but we have a small amount that is on on the
edge and making decisions way there way faster but because it's very heavy >> we're doing just a tiny percentage of that. Um so we have a person in our team that is very like experienced in terms of that part particular like angle of edgi. Um, so we're constantly like optimizing and looking how what more we can do there um to make it even faster.