Home All Episodes About Official Page Subscribe on YouTube
Episode 13 Feb 06, 2025 13:43 1.5K views

Why ChatGPT Operator will Change AI Agents Forever

About This Episode

ChatGPT's new Operator feature is set to revolutionize the way we interact with AI agents by enabling them to perform tasks on the web with a human-like interface. Leveraging advanced reasoning and vision capabilities powered by GPT-4 and reinforcement learning, this groundbreaking tool seamlessly interacts with a browser—navigating, clicking, typing, and performing tasks in real time.

Whether it's booking a campsite, handling grocery orders, or searching for travel deals, Operator combines intuitive usability with precision. It's not just a tool; it's like having a trusted digital assistant that adapts to your preferences and works collaboratively while keeping you in control.

With features like parallel task execution, custom workflow integration, and personalized preferences, Operator is designed to meet the needs of individuals and businesses alike. While it’s still in its early research phase, the potential is enormous—transforming everything from automated task batching to building entire workflows that were once reserved for manual intervention.

What could you achieve with a tool like this? Share your thoughts, and don't forget to subscribe for updates as we explore this exciting development further!
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
⏰ TIMESTAMPS:
0:00 - Introduction
00:55 - Exploring The Practical Benefits Of Operator
2:26 - Chat GPT Announcement
8:35 - How Operator Handles Custom Preferences
10:10 - ChatGPT Pro's Capabilities
12:13 - Operator’s Limitations And Future Potential
13:14 - Closing Thoughts
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Sign up for free ➡️ https://link.jotform.com/MSOxSDOTcX
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Follow us on:
Twitter ➡️ https://x.com/aiagentspodcast
Instagram ➡️ https://www.instagram.com/aiagentspodcast
TikTok ➡️ https://www.tiktok.com/@aiagentspodcast
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

Transcript

and add all ingredients to uh the grocery cart or v card I think I already have okay so this is incredible by the way I just want to talk about this from a practical standpoint I know the price point of $200 per month is pretty high I'm curious whether they're going to release this on just like the pro or the business model and when I say Pro I mean like the premium just plus I guess is they're $20 a month option $200 a month for a personal test like this probably too high but as you can see this is pretty convenient be like hey I want this recipe can you give me those things in my instacart hi my name is Demitri panichi and I'm a content creator agency owner and AI Enthusiast you're listening to the AI agents podcast brought to you by jot

form and featuring our very own CEO and founder idin tank this is the show where artificial intelligence meets Innovation productivity and the tools shaping the future of work enjoy the show oh my goodness it's finally happening we're going to get AI agents with chy PT so I remember I talked about this with idin on an episode a month or so ago the chat GPT Pro announcement was a little bit underwhelming the cost was really high the advanced nature of what they were giving us wasn't necessarily good however you're finally going to get something that I think is going to be much more worth the money than you can imagine so here we have operator it's a research preview of an agent that can use its own browser to perform tasks for you so they have a preview on their website as you can see see

right here basically uh you can ask it to do something for you and it's going to go through its own browser and perform that actual task so for example it's asking to find a familyfriendly campsite at Joshua tree and what you'll see is it'll follow the steps that you're asking and then you can tell it to actually book something so watch this find me a familyfriendly campsite at Joshua Tree this weekend you can press and enter that and then from there what'll happen is it'll actually go on the internet and attempt to do this for you right so it's going to navigate to hip Camp enter Joshua Tree all these types of things and then what it's going to do is it's going to ask you whether this exact one is a good option and you in this case could say book it and you

can actually book that so CH GPT operator is something that is going to perform I think pretty well let's show you exactly what this uh chat GPT announcement is yes my name is Shu no and I'm a research operator in OPI and what is operator uh operator is a research preview of an agent that uses browser to help fure to do things so I have three-year-old kid who likes a pasta so I uh make a Linkin with Clan so I ask it to buy the groceries for it w I use insta card Tab and operator can actually basically use any websit without and it is not particularly optimized for insta card but the reason why using this app is that Ito by the uh detailed instruction of how this website can be best utilize just like the tutorial that human can benefit from so I'll

use instaart tab and ask it to solve task could you find a uh recipe of Linguini with clamps uh from or recipes website and add or ingredients to uh the grocery card or v card I think I already their $20 a month option $200 a month in my instacart we have um some ingredients like but vegetable oil and oh even omitting some stuff and black pepper so you don't need to add them to the card so it says that I'll find recipe and then add everything in the uh ingredients to the car and okay it says that it'll come Fromm the ingredients and store with me before adding them to the car so let's start this is interesting the agent is actually giving you a human in the loop step I remember we talked about this with the booking in the previous example but this

is important finding the recipe I'm not doing anything from now on the operator just doing and I'm just watching what he's doing what is interesting with the operator is the is it is using browser that are built for the human and it is using the seeing the exactly same screen that I'm seeing right now and using the keyboard typing and mouse clicking to control the Brower just like human would do this is different from other agents that uses uh API or programming based interface which programmers might understand but non programmer users cannot uh understand really well so so this is important to note it's going to be accessible operator because it is using the this natural human interface it's very easy to follow by just looking at what it's doing in the screen can you follow its progress uh yes so one way to following

his progress is I can zoom in to see the screen better and operator is powered by the text based Chain of Thought reasoning so whenever doing things it us says uh it makes plans how things can be done and this can be followed through this uh list of the dos and it says I found a recipe and uh which store would you prefer to use so I'll ask use uh guses so often it as clarifying question whenever it is needed uh is needed in the process of solving the task there are cases that operator have to make sensitive actions like things like logging in or buying things in this case we buil operator to be safe in this situation so operator is designed to okay this is very good the fact that they've made it so that it will not do things unless you give

it confirmation for more sensitive things I I appreciate that so it's giving you a step-by-step breakdown it's giving you an explanation as to what's happening these are all extremely extremely important us to take control to log in by ourselves or whenever you needed checking it oh okay okay okay so we can take over it during the process too that's huge so so it's almost like a mini browser within your chbt double check whether the list is correct and then checking it by myself amaz thank you so much I appreciate you showing it to us this is this is incredible I I I think this is the future of the year um operator is going to be one of many um there's going to be so many more things to talk about it so let's let's break down some of the the ways that this actually

works essentially what it does it says it's powered by a new model called computer using agent CUA combining GPT 4 's Vision capabilities with Advanced reasoning through reinforcement learning CUA is trained to interact with graphical user interfaces the buttons menus and text Fields people see on a screen operator can see through screenshots and interact using all the actions and mouse keyboard alloud with a browser enabling it to take action on the web without requiring custom API integ ation if it encounters challenges or M makes mistakes operator can leverage its reasoning capabilities to self-correct when it gets stuck and needs assistance it simply hands control back to the user ensuring a smooth and collaborative experience so you're going to have to be on you know your computer and interact with it during the steps which I do find interesting but probably will take less time in

a way if you want to step away during when it's finding the groceries or whatnot and go back yeah so it also says here that to get started you can simply describe the task and it'll handle the rest user can choose to take control over the remote browser at any point users can personalize their workflows and Operator by adding custom Integrations either for all Sites or for specific ones such as setting preferences for Airlines on booking.com okay interesting operator lets users save prompts for quick access on the homepage ideal for repeated tests like restocking groceries and instacart similar to using multiple tabs on a browser users can have operator run multiple test simultaneously by creating new conversations like ordering a personalize anamel mug on Etsy while booking a campsite on hip Camp so let's check this video out as well L for New York City

for October 1st to get started um you're going to click on your account and you'll see this websites tab on the side and for me I really like to use Price Line to book my trips and so I'm going to add some custom instruction to Price Line so for me I like to keep my vacation schedules really flexible and so I like fully refundable rates and I also like to make sure I have one meal planned in advance so I like to offer free breakfast so I'm going to type um um always look for like almost your many gpts and agents within each now every time I use the price line the model will have access to these instructions and so I won't have to remind it if my preferences were always factored in in an agent I would definitely be comfortable with using it

for this I'm not usually comfortable with this kind of stuff but I I appreciate that also people we know they're using screen Studio because of that the best so I'm going to go ahead and I'm trying to plan some trips in advance for the year and so I'm going to start um by typing Price Line and I'm trying to plan a trip to New York right now so I'm going to say look at those Customs for New York City for October 1st to October 7th I have no preference for uh bed size and I'm a fired off and now it's going to start searching and keep my preferences in mind and then at this point are you doing anything or is operator hands off I'm not doing anything right now operator is doing everything by itself so I'll actually come back once it's found all

the details and it'll ask me to confirm if the details look correct and then um either I can take over and check out or I can ask the mall to check out and once it actually clicks before it clicks a button It'll ask me to confirm before it does that too well I'm great yeah thank you awesome I think that we're looking at something that is is going to be absolutely uh outstanding um truth be told I I like where this is headed um and for me as somebody who's a tutorial person I think this is going to be really interesting because allegedly chat GPT Pro is going to have uh a lot more capabilities as well it's going to also give you extended access to the Sor video generation um and access to research preview of operator obviously but 01 PR mode basically is

a higher reasoning level you're going to get more limits for video and screen sharing an advanced voice and I I'm just curious to see how I use this right here is open AI chat expands the remote session allowing me to enter in my own details pretty important so I'm going toon option it clicks on the item I mean for the most part this is one of those things where you know there's been RPA it's uh robotic process C automation where you can predefine a set of parameters for it to do and uh it'll click and click and click in the way that you told it to but this is this is next level this is just it being a mini person and the fact that we found out I can do multiple tasks at once you can get all your chores done if you need

to do it virtually you know in such a quick way I mean like for example you can book a trip get groceries and uh plan ahead on you know getting new swim trunks or something like that for a vacation in in a much shorter period of time you're not doing those one at a time time you're not doing them sequentially you're doing them in parallel and uh anyone knows I mean task batching if you're doing things in parallel it's you're getting you know more work done for example if you're working at your desk and you're also getting the laundry done by running in the background two things at once it's very common practice in cooking right you have one thing cooking on the stove one thing in the oven and those things are going and you're not just sitting there waiting you're also cutting something

you're also boiling the water for the pasta this same concept can be utilized in this new operator model so if you're willing to spend the 200 bucks to try it out do I mean why not if if you're somebody who thinks they can benefit from this I I don't see why not and I I I look forward to seeing its capabilities moving forward uh some of the limitations though uh is that it's an early research preview and while it's already capable of handling a wide range of tasks it's still learning evolving and making mistakes so for instance it currently encounters challenges with complex interfaces like creating slideshows or managing calendars early user feedback will play a vital role in enhancing its accuracy reliability and safety what next is that you're going to get CUA in the API so they're planning to expose the model powering

operating CUA in the API so that developers can build their own computer using agents this is going to be incredible imagine having a recuring task that tells it to go scour the internet for for certain things that's amazing they're going to expand operator to plus team and Enterprise users and integrate its capabilities directly into CH GPT in the future once they are more confident in its safety and usability at scale unlocking seamless real time and asynchronous task execution so for me in my company I'm definitely going to be using this having a basically person in my perspective a person that can do stuff like this this is incredible if there is a limit of an API you can train this API to do do the tasks in the interface of the product so then it kind of serves as its own API Bridge that's incredible

so excited for this to go on the team plan so I can try it out if anyone's interested in commenting on this please do leave a comment down below and leave us a review what are your thoughts on operator and what are your thoughts on this podcast thank you so much for watching we'll see you in the next one