Patrick O'Shaughnessy

InferenceJS: Real-time computer vision in your browser

Build AI that runs where your users are—directly in the browser. Max Schridde, Full Stack Engineer at Roboflow, explores dataset curation, training, and deploying with InferenceJS, then showcases a live scavenger hunt game. Scan a QR code, play along, and compete for Roboflow swag while experiencing computer vision on the web.

Published
Published Nov 21, 2025
Uploaded
Uploaded Jun 13, 2026
File type
YouTube
Queried
0

Full transcript

Showing the full transcript for this video.

AI-generated transcript with timestamped sections.

0:06-1:37

[00:06] - My name is Max Schriddy, and I am a full-stack software engineer at RoboFlow. And today, we're going to be diving into RoboFlow inference JS. So how can we get a real-time computer vision application [00:18] and model running in the browser. [00:22] So quick look at the agenda. We're going to do a very brief computer vision overview [00:26] I know there's probably a lot of you that have experience with it, but just [00:30] kind of thrown out some terms that we're going to be using throughout the presentation. [00:33] We'll explore what is RoboFlow, why it came to be, [00:37] and what is the various products and ecosystem that we've created for this space. [00:42] And then finally, we'll dive into RoboFlow Inference JS, jump into some demos. [00:46] And at the end, if we have time, I'm hoping to actually get you guys involved [00:50] with an interactive demo. [00:53] So at RoboFlow, we like to think of computer vision as giving software the sense of sight. [00:59] But it doesn't just stop there, right? We're teaching AI to look out for us. [01:05] There's many different applications for computer vision. [01:09] These are just a few that are listed that we have a lot of clients and customers engaging with, but the list is endless. [01:16] You know, we can be used for defect detection, inventory tracking, [01:20] In the healthcare and medicine space, where CV might actually save the most lives, [01:24] We have a lot of doctors and people in the field talking about how computer vision is almost like a second set of eyes that never blinks, right? So we're using it for medical imaging analysis. [01:34] as well as guiding procedures and training.

1:37-3:11

[01:37] And we also have some more exotic kind of fun use cases. So RoboFlow powers [01:42] Wimbledon Instant Replay. [01:44] Scientists are using it for exoplanet discovery and much more. [01:48] A lot of different use cases here. [01:51] and some exciting stuff. [01:54] Jumping into computer vision-- [01:56] An overview 101, very briefly. This is an iterative process, right? So you have data that you're collecting. [02:03] You're gonna label and annotate that data, and then you're gonna jump into the learning phase, where you're training it, [02:09] And after you train, you move into the test and evaluation phase, and this is where you repeat, right? You're going to often find that there's room for improvement, places where you can add better data set. [02:20] data points or improve your labeling. And once you finally get to the place where you're ready to deploy your model into production, you are able to do that in many different ways on RoboFlow and run inference. [02:33] Particularly today, we're going to focus on running inference in the browser. [02:38] Thank you. [02:39] So where did RoboFlow come from? This is a quote from our CTO. [02:43] But back in 2019, 2020, [02:46] Our CEO and CTO kind of came to this realization that there was not a lack of inspiration in the computer vision space. [02:52] There was just a major gap when it came to tooling and infrastructure in order to bring these computer vision ideas to life. [02:58] And so RoboFlow was formed. And we like to say, as our CTO said here, it's our mission to remove any barriers that might prevent [03:06] any of these developers, computer vision, people that have computer vision ideas from succeeding.

3:13-4:54

[03:13] So here you'll see a couple different examples of computer vision. [03:17] in action out in the wild. [03:19] There's different task types occurring here, detection, instance segmentation. Some of the state of the art models that we at RoboFlow are pushing out, our RF data models, are on display here. And then you'll also see down at the bottom examples of workflows. So we'll dive into workflows a little bit more later. But workflows is what allows you to not only stop at training computer vision models, [03:45] but kind of create a composite pipeline where you have multiple different models [03:48] You're able to expand with logic, custom logic, pre-built blocks that we have. [03:54] the ability to integrate with third-party APIs and applications, and kind of run more analysis, right? So we have tracking here, [04:02] We have street analysis, counting problems, [04:05] and much more. [04:10] Just a couple of high-level KPIs here. I really just want to point out that we're developer-friendly and production-ready, and we really want to emphasize that we are trying to get you as fast as possible from [04:23] development into production. One of the cool stats here, we have more than half of the Fortune 100s building with RoboFlow. [04:29] and many other engineers and developers. [04:34] This is kind of a high level kind of snapshot of some of the products that we offer. [04:38] And we'll dive into each of these a little bit more in the following slides. There's many other products that we have and tools that we offer. We'll dive into one in one of the demos, an exciting new research preview that we have. But at a high level, again, when you think about kind of the computer vision process,

4:54-6:28

[04:54] These are a lot of the tools that we have in order to aid in getting those models to production. [05:00] So, first I want to talk about Universe. Universe is the world's largest collection of open source computer vision data sets and pre-trained models. [05:09] So this is a great entry point into the computer vision space and the RoboFlow ecosystem. [05:14] It's very easy. You can use these data sets and models and push them out to production immediately and start solving real customer problems. [05:21] They can also be used to help you in your AI-assisted annotation, labeling, as you move on to build more custom fine-tuned models. [05:30] I would also say that these can be used as a training checkpoint. So you can actually start [05:38] not from base, but from a universe model, and then continue to fine tune from there. [05:46] Hopping into the annotation phase, we have a full suite of AI assisted annotation tools [05:52] That's really helped augment the human labeling process. [05:55] and even hopefully allow you to fully automate your data labeling pipeline. [05:59] So these are some of the examples. [06:01] You're able to use custom models that you train or that you pull from from universe for label assist and [06:07] You can use some of your favorite foundation models [06:09] for auto labeling. And then we also include in this flow a bunch of team workflows. You're able to kind of control and manage [06:17] who's doing the data labeling, [06:19] who's reviewing, who's approving. You're able to also incorporate a bunch of data set pre-processing and augmentation and continue to grow that data set.

6:28-8:09

[06:28] before you head into training so that you're making sure that you're building a generalizable model and not something that's just memorizing the data that you have. [06:37] Once you get into the training process, we have a full hosted model training infrastructure. Comes with a bunch of GPUs that are ready for you to spin up and train on. [06:46] You can see that we can train on many different task types, many different model sizes. Again, what's really important here is that this is just an iterative process, right? So you start small, and then you train. You continue to run all of the model evaluation, digging into the confusion matrices, [07:02] digging into vector analysis to identify where those gaps are, and not only the data that you have, but if there was mislabeling, and you kind of run this iterative process until you get to those KPIs, maybe it's map scores, [07:14] or other things, and you feel ready to kind of deploy that into production. [07:20] before we get to the actual deployment part this is an example of workflows so i briefly mentioned this before but it's a low code [07:29] interface to help you build pipelines and applications. [07:31] We realize that a lot of the computer vision problems out there don't just end with a computer vision model, right? You need to incorporate different pieces of logic. You want to send notifications, upload data, continue to grow your data set so that you can... [07:45] maybe actively learn on the day to year, collecting and production. And so with workflows, we enable that ability. [07:51] And you're able to also build workflows that have composite models, right? So a kind of common flow is that you use an object detection model to find objects in some video stream, and then you're able to crop and extract, run it through a classification, run some extra custom logic on there.

8:09-9:45

[08:09] maybe ping some of your external API integrations. And so all of that is powered through our workflows engine. [08:18] And then finally, the deploy phase. So our goal is to get you from that development to the deployment phase in minutes. [08:25] A new kind of, I mentioned it earlier, research preview called RoboFlow Rapid that we will be demoing here. [08:31] today, but really we want to get rid of all that cruft in the middle and just get your model out into the world so you can start solving real customer problems. [08:40] as soon as possible. We have various different deployment options. So you can run by default when you train a model. [08:47] you get an endpoint in the RoboFlow cloud that you can start hitting immediately. [08:51] You're also able to run your own dedicated instances in the cloud. [08:56] as well as support for in-device on edge, and, importantly for today, in the browser via inference.js. [09:07] So inference.js, what is it? This is a coworker of mine just giving an example. We're pulling in Microsoft Cocoa, the pre-trained model from Universe. [09:17] kind of a common, well-known data set. And inference.js is just a custom layer that's running on top of TensorFlow.js. [09:26] And again, just enabling real-time inference [09:29] in the browser. So we all know kind of what are the benefits of this, but we really want to promote that magical user experience, fast, in the browser, it's cheap, and also gives the opportunity for privacy with the user if they're concerned about their data.

9:46-11:19

[09:46] Okay. [09:47] Simple code snippet, not going to spend much time here. But you can see, you pretty much just plug in your model ID, the model version you're training with. [09:56] and your publishable API key. [09:58] And then you just get ripping and you're able to run inference in the browser. [10:03] OK, so this is a coding demo. We're doing a little vibe coding here. I was a little lazy. But I'm using Lovable. [10:11] Pretty much giving it some feedback to, or in the prompt, telling it to build a simple computer vision app. There's some complexity here around integration with inference.js. In particular, we're working to improve the documentation there so that wouldn't be needed. [10:24] But you can see Lovable is spinning up a quick computer vision application that's going to be connected to my webcam. [10:33] Again, I'm gonna pull in Microsoft Cocoa here from the universe. [10:37] Start detection. [10:40] Once the model gets loaded, you can see we're detecting me in the stream. You'll also notice that we're not able to successfully detect the iPhone. And a lot of this has to do with [10:52] the Microsoft Cocoa data set when the data was collected and trained. [10:55] So now we're gonna hop over to RoboFlow Rapid [10:59] So this is the new research preview that we're getting ready to release over at RoboFlow. [11:03] This is a video first flow. [11:06] And one of the two main value props is [11:10] gets you from data upload to deployed model in under five minutes. [11:15] We like to think of RoboFlow Rapid as computer vision 2.0.

11:19-12:49

[11:19] So this is AI steering model development, human assisted. [11:24] What that means is we have a lot of the AI label assisting tools, text prompting, you're able to do zero to few shot, object detection, box prompting, and you can see I only have to collect. [11:37] minimal amounts of data points. And once I approve this, we're going to build the model, and in real time, [11:44] actually test the model that you just built on the data that you provided. [11:48] So I sped this up a little bit, but this, you'll have to trust me, took around 30 seconds for it to run on the video. [11:54] And now you can see I have a model that's correctly identifying me as a person and the phone. [11:59] I acknowledge that this is very simple, and it doesn't cover all cases, but for purpose of the demo. [12:05] I'm going to head back over to the application. [12:08] grab my new model ID, [12:11] Thank you. [12:14] And it's version one. [12:16] that [12:17] It takes me a while to copy the API keys here, too, so I apologize. [12:22] I could build a model faster than copying the API key. [12:30] And yeah, once the model loads, you'll see-- [12:33] I am now able to correctly identify the phone. One thing you'll notice is at an angle, it kind of misses it. And so this would be an example of through model eval, I would recognize, OK, I need to go back, provide [12:45] A couple more data points of the side angle of the phone. [12:48] and retrain.

12:51-14:23

[12:51] So, [12:51] OK, so I got two minutes left here, so we're going to try to rip this really quickly. [12:56] This is a scavenger hunt interactive demo. So scan the QR code. [13:00] and we are using the Microsoft Cocoa [13:03] model to power object detection on device. [13:07] I tried to fine tune the classes, so it's just going to be stuff that is around you in the room. [13:13] You're gonna be prompted to log in and then give a yes or no, whether you're okay with the application capturing your images. That's strictly for leaderboard visualization. We're not gonna do anything with that data. [13:26] But you can also say no. [13:29] Um... [13:30] Yeah, so I'm going to give you a minute to play around with this. [13:33] And-- [13:35] At the end, come find me outside. We're gonna see who was engaged, and I got some swag to hand out for it. [13:45] I tried this with my roommates a couple days ago and I thought, [13:48] This would be a fun adult game. You're at a party. [13:52] Ah, Wi-Fi is slow. OK, well, I hope you guys have a chance to try it out outside later. [13:58] Um... [14:00] I would say next steps, call to action here. So this slide was actually made a couple weeks ago, and so it's outdated. Have RF debtor, state of the art, [14:08] uh, [14:09] model for segmentation being released. We actually just released that, so we're very excited about it. [14:14] The last thing is we want to be able to get workflows running in the browser as well. [14:18] So right now, the models can be run in the browser. If you want to integrate with workflows,

14:23-14:36

[14:23] You still have to connect with the cloud, but our team is working actively to try to bring this into the browser as well. [14:32] All right, Groovy, thank you everyone. It was a pleasure being up here and I hope to talk to you outside.

Want to learn more?