Patrick O'Shaughnessy

LiteRT.js, Google’s high performance WebAI runtime

Harness the power of LiteRT on the web! This talk introduces LiteRT.js, Google's new WebAI runtime that runs your custom .tflite models with WebGPU. Matthew Soulanille, ODML Software Engineer, and Chintan Parikh, Product Manager, cover features of the library, its best-in-class performance, how it complements Google's existing Web AI ecosystem, and what's coming next.

Published
Published Nov 21, 2025
Uploaded
Uploaded Jun 13, 2026
File type
YouTube
Queried
0

Full transcript

Showing the full transcript for this video.

AI-generated transcript with timestamped sections.

0:04-1:38

[00:04] hi good morning good afternoon everyone [00:06] We are here to talk to you about LightRTJS, Google's high-performance WebAI runtime. [00:12] My name is Chintan Parikh. I'm a product manager. [00:16] And my name is Matthew Solanil, I'm a software engineer. [00:19] All right, so with that, let's dive in. [00:21] Thank you. [00:23] So as we know, a lot of the open source ML [00:25] Ecosystem generally prefers to use PyTorch. [00:29] That has come to be the more prominent way of running all of these models on device. [00:34] But the journey to do this isn't always that straightforward. [00:37] There's many steps [00:39] as we know it. [00:42] Even when other frameworks like ONIX have tried to simplify this, it's still not that straightforward. [00:48] It's also-- [00:49] critical to remind [00:52] Everyone here is like web is one of Google's core platforms and driving ML innovations on the web is critical for the health of the web ecosystem. [01:02] So to that end, we would like to introduce lightrt.js today. [01:06] This is Google's new runtime for web. [01:11] Lightrt.js, JS ending for JavaScript, [01:15] has many great features. Some of it certainly involves being able to run models on [01:20] several of the popular browsers [01:22] So, [01:22] We've also expanded support for WebGPU, [01:26] WebGPU essentially is going to enable improved performance [01:30] on GPU accelerators on the browsers. And finally, what's gonna be even more interesting is that, this is a shared model format with Android.

1:39-3:17

[01:39] So what you could think about it is this is an effort to offer a cross-platform solution. The models that now run on mobile with RTF Lite format will also be something that you could scale over for web. [01:52] Now it's critical to remember that LIDRT.js [01:56] is built on top of light RT. [01:59] LIDART is our runtime that provides tools for model conversion, optimization, enabling efficient deployment, [02:09] on edge devices. We've certainly scaled to offer model coverage from TensorFlow, PyTorch, Keras, and Jaxx. And in terms of efficient deployment, we are also expanding coverage to GPU and CPU support and a lot of this and more to come. [02:25] So here's what we were going to focus a lot of the talk today. [02:30] is the challenge that we have talked about and how we can really use LightRTJS [02:35] to simplify this, [02:37] for running ML on the web. [02:40] I'm going to hand it off to Matthew. [02:42] to tell you more of how this works. Thanks, Chin-chan. [02:45] So to start off, let's go over a bit of the architecture of lightRTJS. [02:50] So, starting from just your normal web application written in maybe JavaScript or TypeScript, or really any language that can call into JavaScript, [02:59] We've got lightrt.js, [03:01] as essentially just an npm package that you would import as any normal JavaScript library, [03:07] This still written in JavaScript will call into the LightRT library, which itself is written in C++, but we've compiled it with WebAssembly to run in the web.

3:17-4:59

[03:17] So, LightRT itself has a few different accelerators. We've got CPU acceleration with multi-threading via X and N-Pack. [03:24] Google's high-performance LIMPAC library. [03:26] We've also got WebGPU acceleration via ML drift, [03:31] And these both call into the standard browser [03:34] back ends that you'd expect. We're also working on adding a new NPU accelerator in the future. So to talk a little bit more about the GPU accelerator, this is actually the same web GPU accelerator that MediaPipe uses. You've definitely seen this earlier today in Jason's presentation. But yeah, this really lets us get high performance real-time pipelines. [04:00] running in my RTJS. [04:03] So, [04:04] Um... [04:05] I promised PyTorch conversion. Let's talk a bit about how that works. [04:09] So, [04:11] It's a multi-step process, but really this first step is the most important. And this is where we just sort of-- [04:18] convert the first--we actually really convert the model here. [04:21] We'll use Google's AI Edge Torch package, which is part of our suite of tools for converting from PyTorch and working with LightRT. [04:31] I'm [04:32] So to do this, [04:33] We'll start with the depth anything V2 model that Hugging Face actually has on Hugging Face. [04:40] download this just using their great API for pulling models from their platform. [04:46] We're gonna wrap this in a small wrapper, just to sort of control what the inputs and outputs look like. Here, we're assigning the pixel values as the inputs, and then we're going to only pull the predicted depth output of the model,

4:59-6:30

[04:59] since we don't really care about any of the other outputs. [05:02] After we have this small wrapper, [05:04] we can pass this to the AI Edge Torch converter. If you've ever converted Onyx models, this should look pretty similar. We've got a sort of random input sample that we then pass to the model as we're tracing it. [05:16] and then we can save it as a tflite file. [05:19] Now, this tflite file, which you can visualize in Model Explorer if you want, [05:24] is kind of big. It's like almost 100 megabytes. This is maybe more than you'd want to serve a user on a web page. [05:32] But we can make that smaller using the next tool in this process. [05:36] the AI Edge Quantizer. [05:38] Quantization is a form of model compression where you can take a large model and make it a lot smaller to--to make it a lot smaller. [05:45] make it a lot easier to load on [05:48] small platforms like web. [05:50] or [05:50] more constrained platforms like web. And this also helps with memory usage. So to do that, [05:58] We will first... [05:59] import AI Edge Quantizer, use it to load the depth anything model that we previously saved as a tflite file. [06:05] We can choose a quantization recipe. In this case, we'll use the dynamic weights at int eight, [06:11] and then activations.float32 recipe. [06:14] Although the AI Edge Quantizer comes with a lot of different recipes you can choose from. [06:18] And you can make your own if you want one that really perfectly fits your model. [06:22] After that, we'll just export the model to a new tflite file. [06:26] And we see it's a lot smaller now, at only about 27 megabytes.

6:30-8:02

[06:30] much more reasonable to serve on web. [06:32] So, with that done, we can now test the model. [06:37] For this, we'll use the LightRTJS model tester. [06:41] And if you run this npx command, you can actually run this locally on your computer right now. [06:46] This we will use [06:48] by uploading that model to the local webpage that it runs on. [06:52] And we can see that [06:55] We're running on CPU at about 550 milliseconds, but GPU actually failed. [07:00] So this is a very good way to make sure that what you've [07:05] converted actually works. [07:07] But for our purposes, at least for this demo, the CPU performance is good enough. [07:12] um, [07:13] So once we verify that the model works well enough for our purposes, [07:17] we can take the final step of actually running it with lightrtjs on the web. This package here, lightrtjs core, is the main entry point to running models in lightrtjs. [07:29] And if we grab that model we had previously converted and quantized right there. [07:34] We can load it with lightrtjs, specify WASM as the accelerator in this case, since WebGPU happened to fail. [07:40] And then [07:42] create an input tensor for it. This you would scrape from like a webcam input or something, and then run the model. We get our depth tensor output as we sort of [07:53] arranged in the model setup when we converted it. This sort of matches the [07:58] the Python wrapper clash that we created earlier.

8:02-9:35

[08:02] And we can see now, with this demo, if we-- [08:06] Upload an image, click Run, we get a pretty good depth estimation from this local model [08:11] And it takes in about half a second. [08:14] So, [08:15] What if you don't have a model in PyTorch that you want to run? Well, there's a lot of models already converted to LightRT, given that it started as an Android runtime. [08:26] So you can find a bunch of models on Kaggle and Hugging Face, if you search for the LightRT runtime in the model filter. [08:34] And, [08:35] I have tested a fair number of these. [08:38] But you can definitely feel free to try [08:41] your own models out on it as well. And these are all just run in the same model tester I was showing before. [08:47] Let's focus on this real ESSERGAN model. This is an image upscaling model that takes 128 by 128 patches and upscales them to 512 by 512. [08:59] And we can actually just take this model from Hugging Face, [09:02] write a demo around it. [09:05] bit more complicated than just that sentence. But yeah, we can see we're getting [09:10] Pretty good performance on CPU with multi-threaded. [09:14] and much better performance on WebGPU. So of course, [09:18] when WebGPU works. [09:19] It's definitely the choice for these larger models. [09:22] Now, I said this wasn't quite as simple as just [09:26] writing the demo. [09:28] A little bit more complicated in this case. We have to cut the image up. [09:32] off-scale each individual piece and put it back together.

9:35-11:02

[09:35] But if you actually already have a pipeline that you like, that you've written in TensorFlow.js, [09:41] you can reuse that pipeline with LiteRTJS. [09:44] So for this example, we've got the media pipe hand pose estimation pipeline. [09:49] Written in TensorFlow.js. [09:51] And here's sort of a simple overview of how that would work. But you can see we're loading the TF model, or the TensorFlow.js model here from like my model. And then we're also running like model.execute. This is sort of how you would normally run it in TensorFlow.js. [10:08] if we use the lightRTJS, TFJS interop package, we can swap that model load and that model inference out with these two lines. [10:18] And we'll also have to swap the model to the TF Lite version. [10:21] But by just swapping those two pieces, we can call this model with TFJS input tensors [10:26] And we can get [10:27] the TFJS output tensors that you'd expect [10:30] even though we're running the model in light RT. [10:32] And this lets us reuse that sort of more complicated logic you might already have written in your TFJS pipelines. [10:39] Um... [10:39] And yeah, we can see that here. Here it is running in TFJS. [10:43] And we can swap to lightRTJS. It takes a second to load. And yep, still running, same pipeline. All we had to do is swap that model out and swap a few lines of code. [10:54] So at this point, [10:56] I will hand it back to Chin-ton to talk about what's coming next. [11:01] Thank you, Matthew.

11:08-12:39

[11:08] That was exciting, and we are certainly thrilled to see what you would do with this. So let's also kind of talk about what we're working on for LiDAR-TJS and what's coming next. [11:17] uh [11:17] We really wanted to emphasize that lightrt.js is certainly a big area of focus. [11:24] We are certainly committed [11:25] to future innovations and also ensuring we are able to add more capabilities that can offer more useful experiences and use cases to the end users. So with that, we are certainly taking first steps to expand WebGPU model support, and that's going to happen in the months to come. That's going to also follow with enabling support for WebNN. So that's essentially going to help us even unlock doors for many other applications with this WebNN support. [11:54] Another aspect is that if now that we've seen the awesome demos that Matthew has just shown, [12:01] If you're looking to try it out, it might just take you five minutes. You're free to download a model from the HuggingPage phase. There's also a web model tester to help you test out the model and decide, you know, what might be the best way or the best compute platform. [12:15] to run this model. [12:16] And with that, I just want to thank you all for your attention, for your time. We also have a QR code up here. [12:24] So you're welcome to scan here, and this has links to documentation, ability to connect with us, and also sharing feedback, and we also welcome input for any of your future use cases. So with that, thank you very much. [12:37] for your time. [12:38] Thanks.

Want to learn more?