The 3B vision model runs in the browser (after a 3GB model download). There's a ...

troyvit · 2025-12-02T19:38:37 1764704317

I'm reading this post and wondering what kind of crazy accessibility tools one could make. I think it's a little off the rails but imagine a tool that describes a web video for a blind user as it happens, not just the speech, but the actual action.

GaggiX · 2025-12-02T20:02:21 1764705741

This is not local but Gemini models can process very long videos and provide description with timestamps if asked for.

https://ai.google.dev/gemini-api/docs/video-understanding#tr...

embedding-shape · 2025-12-02T21:30:37 1764711037

Nor would it be describing things as they happen, but instead needing pre-processing, so in the end, very different :)

user_of_the_wek · 2025-12-03T07:35:30 1764747330

> The image depicts and older man...

Ouch