Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The 3B vision model runs in the browser (after a 3GB model download). There's a very cool demo of that here: https://huggingface.co/spaces/mistralai/Ministral_3B_WebGPU

Pelicans are OK but not earth-shattering: https://simonwillison.net/2025/Dec/2/introducing-mistral-3/



I'm reading this post and wondering what kind of crazy accessibility tools one could make. I think it's a little off the rails but imagine a tool that describes a web video for a blind user as it happens, not just the speech, but the actual action.


This is not local but Gemini models can process very long videos and provide description with timestamps if asked for.

https://ai.google.dev/gemini-api/docs/video-understanding#tr...


Nor would it be describing things as they happen, but instead needing pre-processing, so in the end, very different :)


> The image depicts and older man...

Ouch




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: