Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My assumption is that there could be higher dimensional 'token' representations that can be used instead of vision tokens (though obviously not human interpretable). I wonder if there is an optimisation for compression that takes the vector space at a specific level in the network to provide the most context for minimal memory space.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: