Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Can someone explain how to use Prompt Caching with LLAMA 4?


Depends on what front end you use. But for text-generation-webui for example, Prompt Caching is simply a checkbox under the Model tab you can select before you click "load model".


I basically want to interface with llama.cpp via an API from Node.js

What are some of the best coding models that run locally today? Do they have prompt caching support?




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: