Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The index has the most up-to-date term frequency information, but it is logistically inacessible, and it's not really practical to interrogate it when extracting keywords (as you need this information for 100 billion terms), so a somewhat stale version is kept in memory instead and used in that process.

When searching, doing BM25, it is a lot more accessible as you already fetch that information indirectly as part of looking up the documents lists, and this is typically only done up to about a dozen times per query.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: