More

ofermend · on March 26, 2025

This model is quite impressive. Not just useful for math/research with great reasoning, it also maintained a very low hallucination rate of 1.1% on Vectara Hallucination Leaderboard: https://github.com/vectara/hallucination-leaderboard

ofermend · on March 25, 2025

It is common these days to see in large companies multiple teams developing isolated RAG applications. This is similar to the problem of "Shadow IT" back in the early cloud era - causes a big headache to IT teams.

I work at Vectara, and we see this all the time. Wondering how others are experiencing this?

ofermend · on Jan 28, 2025

DeepSeek-R1 is an amazing reasoning LLM, but it seems to hallucinate more than we might expect.

ofermend · on Dec 11, 2024

Gemini-2.0-Flash does extremely well on the Hallucination Evaluation Leaderboard, at 1.3% hallucination rate https://github.com/vectara/hallucination-leaderboard

refulgentis · on Dec 11, 2024

Fascinating, thanks for calling that out: I found 1.0 promising in practice, but with hallucination problems. Then I saw it had gotten 57% of questions wrong on open book true/false and I wrote it off completely - no reason to switch to it for speed and cost if it's just a random generator. That's a great outcome.

jug · on Dec 12, 2024

Speaking of which, I wonder how they'd do on SimpleQA. OpenAI is an outlier there in the negative sense vs Anthropic. This benchmark also deals with hallucination and "inappropriate certainty".

ofermend · on Nov 27, 2024

We've done a study (see link) that shows that - unlike common belief - semantic chunking is not always the best approach.

Curious to hear from the YC community - anyone else did systemic testing and if so what did you find?

ofermend · on Oct 24, 2024

Check out Granite 3.0 on the hallucination leaderboard: https://github.com/vectara/hallucination-leaderboard

ofermend · on Oct 23, 2024

We recently launched UDF reranking as part of the RAG stack, and we think this supports a lot of interesting use-cases to go beyond simple relevance. For example, it supports ranking by distance (geo-location), by recency, and more.

I wanted to ask advice from the HN community: what are some real use-cases you have that can benefit from UDF reranking in RAG?

ofermend · on Oct 15, 2024

I remember the Magnus/Niemann controversy from 2023 - that was quite a drama... https://en.wikipedia.org/wiki/Carlsen%E2%80%93Niemann_contro...

ofermend · on Sept 27, 2024

Great release. Models just added to Hallucination Leaderboard: https://github.com/vectara/hallucination-leaderboard.

TL;DR: * 90B-Vision: 4.3% hallucination rate * 11B-Vision: 5.5% hallucination rate

ofermend · on Sept 25, 2024

About a year ago we launched in partnership with the Airbyte team the Vectara Destination, to help developers accelerate Generative AI applications - congrats on the Airbyte team on this great launch and looking forward to 2.0