Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A header-only C vector database library (github.com/abdimoallim)
78 points by abdimoalim 16 hours ago | hide | past | favorite | 33 comments
 help



As data stores go go this is basically in memory only. The save and load process is manually triggered by the user and the save process isn't crash safe nor does it do any integrity checks.

I also don't think it has any indexes either? So search performance is a function of the number of entries.


I feel like there's two kinds of developers. The ones who shit all over other people's preferences and turn everything into an almost religious discussion, and the ones who prefer to just build stuff.

Get over it. Some people like header only.


Agreed, once you've spent hrs fighting with C build tools under a deadline, it becomes very easy to see why this is beneficial.

Some people may not have known the difference and probably thought it was more akin to a naming convention.

I'm obviously not talking about the people asking "what is it".

It would be a lot better for the community if you directly replied to the objectionable content with a civil response.

No need to be an asshole; we can all discuss things civilly.

Does declaring a function as inline do anything for any modern compiler? I understood that this is basically ignored now and is the compiler makes its own decisions based on what is fastest.

The idea that it does nothing is a persistent myth. Both GCC and Clang heed it although neither treats it as a mandate:

https://tartanllama.xyz/posts/inline-hints/

This library seems to have the annotation on every function, though, so it's possible the author is just following a convention of always using it for functions defined in header files (it'd be required if the functions weren't declared `static`).


One obvious benefit for a header only library is that it suppresses the warning you get when a static function isn't used.

It is not a benefit if you do not get warnings about unused functions. With any proper library, you would also not get warnings for functions that are part of the API that are not used, but you would get warnings about non-exported functions internal to a translation unite that are accidentally not used. This is a good thing.

Kind of. At the end of the day the compiler can do almost anything it wants outside of unrefined behavior, which isn't much of a guard rail.

In reality header only libraries allow for deep inlining, the compiler may optimize very specifically to your code and usage.

The situation is a bit more exaggerated with C++ because of templates, but there is some remaining gains to he had in C alone.


In the world of Kubernetes and languages where a one-liner brings in a graph of 1700 dependencies, and oceans of Yaml, it's suddently important for a C thing to be one file rather than two.

C libraries have advertised "header-only" for a long time, it's because there is no package manager/dependency management so you're literally copying all your dependencies into your project.

This is also why everyone implements their own (buggy) linked-list implementations, etc.

And header-only is more efficient to include and build with than header+source.


I never copied my dependencies into my C project, nor does it usually take more than a couple of seconds to add one.

There's a number of extremely shitty vendor toolchain/IDE combos out there that make adding and managing dependencies unnecessarily painful. Things like only allowing one project to be open at a time, or compiler flags needing to be manually copied to each target.

Now that I'm thinking about it, CMake also isn't particularly good at this the way most people use it.


They are certainly bad vendor toolchain, but I want to push back against the idea that this is a general C problem. But even for the worst toolchains I have seen, dropping in a pair of .c/.h would not have been difficult. So it is still difficult to see how a header-only library makes a lot of sense.

One of the worst I've experienced had a bug where adding too many files would cause intermittent errors. The people affected resorted to header-izing things. Was an off-by-one in how it was constructing arguments to subshells, causing characters to occasionally drop.

But, more commonly I've seen that it's just easier to not need to add C files at all. Add a single include path and you can avoid the annoyances of vendoring dependencies, tracking upstream updates, handling separate linkage, object files, output paths, ABIs, and all the rest. Something like Cargo does all of this for you, which is why people prefer it to calling rustc directly.


Exactly; I can't understand this obsession with header-only C "libraries".

Writing new C code in 2026 is already an artisanal statement, so why not got all the way in making it?

Useful for embedded devices? Crashes, disk updates not important for ephemeral process?

As a non-C programmer, why would "header only" be a good thing?

It's not.

It's a tradeoff people make between ease of integration - just download the .h file into your project folder and #include it in your source file instead of worrying about source build system vs target build system, cross compiling headaches etc...

And compilation times: any time you change any of your source files, your compiler also has to recompile your dependencies. (Assuming you haven't used precompiled headers).


I'm completely ignorant about this, but wouldn't it be possible to compile separately your project to improve compilation times? for instance, if you're using OP's vector library, which is self contained, you could compile that first and just once?

Let's say you need to use a function like:

    int add(int a, int b){
        // Long logic and then this
        return a+b;
    }
Let's say this is your main.c.

    #include "add.h"

    int main(void) {
      return add(5,6);
    }

The preprocessor just copies the contents of add.h into your main.c whenever you're trying to compile main.c. (let's ignore the concept of precompiled headers for now).

What you can instead do is just put the add function declaration in add.h that just tells the compiler that add function takes two integers and returns an integer.

   int add(int a, int b);
You can then put the add function definition in add.c , compile that to an add.o and link it to your main.o at link time to get your final binary - without having to recompile add.o every time you change your main.c.

Precompiled headers: https://maskray.me/blog/2023-07-16-precompiled-headers


Unless you have link time optimization you would lose out on optimization and performance.

The whole thing is essentially a workaround for lack of sufficiently good/easy ways to package code in the ways people want to use it.


It often also means it was written more correctly. There is a bit of an art to designing a header only library and it can strike a different balance between code size and runtime speed optimization.

In strict terms when you place implementation in a .c file you probably want that code to be shared when different things call it, and the compiler will "link" to that same implementation.

When you have a header only library the compiler is free to optimize in more ways specific to your actual use case.


Extremely easy copy paste deployment into projects

C's package management story is unfriendly to say the least. A header only library simplifies it dramatically, and makes it much more straightforward to integrate dependencies into your application.

Header-only C libraries are such an underappreciated pattern for embedding into larger projects. For vector search specifically, having something you can just drop into an existing C/C++ codebase without pulling in a whole database dependency is really appealing. Curious about the indexing strategy — is it brute force or does it support approximate nearest neighbor?

Why to call it a header? Could be just a source file. Including sources is uncommon, but why not? Solid "amalgamation" builds are a thing too.

In the early days of CUDA it was pretty common to just #include all your sources, since linking was such a nightmare.

Would it work to replace the memory store with mmap?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: