It’s a very nice and detailed benchmark suite! Great effort! Can you please shar...

burntsushi · 2025-09-23T14:03:55 1758636235

i9-12900K, x86-64.

There is definitely no AVX-512 support on my CPU. Which is also true for most of my users. I don't bother with AVX-512 for that reason.

Another substantial population of my users are on aarch64, which memchr has optimizations for. I don't think StringZilla does.

ashvardanian · 2025-09-23T14:11:07 1758636667

Makes sense! I mostly focus on newer AVX-512 variants as opposed to older AVX2-only CPUs. As for aarch64, it is supported with both NEON, SVE, and SVE2 kernels for some tasks. The last two are rarely useful, unless you run on AWS Graviton 3 (previous gen) or some of the supercomputers with custom chips like Fujitsu A64FX.

burntsushi · 2025-09-23T14:30:31 1758637831

> newer AVX-512 variants as opposed to older AVX2-only CPUs

This is exactly my issue with targeting AVX-512. It isn't just absent on "older AVX2-only CPUs." It's also absent on many "newer AVX2-only CPUs." For example, the i9-14900K. I don't think any of the other newer Intel CPUs have AVX-512 either. And historically, whether an x86-64 CPU supported AVX-512 at all was hit or miss.

AVX-512 has been around for a very long time now, and it has just never been consistently available.

vlovich123 · 2025-09-23T14:38:12 1758638292

It’s mainly available in data centers, but yes missing in consumer parts. And for a while even in data centers you wanted to be careful about using it due to Intel’s issues with clock downscaling but that hasn’t been true for a few years.

ashvardanian · 2025-09-23T14:53:38 1758639218

The consumer situation is changing. A few years ago, when I was working with a team on some closed source HPC stuff, we’ve got everyone Tiger Lake-based laptops to simplify AVX-512 R&D. Now, Zen4-based desktop CPUs also support it.

But its fair to say that I’m mostly focusing on the datacenter/supercomputing hardware, both on the x86 and Arm side.

vlovich123 · 2025-09-23T15:47:47 1758642467

If you’re targeting AVX-512 Intel consumer it’s pointless. But yes, AMD does continue to ship AVX-512 chips so completely ignoring 512 on consumer isn’t ideal.

William_BB · 2025-09-23T15:35:43 1758641743

Could you elaborate on SVE and SVE2? Is that because it's only 128 bits? I think my Macbook (Apple silicon) is one of the two

ashvardanian · 2025-09-23T15:54:27 1758642867

Yes, at the scale of 128-bit registers NEON is mostly enough, except for a few categories of instructions missing in that ISA subset, like scatter/gather ops, that can yield 30% boost over serial memory accesses: https://github.com/ashvardanian/less_slow.cpp/releases/tag/v...