Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It’s a very nice and detailed benchmark suite! Great effort! Can you please share the CPU model you are running on? I suspect it’s an x86 CPU without AVX-512 support.


i9-12900K, x86-64.

There is definitely no AVX-512 support on my CPU. Which is also true for most of my users. I don't bother with AVX-512 for that reason.

Another substantial population of my users are on aarch64, which memchr has optimizations for. I don't think StringZilla does.


Makes sense! I mostly focus on newer AVX-512 variants as opposed to older AVX2-only CPUs. As for aarch64, it is supported with both NEON, SVE, and SVE2 kernels for some tasks. The last two are rarely useful, unless you run on AWS Graviton 3 (previous gen) or some of the supercomputers with custom chips like Fujitsu A64FX.


> newer AVX-512 variants as opposed to older AVX2-only CPUs

This is exactly my issue with targeting AVX-512. It isn't just absent on "older AVX2-only CPUs." It's also absent on many "newer AVX2-only CPUs." For example, the i9-14900K. I don't think any of the other newer Intel CPUs have AVX-512 either. And historically, whether an x86-64 CPU supported AVX-512 at all was hit or miss.

AVX-512 has been around for a very long time now, and it has just never been consistently available.


It’s mainly available in data centers, but yes missing in consumer parts. And for a while even in data centers you wanted to be careful about using it due to Intel’s issues with clock downscaling but that hasn’t been true for a few years.


The consumer situation is changing. A few years ago, when I was working with a team on some closed source HPC stuff, we’ve got everyone Tiger Lake-based laptops to simplify AVX-512 R&D. Now, Zen4-based desktop CPUs also support it.

But its fair to say that I’m mostly focusing on the datacenter/supercomputing hardware, both on the x86 and Arm side.


If you’re targeting AVX-512 Intel consumer it’s pointless. But yes, AMD does continue to ship AVX-512 chips so completely ignoring 512 on consumer isn’t ideal.


Could you elaborate on SVE and SVE2? Is that because it's only 128 bits? I think my Macbook (Apple silicon) is one of the two


Yes, at the scale of 128-bit registers NEON is mostly enough, except for a few categories of instructions missing in that ISA subset, like scatter/gather ops, that can yield 30% boost over serial memory accesses: https://github.com/ashvardanian/less_slow.cpp/releases/tag/v...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: