Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The most useful GCC options and extensions (antoarts.com)
46 points by g3orge on Sept 14, 2012 | hide | past | favorite | 14 comments


Major omission from this article: vector extensions. GCC (and Clang) allows having first class SIMD vector values and use standard operators (+, -, *, /) to operate on them. This makes writing computer graphics and other 3d stuff a lot nicer.

Like this:

  typedef float vec4 __attribute__((vector_size(16)));
  vec4 a = { 1, 2, 3, 4 }, b = { 5, 6, 7, 8 };
  vec4 c = a + b;
What GCC does not allow (IIRC) is GLSL/OpenCL -style shuffle syntax like position.xyzw or (vec4)(a.xx, b.yy). Clang has __builtin_shufflevector but GCC doesn't have anything like it and you have to revert to SIMD intrinsics for SSE, NEON, etc.

Interesting thing about shuffles: Clang's implementation of ARM NEON intrinsics actually defines NEON shuffle intrinsic functions using a __builtin_shufflevector. I've noticed that Clang emits a lot better NEON code than GCC does. GCC's NEON backend is a lot worse than the SSE backend. And shuffle instructions are a major difference between NEON and SSE.


Can you give an example where such shuffle instructions are useful? I'm genuinely curious.


The shuffle instructions are commonly used when encoding or decoding integer serialization formats or processing integer arrays. It can also be a fast intermediate operation for some bit-twiddling algorithms, which would require multiple sequential operations to achieve an equivalent distribution of bits for processing.

If you are not working on high-performance bit-twiddling algorithms then you will probably have limited use for shuffle intrinsics. For those applications, it can save a few clock cycles for each call relative to more naive methods.


Shuffles are what makes SIMD go. It's trivial to make simple repeated math ops done in parallel instead of 4 (SIMD vector width) times sequentially, but shuffling vectors cleverly is where you can get big performance wins.

Basic example, doing addition of 4 values in 2 ALU operations:

  vec4 sum(vec4 v) // return v.x+v.y+v.z+v.w repeated 4 times
  {
    vec4 temp = v + v.yxzw;
    return temp + temp.zwxy; // did I get this right?
  }
Practical examples: https://github.com/rikusalminen/threedee-simd (work in progress) Requires this: http://gruntthepeon.free.fr/ssemath/


Cross product is a good example: (v1.yzx * v2.zxy) - (v1.zxy * v2.yzx)


If you think of processors as plumbing, shuffle operations are like pipes: they put the data in the right place for other operations. General shuffle operations let you do things like convert RGB data to RGBA data without trouble. Shuffling is a more flexible version of the more common pack, unpack, and bytewise shift operations.


I never understood why binary literals weren't included in the language.

Other nice extensions `x ? : y` => `x ? x : y`, vector extensions, C++ support for `__restrict__`, all those attributes (deprecated, format, warning, pure, etc.)

Regarding Assembler the `-Wa,-ahl` flag to get the C code interleaved.

And regarding warnings (especially for C++) the world doesn't stop with -Wall -Wextra. There are a bunch of nice warning options like `-Weffc++ -Wfloat-equal -Wdouble-promotion` and so on.


The binary ?: operator is great for providing a fallback for NULL pointers, especially in Objective-C where nil is everywhere. Kind of like || in JavaScript.


As to binary literals, there is a straightforward translation from and to hexadecimal. It might be hard to get used to it at first, but when you see enough of them you just know what it looks like in binary.


Isn't the whole point of -Wall -Wextra to include all the warnings? It seems rather unfortunate to have to add a bunch more flags to do what I want.


I thought so too. But even -Wextra does not include everything. Some Flags can certainly be a bit annoying (e.g., -Weffc++). A project I'm currently working on has a really long list of additional warning flags. But for manual calls to g++ such a long list is not practical however with the GCC spec file one should be able to add something like -Wextreme.


Clang has -Weverything which does indeed include everything.


At the start, I thought it was just going to point out the obvious, but there were some really useful extensions in there, such as the likely/unlikely.


One piece of advice I've heard is to benchmark your app under -Os compared to -O2 or higher.

The idea is that in these days, cache misses are king, so smaller code means fewer misses.

Of course, you need a real-world load, test on correct CPU architecture for your problem etc.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: