Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> With expression templates you can translate one to the other, this is a static manipulation that does not depend on compiler level.

How does that work on an implementation level? First thing that comes to mind is specialization, but I wouldn't be surprised if it were something else.

> What does depend on the compiler is whether the incidental trivial function calls to operators gets optimized away or not.

> Of course in many cases the optimization level does matter: if you are optimizing small vector operators to simd inlining will still be important.

Perhaps this is the source of my confusion; my uses of expression templates so far have generally been "simpler" ones which rely on the optimizer to unravel things. I haven't been exposed much to the kind of matrix/BLAS-related scenarios you describe.



Partial specialization specifically. Match some patterns and covert it to something else. For example:

  struct F { double x; };
  enum Op { Add, Mul };
  auto eval(F x) { return x.x; }
  template<class L, class R, Op op> struct Expr;
  template<class L, class R> struct Expr<L,R,Add>{  L l; R r; 
    friend auto eval(Expr self) { return eval(self.l) + eval(self.r); } };
  template<class L, class R> struct Expr<L,R,Mul>{  L l; R r; 
    friend auto eval(Expr self) { return eval(self.l) * eval(self.r); } };
  template<class L, class R, class R2> struct Expr<Expr<L, R, Mul>, R2, Add>{   Expr<L,R, Mul> l; R2 r; 
    friend auto eval(Expr self) { return fma(eval(self.l.l), eval(self.l.r), eval(self.r));}};
  template<class L, class R>
  auto operator +(L l, R r) { return Expr<L, R, Add>{l, r}; } 
  template<class L, class R>
  auto operator *(L l, R r) { return Expr<L, R, Mul>{l, r}; } 

  double optimized(F x, F y, F z) { return eval(x * y + z); }
  double non_optimized(F x, F y, F z) { return eval(x + y * z); }
Optimized always generates a call to fma, non-optimized does not. Use -O1 to see the difference (will inline trivial functions, but will not do other optimizations). -O0 also generates the fma, but it is lost in the noise.

The magic happens by specifically matching the pattern Expr<Expr<L, R, Mul>, R2, Add>; try to add a rule to optimize x+y*z as well.


Hrm, OK, that makes sense. Thanks for taking the time to explain! Guessing optimizing x+y*z would entail something similar to the third eval() definition but with Expr<L, Expr<L2, R2, Mul>, Add> instead.

I think at this point I can see how my initial assertion was wrong - specialization isn't fully orthogonal to expression templates, as the former is needed for some of the latter's use cases.

Does make me wonder how far one could get with rustc's internal specialization attributes...




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: