Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How do you handle it in numpy if you want something like ((x - y)/2 + z) * w all elementwise over the arrays? Naively that's 3 intermediate unused arrays being created.


You use Julia and let loop fusion handle it. You get 1 allocation if you want to store the result in a new array, or zero allocations if you already have the array allocated.

https://julialang.org/blog/2017/01/moredots/


There's plenty of CPU and GPU numpy accelerators available.

* Numba: https://numba.pydata.org/

* JAX: https://jax.readthedocs.io/en/latest/notebooks/quickstart.ht...


numba - instead of writing in Rust, you write it in numba, which is also almost like another language. Not bad, but needs to be taken into account and is not pure numpy.


Is there an optimal solution to that for all sizes of the arrays?

E.g. I’d expect for small/medium then re-writing as a fold over the input arrays would be fastest because you’d only traverse once & everything would fit in cache.

However if the 3 arrays combined size is larger than L1 cache, I’d be strongly tempted to bet on the naïve 3 operation approach to be faster. You’d save so much time on cache line flush / reload activities.

But then if 2 arrays are larger than L3 i’d expect going back to the fold / single traversal to win again because the iterating 2 arrays at a time behaviour is no longer any different to iterating 3 at a time.

Untested hypothesis.



Isn't that what Theano/Aesara is for? Still based on a NumPy interface AIUI, but automagically compiling from NumPy-based to efficient code on the CPU or GPU.


Inplace operators?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: