In principle, it works on Nvidia, but Nvidia still does not support 1.2...
But, new pluggable engine for OpenCL 1.2 is in plan, and it will have optimizations for Nvidia, so, it is not something that is out of reach.
How about Intel/AMD on-die GPUs? (And any thoughts on if that's likely to help -- now, or with future generations -- as opposed to just sticking to the CPU side?)
The library does NOT depend on the version, since it supports pluggable engines.
Currently the only available engine is focused on AMD GCN GPUs and uses OpenCL 2.0 features in its kernels.
Other engines are possible and planned, notably one that would work with older versions of OpenCL and support tuning for Nvidia.
This is really exciting, thanks for sharing. OpenCL is probably the future of everything. AMD is going to be an important part of it for sure. Fantastic to see this explained in the Clojure context; it feels right at home!
I wrote my master thesis on General purpose GPU programming. The only two reasons you want to code for the GPU is that it can be much faster for some workloads (basically identical computation which can be trivially parralized) and (in some cases) because it can use much less power.
In the particular case of matrix multiplications, lots of problems can be described as matrices (say googles website ranking) and doing calculations on huge matrices is a thing the GPU is great at.
That said it is unlikely to be something most people want.
Whenever you need to implement an algorithm that is described using matrices it is much easier and faster to implement it by calling linear algebra functions instead of coding loops yourself.
Neanderthal comes into picture when you have lots of numbers and still need it to be fast (nano, micro, milli)seconds instead of minutes and hours (or perhaps days).
So, actually any kind of numerical software written in Clojure (and Java) can hugely benefit from this.
It entirely depends on how you want to program your GPU. If you want fine grained control, you should really use raw OpenCL/CUDA. If however you need something higher level, it may be better to choose one of a number of DSLs that are available. E.g. Haskell's accellerate (for array programming), a DSL from the Delite/Forge project.
For general purpose high level programming however, there isn't really anything acceptable. Skeletal/pattern based programming is covered fairly well with frameworks such as SkelCL or SkePU, however (from what I've seen) there doesn't seem to be a nice functional, or even procedural general purpose language for GPU programming.