More

Chirono · 2025-06-28T17:35:56 1751132156

Exactly. This number is so so much bigger than 10^100000 or however many grains of sand would fit, that dividing by that amount doesn’t really change it, certainly not enough to bring it down closer to 9,999,999sub10

Chirono · on Oct 8, 2024

The two other changes they mention have been widely adopted, and are included in at least some of the models they benchmark against. It seems they list them for completeness as changes to the original transformer architecture.

aDyslecticCrow · on Oct 8, 2024

Nicely spotted! Then, I really look forward to seeing this method tested by others! Epic stuff.

Chirono · on Aug 20, 2024

This is just an artefact of tokenisation though. The model simply isn’t ever shown the letters that make up words, unless they are spelled out explicitly. It sees tokens representing groups of words. This is a little like saying a human isn’t intelligent because they couldn’t answer your question that you asked in an ultrasonic wavelength. If you’d like to learn more this video is a great resource: https://youtu.be/zduSFxRajkE?si=LvpXbeSyJRFBJFuj

Chirono · on June 1, 2024

I used to use K professionally inside a hedge fund a few years back. Aside from the terrible user experience (if your code isn’t correct you will often just get ‘error’ or ‘not implemented’ with no further detail), if the performance really was as stellar as claimed, then there wouldn’t need to be a no benchmark clause in the license. It can be fast, if your data is in the right formats, but not crazy fast. And easy to beat if you can run your code on the GPU.

lmeyerov · on June 1, 2024

The last point is spot on... Pandas on GPUs (cudf) gets you both the perf + usability, without having to deal with issues common to stack/array languages (k) and lazy languages (dask, polars). My flow is pandas -> cudf -> dask cudf , spark, etc.

More recently, we have been working on GFQL with users at places like banks (graph dataframe query language), where we translate down to tools like pandas & cudf. A big "aha" is that columnar operations are great -- not far from what array languages focus on -- and having a static/dynamic query planner so optimizations around that helps once you hit memory limits. Eg, dask has dynamic DFS reuse of partitions as part of its work stealing. More SQL-y tools like Spark may make plans like that ahead of time. In contrast, that lands more on the user if they stick with pandas or k, eg, manual tiling.

anonu · on June 1, 2024

I've been using kdb/q since 2010. Started at a big bank and have used it ever since.

Kdb/q is like minimalist footwear. But you can run longer and faster with it on. There's a tipping point where you just "get it". It's a fantastic language and platform.

The problem is very few people will pay 100k/month for shakti. I'm not saying people won't pay and it won't be a good business. But if you want widespread adoption you need to create and an ecosystem. Open sourcing it is a start. Creating libraries and packages comes after. The mongodb model is the right approach IMO

ccorcos · on June 1, 2024

Can you elaborate on what mongo did right? My understanding is that AWS is stealing their business by creating a compatible api

wink · on June 7, 2024

MongoDB's biggest feat is marketing. Recovering from people comparing your database with a black hole is... something.

tiffanyh · on June 1, 2024

Would you recommend K?

Is something else better (if so what)?

chongli · on June 1, 2024

I think the main reason to use any of these array languages (for work) is job security. Since it's so hard to find expert programmers if you can get your employer to sign off on an array language for all the mission-critical stuff then you can lock in your job for life! How can they possibly replace you if they can't find anyone else who understands the code?

Otherwise, I don't see anything you can do in an array language that you couldn't do in any other language, albeit less verbosely. But I believe in this case a certain amount of verbosity is a feature if you want people to be able to read and understand the code. Array languages and their symbol salad programs are like the modern day equivalent of medieval alchemists writing all their lab notes in a bespoke substitution cipher. Not unbreakable (like modern cryptography) but a significant enough barrier to dissuade all but the most determined investigators.

As an aside, I think the main reason these languages took off among quants is that investing as an industry tends toward the exultation of extremely talented geniuses. Perhaps unintelligible "secret sauce" code has an added benefit of making industrial espionage more challenging (and of course if a rival firm steals all your code they can arbitrage all of your trades into oblivion).

rscho · on June 1, 2024

I'm sorry but you really sound like you judge APLs from an outsider pov. For sure, it's not job security that's keeping APLs afloat, because APLs are very easy to learn. A pro programmer would never use K or any APL. But pro mathematicians or scientists needing some array programming for their job will.

nathan_compton · on June 1, 2024

I have been a programmer, scientist, etc, at various times in my life and I have programmed in J. I don't think there is any compelling reason to use J over Matlab, R, or Python and very many reasons not to. Vector languages are mind expanding, for sure, but they have done a very poor job keeping up with the user experience and network effects of newer languages.

A few years ago I wrote a pipeline in J and then re-implemented it R. The J code was exactingly crafted and the R code was naive, but the R code was still faster, easier to read and maintain and, frankly, easier to write. J gives a certain perverse frisson, but beyond that I don't really see the use case.

rscho · on June 1, 2024

I disagree. For the stuff I'm doing, I've been rewriting the same 1-screen program in different ways many times. J made that easy, R would make that much more tedious since I'd have >250 lines instead of 30. Of course if I'm doing something for which R has libs, I'll do it in R. Additionally, I'll very probably end up rewriting my final program in another language for speed and libs, because by then I'll know precisely what I want to write. IMO, the strength of array languages lies in easy experimentation.

nathan_compton · on June 3, 2024

I think there is an element of truth to this, but the context where this ends up being an advantage is personal and idiosyncratic.

rscho · on June 1, 2024

Are you a quant, or doing very specialized numerical things that aren't in libs on 2D datasets? Then 10X yes. If not, no. Everything else will be better

nathan_compton · on June 1, 2024

I'm not a quant but having used "data science" languages, ocaml, J, R, etc, I strongly doubt that an array language offers any substantial advantages at this date. I could be wrong, of course, but it seems unlikely.

rscho · on June 1, 2024

J only is a data science language in the sense that its core is suited to dataset manipulation, but it's severely lacking in modelling librairies. If you're doing exotic stuff and you know exactly what you're doing, I can see the rationale but otherwise it's R/Python any day. It makes plenty of sense for custom mathematical models, such as simulations though.

koolala · on June 1, 2024

J and OPFS

koolala · on June 1, 2024

no benchmark clause sounds like webgpu

Chirono · on Feb 14, 2024

For anyone wondering what this does, it looks like it produces optimal configurations for belt balancers given a specified number of input and output belts. Belt balancers evenly distribute items between belts: https://wiki.factorio.com/Balancer_mechanics

perihelions · on Feb 14, 2024

Fun to see that that math (linked in that wiki) coincides with early 20th-century analog telephony problems—the problem of how to arrange electromechanical switches to support large numbers of simultaneous circuits, avoiding bottlenecks.

There's people reading Bell Labs technical papers from the 1950's as part of the process of playing an addictive video game.

TeMPOraL · on Feb 14, 2024

> There's people reading Bell Labs technical papers from the 1950's as part of the process of playing an addictive video game.

It's a nice break from reading orbital mechanics textbooks as part of the process of playing another addictive video game.

skirmish · on Feb 15, 2024

I personally bought this real world fighter plane tactics book for playing military flight sims: https://www.amazon.com/dp/0870210599 .

perihelions · on Feb 14, 2024

Origin of my HN name! :)

beacon294 · on Feb 14, 2024

What games?

kroltan · on Feb 14, 2024

Bell labs => Factorio

Orbital mechanics => Kerbal Space Program

Izikiel43 · on Feb 14, 2024

Sounds like kerbal space program?

jconnop · on Feb 14, 2024

The latter is surely Kerbal Space Program.

Chirono · on Jan 24, 2024

In some cases you can directly test hypotheses like that, but more often than not, there isn’t a way to test without just trying.

Chirono · on Jan 23, 2024

Nice paper. I particularly like how they talk through the ideas they tried that didn’t work, and the process they used to land on the final results. A lot of ML papers present the finished result as if it appeared from nowhere without trial and error, perhaps with some ablations in the appendix and I wish more papers followed this one in talking about the dead ends along the way.

sjwhevvvvvsj · on Jan 23, 2024

Nothing would benefit the scientific enterprise more than explicitly publishing papers about failed experiments.

How much public money, time, and careers have been wasted chasing something that is already known not to work?

vacuity · on Jan 23, 2024

Unfortunately, careers don't get advanced that way. There are very backwards incentives.

sjwhevvvvvsj · on Jan 23, 2024

I’m aware, I left the academic world in no small part that I refused to write papers that weren’t worth reading. A high quality, but short, CV is a career ender these days. I’m happier now though!

Chirono · on Dec 1, 2023

That’s only true for linearly ordered structures, but isn’t true for partially ordered ones.

For example, set inclusion. Two different sets can be neither greater than not smaller than each other. Sets ordered by inclusion form a partially ordered lattice.

Chirono · on Nov 28, 2023

Interesting! Do you have a link to that research?

l33tman · on Nov 28, 2023

Certainly: https://arxiv.org/abs/2306.05720

It's a very interesting paper.

"Even when trained purely on images without explicit depth information, they typically output coherent pictures of 3D scenes. In this work, we investigate a basic interpretability question: does an LDM create and use an internal representation of simple scene geometry? Using linear probes, we find evidence that the internal activations of the LDM encode linear representations of both 3D depth data and a salient-object / background distinction. These representations appear surprisingly early in the denoising process−well before a human can easily make sense of the noisy images."

Chirono · on Nov 27, 2023

From reading this book you’d have a very good grasp of the underlying theory, much more than many ML engineers. But you’d be missing out on the practical lessons, all the little tips and intuitions you need to be able to get systems working in practice. I think this just takes time and it’s as much an art as it is a science.