More

foundry27 · 2025-04-28T00:54:29 1745801669

I just tried the same puzzle in o3 using the same image input, but tweaked the prompt to say “don’t use the search tool”. Very similar results!

It spent the first few minutes analyzing the image and cross-checking various slices of the image to make sure it understood the problem. Then it spent the next 6-7 minutes trying to work through various angles to the problem analytically. It decided this was likely a mate-in-two (part of the training data?), but went down the path that the key to solving the problem would be to convert the position to something more easily solvable first. At that point it started trying to pip install all sorts of chess-related packages, and when it couldn’t get that to work it started writing a simple chess solver in Python by hand (which didn’t work either). At one point it thought the script had found a mate-in-six that turned out to be due to a script bug, but I found it impressive that it didn’t just trust the script’s output - instead it analyzed the proposed solution and determined the nature of the bug in the script that caused it. Then it gave up and tried analyzing a bit more for five more minutes, at which point the thinking got cut off and displayed an internal error.

15 minutes total, didn’t solve the problem, but fascinating! There were several points where if the model were more “intelligent”, I absolutely could see it reasoning it out following the same steps.

bko · 2025-04-28T12:52:34 1745844754

Claude gets the right answer but misplaces the pieces in its initial analysis which means the answer is incorrect.

Whats going on? Did it just get lucky? Did it memorize the answer but misplace the pieces in its recall? Did it actually compute anything?

https://claude.ai/share/d640bc4c-8dd8-4eaa-b10b-cb3f83a6b94b

This is the board as it sees it (incorrect):

https://lichess.org/editor/kb6/pp6/2P5/8/8/3K4/8/R7_w_-_-_0_...

IanCal · 2025-04-28T09:38:26 1745833106

Told that it was a mate in 2 puzzle, and it solved it for me

https://chatgpt.com/share/680f4a02-4cc4-8002-8301-59214fca78...

It worked through some stuff then decided to try and list all possible moves as there can't be that many. Tried importing stuff that didn't work, then wrote code to create the permutations.

foundry27 · 2025-04-22T02:20:52 1745288452

tl;dr for anyone who may be put off by the article length:

OP built an arena allocator in Go using unsafe to speed allocator operations up, especially for cases when you're allocating a bunch of stuff that you know lives and dies together. The main issue they ran into is that Go's GC needs to know the layout of your data (specifically, where pointers are) to work correctly, and if you just allocate raw bytes with unsafe.Pointer, the GC might mistakenly free things pointed to from your arena because it can't see those pointers properly. But to make it work even with pointers (as long as they point to other stuff in the same arena), you keep the whole arena alive if any part of it is still referenced. That means (1) keeping a slice (chunks) pointing to all the big memory blocks the arena got from the system, and (2) using reflect.StructOf to create new types for these blocks that include an extra pointer field at the end (pointing back to the Arena). So if the GC finds any pointer into a chunk, it’ll also find the back-pointer, therefore mark the arena as alive, and therefore keep the chunks slice alive. Then they get into a bunch of really interesting optimizations to remove various internal checks and and write barriers using funky techniques you might not've seen before

foundry27 · 2025-03-26T12:25:27 1742991927

I believe this is out of date. There’s a very explicit opt in/out slider for permitting training on conversations that doesn’t seem to affect conversation history retention.

foundry27 · on March 1, 2025

It’s always a touch ironic when AI-generated replies such as this one are submitted under posts about AI. Maybe that’s secretly the the self-reflection feedback loop we need for AGI :)

DrammBA · on March 1, 2025

So strange too, their other comments seem normal, but suddenly they decided to post a gpt comment.

isaacremuant · on March 2, 2025

At least one other is LLM generated too, from what I saw.

foundry27 · on Feb 25, 2025

I’ve been using xonsh as my daily driver for a few years now, and it’s a massive productivity booster!

Broadly speaking I’ve found that most of the reported compatibility and usability concerns in their GitHub issues have boiled down to user error, rather than any kind of a defect with the shell itself. That’s not to say there aren’t any issues, but they’re few and far between, and it’s more than solid enough for regular use. It isn’t bash, and you shouldn’t expect to execute a bash script with xonsh or use bash idioms (even though some compatibility layers exist for e.g. translating your ~/.bashrc and sourcing envvars from bash scripts).

foundry27 · on Feb 17, 2025

Barbour is criminally underrated as a physics author. He’s published a lot of interesting ideas regarding the role of time, or lack thereof, in modern theories! (The End of Time, and its treatment of Causality as a direct substitute for time in any future theory of everything, was very fun)

Xmd5a · on Feb 17, 2025

Someone came up with a very similar theory (two arrows of time diverging from the same point, the big bang). They even gave their theory the same name: Janus.

https://januscosmologicalmodel.com/januspoint

There are other players concerned with similar ideas:

- Negative mass, Farnes: https://en.wikipedia.org/wiki/Dark_fluid

- Mirror-image universe going backwards in time from the big bang, Turok: https://www.newscientist.com/article/mg25734230-100-neil-tur...

foundry27 · on Feb 9, 2025

If that’s how it’s being advertised, and that’s the reason people are giving it a shot based on that advertising, then I certainly do! And so, I imagine, did the people who have left feedback so far!

foundry27 · on Feb 9, 2025

I get that sliding in references to a passion project on top-scoring articles might seem like an easy way to give the project exposure, but commenting the same thing over and over comes off as a bit boorish. And just plugging the URL isn’t really contributing anything to the discussions IMO. Why not show us something your tool explained or summarized from the articles that isn’t obvious from a cursory read? Citing the tool as the source for something cool wouldn’t be nearly as in-your-face.

shashanoid · on Feb 9, 2025

I won't do it again my sincere apologies

foundry27 · on Dec 29, 2024

Aye, there’s the kicker. The correct configuration of hardware resources to run and multiplex large models is just as much of a trade secret as model weights themselves when it comes to non-hobbyist usage, and I wouldn’t be surprised if optimal setups are in many ways deliberately obfuscated or hidden to keep a competitive advantage

Edit: outside the HPC community specifically, I mean

codybontecou · on Dec 30, 2024

The economic barrier to entry probably has a lot to do with it. I'd happily dig into this problem and share my findings but it's simply too expensive for a hobbyist that isn't specialized in it.

foundry27 · on Dec 21, 2024

I think the approximate reciprocal approach is interesting here. The doc mentions multiplying the dividend by ~1.00025 in the math to avoid FP error so you don’t end up off-by-one after truncation, but I think this hack is still incomplete! On some inputs (like 255, or other unlucky divisors near powers of two), you might get borderline rounding behaviour that flips a bit of the final integer. It’s easy to forget that single-precision floats don’t line up neatly with every 8bit integer ratio in real code, and a single off-by-one can break pixel ops or feed subtle bugs into a bigger pipeline.

I suspect a hybrid scheme like using approximate reciprocals for most values but punting to scalar for unlucky ones could handle these corner cases without killing performance. That’d be interesting to benchmark

LegionMammal978 · on Dec 21, 2024

There are only 65280 possible inputs, that's easily small enough to test every value for correctness.