More

taylorsatula · 2026-03-06T18:46:06 1772822766

Most prompt engineering is done by changing words, running the model, and squinting at the output. Over the past few weeks I've built this toolkit which lets you measure what's actually happening inside the model instead.

You define regions of your prompt (instructions, examples, constraints, whatever), run the pipeline on any HuggingFace model, and get back per-layer attention heatmaps, cooking curves showing how attention to each region evolves through the network, and logit lens snapshots. Supports Llama, Qwen, Mistral, Gemma out of the box. Self-contained engine script you can scp to a GPU box and run with no dependencies beyond transformers. The repo is designed so that Claude can handle the whole pipeline end-to-end including interpreting results in a grounded domain-specific way.

I built it to tune system prompts for another project and realized the general approach was useful enough to extract. The "before and after" comparison tooling ended up being the part I use most.

taylorsatula · 2025-12-31T19:17:03 1767208623

Hi all, thank you all for the OUTPOURING of support for the MIRA project over the past few weeks. It trips me out that people are creating discussions, lodging bugs for me to fix, and even proposing feature improvements!

This release represents focused work on MIRA's relationship with self, time, and context. Since the original 1.0.0 release generic OpenAI/local providers have full feature parity with the native Anthropic format, the working_memory has been modified so that the model receives a HUD (for lack of a better) word in a sliding assistant message that contains reminders and relevant memories, and adjustments to the context window to better articulate the passage of time between messages.

In the 1.0.0 release I did not realize the percentage of users who would be operating the application totally offline. Significant improvements have been made on this front and now has rock offline/self-hosted solid reliability.

Also, since the original 1.0.0 release I have switched to a AGPL 3.0 open-source license.

Various other improvements have been made and are contained in the release notes for releases 2025.12.30-feat and 2025.12.24.

Thank you all again for all of the feedback. It is wildly satisfying to work on a project so diligently for so long and then have it embraced by the community. Keep the feature requests comin'!

taylorsatula · 2025-12-21T17:15:50 1766337350

I had not but I’m going to read more about this today! Thanks!

taylorsatula · 2025-12-21T17:14:49 1766337289

I really like how readable Python is and latency isn’t an issue because the VAST majority of time is API latency.

taylorsatula · 2025-12-21T17:13:51 1766337231

Are you running it locally or the hosted version? I say that because Anthropic models are really good about not lying that they executed a tool call but using another provider/model sometimes they lie to your face.

Does it produce an error or just lies to you?

hohithere · 2025-12-21T17:48:39 1766339319

I tried the web hosted one, and it just lied indeed.

I asked to replace a name, told me it was done and shows the, what should be result, but does not touch the document.

taylorsatula · 2025-12-21T06:26:20 1766298380

A good friend of mine, god honest truth, met his now-wife on Fark less than three years ago. Sure is somethin.

taylorsatula · 2025-12-21T06:22:49 1766298169

Self-hosting Postgres is so incredibly easy. People are under this strange spell that they need to use an ORM or always reach for SQLite when it’s trivially easy to write raw SQL. The syntax was designed so lithium’d out secretaries were able to write queries on a punchcard. Postgres has so many nice lil features.

taylorsatula · 2025-12-21T03:19:06 1766287146

:D I’d also like to thank David Hahn for obsessively (and arguably compulsively) learning about a topic way out of his depth and then manifesting it till the cops took him away.

taylorsatula · 2025-12-21T01:20:45 1766280045

(As I said above I changed to an AGPL earlier today but I'll speak to my BSL logic)

I liked BSL because the code ~was~ proprietary for a time so someone couldn't duplicate my software I've worked so hard on, paywall it, and put me out of business. I'm a one-man development operation and a strong gust of wind could blow me over. I liked BSL because it naturally decayed into a permissive open source license automatically after a timeout. I'd get a head start but users could still use it and modify it from day one as long as they didn't charge money for it.

nawtagain · 2025-12-21T02:22:09 1766283729

Totally fair - but just call it Source Available then.

Open Source has a specific definition and this license does not conform to that definition.

Stating it is open source creates a bait and switch effect with people who understand this definition, get excited, then realize this project is not actually open source.

hugo1789 · 2025-12-21T11:02:38 1766314958

Could you please stop that? First it is not true. "Open Source" has nothing to do with the "Open Source Initiative" it existed long before. Second you are making people keep their source closed (not available) which is not a good thing.

eadwu · 2025-12-21T03:27:28 1766287648

"Open Source has a specific definition and this license does not conform to that definition."

To be fair, this wouldn't be an issue if Open Source stuck with "Debian Free Software". If you really want to call it a bait and switch, open source did it first.

taylorsatula · 2025-12-21T03:20:17 1766287217

That’s fair. It’s OSI now but I get what you’re saying broadly.

taylorsatula · 2025-12-21T01:16:48 1766279808

I use a two-step generation process which both avoids memory explosion in the window and the one turn behind problem.

When a user sends a message I: generate a vector of the user message -> pull in semantically similar memories -> filter and rank them -> then send an API call with the memories from the last turn that were 'pinned' plus the top 10 memories just surfaced. the first API call's job is to intelligently pick the actual worthwhile memories and 'pin' them till the next turn -> do the main LLM call with an up-to-date and thinned list of memories.

Reading the prompt itself that the analysis model carries is probably easier than listening to my abstract description: https://github.com/taylorsatula/mira-OSS/blob/main/config/pr...

I can't say with confidence that this is ~why~ I don't run into the model getting super flustered and crashing out though I'm familiar with what you're talking about.