Most prompt engineering is done by changing words, running the model, and squinting at the output. Over the past few weeks I've built this toolkit which lets you measure what's actually happening inside the model instead.
You define regions of your prompt (instructions, examples, constraints, whatever), run the pipeline on any HuggingFace model, and get back per-layer attention heatmaps, cooking curves showing how attention to each region evolves through the network, and logit lens snapshots. Supports Llama, Qwen, Mistral, Gemma out of the box. Self-contained engine script you can scp to a GPU box and run with no dependencies beyond transformers. The repo is designed so that Claude can handle the whole pipeline end-to-end including interpreting results in a grounded domain-specific way.
I built it to tune system prompts for another project and realized the general approach was useful enough to extract. The "before and after" comparison tooling ended up being the part I use most.
Hi all, thank you all for the OUTPOURING of support for the MIRA project over the past few weeks. It trips me out that people are creating discussions, lodging bugs for me to fix, and even proposing feature improvements!
This release represents focused work on MIRA's relationship with self, time, and context. Since the original 1.0.0 release generic OpenAI/local providers have full feature parity with the native Anthropic format, the working_memory has been modified so that the model receives a HUD (for lack of a better) word in a sliding assistant message that contains reminders and relevant memories, and adjustments to the context window to better articulate the passage of time between messages.
In the 1.0.0 release I did not realize the percentage of users who would be operating the application totally offline. Significant improvements have been made on this front and now has rock offline/self-hosted solid reliability.
Also, since the original 1.0.0 release I have switched to a AGPL 3.0 open-source license.
Various other improvements have been made and are contained in the release notes for releases 2025.12.30-feat and 2025.12.24.
Thank you all again for all of the feedback. It is wildly satisfying to work on a project so diligently for so long and then have it embraced by the community. Keep the feature requests comin'!
Are you running it locally or the hosted version? I say that because Anthropic models are really good about not lying that they executed a tool call but using another provider/model sometimes they lie to your face.
Self-hosting Postgres is so incredibly easy. People are under this strange spell that they need to use an ORM or always reach for SQLite when it’s trivially easy to write raw SQL. The syntax was designed so lithium’d out secretaries were able to write queries on a punchcard. Postgres has so many nice lil features.
:D I’d also like to thank David Hahn for obsessively (and arguably compulsively) learning about a topic way out of his depth and then manifesting it till the cops took him away.
(As I said above I changed to an AGPL earlier today but I'll speak to my BSL logic)
I liked BSL because the code ~was~ proprietary for a time so someone couldn't duplicate my software I've worked so hard on, paywall it, and put me out of business. I'm a one-man development operation and a strong gust of wind could blow me over. I liked BSL because it naturally decayed into a permissive open source license automatically after a timeout. I'd get a head start but users could still use it and modify it from day one as long as they didn't charge money for it.
Totally fair - but just call it Source Available then.
Open Source has a specific definition and this license does not conform to that definition.
Stating it is open source creates a bait and switch effect with people who understand this definition, get excited, then realize this project is not actually open source.
Could you please stop that? First it is not true. "Open Source" has nothing to do with the "Open Source Initiative" it existed long before. Second you are making people keep their source closed (not available) which is not a good thing.
"Open Source has a specific definition and this license does not conform to that definition."
To be fair, this wouldn't be an issue if Open Source stuck with "Debian Free Software". If you really want to call it a bait and switch, open source did it first.
I use a two-step generation process which both avoids memory explosion in the window and the one turn behind problem.
When a user sends a message I:
generate a vector of the user message ->
pull in semantically similar memories ->
filter and rank them ->
then send an API call with the memories from the last turn that were 'pinned' plus the top 10 memories just surfaced. the first API call's job is to intelligently pick the actual worthwhile memories and 'pin' them till the next turn -> do the main LLM call with an up-to-date and thinned list of memories.
I can't say with confidence that this is ~why~ I don't run into the model getting super flustered and crashing out though I'm familiar with what you're talking about.
You define regions of your prompt (instructions, examples, constraints, whatever), run the pipeline on any HuggingFace model, and get back per-layer attention heatmaps, cooking curves showing how attention to each region evolves through the network, and logit lens snapshots. Supports Llama, Qwen, Mistral, Gemma out of the box. Self-contained engine script you can scp to a GPU box and run with no dependencies beyond transformers. The repo is designed so that Claude can handle the whole pipeline end-to-end including interpreting results in a grounded domain-specific way.
I built it to tune system prompts for another project and realized the general approach was useful enough to extract. The "before and after" comparison tooling ended up being the part I use most.
reply