Windows 95 was a vast improvement in looks over 3.x. Of course tastes differ, but I found it very aesthetic, not ugly at all, and used the classic look until Windows 7 EOLd.
I asked it to recite potato 100 times coz I wanted to benchmark speed of CPU vs GPU. It's on 150 line of planning. It recited the requested thing 4 times already and started drafting the 5th response.
Qwen3.5 pretty much requires a long system prompt, otherwise it goes into a weird planning mode where it reasons for minutes about what to do, and double and triple checks everything it does. Both Gemini's and Claude Opus 4.6's prompts work pretty well, but are so long that whatever you're using to run the model has to support prompt caching. Asking it to "Say the word "potato" 100 times, once per line, numbered.", for example, results in the following reasoning, followed by the word "potato" in 100 numbered lines, using the smallest (and therefore dumbest) quant unsloth/Qwen3.5-35B-A3B-GGUF:UD-IQ2_XXS:
"User is asking me to repeat the word "potato" 100 times, numbered. This is a simple request - I can comply with this request. Let me create a response that includes the word "potato" 100 times, numbered from 1 to 100.
I'll need to be careful about formatting - the user wants it numbered and once per line. I should use minimal formatting as per my instructions."
good to know, thanks. I just ran ollama with qwen3.5:27b. Currently it's stuck on picking format
Let's write.
Wait, I'll write the response.
Wait, I'll check if I should use a table.
No, text is fine.
Okay.
Let's write.
Wait, I'll write the response.
Wait, I'll check if I should use a bullet list.
No, just lines.
Okay.
Let's write.
Wait, I'll write the response.
Wait, I'll check if I should use a numbered list.
No, lines are fine.
Okay.
Let's write.
Wait, I'll write the response.
Wait, I'll check if I should use a code block.
Yes.
Okay.
Let's write.
Wait, I'll write the response.
Wait, I'll check if I should use a pre block.
Code block is better.
Yeah, it tends to get stuck in loops like that a lot with everything set to default. I wonder if they distilled Gemini at some point, I've seen that get stuck in a similar "I will now do [thing]. I am preparing to do [thing]. I will do it." failure mode as well a couple of times.
I don't quite get the low temperature coupled with the high penalty. We get thinking loop due to low temperature, and we then counter it with high penalty. That seems backward.
For Qwen3.5 27B, I got good result with --temp 1.0 --top-p 1.0 --top-k 40 --min-p 0.2, without penalty. It allows the model to explore (temp, top-p, top-k) without going off the rail (min-p) during reasoning. No loop so far.
The guidelines are a little hard to interpret. At https://huggingface.co/Qwen/Qwen3.5-27B Qwen says to use temp 0.6, pres 0.0, rep 1.0 for "thinking mode for precise coding tasks" and temp 1.0, pres 1.5, rep 1.0 for "thinking mode for general tasks." Those parameters are just swinging wildly all over the place, and I don't know if printing potato 100 times is considered to be more like a "precise coding task" or a "general task."
When setting up the batch file for some previous tests, I decided to split the difference between 0.6 and 1.0 for temperature and use the larger recommended values for presence and repetition. For this prompt, it probably isn't a good idea to discourage repetition, I guess. But keeping the existing parameters worked well enough, so I didn't mess with them.
well hold on now, maybe it’s onto something. do you really know what it means to “recite” “potato” “100” “times”? each of those words could be pulled apart into a dissertation-level thesis and analysis of language, history, and communication.
either that, or it has a delusional level of instruction following. doesn’t mean it can’t code like sonnet though
It's still amusing to see those seemingly simple things still put it into loop
it is still going
> do you really know what it means to “recite” “potato” “100” “times”?
asking user question is an option. Sonnet did that a bunch when I was trying to debug some network issue. It also forgot the facts checked for it and told it before...
I wonder how much certain models have been trained to avoid asking too many questions. I’ve had coworkers who’ll complete an entire project before asking a single additional question to management, and it has never gone well for them. Unsurprising that the same would be true for the “managing AI” era of programming.
The thing I struggle most with, honestly, is when AI (usually GPT5.3-Codex) asks me a question and I genuinely don’t know the answer. I’m just like “well, uh… follow industry best practice, please? unless best practice is dumb, I guess. do a good. please do a good.” And then I get to find out what the answer should’ve been the hard way.
Still, I would probably abandon the name for trademark enforcement reasons. It's low hanging fruit for them if they want to kill you.
(this is also why the Pentium was called the Pentium instead of the numbers that processors used to be called.. and why the gameboy copyright text was embedded into the ROMs)
I dont use it a lot but when I do it's pretty much 2 patterns
* "search on steroids" - get me to the thing I need or ask whether the thing I need exists, give me few examples and I can get it running.
* getting the trivial and uninteresting parts out of the way, like writing some helper function for stuff I'm doing now, I'll just call AI, let it do its thing and continue writing the code in meantime, look back ,check if it makes sense and use it.
So I'm not really cheating myself out of the learning process, just outsource the parts I know well enough that I can check for correctness but save time writing
> Can you create incredibly useful code without that knowledge today?
You could do that without that knowledge back in the day too, we had languages that were higher level than assembler for forever.
It's just that the range of knowledge needed to maximize machine usage is far smaller now. Before you had to know how to write a ton of optimizations, nowadays you have to know how to write your code so the compiler have easy job optimizing it.
Before you had to manage the memory accesses, nowadays making sure you're not jumping actross memory too much and being aware how cache works is enough
Or more so - machines have gotten so fast, with so much disk and memory.. that people can ship slopware filled with bloatware and the UX is almost as responsive as Windows 3.1 was
if 2fa is "use the second factor that's on same device as first factor" (like when using phone apps in many cases, password + 2fa from email/sms/authenticator app on same device), I disagree.
It's actually extremely similar: the agent has to figure out a way to associate the next logical steps with the (often disconnected or nonsensical) directives the executive gave them.
It might be a little easier with a dog though. With a dog, you just give it treats and it doesn't care how you interpret what it typed.
They were functionally just fine; good even compared to some modern abominations.
But the look was just plain and ugly, even compared to some alternatives at the time.
> Things started going downhill, in my opinion, with the Windows XP "Fisher-Price" Luna interface and the Microsoft Office 2007 ribbon.
Yeah I just ran it with 2000-compatible look; still ugly but at least not wasting screen space
reply