As someone who worked in the field of "semantic XML processing" at the time I ca...

As someone who worked in the field of "semantic XML processing" at the time I can tell you that while the "XML processing" part was (while full of unnecessary complications) well understood, the "semantic" part was purely aspirational and never well understood. The common theme with the current flurry of LLMs and their noisy proponents is that it is, in both cases, possible to do worthwhile and impressive demos with these technologies and also real applications that do useful things, but people who have their feet on the ground know that XML doesn't engender "semantics" and LLMs are not "conscious". Yet the hype meddlers keep the fire burning by suggesting that if you just do "more XML" and build bigger LLMs, then at some point real semantics and actual conscience will somehow emerge like a hatching chicken from the egg. And, being emergent properties, who is to say semantics and conscience will not emerge, at some point somehow? A "heap" of grains is emergent after all, and so is the "wetness" of water. But I have strong doubts about XHTML being more semantic than HTML5.

And anyway, even if Google had nefarious intentions and even if they managed to steer the standardization, one has also to concede that all search engines before Google were encumbered by too much structure, too rigid approaches. When you were looking for a book in a computerized library at that point it was standard to be sat in front of a search form with many, many fields; one for the author's name, one for the title and so forth, and searching was not only a pain, it was also very hard to do for a user without prior training. Google had demonstrated it could deliver far better results with a single short form field filled out by naive users that just plonked down three or five words that were on their mind et voila. They made it plausible that instead of imposing a structure onto data at creation time maybe it's more effective to discover associations in the data at search time (well, at indexing time really).

As for the strictness of documents, I'm not sure what it will give you what we don't get with sloppy documents. OK web browsers could refuse to display a web page if any one image tag is missing the required `alt` attribute. So now what happens, will web authors duly include alt="picture of a cat" for each picture of a cat? Maybe, to a degree, but the other 80% of alt tags will just contain some useless drivel to appease the browser. I'm actually more for strict documents than I used to be, but on the other hand we (I mean web browsers) have become quite good at reconstructing usable HTML documents from less-than perfect sources, and the reconstructed source is also a strictly validating source. So I doubt this is the missing piece; I think the semantic web failed because the idea never was strong, clear, compelling, well-defined and rewarding enough to catch on with enough people.

If we're honest, we still don't know, 25 years later, what 'semantic' means after all.