So there's a private option that's terrible (WebMD) and a private option that's terrific (Merck Manuals). And it's the terrible one that ranks well at Google.
Sounds like a Google problem and apparently only a Google problem.
That's where I continue to want to see what results in a search engine would look like that heavily punishes presence of advertisements in a result. All the SEO spam pages are ad-driven, so cutting out anything following that incentive should result in removal of all pages that follow that terrible SEO spam pattern that ruins search results.
Punishment/vengeance is a popular idea around here, but you have to also remember that a search engine is supposed to bring you the most relevant results.
Filtering out, say, Stack Overflow or Reddit because it has ads doesn't help you when it answers your question and is perhaps the only thing on the internet that truly does.
People seem to think there's this ad-less replica of the internet, sitting right behind our ad-riddled internet, where everything they want exists for free, it's just hidden. In reality, the websites making money are the ones providing the vast majority of things people are looking for.
Maybe instead of heavily punishing websites with ads, a search engine could instead punish heavily ad-driven websites. A lot of the SEO-exploiting blog mills are filled to the brim with ads where the goal is to get you to visit to view as many ads as possible, not provide good content that's funded by ads.
Does this site (in general, not just this page) have more than 5 advertisements per page? Between 2 and 4?
Does this site attempt to load 12 trackers? "Only" 4 trackers? Just 1?
Does an AI text analysis of the first few paragraphs match on this nonsense?:
> Fixing your gadget is important. Many people find that their gadget sometimes breaks. Gadget helps us do action easier, and improves our lives. We all hate it when our Gadget doesn't work the way we expect it to. It can be frustrating. Read below for tips on how to fix your gadget. (Followed by 3 more paragraphs of filler before getting to regurgitated gems like "reboot it".
I'm sure we have the AI tech now to semantically see this bullshit and downrank it. Right? (Ok, maybe I overestimate how easy this would be. Forgive me, I'm just ranting here)
Do you mean the advertising company that runs a search engine should punish pages in the results that... show their ads? Or just when it's a "lot" of their ads? Or should they only do that if the pages are showing ads from their competitors?
I’m honestly surprised that Google thinks that a page with N ads deserves N times the CPM. The more ads, the less attention each ad can grab, no? I wonder whether just treating ads as zero sum (regardless of ad provider) — such that a page with 5 ads, 2 of which are Google ads, gets a payout of 2/5 the CPM of a page with one Google ad and no other ads — would basically drive all these SEO mills out of business. While also not really impacting honest ad-sponsored sites (like Reddit), that only tend to run one ad per page.
Or maybe the search engine shouldn't care about ads at all, and just figure out what is good content and what is bad content, and what actually answers queries well.
When I said "punish", I meant that the ranking algorithm should do that. It's not about vengeance, it's about filtering out SEO spam. The problem with filtering out SEO spam by detecting it as such is that it's by definition an arms race. That's why I propose to instead of looking for the symptom (SEO spam) pull it out at the wrong incentive structure that's causing it (ads).
"that heavily punishes presence of advertisements in a result." while that is pleasing to read at face value, it has two fundamental problems:
1. it's orthogonal to relevance of content (semi-solvable algorithmically I suspect)
2. it's antithetical to Google's core business model (a lot tougher nut to crack)
The entire point of my comment was that it's not orthogonal. The ads are what fuels the click bait and SEO-driven articles. Nobody for example would ever pay a subscription to a website that is just waffle filler. While stackoerflow has ads, it's much better in that regard to the SEO spam pages.
Aren't you contradicting yourself there? Stack Overflow is ad-supported, but is good. But you want search engines to penalize sites that have ads?
I hate ads, but I don't think we should be focusing on them here. Some sites that have ads have garbage content, and some sites that have ads have useful content. Just... find the useful content, and return it in search results. I know "just" is doing an awful lot of heavy lifting there, but I don't think "has/does not have ads" is as important a signal to a search engine's algorithm as you think it is.
I disagree. The way content is presented matters. Splitting an article into 4-6 pages and filling those pages with ads makes me not want to read that content. I'd much rather go somewhere that has the same text in a single page and only a few ads.
The ideal search engine would show me the ad-free page first given otherwise identical quality. Of course Google will never do anything like that. That's why I'm hoping for an alternative search engine to do so.
Imagine a world where the biggest search engine made its money from advertising. In that kind of a world, wouldn’t the search engine primarily be incentivized to show you the results pages with the most advertising, regardless of the quality of the content?
Anyone who wants attention is motivated to do SEO. Should engines downrank every site that has good SEO? That is, downrank every site that ranks highly?
They already look at things like clickthroughand dwell time and bounce back. If enough people dislike Example.com enough to avoid clicking on it or come back to search after visiting it, the engine learns that it is a bad result.
Maybe the problem is that most people like what you don't like.
No, they key is to differentiate SEO'd pages with useful content from SEO'd pages with useless content.
This is a game as old as search engines. In 2005, it meant filtering out sites that were just lists of keywords, not coherent sentences and paragraphs. It meant for giving extra points to articles with structure, such as header tags and paragraphs, as opposed to just blobs of text. It meant using PageRank to organically discover which pages real people thought were useful.
It's a much subtler and more difficult problem in 2022, but there are also better tools to do it (big NLG models). It just seems that Google lost interest in quality control at some point.
And I would guess they lost interest in quality control because of Chrome's market penetration. Chrome is a browser monopoly at this point, and with Google being the default search engine on Chrome, they no longer have to give quality results to maintain their search user base. On top of that, they control such a large share of the ad market that any SEO spam website is more likely than not to be using AdSense. Which means they have a financial incentive to deliver page views to SEO spam sites, which tend to have higher ad/content ratios.
That stuff definitely helps. That's also why do many now just search Reddit. However, wouldn't it be nice if the search engine could be smart enough to figure that out itself?
The problem is that people clicking+dwelling on something is not highly correlated with it serving their needs.
See: clickbait YouTube videos that show you something you really want to see in the thumbnail, then spend 10 minutes doing something else before showing it, and when you see it it’s a tiny aside with no more context than what you got in the thumbnail. If it’s even in the video at all.
Those videos have both high clickthrough (thus “click bait”) and also high dwell time (from people waiting for the thing they wanted to see to show up.) They do also have high bounceback, but only from people who recognize what’s going on. “A new sucker’s born every minute”, and those suckers will click the video and watch it, because they don’t yet know the principle that this specific kind of enticing thumbnail+title format implies that they won’t find what they want here.
These metrics all measure, effectively, “wanting” rather than “having.” It’s like measuring food by how addictive it is, rather than by how satisfied it makes you. You’ll end up optimizing toward cheetos — literally flavoured air — rather than toward anything that fills your stomach. People might enjoy cheetos while they’re eating them, but if they’re genuinely hungry, cheetos won’t solve their problem — they’ll still be hungry afterward.
> Sounds like a Google problem and apparently only a Google problem.
I want something like webrings to become A Thing again. A user curated search engine. And the users doing the curated need to be vetted. I don't know if this is even possible, but I get tired of having to come to HN to get a human recommendation that is miles better than the algorithmic crap from the current search engines.
Unsolvable since it's a network of fallible humans we attempt to topologically score.
You can make decent attempts, such as academic peer review. Even this system perpetuates its own problems (beta amyloid) and has perverse incentives (publish or perish, falsified results), though.
Semantic web had some good ideas about this. Networks of signed FOAF data attached to articles and posts. You could form a side graph of trust information that you could revoke at any time.
You do. I imagine people or groups curating lists of pages or sites - they decide what to put on their lists, but you decide to include them in your personal search engine or not. Or you could fork their list and edit as you see fit.
It's a bit of a conundrum; on the one side, the NHS and (in a different area) MDN are better, more authoritative, etc sources, so Google should promote those. On the other, this would mean that Google can no longer cite neutrality or hide behind "the algorithm", as has been their legal defense against a ton of lawsuits where the suers said one websites should go higher or lower in the rankings.
What lawsuit? There is no legal basis for a lawsuit. As a private corporation, Google is free to rank search results however they like regardless of whether that's done by humans or algorithms.
So what? You haven't cited a successful lawsuit against Google on that issue, or even a plausible legal theory. Have you discussed this with an actual attorney?
Google chooses to rank "authoritative" sites based on its own notion of authoritativity (which they don't share, but they decide).
They implement it as agnostic tuning as much as possible, avoiding single human chery picking sites. They use panels of humans (mturk style) for quality ratings.
Could you imagine the outrage if Google said "The government is always the best source about everything?"
What even would be the point? Use the government search engine for that use case.
Google already does this. Searching for "YMYL" (Your Money or Your Life) should produce useful results:
> For pages about clear YMYL topics, we have very high Page Quality rating standards because low quality pages could potentially negatively impact a person’s health, financial stability, or safety, or the welfare or well-being of society.
I disagree. There is lots of useful information around the internet that is hard to find. A lot of content simply doesn't have the keywords it needs to be discovered.
For example, somebody could search for "best rewards credit card for married couple with normal lifestyle" and find some listicle full of referral links that's a few years old. But the best advice might be in a Reddit or Twitter discussion titled "what's a good card for my P2 with no AF, low MSR, and at least 2CPP" whose replies are terse comments like "CSP?" or "BofA Custom; select CNP transactions as the 3% category and order everything online. they give high CL if you want to AU instead". There's enough jargon and levels of understanding that Google can't find the best advice, only good keyword matches.
I recently had a similar issue when I searched for something well-phrased and generic ("how to stop wood joints from squeaking"). The results were lackluster, but after a few attempts, I found the most helpful results were actually under a specific application ("how to stop a bed frame from squeaking").
So what's my point? It's not Google's fault. They are trying their best of optimize text search, with some fancy word associations and other stuff to help. But it's going to take a lot of effort to make an efficient, scalable, general-purpose AI that can achieve near-human understanding of text and then find the most relevant articles related to that. This "super-Google" would have to comprehend every post and comment on the internet, contextualize the knowledge ("a squeaky bed is caused by squeaky wood joints or fasteners, so this advice is useful for any kind of squeaking wood furniture"), and quickly generate results for every query.
It's not WebMD's fault they have ad revenue that lets them hire writers to SEO their articles with the best keywords. Nor is it Merck's fault that they are using specific language that doesn't cover all the phrases that a person could search. Nor is it Google's for making a search engine without human comprehension. It's just a technological gap that can't be bridged in the present day.
Google can make or break any online business they want to. It is what it is... We let them get there... That being said, I haven't even tried to use Bing, and it's pretty much impossible to convince me that Microsoft Edge is worth a second look after all the years of MSIE, and how Windows has been slowly devolving over time.
If worst comes to worst, just add "reddit" to your search term, and then all you have to do is determine whether he answers you find look like they came from a human, a spammer, or a corporation.
Sounds like a Google problem and apparently only a Google problem.