So there's a private option that's terrible (WebMD) and a private option that's ...

ajmurmann · on Aug 30, 2022

That's where I continue to want to see what results in a search engine would look like that heavily punishes presence of advertisements in a result. All the SEO spam pages are ad-driven, so cutting out anything following that incentive should result in removal of all pages that follow that terrible SEO spam pattern that ruins search results.

hombre_fatal · on Aug 30, 2022

Punishment/vengeance is a popular idea around here, but you have to also remember that a search engine is supposed to bring you the most relevant results.

Filtering out, say, Stack Overflow or Reddit because it has ads doesn't help you when it answers your question and is perhaps the only thing on the internet that truly does.

People seem to think there's this ad-less replica of the internet, sitting right behind our ad-riddled internet, where everything they want exists for free, it's just hidden. In reality, the websites making money are the ones providing the vast majority of things people are looking for.

Use https://search.marginalia.nu if you want to severely punish ads.

nvrspyx · on Aug 30, 2022

Maybe instead of heavily punishing websites with ads, a search engine could instead punish heavily ad-driven websites. A lot of the SEO-exploiting blog mills are filled to the brim with ads where the goal is to get you to visit to view as many ads as possible, not provide good content that's funded by ads.

function_seven · on Aug 30, 2022

Some sort of ratio algorithm would be nice.

Does this site (in general, not just this page) have more than 5 advertisements per page? Between 2 and 4?

Does this site attempt to load 12 trackers? "Only" 4 trackers? Just 1?

Does an AI text analysis of the first few paragraphs match on this nonsense?:

> Fixing your gadget is important. Many people find that their gadget sometimes breaks. Gadget helps us do action easier, and improves our lives. We all hate it when our Gadget doesn't work the way we expect it to. It can be frustrating. Read below for tips on how to fix your gadget. (Followed by 3 more paragraphs of filler before getting to regurgitated gems like "reboot it".

I'm sure we have the AI tech now to semantically see this bullshit and downrank it. Right? (Ok, maybe I overestimate how easy this would be. Forgive me, I'm just ranting here)

gregmac · on Aug 30, 2022

Do you mean the advertising company that runs a search engine should punish pages in the results that... show their ads? Or just when it's a "lot" of their ads? Or should they only do that if the pages are showing ads from their competitors?

derefr · on Aug 31, 2022

I’m honestly surprised that Google thinks that a page with N ads deserves N times the CPM. The more ads, the less attention each ad can grab, no? I wonder whether just treating ads as zero sum (regardless of ad provider) — such that a page with 5 ads, 2 of which are Google ads, gets a payout of 2/5 the CPM of a page with one Google ad and no other ads — would basically drive all these SEO mills out of business. While also not really impacting honest ad-sponsored sites (like Reddit), that only tend to run one ad per page.

nvrspyx · on Aug 31, 2022

I mean when it's a lot of ads. It could perhaps be an ad to content ratio? It seems SEO spam contains more ads than content most of the time.

kelnos · on Aug 30, 2022

Or maybe the search engine shouldn't care about ads at all, and just figure out what is good content and what is bad content, and what actually answers queries well.

ajmurmann · on Aug 30, 2022

When I said "punish", I meant that the ranking algorithm should do that. It's not about vengeance, it's about filtering out SEO spam. The problem with filtering out SEO spam by detecting it as such is that it's by definition an arms race. That's why I propose to instead of looking for the symptom (SEO spam) pull it out at the wrong incentive structure that's causing it (ads).

zeruch · on Aug 30, 2022

"that heavily punishes presence of advertisements in a result." while that is pleasing to read at face value, it has two fundamental problems:

1. it's orthogonal to relevance of content (semi-solvable algorithmically I suspect) 2. it's antithetical to Google's core business model (a lot tougher nut to crack)

ajmurmann · on Aug 30, 2022

> 1. it's orthogonal to relevance of content

The entire point of my comment was that it's not orthogonal. The ads are what fuels the click bait and SEO-driven articles. Nobody for example would ever pay a subscription to a website that is just waffle filler. While stackoerflow has ads, it's much better in that regard to the SEO spam pages.

kelnos · on Aug 30, 2022

Aren't you contradicting yourself there? Stack Overflow is ad-supported, but is good. But you want search engines to penalize sites that have ads?

I hate ads, but I don't think we should be focusing on them here. Some sites that have ads have garbage content, and some sites that have ads have useful content. Just... find the useful content, and return it in search results. I know "just" is doing an awful lot of heavy lifting there, but I don't think "has/does not have ads" is as important a signal to a search engine's algorithm as you think it is.

fredophile · on Aug 30, 2022

> it's orthogonal to relevance of content

I disagree. The way content is presented matters. Splitting an article into 4-6 pages and filling those pages with ads makes me not want to read that content. I'd much rather go somewhere that has the same text in a single page and only a few ads.

beambot · on Aug 30, 2022

What about playing 2-3 unskippable video ads before watching the actual content? Thus, Google should degrade YouTube search results as well!

ajmurmann · on Aug 30, 2022

The ideal search engine would show me the ad-free page first given otherwise identical quality. Of course Google will never do anything like that. That's why I'm hoping for an alternative search engine to do so.

aendruk · on Aug 30, 2022

Teclis implements a fun approximation of this. It runs uBlock Origin on results and penalizes according to the number of items blocked.

dbcurtis · on Aug 30, 2022

Imagine a world where the biggest search engine made its money from advertising. In that kind of a world, wouldn’t the search engine primarily be incentivized to show you the results pages with the most advertising, regardless of the quality of the content?

lupire · on Aug 30, 2022

No, because people would stop using the crappy search engine.

That's how the world was before Google.

dbcurtis · on Aug 30, 2022

And that differs from today because.... why?

lupire · on Aug 30, 2022

Anyone who wants attention is motivated to do SEO. Should engines downrank every site that has good SEO? That is, downrank every site that ranks highly?

They already look at things like clickthroughand dwell time and bounce back. If enough people dislike Example.com enough to avoid clicking on it or come back to search after visiting it, the engine learns that it is a bad result.

Maybe the problem is that most people like what you don't like.

Calavar · on Aug 30, 2022

No, they key is to differentiate SEO'd pages with useful content from SEO'd pages with useless content.

This is a game as old as search engines. In 2005, it meant filtering out sites that were just lists of keywords, not coherent sentences and paragraphs. It meant for giving extra points to articles with structure, such as header tags and paragraphs, as opposed to just blobs of text. It meant using PageRank to organically discover which pages real people thought were useful.

It's a much subtler and more difficult problem in 2022, but there are also better tools to do it (big NLG models). It just seems that Google lost interest in quality control at some point.

And I would guess they lost interest in quality control because of Chrome's market penetration. Chrome is a browser monopoly at this point, and with Google being the default search engine on Chrome, they no longer have to give quality results to maintain their search user base. On top of that, they control such a large share of the ad market that any SEO spam website is more likely than not to be using AdSense. Which means they have a financial incentive to deliver page views to SEO spam sites, which tend to have higher ad/content ratios.

Consultant32452 · on Aug 30, 2022

This seems like a good fit for a ! solution like duckduckgo. !gov !universities These may already exist.

ajmurmann · on Aug 30, 2022

That stuff definitely helps. That's also why do many now just search Reddit. However, wouldn't it be nice if the search engine could be smart enough to figure that out itself?

derefr · on Aug 31, 2022

The problem is that people clicking+dwelling on something is not highly correlated with it serving their needs.

See: clickbait YouTube videos that show you something you really want to see in the thumbnail, then spend 10 minutes doing something else before showing it, and when you see it it’s a tiny aside with no more context than what you got in the thumbnail. If it’s even in the video at all.

Those videos have both high clickthrough (thus “click bait”) and also high dwell time (from people waiting for the thing they wanted to see to show up.) They do also have high bounceback, but only from people who recognize what’s going on. “A new sucker’s born every minute”, and those suckers will click the video and watch it, because they don’t yet know the principle that this specific kind of enticing thumbnail+title format implies that they won’t find what they want here.

These metrics all measure, effectively, “wanting” rather than “having.” It’s like measuring food by how addictive it is, rather than by how satisfied it makes you. You’ll end up optimizing toward cheetos — literally flavoured air — rather than toward anything that fills your stomach. People might enjoy cheetos while they’re eating them, but if they’re genuinely hungry, cheetos won’t solve their problem — they’ll still be hungry afterward.

droopyEyelids · on Aug 30, 2022

This is so simple its easy to overlook the fact that its also ingenious

rootusrootus · on Aug 30, 2022

> Sounds like a Google problem and apparently only a Google problem.

I want something like webrings to become A Thing again. A user curated search engine. And the users doing the curated need to be vetted. I don't know if this is even possible, but I get tired of having to come to HN to get a human recommendation that is miles better than the algorithmic crap from the current search engines.

adolph · on Aug 30, 2022

There is the "awesome list" phenomena:

Search "medical information awesome list"

https://github.com/NeovaHealth/awesome-health

https://github.com/lalaithan/awesome-health

https://github.com/jeromecc/awesome-health

Fork and make your own!

lupire · on Aug 30, 2022

Web of trust, not web rings.

Post your bookmarks, share to your friends, encourage your friends to do the same. Import those bookmarks into a search engine site fliter extension.

hattmall · on Aug 30, 2022

T100 sites and the like seemed to be the peak of discovering interesting and relevant content to me.

cecilpl2 · on Aug 30, 2022

> And the users doing the curated need to be vetted.

It comes back to the age-old question: Who vets the vetters?

echelon · on Aug 30, 2022

Unsolvable since it's a network of fallible humans we attempt to topologically score.

You can make decent attempts, such as academic peer review. Even this system perpetuates its own problems (beta amyloid) and has perverse incentives (publish or perish, falsified results), though.

Semantic web had some good ideas about this. Networks of signed FOAF data attached to articles and posts. You could form a side graph of trust information that you could revoke at any time.

mitchdoogle · on Aug 30, 2022

You do. I imagine people or groups curating lists of pages or sites - they decide what to put on their lists, but you decide to include them in your personal search engine or not. Or you could fork their list and edit as you see fit.

Cthulhu_ · on Aug 30, 2022

It's a bit of a conundrum; on the one side, the NHS and (in a different area) MDN are better, more authoritative, etc sources, so Google should promote those. On the other, this would mean that Google can no longer cite neutrality or hide behind "the algorithm", as has been their legal defense against a ton of lawsuits where the suers said one websites should go higher or lower in the rankings.

gtirloni · on Aug 30, 2022

If it's a human curating content or an algorithm doing so, I don't see how that helps Google on a lawsuit. Unless they blame sentience.

The best algorithms adapt to feedback. Surely Google's own algorithm can accept Google's feedback to adjust for flaws in it.

nradov · on Aug 30, 2022

What lawsuit? There is no legal basis for a lawsuit. As a private corporation, Google is free to rank search results however they like regardless of whether that's done by humans or algorithms.

gtirloni · on Aug 31, 2022

You'd think so but Google has essentially a monopoly depending how you look at it and some countries have stricter rules than others.

nradov · on Aug 31, 2022

So what? You haven't cited a successful lawsuit against Google on that issue, or even a plausible legal theory. Have you discussed this with an actual attorney?

gtirloni · on Aug 31, 2022

I wasn't the one mentioning lawsuits. Check OP's message I was replying to. You seem confused.

lupire · on Aug 30, 2022

Google chooses to rank "authoritative" sites based on its own notion of authoritativity (which they don't share, but they decide).

They implement it as agnostic tuning as much as possible, avoiding single human chery picking sites. They use panels of humans (mturk style) for quality ratings.

Could you imagine the outrage if Google said "The government is always the best source about everything?"

What even would be the point? Use the government search engine for that use case.

david_allison · on Aug 30, 2022

Google already does this. Searching for "YMYL" (Your Money or Your Life) should produce useful results:

> For pages about clear YMYL topics, we have very high Page Quality rating standards because low quality pages could potentially negatively impact a person’s health, financial stability, or safety, or the welfare or well-being of society.

https://static.googleusercontent.com/media/guidelines.raterh...

andrewia · on Aug 30, 2022

I disagree. There is lots of useful information around the internet that is hard to find. A lot of content simply doesn't have the keywords it needs to be discovered.

For example, somebody could search for "best rewards credit card for married couple with normal lifestyle" and find some listicle full of referral links that's a few years old. But the best advice might be in a Reddit or Twitter discussion titled "what's a good card for my P2 with no AF, low MSR, and at least 2CPP" whose replies are terse comments like "CSP?" or "BofA Custom; select CNP transactions as the 3% category and order everything online. they give high CL if you want to AU instead". There's enough jargon and levels of understanding that Google can't find the best advice, only good keyword matches.

I recently had a similar issue when I searched for something well-phrased and generic ("how to stop wood joints from squeaking"). The results were lackluster, but after a few attempts, I found the most helpful results were actually under a specific application ("how to stop a bed frame from squeaking").

So what's my point? It's not Google's fault. They are trying their best of optimize text search, with some fancy word associations and other stuff to help. But it's going to take a lot of effort to make an efficient, scalable, general-purpose AI that can achieve near-human understanding of text and then find the most relevant articles related to that. This "super-Google" would have to comprehend every post and comment on the internet, contextualize the knowledge ("a squeaky bed is caused by squeaky wood joints or fasteners, so this advice is useful for any kind of squeaking wood furniture"), and quickly generate results for every query.

It's not WebMD's fault they have ad revenue that lets them hire writers to SEO their articles with the best keywords. Nor is it Merck's fault that they are using specific language that doesn't cover all the phrases that a person could search. Nor is it Google's for making a search engine without human comprehension. It's just a technological gap that can't be bridged in the present day.

jleyank · on Aug 30, 2022

No, as I suspect the ad revenues are just fine the way they are now. Google isn’t altruistic. They want ad rates.

winternett · on Aug 30, 2022

Google can make or break any online business they want to. It is what it is... We let them get there... That being said, I haven't even tried to use Bing, and it's pretty much impossible to convince me that Microsoft Edge is worth a second look after all the years of MSIE, and how Windows has been slowly devolving over time.

If worst comes to worst, just add "reddit" to your search term, and then all you have to do is determine whether he answers you find look like they came from a human, a spammer, or a corporation.

Melatonic · on Aug 30, 2022

Edge is chromium based now so......