Slaves of the feed - This is not the realtime we've been looking for

fragmede · on Dec 15, 2009

So how do can we solve this problem of drinking from a firehose?

I think a system like Google Reader system might have some traction here, wherein friends can recommend items to each other. With enough friends, you could make a metafeed out of those items. Make it a new service; you get paid for drinking from the firehose, and you pay to get human-filtered feeds. Throw some 'liked/hated this firehose drinker' and a pile of people at it, and you get interest spaces.

metajack · on Dec 15, 2009

There are decades of research in information retrieval that can easily be applied to this problem. Before Google showed up with PageRank, many algorithms were based on properties of a given document and the historical properties of the set (example, TF-IDF).

You don't need to drink from a firehose. The firehose will be throttled and filtered to given you the useful information that you want. This is where the future of real-time search lies.

Disclosure: I'm CTO of Collecta, one of these real-time search companies.

evlapix · on Dec 15, 2009

Search clearly isn't the answer. Can you create a search term that encompasses all of your interests? I hope not.

Surely filtering the "fire hose" with the query "programming news" will not yield satisfactory results.

olefoo · on Dec 15, 2009

That's pretty much what we have now, except we haven't really made it so that the monkeys picking out the good information are being rewarded proportional to the value they create. Right now, it's the monkey who can put the most ads in front of the most monkeys who gets the banana, not the monkeys who find the most worthwhile information from the most obscure sources.

tdoggette · on Dec 15, 2009

He correctly identifies a problem, but no amount of interdevice communication is going to tell me which 5 of boingboing's posts are most likely to be of personal interest.

Here's what'd help me with the bottleneck: a unified inbox with a wicked smart relevance algorithm. VR? Probably not.

JulianMorrison · on Dec 15, 2009

We've had plain old naive-Bayesian filtering for ages - I'm surprised more feed-readers don't use it.

sunkencity · on Dec 15, 2009

The feed items that truly contain new information is not one trains for traditionally with bayes or so, because it would imply something that has not been seen before, rather than just being similar to something already seen. Searching for meaningfully different stuff rather than similar stuff is a whole different problem.

I'd like to have a treemap view of google reader rather than a list, and where I could visually, in real time filter the feeds by adding negative and positive keywords/tags, and click around in various levels of a treemap.

Click on "technology" => make a new treemap of significant tags in that area, click on the next tag to dive into that space etc, and eventually see all the feed items somehow. Perhaps present them in all levels by some sidebar or hoverthing. Color code items for popularity / activity / freshness.

I'm thinking something like dabble.db but with a treemap interface and cappuchino js interface.

tdoggette · on Dec 15, 2009

Here's another way to approach the user's end of it: have them log in to their Google, Facebook, Twitter, et al, and shove all of it into a single stream. Let them filter by source, topic, relevance, time, and anything else you can implement. Record their preferences by watching what items they click through or spend time on or interact with, and your relevance sort gets better. Add other users' data, and you have a fantastic recommendation engine without ever making anyone rate anything.

Another problem to solve with current news reading is duplication. Even if you do want to read about Tiger Woods, there's a lot of duplication in the echo chamber of the internet.

sunkencity · on Dec 15, 2009

Yep, that's the position google has to analyze data, it's incredibly powerful to have almost the whole activitystream on the internet.

An interesting sidenote, apparently google could activate face recognition for google goggles but have chosen not to do that at this moment for privacy issues.

evlapix · on Dec 15, 2009

As much as I hate to say it, It seems like this is something that could be easily accomplished by piggybacking off of Twitter.

Just drop the link, some tags, and misc attributes along with a short description.. Is it much more complicated than that?

tdoggette · on Dec 15, 2009

Remember the end goal here is to deliver content to the user that they want to spend time consuming. Why use Twitter for anything but another way to find content?

evlapix · on Dec 15, 2009

I'm not sure I understand your comparison. What is the difference between finding content I want to spend time consuming, and finding content another way (via some content filtering process), which has every intention of limiting the content to what you "want" to consume.

I agree. The problem is definitely a matter of related vs. unrelated content. My suggestion (or babble) was more so just pointing out how easily we could implement a "source content filtering" logic into an already existing interface.. If we felt that was the best way to go about it.

tdoggette · on Dec 15, 2009

I just mean that I don't see a benefit of using Twitter as a delivery system.

evlapix · on Dec 15, 2009

Of course. No doubt.

JulianMorrison · on Dec 15, 2009

If you re-frame the problem not as searching for "different gold" but "similar chaff" (and hiding it) then simple trainable filters look more useful.

derefr · on Dec 15, 2009

I can see an interesting parallel to searching for prime numbers, which are basically our most pure "new, unknown primitive facts." In this model, a Bayesian filter trained to detect "sameness" would almost be a generalization of a sieve of Erastothenes.

evlapix · on Dec 15, 2009

That sounds like a great interface. But it doesn't stop you from having to do all of the filtering/categorizing. Maybe if after having done all of that work on your end, there would be a nice way to share that with followers.

tdoggette · on Dec 15, 2009

Nah, the only categorization that really needs to be done is by topic. It's easy to tell if something is, say, a tech article, or tech/programming, or tech/linux, based on source and keywords. If your algorithm did that once per item that's in anyone's feed, everything else is refinement. Ideally, a human won't need to touch it at all. Even if the machine gets it wrong, it could be corrected with a "this is in the wrong place" button. In any case, an article shouldn't rise too high in any category that it's not relevant for.

derefr · on Dec 15, 2009

I'd like to see the categories automatically created via SVD. I wonder how people would react to the knowledge that their favorite posts are written by 30-something males on Saturdays, and include the word "has" twice as often as the word "my"?

sunkencity · on Dec 15, 2009

I wouldn't mind an interface with some knobs and dials, given that one can escape the linearity of the feed, and replace that with a recombinant stratified interface which can present the information space from several different vantage points. It'd be a line of flight interface:

“This is how it should be done: Lodge yourself on a stratum, experiment with the opportunities it offers, find an advantageous place on it, find potential movements of deterritorialization, possible lines of flight, experience them, produce flow conjunctions here and there, try out continuums of intensities segment by segment, have a small plot of new land at all times. It is through a meticulous relation with the strata that one succeeds in freeing lines of flight...” – Deleuze and Guattari

http://www.linesofflight.net/linesofflight.htm

tdoggette · on Dec 15, 2009

An interface that weird had better be devastatingly well-implemented, or people will just be confused.

sunkencity · on Dec 15, 2009

Yep it could learn from the usage.

NathanKP · on Dec 15, 2009

I don't think computers can help anyone find truly interesting things. An algorithm would have to be really complicated to discover the fact that just because so many people are interested in New Moon, or Tiger Woods, that doesn't mean I am.

When "relevancy" algorithms are attempted they don't do nearly as good a job at serving niche groups as "community" moderated algorithms such as HN.

HN is an algorithm, in which each person's brain is an equation that estimates the "worth" of piece of information. As a result the final product is an extremely complicated result of "equations", as it were, that are good at finding relevant information.

I don't expect computers to be good at doing that for another ten years or so.

jgrahamc · on Dec 15, 2009

An algorithm would have to be really complicated to discover the fact that just because so many people are interested in New Moon, or Tiger Woods, that doesn't mean I am.

A Naive Bayesian text classifier could easily learn that. If you fed it news stories that you are interested in it would quickly discover that you never click on stories about Tiger Woods.

It would then classify incoming news for you.

In fact, my old email project POPFile has been adapted to support things like RSS and NNTP filtering with ease.

NathanKP · on Dec 15, 2009

That would work to eliminate "noise" info but such algorithms still do a poor job of helping one find new content and subjects that you have never expressed interest in before. I still trust groups of real people to help me in this regard more than I would trust an aggregator.

derefr · on Dec 15, 2009

Theoretically, you could have a (lazily-evaluated) subscription to the entire universe of discourse. It would start out completely hidden, but would gradually become visible in your reader as your filters trim down the stream to a tractable level. Thus, you wouldn't so much be saying "tell me something I don't know," but rather "tell me everything, and leave out the 99.9999% I don't care about."

joeyo · on Dec 15, 2009

I don't think computers can help anyone find truly interesting things. An algorithm would have to be really complicated to discover the fact that just because so many people are interested in New Moon, or Tiger Woods, that doesn't mean I am.

I don't see why, in principle, the full Bayesian inference couldn't be done. In other words, automatically compute how interesting the item is based on how the masses liked it in aggregate, how individuals with similar tastes to you liked it, how similar the item is to other items that you have liked, etc.

loganfrederick · on Dec 15, 2009

The problem with this approach is feasibility. Some companies like Netflix and Amazon have done some complex recommendation engines, and I still don't think they're processing as much data as an all-encompassing social network data aggregating recommendation engine would.

I believe Reddit was at one point going to try and create a news recommendation engine but found it impractical as the number of users grew. The number of permutations for what people in just one user's network like and dislike is staggering.

joeyo · on Dec 15, 2009

Given that it's presently possible to offload data processing to a number of cloud-based computing services, I'm not sure that we should be afraid of the computation cost. If you are willing to make some assumptions about independence (for example) you can do a lot of the computations in parallel. A more subtle issue would be how much the use of social net data would gain Amazon or Netflix over their current recommendation engines. I suspect it would be quite a lot, but that's just a guess.

ThomPete · on Dec 15, 2009

The idea is that interdevice communication creates the knowledge of context by logging everything you do but don't have the ability to to do. I know it needs more thinking but there is something there imho

ThomPete · on Dec 15, 2009

Lots of great comments, thanks so much.