Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's the entire point. With the right one piece of data, you can draw powerful and useful inferences. You don't need data on the 100m dash running speed of every single human being on Earth. That one item can be sufficient. It all depends on what the data is and what you're trying to infer. How is that incorrect?


But that's not what he's doing. He isn't concluding that everyone runs slower than the record (which is almost a tautology), he's trying to conclude things about the average running speed of people. That's extremely deceptive, because you can't actually conclude a damn thing about the average based on the minimum.


Except that it's not lower than the minimum, but that isn't much.


What can you infer from just knowing one human being can run 100m in 9.81s?

What if you knew instead that one particular human being can run 100m in 16s?


We can’t infer much about how fast others run. Only that it’s possible to run 100m in 9.81s. We don’t know if everyone else can do that, or no one.


Yes we do, we know it's the world record. That's the data point.


I would not call it "powerful" that you can infer from "X is the world record" that "nobody is faster than X".

It's not even inference, it's simply saying the same thing in different ways.

And in any case, it's of course not very relevant for his main argument anyway, since don't don't have any such data-points about aliens.

If we knew that "humans are the smallest intelligent beings in the universe" well then yes could "infer" that all aliens are larger. But that is trivial and pointless.


The world record is not 1 data point, it is a property of all human running in history.


What is the datapoint in this case?

1. Bolt runs 100m in 9.58s

or

2. Bolt runs 100m in 9.58s and he’s the fastest ever


It's what he said: Usain Bolt's 100m World Record of 9.58 seconds.


And what can you infer from that?


Given that it's a world record, you can infer that the majority of the species cannot run so quickly.


That is not a "powerful and useful inference", it's not an inference at all. It is simply the definition of a world record.


the point is that a "single datapoint" does not mean the same thing as a "single datapoint", depending on what that datapoint is and the context around it. The myth he was specifically busting in the original paragraph was that you cannot learn anything from a single datapoint, but that's only true if that datapoint is drawn from a random distribution.

A single datapoint can tell you a lot, if you know other things about it. In this case, knowing that it's a world record dramatically changes its utility. It's powerful in the sense that it tells you a lot about the overall dataset, not in the sense that it's a novel insight.


The global minimum of a distribution is not, in any way, a "single datapoint". It is a property of all the data points that ever were - in this case, it contains information about all humans that have ever run in the history of the species.

It is in no way comparable to knowing how many humans there are, which can't actually tell you anything about how many aliens there could be.


Sure, if you happen to know the global minimum value, then you know a lot. For aliens though, we don't have anything resembling that, so the conclusions we can draw are much less interesting.


The purpose was debunk that myth so that he can derive further (though less information) about aliens, and then the author goes on to do so.

That is, now that you've admitted that a single datapoint does in fact have the potential for providing extreme information in the extreme case, then you should expect to find (some) information in the other cases... so that's what he does. As long as you assume its not a purely random distribution, which he does not.


There is no such myth. Nobody is surprised that you can say something about a distribution if you know some facts about the distribution. Using "single data-point" in two completely different ways is just bait and switch.

Any data pertaining to humans would be a single data-point in the true sense, which in itself gives almost zero information. You can't compare that to knowing global properties of the sample or the distribution, it's completely different things.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: