Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Whether a sample is representative is a different issue from whether the response rate is high.

For example, you can have a population of 10,000 jobs, 9,000 of which is hiring for Clojure and 1,000 of which is hiring for Forth. If you sample the 9,000 Clojure jobs, then you might conclude that 100% of all 10,000 jobs are for Clojure. But in reality, only 90% are.

Instead, you can sample 100 of the 10,000 jobs at random. The expected value of the average of whether a sampled job is a Clojure job will be 90%. There will be noise but that can be statistically accounted for.

If the population that you want to draw conclusions about is, say, the complete universe of jobs ever offered in the US in 2021, it will be difficult to find either a data set that contains this universe or a data set that is arguably a random subset of the universe. So representativeness is hard.

You could adjust your population definition to achieve plausible representativeness. For example, take the population of all developer jobs at companies that had an IPO between 2018 and 2021. Maybe you have a way to compile this data set from some source. Then you limit the scope of your claims but you will be more credible.

Another thing that you can do is take an existing data set that you know to be representative and compare the distribution of job characteristics in your sample to that. For example, you might find that your sample is more likely to include web development jobs than your reference data set. Then you know that your sample is not representative, and you know in what way it isn't. Or you might find that your sample is comparable to your reference data set. This can give you some confidence that your findings generalize.



Agree; even if we don't know the distribution or representativeness of the samples we can do some guesstimates, as I just did; it's as good or as bad as what you see on Tiobe or other ranking services; the error is likely to be larger the smaller the population under consideration, which is why I'm rather skeptical about the salary statistics on Clojure.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: