GitHub and StackOverflow started out really biased in terms of their communities - GitHub with Ruby, and StackOverflow with Microsoft languages. Do you think they've sufficiently lost that bias?
On another note, you know what else could be a great source(s) for data? Google Scholar, CiteSeerX and arXiv. It'd be really interesting to compare the language usage between "industry" and academia.
I think StackOverflow has; GitHub still seems a bit biased towards scripting languages like Ruby or Javascript though. In any case, since you show the graphs from the various sources, I don't think it matters if one source is more biased than another. If, for example, you used CodePlex as a data source, you'll see a huge bias towards C# -- but visitors to your site could simply draw their own conclusions from the charts.
I'm not sure I even trust that... I do lots of bindings from OCaml to C, and whereas I consider them to be OCaml projects, GitHub sees they're more C by LOC and counts them as C.
If nearly all ruby programmers put their code on github, and you don't count it but you do count google code, then isn't that a bias i the sample? Wouldn't it make more sense to include all the major code hosting sites, including github?
Also, maybe add GitHub and StackOverflow as data sources?