Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Any good books on graphing/charting/visualization?
47 points by lukev on April 14, 2010 | hide | past | favorite | 38 comments
My day job has me working on a project that has vast amounts of data available in tabular form, but no way to analyze the data except to search it and display it in more tables. Pages and pages of tables.

I'd love to build a way to query the data and display the results visually, and I'm looking for books that demonstrate various techniques for visualizing data that (in many cases) is quite complex. Right now, my experience doesn't really extend beyond basic pie/bar/scatter graphs.

I've heard amazing things about Tufte, but looking at the previews of his books on Amazon they seem mostly focused on artistic presentations of information - something a marketer or analyst would create manually, not dynamic charts generated from terabytes of data. Is that the case? Does it still have useful information for the sort of thing I'm doing, or can anyone recommend something more suitable?



Edward Tufte's book The Visual Display of Quantitative Information is a monumental book. He writes not about how to make your graphs look pretty, but how to display vast quantities of data and distill them down into useful graphics that communicate themselves effectively.

He provides examples of good and bad graphs, but more importantly, explains what exactly it is that makes those examples good and bad, and further generalizes it so you understand how to make good visualizations. If you don't want to shell out the money for it, it's probably at your library (remember those?).

Additionally, if I were you, I'd stay way from statistical approaches to displaying information unless you have some background or are willing to learn about it -- it tends to be highly technical and is probably too complex for what you're trying to do. Basic stats might help you, but not as much as Tufte will.


> Edward Tufte's book The Visual Display of Quantitative Information is a monumental book.

Agreed, its absolutely excellent. Thanks to Y Combinator for listing it in the book list.

> Additionally, if I were you, I'd stay way from statistical approaches to displaying information...

Not agreed. In my opinion you might have missed what I felt was a main point of that book: Always learn the appropriate statistics required to understand the data, choose a correct visualization method to communicate those statistics effectively, and once you've understood it fully, confirmed the results, and removed all the cruft, then publish it.


What I meant was to shy away from approaches that are PURELY based on statistics if you have no background in it, because it can get overwhelming quickly.

Of course, if it's worth it to invest the time required to have a fundamental understanding of statistics, by all means do so -- but if this is a one-time or a short-term project, I'm not sure the time commitment is worth it.


Hm... In my experience, that's the best part of projects, is being able to learn something new while doing them.


For pure data visualization, Tufte is absolutely the brilliant and helps you avoid a whole raft of bad behavior that leads to sloppy, hard to understand graphics. I can't second your suggestion enough.

I would add that his one day course is a great way to get started reading his books (especially if you can get your employer to pay for it). He's a great speaker and you get all the books as part of the course. I wish all the makers of charting libraries, toolkits, and data analysis software were more familiar with his work. It would save us from some truly awful junk.

One area where I disagree with him frequently is when he strays from data visualization into user interface design. In general, I find his user interface preferences result in UI's that are too cluttered. One of his main rules is that data presentations should be very dense. However, I disagree that this approach works as well for user interfaces as it does when visualizing data. If you look at his web site (http://www.edwardtufte.com/) you can see his UI philosophy on display. I find Don Norman to be much better in this area than Tufte.


The principles of Tufte's that I always come back to are:\

maximize the data/ink ratio - figure out how to show more data with fewer lines, symbols, colors

Clarify by adding data - show the broad trend but allow a viewer to drill down into specific areas of interest

Here is a good example of both points, 2200 data points coherently graphed: http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0...


I do a lot of charting for financial services software. The best practical book that I've found is The Wall Street Journal Guide to Information Graphics: The Dos and Don'ts of Presenting Data, Facts, and Figures. Simple but practical guidelines for displaying pie/line/area graphs.

But for your situation, check out some of these sites which focus on more complicated graphing techniques:

http://www.perceptualedge.com/examples.php

http://blogof.francescomugnai.com/2009/04/50-great-examples-...

http://interface.fh-potsdam.de/infodesignpatterns/news.php

http://patternbrowser.org/

http://webdesignledger.com/inspiration/15-stunning-examples-...

http://www.tableausoftware.com/public/


I would highly recommend learning R (http://www.r-project.org/). It is very easy to directly query databases and R has many visualisation packages, including the awesome ggplot2 (http://had.co.nz/ggplot2/) based on the grammar of graphics. I'm writing an R graphs cookbook and my startup's visualisation product is also built on R (see profile and feel free to email me if you need any help).

Also, look at Ben Fry's Processing books (http://benfry.com/). Here's an introductory tutorial - http://blog.blprnt.com/blog/blprnt/your-random-numbers-getti...).

If you're familiar with Python, check out Matplotlib (https://www.packtpub.com/matplotlib-python-development/book).


I second Python+Matplotlib and toss in another notion: animation. With Matplotlib you can generate a plot, save it as a .png file, step to the next plot, save it, etc. all unattended. Then combine your plots into a movie file using one of many programs designed to do just that and presto! your own movie.

I've done this on occasion and with success. When it fits, animation works wonders. And when it doesn't, look for "duck" in Tufte.


I like Processing, and you can do pretty much anything you want with it but I will point out that it was not necessarily built with charts, graphs, and plots in mind; it's relatively "low-level." Any type of graph or visualization you want will have to built pretty much from scratch with the primitives available. This might not put you off, but it's something to keep in mind.


Yeah in that sense R is a better choice, but to make good looking graphs in R takes a bit of learning.


I just got the book "R in Action" from the Manning MEAP program. Chapter 3 is all about the graphing capabilities built into R. I had a hard time trying to learn that stuff from the online docs but I find the book very easy to follow.


Oh cool, I haven't read that book, will look it up. The UseR series books are very good too, although a bit expensive. If you have access to a good University library, I recommend looking them up.


Any experience with Incanter? Supposedly it's a port of R to Clojure, and since I love Clojure and everything else is already running on the JVM I'd lean towards using it as opposed to another standalone program if it's any good.


I have used it, but as a warning I'm not an expert user of R, Clojure, or Incanter. Incanter is very pleasant to use, because I personally much prefer coding in Clojure to R. The R language is powerful, but I don't find it obvious at all. You can also use Processing from Clojure if you need to roll your own charts, in addition to the very capable charting library that's already there to handle most routine visualization.

That said, Incanter is immature compared to R. If Incanter does what you need, it might be a great fit, but R has a huge community and list of libraries right now. There's an R to Clojure bridge, but if you don't yet know R I'm not sure it's very helpful.

Finally, Incanter is developing at a break-neck pace. Even if it doesn't do what you want today, it might tomorrow. Literally. I'd love to see the user base grow, because Clojure seems like a perfect fit for statistical computing.


I haven't tried Incanter yet, but from what I've read about it, it sounds very interesting. Perhaps that is a good way to get started learning Clojure (like I'm learning Python via Matplotlib). Since Incanter is still in early stages of development, I think I'd still prefer R to do any serious work, because it's probably a lot more solid and has a lot more packages.


Tufte is great, but he's extremely heavy and a bit dated. He is about 80% brilliant 20% completely missing the point. It's very strange.

If you are looking for a smaller book I've found the WSJ Guide to Information Graphics by Dona Wong to be pretty decent and pretty straight forward, and it's about 100 pages. It's not too focused on finance either, although that's what I got it for (I do front end development for financial analysis company - lots of charting).

http://www.amazon.com/Street-Journal-Guide-Information-Graph...


I second this. I found his books interesting, but a sizable portion of the content is opinion rather than facts that are demonstrated via studies and the like. They're fun to look at, and there's a few important principles in them, but they're more of a coffee table book than a real reference.


I would check out Visualizing Data - http://oreilly.com/catalog/9780596514556

Also, I enjoy this site http://flowingdata.com.


Visualizing Data is a good choice, but I would also suggest you note the recent string of critiques of content-free visualization on flowingdata and start out with Tufte if you are new to the field.


I've read The Visual Display of Quantitative Information by Tufte, and I think it would benefit you even though you are not talking about manually generating charts. For example, he talks about how it's easy to be misleading with a chart based on how you calibrate the axes, which is something you'd still need to do even with dynamically generated visualizations.


The Tufte books are brilliant. For dynamic charts, his first book (The Visual Display of Quantitative Information) is the most relevant, it covers the theory - how to tell a good representation from a bad one - and the basics.

Readings in Information Visualization ( http://www.amazon.co.uk/Readings-Information-Visualization-I... ) is a collection of papers covering a wide range of techniques for a wide range of tasks.

Apart from that, it's mostly a matter of picking up interesting ideas wherever you find them. flowingdata.com is nice, same with http://www.informationisbeautiful.net/


Information is Beautiful is great for fun stuff, but there are a ton of bad practices on that site, it's not for serious infographics. Be careful copying from it unless you are really able to tell where he's having fun and where his charts are serious (because he's very capable of both).


Agreed. I was assuming that you have read your Tufte :) and can tell informative visuals from those that are primarily pretty or entertaining.


Yes, the books are excellent, great reads, though the first one as you say is the essential one.

Once you read these, you will start seeing 'Data Ducks' everywhere! I speak from experience.


Leland Wilkinson's "The Grammar of Graphics" http://www.amazon.com/Grammar-Graphics-Leland-Wilkinson/dp/0... is also excellent and fully implemented in the R programming language/statistics package ( http://www.amazon.com/ggplot2-Elegant-Graphics-Data-Analysis... )


If you're big on visualization check out Harvard's www.CS171.org. I'm enrolled in the class right now and it's been very enriching. I think it is also available as opencourseware.

Books: http://www.cs171.org/syllabus.html

Resources http://www.cs171.org/resources.html


Most of these (rather good) suggestions revolve around learning the theory of representing data. But how does one practically accomplish these visualization tasks?

I have been delving in this area for the past couple months, and even though I am still learning, I will give my practical suggestions to the programmer:

1) First accept that there is no silver bullet to data visualization. You pick the tool that makes the most sense. Sometimes you have to write a Java program, sometimes a Python program, and yes, even sometimes an Excel spreadsheet. Don't be picky--just get it done.

2) Programmatically speaking, there are ways to represent truly massive terabyte datasets.

- You can learn Processing (used by Ben Fry in Visualizing Data) which is based on Java and pretty simple to learn. My caveat is that you can't run these scripts server-side, that is, it doesn't generate jpgs or pngs on demand due to headless mode constraints.

- You can use Beautiful Soup in Python to easily modify XML data for SVG graphics. Check out this: http://flowingdata.com/2009/11/12/how-to-make-a-us-county-th...

- You can learn Java's image library (I haven't done this so I can't really give any advice, but this is what Processing simplifies I think)

- You can use Excel to easily pump out bar/pie/line graphs

- You can use the Google Chart API

- You can use Flash. Check out AmCharts for that Mint-y goodness.

3) Learn statistics. Browse the Netflix Prize forums. Struggle with MatLab or R or Octave. You need to learn how to efficiently handle large datasets in memory to better sift through the essential information you need. For very very large sets that absolutely cannot be handled in memory, you'll want to check out Hadoop + MapReduce. Check out Cloudera's distribution for Hadoop. Handling data is every bit as important as visualizing it.


I had a similar situation except that I wasn't as smart as you to consider books in the first place.

But, I did use some really good tools. I highly recommend using Prefuse (yes, it's java but it ships with great examples and it's open source). If you like prefuse, then try flare (actionscript based). As far I know, prefuse supports querying from tables (my data backend was postgres). Here's prefuse: http://prefuse.org/ Here's flare:http://flare.prefuse.org/

And for a dash of inspiration and more ideas: http://www.visualcomplexity.com/vc/


Fun fact - Tufte's in Arlington giving a talk and we're taking a 15 minute break right now. Compelling speaker and thrilling read (he's giving away four of his books to every attendee).


I attended a one-day seminar of Tufte's one time and agree that it was pretty good. But he was by no means "giving away" his books; they were built into the (considerable) price of the event. I also recall being amazed at how many people showed up. This was in a large hotel ballroom in SF and we were jam packed in there.

Another fun fact: when he couldn't get his first book (VDQI) published the way he wanted, he mortgaged his house and published it himself. Respect.


FWIW, compared to similar seminars I thought the price of Tufte's was very reasonable - IIRC, shy of $400, and includes $160 worth of books. I also thought the class was quite good, BTW.


My favorite line was: "Bad design pokes a finger into the eye of thought."

He even brought out his iPhone & iPad and commented on the easily-navigated UI.

Normal registration was $380 per person, albeit ft students received a $180 discount.

I half-expected him to show some of Randall Munroe's hi-res graphics (such as http://xkcd.com/657/), but alas, it would have been too perfect =)


If you are considering using Python, Beginning Python Visualization (http://www.amazon.com/Beginning-Python-Visualization-Transfo... ) seems quite good to me. It is of course a niche product targetting though who intend to use Python though. If you are looking for a more broad based grounding in visualization it is probably not your best choice.


To me it sounds like you want to be using a tool like Matlab or matplotlib in python to automatically generate various types of plots from your data. There are a wide variety of books about Matlab, and I don't really know one better than the rest. For python, there's "Beginning Python Visualization" by Vaingast. It's pretty introductory, but provides good starting points. The matplotlib web site also provides a gallery of example plots with code.


Please keep in mind color blind people. I'm red/green blind and about 1/3 of the charts I run across are meaningless to me. Here are a couple sites with info: http://wearecolorblind.com/ http://www.vischeck.com/


A colleague says great things about Cleveland's "The Elements of Graphing Data." It talks about how to leverage the way people perceive graphs in order to convey information. I've flipped through it, and it's on my to-read list. Sorry I can't say more about it.


Has anyone had a chance to check out "Beautiful Visualization: Looking at Data through the Eyes of Experts"

(http://j.mp/9SxXza)?

It was just published today according to Amazon.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: