There are several types of learning, and some people are better at (or prefer) one over others. Some people do well with text. Others do well with pictorial representations, other do better with demonstration and talks and some learn by doing/failing/doing again.
I personally am good with pictorials and the do/fail iteration styles of learning. Text is passable, but I am usually translating it to pictures in my head. There have been times where I have gotten more out of one crappy but representative picture than I have out of an hour lecture. Good pictures that engage the viewer are even better. In this case I was easily able to imagine the gophers walking along doing their tasks, and just "seeing" the flow because the picture was pretty representative. I could have gotten the same message from boxes and arrows, but not as quickly I think.
So why should we have more good diagrams? Because it would help more people of a certain type learn to program easier.
Oh. Well, either way, the illustrations in the ruby book are just commics as a cute way to make side points, but the pictures themeselves don't actually convey information in the same way as the gopher diagrams in the talk. Please feel free to reinterpret my comment retroactively to be about how that is the case :)