Archive for the 'Visualization' Category

Dublin

Dublin Trip

Tomorrow I’m off to Ireland with an i2 colleague – we’re taking part in the Visual Analysis of Complex Networks (VACN) Workshop & Visualization Cook-Off Competition at Complex & Adaptive Systems Laboratory at University College Dublin.

We’re going to be talking about some recent updates we’ve made to Analyst’s Notebook and some of our future plans.  More excitingly, we are going to spend some quality time with the authors of some brilliant open source tools – Gephi, Tulip, Visone & Pajek. Each of these are fantastic tools in their own right. It is going to be fun to find out how the tools that the day job has developed over many years compares against the young upstarts in the field ;-)

If you are in the area, drop by or drop me a note if you fancy meeting up!

Visual Timelines and Narrative

The well-loved xkcd blog posted a great timeline sketch of film plots the other day.

xkcd movie timeline

It was noticed by the visualization & infographic blog community, and Walter Rafelsberger and Daniel McLaren did some nice follow up work, but for the most part people just seemed to be saying how cool it was and moving on.

I thought I’d try to put it into perspective.

Drawing narrative ‘persona’ lines along a timeline is a common technique.  The ‘persona’ usually represents a physical object – a person for example – and the vertical direction usually represents some sort of proximity.  Often geographic proximity.  Let’s see a few examples…

Marey’s train timetables (you can find them in the Tufte books) drew lines for each train:

Marey's Train Timeline from Tufte's Visual Display of Quantitative Information

From the physics world, Penrose diagrams are a concise depiction of space-time which allow event causality ‘cones’ to be plotted. Typically time runs bottom to top in these diagrams and observers are plotted as lines.

Penrose Diagram

This well-crafted musical visualization (pdf) from Jon Snydal & Marty Hearst has pitch as proximity, and the lines show structural patterns as the motif is repeated with variations:

Improviz Music Timelines

A few years ago, the BBC ran a programme about comedy heroes which I remembered for the credits and title sequences.  They show the interweaving careers of British comedians over the decades.

Here proximity represents collaboration on a TV program.

The wonderful JunkCharts blog showed this timeline narrative of Wall Street Bank acquisitions:

JunkChart's Wall Street Acquistions Timeline

And finally, a few years ago in the day job we put together a system for drawing out diagrams that can convey meetings and assignations:

Mumbai Attacks Timeline

What are the aesthetic and legibility rules that govern these kind of diagrams?  Are there rules similar to graph drawing aesthetics?

I think there are some guidelines.

  • Meetings, significant events, etc. can be shown as joining lines: most of the narrative power comes out of this simple drawing metaphor.
  • Other line crossings are to be avoided.
  • When they can’t be avoided,  use a good visual design to allow the eye to follow what is going on.
  • Make sure the line labels are legible across the diagram.  It isn’t any good just labelling the left side because by the time one has scrolled over to the right the labels will be out of view.  On a static picture this means repeating the label as in the xkcd example. Also consider labelling the right hand side too.
  • Colour is good for categories of persona.
  • Colour plays an important role in helping your eye distinguish between lines.
  • Thick lines are easier on the eye than thin ones.
  • Curved lines are preferable to straight lines – they are just easier to follow.
  • Lines can start late and end early.  If that line is a character in a movie, abrubt termination means the worst has happened ;-)
  • Line style can change as the story evolves and you can use this for narrative effect. In the xkcd Jurassic Park example, the dotted line shows a velociraptor is in prison.
  • Parallel lines work really well.

Perhaps the least talked about point in Manuel Lima’s manifesto was ‘Embrace Time’.  I agree with Manuel that we should be working on this and it would be great to see more effort in this area.

Using Javascript for Visualization?

People have been predicting the rise of Javascript visualization implementations for a while now, but is this really going to happen?

First, let’s look at the positive signs:

However, looking about the web, how many examples of visualizations are there? Well, I’ve found some interesting ones like Matt Ryall’s visualizations of wiki data and Social Collider. There are more in the InfoVis research community.

But there aren’t many. So what is preventing it becoming more widespread?

One factor is the stubborness of Microsoft in its reluctance to support standards like Canvas. For commercial purposes IE is impossible to ignore.

Another factor is the language itself:

Javascript books: a cheap dig!

Javascript books: a cheap dig!

For me one of the biggest barriers is the development environment.  I’ve tried a few, the best I’ve found being JSEclipse (now part of Flex).  I must be missing something ;-)

So how is this going to develop?  My guess is that we are still a couple of years away from more mainstream adoption.  But there is no doubt that it is coming.

Update: I chatted with Mike Bostock and Marian Dörk at VisWeek about their Javascript environments. Safari and TextMate seemed to be their preferred environments for writing code…

Summer internship at i2, Cambridge UK

Time for a shameless plug! If there are any students of information visualization out there looking for an interesting internship this summer – my company is offering one. Looking forward to hearing from you…

Visualizaton Goals & Features

What are the goals of visualization? And what are the features that support those goals?

My 10 cents worth:

The basic goal is to facilitate reasoning and thought about what is being visualized. That reasoning could revolve around causality, hypothesis, predictions, inferences, habits, modus operandi, contradictions, uncertainty, and a whole host of problems the user is trying to solve. Often the reasoning revolves around external data and/or knowledge too. Visualization should expose structure in the data such as patterns, clusters, gaps, bursts of activity, outliers & trends, etc. And at the end of the reasoning process the great thing about visualization is that one should end up with a picture that can be used to disseminate one’s insights to other people.

So what key features enable these goals to be achieved?
* A Summarization/Overview to give the big picture
* Zoomability
* Drill-down on data for the detail
* Easy navigation around the visual
* Filtering information by category or query
* Different types of visualization expose different patterns (geographic, timeline, textual, lists, link diagrams, etc.)
* Brushing & linking visualizations together can help the filtering & exploration

Other basic things which must be in place in order to succeed:
* Ease of import and export – and adhering to any standards
* Some basic searching of the data
* One must be able to read the data – in particular any text
* Scaling well as the data size gets very large
* Links out to other systems for further information is key
* Links back in to the visualization from other systems can also be powerful
* Interoperability with other visualization tools and other applications in general
* Commentary, scribbling and drawing on the visualizations is a great way to add understanding – a picture alone is rarely enough

And don’t forget the more esoteric things too:
* It needs a positive emotional response so it must look good and not conflict with user’s expectations
* It can use standard visual symbolism, conventions & metaphors
* It must use the basic visual variables well (shape, colour, position, etc.)
* Transitions between visualizations must be smooth to allow the user to keep their context
* It should use design techniques like ‘information scent’ & obvious affordances
* It should facilitate playfulness where ever possible – don’t punish ‘mistakes’!

Phew – glad I got that off my chest – back to the day job :-)

Details Matter

Recently I’ve noticed a few examples where smallish visual design details really matter.

What Colin Ware in “Visual Thinking for Design” refers to as “multiscale structure” in shown-off very effectively in Vizster and SocialAction.



Another example is the way that Thinkbase constructs links of similar types:

Notice how the actors & roles aren’t linked directly to the film but go via intermediate nodes. And also see how well the space is used and how the links are similar lengths & short – both of these are great for aesthetics and link-following tasks.

Both these approaches build on the position visual variable to effectively to ‘clump’ the like-nodes together. But these designs also add other visual variables (connecting line, enclosing shape) which assist visual pattern finding, and importantly can offer affordances for interactivity too.

Design & Reach

Following on from my last post, Mike Danziger and I chatted on email & he wrote up some impressions of the InfoVis conference. Stephen Few responded to some of the points, and a couple (1, 2) of subsequent postings, and some other comments (3, 4) have shown that people are interested. Sorry for being so late in responding myself – the day job sometimes gets in the way!

For me, the key contribution has been Pat Hanrahan’s. I feel the same way & I’m grateful to him for providing some academic respectability to what would otherwise just be my own opinions. From my own pragmatic software industry perspective, I’d like to say something about how his suggestions could be taken forward.

Delivery mechanisms are key: to appeal to the masses one needs reach. Interactive visualizations must be delivered to people’s eyes & to their fingertips. Static images in papers aren’t enough: people don’t have much time or patience & won’t enjoy having to read lots of text in order to learn how the interaction works.

One approach is to put good visualization capability into commonly used tools such as Excel (1). That way people can manage their data themselves. Because the user has the ability to load and edit the data behind the visualizations this means a high degree of skill is needed when crafting the software so it has the necessary flexibility. Each tool has different extension points & platforms. In practical terms this means a software company is forced to choose a very small list of supported environments & work flows.

The more obvious route is to exploit the immediacy and universality of web delivery mechanisms. Thanks to Flash, Silverlight & Java there is a huge audience out there with suitable runtimes. It is good to see more and more experimental visualizations using these. (Though problems with data management are still there of course…)

Reach isn’t enough: in order to bring something compelling to people one must embrace designers. Graphic designers, user experience designers, interaction designers, the works! The right kind of designers can keep a visualization clean, useful & informative but also imbue it with style, panache & memorability. There is a design revolution happening now in the software industry & it will sweep up information visualization tools along the way.

The combination of the need for reach and good design is the main reason why I’m so interested in the Adobe platform. Because they already have designers using their tools, they don’t need to woo them to new platform. Add a massive install base (flash) and increasingly workable languages (mxml, as3) and it is hard to dismiss. Nice to see I’m not alone in thinking this.

Sacramento Thoughts

I got back from the IEEE Visualization conference in Sacramento a few days ago – it was highly enjoyable and I met some great people there.

I’ve been struggling to come to terms with the quantity of reading I now have to do. I’ve also found it hard to summarize my thoughts on all that I heard.

I think my personal best paper award would go to Jeff Heer’s “Design Considerations for Collaborative Visual Analytics”.

On a similar topic, Fernanda Viégas said something that caught my attention: instead of focusing on the classic visualization question of scaling the amount of data being visualized, the Many-Eyes project scales the size of the audience.

However, each data set in the Many-Eyes site is isolated. Processing of the data has to be done in advance in order to bring it down to a manageable size, and data sets do not have any intersection points with each other. (Although they do allow comments to refer to other data sets, along with other navigation aids.)

Classic information visualization research seems to follow a pattern something like this:
* Researcher gets hold of a dataset from somewhere.
* They consider various encodings of it.
* While doing that they achieve some level of domain knowledge.
* They develop an isolated visualization system – this is what they spend most of their time on (I can’t blame them – it is the fun bit).
* They achieve some insights of their own which gives them a warm glow.
* Some short evaluation is tacked on to keep the reviewers happy when they get the paper.

From an outsider’s perspective:
* In many cases the dataset is considered in isolation from other potentially interesting & relevant data sets.
* The quality of the encodings chosen depends on the knowledge of the researcher, and this can vary quite a bit.
* The system developed tends to be isolated from other applications & systems – that makes it easier to develop. Often there are no multi-user aspects, but this at least seems to be changing.
* Insights almost always are with regard to knowledge gleaned from outside the data set. E.g., a downturn in the number of farmers (in census data) could be explained by increasing agricultural mechanisation (innate knowledge), or the popularity of a certain baby name might coincide with a celebrity (search for ‘Celine’ here). There is often an implied “cause and effect” hypothesis in these kind of insights.

Going back to Viégas’ comments I suspect that the true problem lies in scaling not just the audience – though that of course is important – but scaling on both the number and type of datasets being visualized.

The ‘perfect visualization tool’ would be able to cope with new data sets being thrown at it. Linkage would be automatically established between elements of the data set (e.g., Joe Bloggs from one data set would be recognised as the same Joe Blogs from another data set). The data sets could have a wide variety of schemas and come from wildly different sources. The various visualizations in the tool would be automagically updated with the relevant encoding of the new data, and new visualizations which have suddenly become appropriate would be displayed. The user would be able to reach many new insights because all the data is cross-referenced and generally speaking most insights come when combining data. Plus the visualization, being perfect, would show those insights clearly.

Mike Cammarano’s talk on his work with the dbpedia data was interesting from this angle, in that the data was inherently heterogeneous & extensible. Of course, the Semantic Web research agenda is of interest here too, despite lying outside of information visualization research.

As Matthew Ericson showed, the sheer craft and skill needed to combine data well and communicate it effectively means that it is difficult to see a perfect visualization tool being realised in an automated way. I guess this makes it an interesting research area!

Another aspect of developing web-based social visualizations is that there is much more potential for gathering information about how users actually use the visualizations: server-side logs can be designed to keep track of almost every action. This would lack the rigour of a properly controlled lab experiment, but that would be counterbalanced by the sheer number of possible users, so I’d say there must be huge benefits in this approach. (And of course making sense of the logs could be another data visualization challenge!)

On a separate topic I found Stephen Few’s capstone talk rather unsettling – I understand why he is so passionate about designing clear visuals, but sometimes that passion can err on the abrasive side. And that style won’t endear the visualization community to the world out there. I also think he underestimates the power of playfulness and fun in reaching out to an audience – come on – Swivel’s option to ‘bling your graph’ is just funny! Another worry is that the very Spartan style of visuals he favours actually imposes an aesthetic in its own right, for all of its good intentions and intelligent rationale. We should accept some people just won’t like that aesthetic.

However, his tutorial was a really excellent Tuftean summary of all that is great and good about the subject, so I guess he can be forgiven! And when you see graphics like graphwise (thanks Nathan) you can see how much work there is to be done :-)

Visual Variables

I’ve been designing some new visualizations recently, and reading around the InfoVis literature for examples to analyse and best practices to follow. I’ve found myself returning again and again to a diagram in MacKinlay’s 1986 paper and various papers which follow it. So much so that I’ve reworked a version of the diagram and pinned it to the wall behind my desk for quick reference:

It shows a theoretical model for accuracy when performing reasoning tasks with an image. The model was developed empirically but for some comparisons and some analytical tasks experimentation has backed it up. I find it a handy thing to have when working out how visualizations and infographics are put together. And when I have some data that I’m designing an representation for, it helps me choose what visual variables to use.

Visualization Techniques for Temporal Information

Over the last couple of years I’ve been collecting various articles and links on temporal information visualisation. I thought it was about time I collated them and tried to put some context and analysis around the various techniques. Here is the article (pdf, 1MB).