Visual Analytics for Security

A few weeks ago Jörn Kohlhammer invited me to give a talk at the VisMaster Industry Day in Darmstadt, Germany.  It was a relaxed informal meeting where I caught up with some friends like Enrico Bertini – and I even finally got to meet one of my heroes – Jarke van Wijk – which was really exciting.

My talk was on Visual Analytics for Security.  I gave an overview of the work of  analysts in the crime and intelligence worlds and the unique challenges they face. Many of those challenges arise from the subject of their analysis: people, in all their complexity.  I hope this comes across from the slide deck.

Visualization Intern Time Again

Phew, I just got budget approval for another internship – if you or anyone you know might be interested in a visualization internship in Cambridge UK this summer – please apply!

Visual Timelines and Narrative

The well-loved xkcd blog posted a great timeline sketch of film plots the other day.

xkcd movie timeline

It was noticed by the visualization & infographic blog community, and Walter Rafelsberger and Daniel McLaren did some nice follow up work, but for the most part people just seemed to be saying how cool it was and moving on.

I thought I’d try to put it into perspective.

Drawing narrative ‘persona’ lines along a timeline is a common technique.  The ‘persona’ usually represents a physical object – a person for example – and the vertical direction usually represents some sort of proximity.  Often geographic proximity.  Let’s see a few examples…

Marey’s train timetables (you can find them in the Tufte books) drew lines for each train:

Marey's Train Timeline from Tufte's Visual Display of Quantitative Information

From the physics world, Penrose diagrams are a concise depiction of space-time which allow event causality ‘cones’ to be plotted. Typically time runs bottom to top in these diagrams and observers are plotted as lines.

Penrose Diagram

This well-crafted musical visualization (pdf) from Jon Snydal & Marty Hearst has pitch as proximity, and the lines show structural patterns as the motif is repeated with variations:

Improviz Music Timelines

A few years ago, the BBC ran a programme about comedy heroes which I remembered for the credits and title sequences.  They show the interweaving careers of British comedians over the decades.

Here proximity represents collaboration on a TV program.

The wonderful JunkCharts blog showed this timeline narrative of Wall Street Bank acquisitions:

JunkChart's Wall Street Acquistions Timeline

And finally, a few years ago in the day job we put together a system for drawing out diagrams that can convey meetings and assignations:

Mumbai Attacks Timeline

What are the aesthetic and legibility rules that govern these kind of diagrams?  Are there rules similar to graph drawing aesthetics?

I think there are some guidelines.

  • Meetings, significant events, etc. can be shown as joining lines: most of the narrative power comes out of this simple drawing metaphor.
  • Other line crossings are to be avoided.
  • When they can’t be avoided,  use a good visual design to allow the eye to follow what is going on.
  • Make sure the line labels are legible across the diagram.  It isn’t any good just labelling the left side because by the time one has scrolled over to the right the labels will be out of view.  On a static picture this means repeating the label as in the xkcd example. Also consider labelling the right hand side too.
  • Colour is good for categories of persona.
  • Colour plays an important role in helping your eye distinguish between lines.
  • Thick lines are easier on the eye than thin ones.
  • Curved lines are preferable to straight lines – they are just easier to follow.
  • Lines can start late and end early.  If that line is a character in a movie, abrubt termination means the worst has happened ;-)
  • Line style can change as the story evolves and you can use this for narrative effect. In the xkcd Jurassic Park example, the dotted line shows a velociraptor is in prison.
  • Parallel lines work really well.

Perhaps the least talked about point in Manuel Lima’s manifesto was ‘Embrace Time’.  I agree with Manuel that we should be working on this and it would be great to see more effort in this area.

Visual Analytics Panel 2009

I’m appearing on a panel at VAST today, talking about the investigation & analysis process in law enforcement & national security.  Here’s what I wrote as a high-level overview:

A common theme across many cases is the discovery of identifiers of interest: names, addresses, phone numbers, email addresses, bank account numbers, amongst others. Patterns of activity are deduced, connections between individuals, the timing and the location of key events like sightings, phone calls, etc., can lead to the generation of hypotheses/lines of inquiry which help drive the direction of the investigation as a whole. Relationship link diagrams, timelines and maps are the three most commonly expressed visualization needs.

Collaboration has been emphasized in recent years. In terms of the typology presented in Illuminating the Path, we find that typically collaboration is asynchronous remote (different time, different place), or synchronous local (same place, same time). In some shift patterns one sees continuous work done by a revolving team (same place, different time), but that is relatively uncommon. Asynchronous remote collaboration is typically achieved by emailing files. This ‘baton-passing’ approach shares a lot with the way that documents are authored in many professions. The key advantage of this approach is that information can be exchanged freely across organizational firewalls: disadvantages are that there is no definitive version of the information and multiple copies of the document can cause confusion.

In the case of (same place, same time) collaboration, this is done using a shared screen at a desk, within meeting rooms equipped with projectors and/or interactive whiteboards, or often done away from the computer entirely in a relatively informal context. In the latter case, printouts of visualizations are often pointed at and scribbled on. Printing is much more important than may be first realized. As cases get complex, it is common to print out the current known state of the case and pin it up on a wall for the investigation team to see and draw on. Evidential and other procedural requirements, especially within the law enforcement domain, mean that visualizations must fit with a ‘paper trail’ of documents.

Analysts have a very strong sense of ownership over the products they produce, and visualizations are no exception. Analysts raise concerns that their visualizations may be misinterpreted when viewed outside of the context of the task at hand. To ameliorate this, and also to facilitate basic reporting needs, visualizations are very commonly embedded as pictures within textual reports. In this state, they lose their interactivity and the consumer cannot ‘drill-down’ on the information represented. Such images are often produced in a separate ‘production’ stage after the analysis has been done. At the reporting stage, it is very common for a visualization to have to fit onto an A4/Letter size piece of paper!  Visual Analytic tools in general tend to neglect the reporting aspects of the job.

For the future, many of the general challenges facing are practical ones. Tool support for versioning, auditing data access, document searching and collaboration could be better. Tools need to be easily deployable by IT staff if they have any hope of adoption. The amount of available data is growing, but perhaps more importantly there are now more and more data sources that need to be checked during an investigation. Any help in getting data saves the analyst valuable time. Lastly improved summarization/aggregation techniques for large data sets would be very welcome.

Visual Analysis Tools: Practical Considerations

So, you’ve spent months working in the lab developing a new visualisation technique or system and you finally got some time with real users. They really like what they’ve seen. You’ve done a good job of writing the paper, it has been accepted and appears on your resume.

But hang on a minute – despite your best intentions and the users’ approval, they aren’t actually using the system right now.  So should you commercialize the idea?  This would mean the ideas are exploited and perhaps could give you some money back for all that hard work.

What are the practical steps you will need to take?

You’ll have to make sure that you actually own the IP on the system too. I’d do that bit first.

Then there are the standard set of business problems like marketing, sales channels, CRM systems, pricing.  And the usual software infrastructure stuff of build systems, installers, change management, testing, documentation.

Installers are a nightmare. ‘But it works on my machine!’ isn’t going to cut it. In real IT environments, the IT manager is a key person you will need on your side. And his/her department will need to test your application for compatibility and other things first.  For desktop applications it isn’t uncommon for deployments to lag from 3 to 5 years behind the current version. That can be very frustrating for you and for your users.

But actually, I’d argue that all those things are a lot easier than the business of really understanding the users needs: that is much harder.  Did the new system really improve their performance? Were they just trying to be helpful and polite when they said they liked it? Or are you seeing well-known experimental biases like the observer-expectancy effect or the Hawthorne effect?

What about the user’s workflow?  How does the tool fit into their existing processes?

If it did increase their performance, could they put a value on that?  And I’m not being theoretical – I’m talking about a real dollar value here. Or some measure of success in terms of the business drivers of the organisation. You will struggle to sell it unless you can talk in business terms that your buyers will use.

In this context one has to question statements like ‘the goal of visualisation is insight, not pictures’. Actually I’d argue that the end goal is action, not insight. The true aim is taking better decisions.

Don’t be disheartened: these issues make a long list, but provided you are providing enough value and provided you think about these up-front you can save yourself a lot of pain for later on. And if you don’t want to think about these things, maybe you could even strike up a licensing deal with someone who does.

Using Javascript for Visualization?

People have been predicting the rise of Javascript visualization implementations for a while now, but is this really going to happen?

First, let’s look at the positive signs:

However, looking about the web, how many examples of visualizations are there? Well, I’ve found some interesting ones like Matt Ryall’s visualizations of wiki data and Social Collider. There are more in the InfoVis research community.

But there aren’t many. So what is preventing it becoming more widespread?

One factor is the stubborness of Microsoft in its reluctance to support standards like Canvas. For commercial purposes IE is impossible to ignore.

Another factor is the language itself:

Javascript books: a cheap dig!

Javascript books: a cheap dig!

For me one of the biggest barriers is the development environment.  I’ve tried a few, the best I’ve found being JSEclipse (now part of Flex).  I must be missing something ;-)

So how is this going to develop?  My guess is that we are still a couple of years away from more mainstream adoption.  But there is no doubt that it is coming.

Update: I chatted with Mike Bostock and Marian Dörk at VisWeek about their Javascript environments. Safari and TextMate seemed to be their preferred environments for writing code…

Oops I crashed Gmail

I think I might have crashed Gmail last week. Seriously.

The crash is well documented with the usual set of vague excuses including ‘high load on the service’. But was it my fault?

I had just got a new HTC Hero and was trying to migrate my contacts from my old phone.  I’d got myself into a state where I’d got the new contacts onto the phone from Google, but at that point I realised most of the fields were misaligned.

Not thinking what I was doing, I deleted all the contacts from the phone and then started to edit my contacts within Gmail. What I didn’t realise was that hundreds of these deleted phone contacts were also being deleted from my Gmail contacts. My phone also locked up. While I was trying to correct my list in Gmail I kept getting these weird errors that the contacts I was editing didn’t exist any more. Very odd. The contacts list was getting shorter and shorter. And suddenly BANG, I got a big error in Gmail saying ‘Your contact list is not available right now, please try again later’.  I tweet about it and a friend tweets back and says that everyone else is having the same problem.

Coincidence?  Almost certainly. I guess I’ll never know. I just have this lingering sense of guilt about it ;-)

Proud Sponsors…

I’m very happy to announce that with the new change of management at the day job, i2 will be sponsoring VisWeek this year!

As ever, the conference programme looks exciting. It will be a great chance to meet customers and also to see what the academic community has been doing in 2009. Can’t wait…

Visualizations of Habit and Routine

Lately I’ve become interested in the design of visualizations that draw out patterns in habit & routine. To explain what I mean, here are a bunch of nice examples…

Let’s start with a visualization of a twitter user’s posting habits from xefer.com:


This simple diagram of a baby’s sleep times comes from Trixie Tracker:
Simple but effective! Thanks to Nathan’s flowingdata for these two examples. (See also a wonderful visualization of the stabilization of a baby’s sleep patterns in Winfree “The Timing of Biological Clocks” Page 31, also shown in Card et al “Information Visualization…” Page 5/6).

It seems that some form of heatmap is the most common means of representing habitual behaviours – see e.g., Andrienko et al for a visualization of traffic densities around Milan (red is lots of traffic):
This picture of hotel visitation patterns (Weaver et al) shows the number of visitors over a weekly timescale:
I like the summary at the bottom and right of the main area showing aggregated trends.

Nathan Eagle & Alex Pentland’s paper on “Eigenbehaviours” differentiates various routine patterns from a dataset & presents them clearly:
This reminds me of Wijk & Selow’s classic paper too.

Does anyone have any suggestions on other visualizations of habits and routines?

Summer internship at i2, Cambridge UK

Time for a shameless plug! If there are any students of information visualization out there looking for an interesting internship this summer – my company is offering one. Looking forward to hearing from you…