Visual Analytics Panel 2009

I’m appearing on a panel at VAST today, talking about the investigation & analysis process in law enforcement & national security.  Here’s what I wrote as a high-level overview:

A common theme across many cases is the discovery of identifiers of interest: names, addresses, phone numbers, email addresses, bank account numbers, amongst others. Patterns of activity are deduced, connections between individuals, the timing and the location of key events like sightings, phone calls, etc., can lead to the generation of hypotheses/lines of inquiry which help drive the direction of the investigation as a whole. Relationship link diagrams, timelines and maps are the three most commonly expressed visualization needs.

Collaboration has been emphasized in recent years. In terms of the typology presented in Illuminating the Path, we find that typically collaboration is asynchronous remote (different time, different place), or synchronous local (same place, same time). In some shift patterns one sees continuous work done by a revolving team (same place, different time), but that is relatively uncommon. Asynchronous remote collaboration is typically achieved by emailing files. This ‘baton-passing’ approach shares a lot with the way that documents are authored in many professions. The key advantage of this approach is that information can be exchanged freely across organizational firewalls: disadvantages are that there is no definitive version of the information and multiple copies of the document can cause confusion.

In the case of (same place, same time) collaboration, this is done using a shared screen at a desk, within meeting rooms equipped with projectors and/or interactive whiteboards, or often done away from the computer entirely in a relatively informal context. In the latter case, printouts of visualizations are often pointed at and scribbled on. Printing is much more important than may be first realized. As cases get complex, it is common to print out the current known state of the case and pin it up on a wall for the investigation team to see and draw on. Evidential and other procedural requirements, especially within the law enforcement domain, mean that visualizations must fit with a ‘paper trail’ of documents.

Analysts have a very strong sense of ownership over the products they produce, and visualizations are no exception. Analysts raise concerns that their visualizations may be misinterpreted when viewed outside of the context of the task at hand. To ameliorate this, and also to facilitate basic reporting needs, visualizations are very commonly embedded as pictures within textual reports. In this state, they lose their interactivity and the consumer cannot ‘drill-down’ on the information represented. Such images are often produced in a separate ‘production’ stage after the analysis has been done. At the reporting stage, it is very common for a visualization to have to fit onto an A4/Letter size piece of paper!  Visual Analytic tools in general tend to neglect the reporting aspects of the job.

For the future, many of the general challenges facing are practical ones. Tool support for versioning, auditing data access, document searching and collaboration could be better. Tools need to be easily deployable by IT staff if they have any hope of adoption. The amount of available data is growing, but perhaps more importantly there are now more and more data sources that need to be checked during an investigation. Any help in getting data saves the analyst valuable time. Lastly improved summarization/aggregation techniques for large data sets would be very welcome.

Visual Analysis Tools: Practical Considerations

So, you’ve spent months working in the lab developing a new visualisation technique or system and you finally got some time with real users. They really like what they’ve seen. You’ve done a good job of writing the paper, it has been accepted and appears on your resume.

But hang on a minute – despite your best intentions and the users’ approval, they aren’t actually using the system right now.  So should you commercialize the idea?  This would mean the ideas are exploited and perhaps could give you some money back for all that hard work.

What are the practical steps you will need to take?

You’ll have to make sure that you actually own the IP on the system too. I’d do that bit first.

Then there are the standard set of business problems like marketing, sales channels, CRM systems, pricing.  And the usual software infrastructure stuff of build systems, installers, change management, testing, documentation.

Installers are a nightmare. ‘But it works on my machine!’ isn’t going to cut it. In real IT environments, the IT manager is a key person you will need on your side. And his/her department will need to test your application for compatibility and other things first.  For desktop applications it isn’t uncommon for deployments to lag from 3 to 5 years behind the current version. That can be very frustrating for you and for your users.

But actually, I’d argue that all those things are a lot easier than the business of really understanding the users needs: that is much harder.  Did the new system really improve their performance? Were they just trying to be helpful and polite when they said they liked it? Or are you seeing well-known experimental biases like the observer-expectancy effect or the Hawthorne effect?

What about the user’s workflow?  How does the tool fit into their existing processes?

If it did increase their performance, could they put a value on that?  And I’m not being theoretical – I’m talking about a real dollar value here. Or some measure of success in terms of the business drivers of the organisation. You will struggle to sell it unless you can talk in business terms that your buyers will use.

In this context one has to question statements like ‘the goal of visualisation is insight, not pictures’. Actually I’d argue that the end goal is action, not insight. The true aim is taking better decisions.

Don’t be disheartened: these issues make a long list, but provided you are providing enough value and provided you think about these up-front you can save yourself a lot of pain for later on. And if you don’t want to think about these things, maybe you could even strike up a licensing deal with someone who does.

Using Javascript for Visualization?

People have been predicting the rise of Javascript visualization implementations for a while now, but is this really going to happen?

First, let’s look at the positive signs:

However, looking about the web, how many examples of visualizations are there? Well, I’ve found some interesting ones like Matt Ryall’s visualizations of wiki data and Social Collider. There are more in the InfoVis research community.

But there aren’t many. So what is preventing it becoming more widespread?

One factor is the stubborness of Microsoft in its reluctance to support standards like Canvas. For commercial purposes IE is impossible to ignore.

Another factor is the language itself:

Javascript books: a cheap dig!

Javascript books: a cheap dig!

For me one of the biggest barriers is the development environment.  I’ve tried a few, the best I’ve found being JSEclipse (now part of Flex).  I must be missing something ;-)

So how is this going to develop?  My guess is that we are still a couple of years away from more mainstream adoption.  But there is no doubt that it is coming.

Update: I chatted with Mike Bostock and Marian Dörk at VisWeek about their Javascript environments. Safari and TextMate seemed to be their preferred environments for writing code…

Oops I crashed Gmail

I think I might have crashed Gmail last week. Seriously.

The crash is well documented with the usual set of vague excuses including ‘high load on the service’. But was it my fault?

I had just got a new HTC Hero and was trying to migrate my contacts from my old phone.  I’d got myself into a state where I’d got the new contacts onto the phone from Google, but at that point I realised most of the fields were misaligned.

Not thinking what I was doing, I deleted all the contacts from the phone and then started to edit my contacts within Gmail. What I didn’t realise was that hundreds of these deleted phone contacts were also being deleted from my Gmail contacts. My phone also locked up. While I was trying to correct my list in Gmail I kept getting these weird errors that the contacts I was editing didn’t exist any more. Very odd. The contacts list was getting shorter and shorter. And suddenly BANG, I got a big error in Gmail saying ‘Your contact list is not available right now, please try again later’.  I tweet about it and a friend tweets back and says that everyone else is having the same problem.

Coincidence?  Almost certainly. I guess I’ll never know. I just have this lingering sense of guilt about it ;-)

Proud Sponsors…

I’m very happy to announce that with the new change of management at the day job, i2 will be sponsoring VisWeek this year!

As ever, the conference programme looks exciting. It will be a great chance to meet customers and also to see what the academic community has been doing in 2009. Can’t wait…

Visualizations of Habit and Routine

Lately I’ve become interested in the design of visualizations that draw out patterns in habit & routine. To explain what I mean, here are a bunch of nice examples…

Let’s start with a visualization of a twitter user’s posting habits from xefer.com:


This simple diagram of a baby’s sleep times comes from Trixie Tracker:
Simple but effective! Thanks to Nathan’s flowingdata for these two examples. (See also a wonderful visualization of the stabilization of a baby’s sleep patterns in Winfree “The Timing of Biological Clocks” Page 31, also shown in Card et al “Information Visualization…” Page 5/6).

It seems that some form of heatmap is the most common means of representing habitual behaviours – see e.g., Andrienko et al for a visualization of traffic densities around Milan (red is lots of traffic):
This picture of hotel visitation patterns (Weaver et al) shows the number of visitors over a weekly timescale:
I like the summary at the bottom and right of the main area showing aggregated trends.

Nathan Eagle & Alex Pentland’s paper on “Eigenbehaviours” differentiates various routine patterns from a dataset & presents them clearly:
This reminds me of Wijk & Selow’s classic paper too.

Does anyone have any suggestions on other visualizations of habits and routines?

Summer internship at i2, Cambridge UK

Time for a shameless plug! If there are any students of information visualization out there looking for an interesting internship this summer – my company is offering one. Looking forward to hearing from you…

The Search Box

I’ve been a fan of Google Suggest since I first saw it – and I think that the news that it will now become the default experience for all Google searches is very significant because it will change everyone’s expectations of what should happen when they start typing into a search box everywhere:

Of course several search boxes already do this kind of thing (the search box & ‘awesome bar’ in Firefox 3, the ‘omnibox’ in Google Chrome) but I believe that its use will become widespread not just for web based searching but also within desktop applications.

In 2005 just after Google Suggest came out I implemented my own version of it – it doesn’t take much to get the basics right – you just need a reverse index. A bit more work and some stats can help eliminate common ‘stop words’ & give spelling suggestions; some further work and you can index two or three word phrases. Some of this stuff ended up in a desktop product in my day job.

Of course in a disconnected desktop product one doesn’t have the huge amount of statistical information on what is searched for & clicked on, but to get a simple version of ‘suggest’ up and running can be quite quick and gives the user real tangible benefits.

The search box _should_ never be the same again…

Visualizaton Goals & Features

What are the goals of visualization? And what are the features that support those goals?

My 10 cents worth:

The basic goal is to facilitate reasoning and thought about what is being visualized. That reasoning could revolve around causality, hypothesis, predictions, inferences, habits, modus operandi, contradictions, uncertainty, and a whole host of problems the user is trying to solve. Often the reasoning revolves around external data and/or knowledge too. Visualization should expose structure in the data such as patterns, clusters, gaps, bursts of activity, outliers & trends, etc. And at the end of the reasoning process the great thing about visualization is that one should end up with a picture that can be used to disseminate one’s insights to other people.

So what key features enable these goals to be achieved?
* A Summarization/Overview to give the big picture
* Zoomability
* Drill-down on data for the detail
* Easy navigation around the visual
* Filtering information by category or query
* Different types of visualization expose different patterns (geographic, timeline, textual, lists, link diagrams, etc.)
* Brushing & linking visualizations together can help the filtering & exploration

Other basic things which must be in place in order to succeed:
* Ease of import and export – and adhering to any standards
* Some basic searching of the data
* One must be able to read the data – in particular any text
* Scaling well as the data size gets very large
* Links out to other systems for further information is key
* Links back in to the visualization from other systems can also be powerful
* Interoperability with other visualization tools and other applications in general
* Commentary, scribbling and drawing on the visualizations is a great way to add understanding – a picture alone is rarely enough

And don’t forget the more esoteric things too:
* It needs a positive emotional response so it must look good and not conflict with user’s expectations
* It can use standard visual symbolism, conventions & metaphors
* It must use the basic visual variables well (shape, colour, position, etc.)
* Transitions between visualizations must be smooth to allow the user to keep their context
* It should use design techniques like ‘information scent’ & obvious affordances
* It should facilitate playfulness where ever possible – don’t punish ‘mistakes’!

Phew – glad I got that off my chest – back to the day job :-)

Intellipedia

I can’t remember when I first read about Intellipedia, but I know it was a long time ago. The other day I watched a really inspirational video from the people inside the CIA who successfully overcame typical middle-management obstacles in order to produce it. I particularly like their comments on the evils of email and shared drives for collaboration. The original proposal document is also well worth a read.

Interesting too that Intellipedia’s iVideo sharing component reportedly uses Adobe Flash, but there is no public mention of using visualization tools yet, despite visualization techniques being widely used in the intelligence analysis community.