Interesting year…

In January 2011 I resigned from my safe cushy job. Security blanket discarded, I read lots of start-up books & met lots of business people. I decided to start a company. After a bit of bureaucratic admin, I began to assemble a team, a group of suppliers for things that could be outsourced & a network of business mentors and advisors. Thanks to the friendly folks at ideaSpace the company now has an office in a great location alongside a great bunch of other start-ups.

Meanwhile and throughout the year I wrote a shed load of code. I abandoned Python (temporarily?) and threw myself into JavaScript both client-side and server-side, embracing nodeJS and many other projects. The vibrancy, diversity and downright helpfulness of the people in the JS community has been massively galvanizing.

Alongside the coding I started the hunt for a lead customer – someone who’d value the ideals of the company and benefit from the growing code-base. Following some good fortune, we found an opportunity. After a long negotiation we signed the deal to our mutual interests and delivery began in earnest. Having people bash, prod and break things, together with suggesting feature after feature has been incredibly useful. We couldn’t have asked for better partners. And in December we completed the first part of the contract successfully.

I’ve been blown away by the enthusiasm friends have for the venture and the wholehearted support of my family. I didn’t predict how much I’d learn about my own character, flaws and all. I’ve made some mistakes, sure, but nothing I can’t back out of later on.

It has been an amazing year, but the real test is coming in 2012. Can we build a brand and repeat the business model? What else can we make and how can we sell it? What will happen when the product enters the public domain? A collective shrug of the shoulders or a genuine pipeline of customers?

With these questions and many more, it is challenging to work out which are the important ones and which should be tackled first. I never expected this seemingly perpetual uncertainty. Nor the constant testing of one’s abilities in unfamiliar situations. But I’ve never felt so alive and so enthusiastic about the future :-)

Me Me Me Me Me


I’ve worked at i2 for more than twelve years.

I’ve seen one office move, two company sales, three major releases, one acquisition, five CEOs and countless reorganisations. I’ve seen the company grow from 30 to about 300 people. I’ve worked for three line managers, under three CTOs. I’ve moved desks eight times.  Since I joined there have been five major versions of Windows. I’ve managed six different people. I’ve had five different roles. I’ve done ten trips to the US, attended seven major conferences, sponsored two of them, done five user groups, attended twelve international workshops & spoken at twenty+ events.

I’ve programmed in six major languages. I’ve worked for 2996 working days. I’ve had roughly 7000 meetings & sent 160,000 emails.

I’ve had enough – I’m moving on.

While I’ve enjoyed the opportunities for career development of being in a growing company, I’ve often found myself yearning for that small company feeling, where it is easier to innovate & be in closer touch with customers.  I’m looking for a leaner, meaner environment where one person can make a big impact.  I want to throw myself back into code again.

Over the last few years I’ve found my own preferences & prejudices diverging from the company line.  I’ve spent a lot of time in newer more open technology stacks. I’ve stopped using any Microsoft tools.  I’m really not a big fan of heavy ‘enterprise’ frameworks, but prefer a lighter, quicker way. I’ve grown to love Python and Javascript rather than C#. Increasingly I’ve found myself playing with web frameworks rather than desktop tools.  There will be some fascinating convergences in the next few years, and I want to be on the front line as they happen.

Bring it on!

 

My Sunbelt Top 7

Last week I was lucky enough to get to INSNA’s Sunbelt conference in Florida. Here’s my top seven papers:
  1. Kevin Lewis presented some work done with Andrew Papachristos on the structure of gang warfare in Chicago using data on inter-gang murders.  Kevin described putting a stronger methodology on data from an earlier paper (pdf). One thing I loved about it was the mapping of street terms to abstract network structures – ‘payback’ = reciprocity, ‘untouchables’ = high out-degree, low in-degree, etc.
  2. Jamie F Olson talked about the statistical properties of centrality measures of communication networks over time. I didn’t quite grok the talk but the gist was that by varying the time-window size and comparing centralities across time periods it is possible to identify the ‘best’ sampling window for the network. For example, he showed that a week was a good period to sample some email data. Apparently a preprint may be available soon on his personal page – I’ll be looking out for that!
  3. Ulrik Brandes (with Bobo Nick) gave a beautifully crafted visualisation design paper. They used Gestalt principles to put together sparklines-inspired glyph for showing network dynamics.  Very elegant.
  4. Elisha Peterson had some smart ideas for keeping node positions stable in visualisations of dynamic networks. He did this by putting springs between versions same node across the time slices (before & after). It seemed to make things more stable at the expense of some calculational complexity.
  5. Lin Freeman shared his insights on the many ways of finding cohesive sub-groups in networks. He gave a clear and concise history of various methods from social sciences, maths & physics. Then an outline of measures of success (modularity q, EI conductance, Freeman Segregation index, Pearson’s correlation ratio) before running the algorithms over a collection of data sets.  Success depends not only on the algorithm but also of course on the cohesiveness of the data. Conclusion?  Good:  Correspondance Analysis, Leading Eigenvector, WalkTrap, Fast Greedy. Not so good: Factions, Tabu, others.  I hope this work gets written up in a review paper soon.
  6. Mark Lauchs talked about the networks involved in a massive police corruption case in Queensland, Australia that were exposed by the Fitzgerald Inquiry. This talk demonstrated that it probable that ‘dark networks’ can never be found automatically: the bad cops were structurally similar to the good cops. The only practical way of uncovering the network inside is to identify at least one bad egg, and use network structures to work from there to get the wider picture.
  7. Joshua Marineau had some interesting insight into the benefit of negative ties within an organisation.  Although it has been shown that individuals who have negative ties under-perform, he claimed that being positively connected to someone who themselves have negative ties can actually be an advantage.
The legendary hospitality suite was as friendly as ever too ;-)

The Birth of a Link

This diagram is absolutely fascinating.
It comes from Easley & Kleinberg’s new book from an excellent paper by Crandall et al (2008) (pdf).
It is a sort of anatomy of how links between people are created: it tries to capture the birth moment and the forces before and after it.
The upward curve is intriguing but straightforward to explain by homophily – like seeking like.
The most interesting bit is the curve just before the first communication occurs.  People get suddenly more similar – a kind of gravitational attraction occurs in the affiliation network and the first communication is sparked into life, closing the triads.
Although is tempting to explain this by creating physics based models, as the paper does,  I can’t help feeling there is a simpler explaination.   I would guess that the base of the curve is generally where ‘awareness’ happens.  At this moment the editors become aware of each other, and at that point a basic psychological effect takes over: simple curiosity. People actively seek each other out, viewing each other’s activities and building a picture of the type of the other person. Partly this is also to de-risk the first encounter in order to make the right first impression.
It isn’t often that one sees abstract concepts like curiosity in science, but I guess that is the power of big data & a great set of research questions ;-)

Dublin

Dublin Trip

Tomorrow I’m off to Ireland with an i2 colleague – we’re taking part in the Visual Analysis of Complex Networks (VACN) Workshop & Visualization Cook-Off Competition at Complex & Adaptive Systems Laboratory at University College Dublin.

We’re going to be talking about some recent updates we’ve made to Analyst’s Notebook and some of our future plans.  More excitingly, we are going to spend some quality time with the authors of some brilliant open source tools – Gephi, Tulip, Visone & Pajek. Each of these are fantastic tools in their own right. It is going to be fun to find out how the tools that the day job has developed over many years compares against the young upstarts in the field ;-)

If you are in the area, drop by or drop me a note if you fancy meeting up!

Social Network Analysis

One of the nice things about my new role is that I get to find out what is happening in lots of other research areas.

I’m really delighted that I persuaded i2 to donate some money to INSNA, the international network of social network analysts. Next week we’ll be travelling to Italy to attend their annual conference and I’m really looking forward to spending some time with the community.

I’m going to be learning about NetworkX and CASOS/*ORA from the experts and giving a citation prize to Mark Newman for his work on betweenness centrality. We’ll be going to lots of interesting sessions and lear ning about the current state-of-the-art in social network analysis, with a view to helping us choose the next steps in our SNA programme.

And it just so happens that Italy is my favourite country and June is one of the best months to visit. It’s a hard life!

Visual Analytics for Security

A few weeks ago Jörn Kohlhammer invited me to give a talk at the VisMaster Industry Day in Darmstadt, Germany.  It was a relaxed informal meeting where I caught up with some friends like Enrico Bertini – and I even finally got to meet one of my heroes – Jarke van Wijk – which was really exciting.

My talk was on Visual Analytics for Security.  I gave an overview of the work of  analysts in the crime and intelligence worlds and the unique challenges they face. Many of those challenges arise from the subject of their analysis: people, in all their complexity.  I hope this comes across from the slide deck.

Visualization Intern Time Again

Phew, I just got budget approval for another internship – if you or anyone you know might be interested in a visualization internship in Cambridge UK this summer – please apply!

Visual Timelines and Narrative

The well-loved xkcd blog posted a great timeline sketch of film plots the other day.

xkcd movie timeline

It was noticed by the visualization & infographic blog community, and Walter Rafelsberger and Daniel McLaren did some nice follow up work, but for the most part people just seemed to be saying how cool it was and moving on.

I thought I’d try to put it into perspective.

Drawing narrative ‘persona’ lines along a timeline is a common technique.  The ‘persona’ usually represents a physical object – a person for example – and the vertical direction usually represents some sort of proximity.  Often geographic proximity.  Let’s see a few examples…

Marey’s train timetables (you can find them in the Tufte books) drew lines for each train:

Marey's Train Timeline from Tufte's Visual Display of Quantitative Information

From the physics world, Penrose diagrams are a concise depiction of space-time which allow event causality ‘cones’ to be plotted. Typically time runs bottom to top in these diagrams and observers are plotted as lines.

Penrose Diagram

This well-crafted musical visualization (pdf) from Jon Snydal & Marty Hearst has pitch as proximity, and the lines show structural patterns as the motif is repeated with variations:

Improviz Music Timelines

A few years ago, the BBC ran a programme about comedy heroes which I remembered for the credits and title sequences.  They show the interweaving careers of British comedians over the decades.

Here proximity represents collaboration on a TV program.

The wonderful JunkCharts blog showed this timeline narrative of Wall Street Bank acquisitions:

JunkChart's Wall Street Acquistions Timeline

And finally, a few years ago in the day job we put together a system for drawing out diagrams that can convey meetings and assignations:

Mumbai Attacks Timeline

What are the aesthetic and legibility rules that govern these kind of diagrams?  Are there rules similar to graph drawing aesthetics?

I think there are some guidelines.

  • Meetings, significant events, etc. can be shown as joining lines: most of the narrative power comes out of this simple drawing metaphor.
  • Other line crossings are to be avoided.
  • When they can’t be avoided,  use a good visual design to allow the eye to follow what is going on.
  • Make sure the line labels are legible across the diagram.  It isn’t any good just labelling the left side because by the time one has scrolled over to the right the labels will be out of view.  On a static picture this means repeating the label as in the xkcd example. Also consider labelling the right hand side too.
  • Colour is good for categories of persona.
  • Colour plays an important role in helping your eye distinguish between lines.
  • Thick lines are easier on the eye than thin ones.
  • Curved lines are preferable to straight lines – they are just easier to follow.
  • Lines can start late and end early.  If that line is a character in a movie, abrubt termination means the worst has happened ;-)
  • Line style can change as the story evolves and you can use this for narrative effect. In the xkcd Jurassic Park example, the dotted line shows a velociraptor is in prison.
  • Parallel lines work really well.

Perhaps the least talked about point in Manuel Lima’s manifesto was ‘Embrace Time’.  I agree with Manuel that we should be working on this and it would be great to see more effort in this area.

Visual Analytics Panel 2009

I’m appearing on a panel at VAST today, talking about the investigation & analysis process in law enforcement & national security.  Here’s what I wrote as a high-level overview:

A common theme across many cases is the discovery of identifiers of interest: names, addresses, phone numbers, email addresses, bank account numbers, amongst others. Patterns of activity are deduced, connections between individuals, the timing and the location of key events like sightings, phone calls, etc., can lead to the generation of hypotheses/lines of inquiry which help drive the direction of the investigation as a whole. Relationship link diagrams, timelines and maps are the three most commonly expressed visualization needs.

Collaboration has been emphasized in recent years. In terms of the typology presented in Illuminating the Path, we find that typically collaboration is asynchronous remote (different time, different place), or synchronous local (same place, same time). In some shift patterns one sees continuous work done by a revolving team (same place, different time), but that is relatively uncommon. Asynchronous remote collaboration is typically achieved by emailing files. This ‘baton-passing’ approach shares a lot with the way that documents are authored in many professions. The key advantage of this approach is that information can be exchanged freely across organizational firewalls: disadvantages are that there is no definitive version of the information and multiple copies of the document can cause confusion.

In the case of (same place, same time) collaboration, this is done using a shared screen at a desk, within meeting rooms equipped with projectors and/or interactive whiteboards, or often done away from the computer entirely in a relatively informal context. In the latter case, printouts of visualizations are often pointed at and scribbled on. Printing is much more important than may be first realized. As cases get complex, it is common to print out the current known state of the case and pin it up on a wall for the investigation team to see and draw on. Evidential and other procedural requirements, especially within the law enforcement domain, mean that visualizations must fit with a ‘paper trail’ of documents.

Analysts have a very strong sense of ownership over the products they produce, and visualizations are no exception. Analysts raise concerns that their visualizations may be misinterpreted when viewed outside of the context of the task at hand. To ameliorate this, and also to facilitate basic reporting needs, visualizations are very commonly embedded as pictures within textual reports. In this state, they lose their interactivity and the consumer cannot ‘drill-down’ on the information represented. Such images are often produced in a separate ‘production’ stage after the analysis has been done. At the reporting stage, it is very common for a visualization to have to fit onto an A4/Letter size piece of paper!  Visual Analytic tools in general tend to neglect the reporting aspects of the job.

For the future, many of the general challenges facing are practical ones. Tool support for versioning, auditing data access, document searching and collaboration could be better. Tools need to be easily deployable by IT staff if they have any hope of adoption. The amount of available data is growing, but perhaps more importantly there are now more and more data sources that need to be checked during an investigation. Any help in getting data saves the analyst valuable time. Lastly improved summarization/aggregation techniques for large data sets would be very welcome.