Archive for the 'Uncategorized' Category

Naming the baby

After 9 months gestation it is time to name the baby.

The company is called Cambridge Intelligence.  Choosing names is often difficult but this was straightforward.  We’re in Cambridge and the product line adds a layer of intelligence to your data. Plus we’re from an intelligence community background.

We know we’re setting ourselves up for a fall.  When we do something dumb, as we inevitably will at some point, people will laugh and say ‘not so intelligent now, are you?’  But hey, one has to start somewhere!

So what is the product and the opportunity?

Lots of visualisation systems are kind of old-fashioned. Remember the days when you had to get a CD in the mail before installing new software?  Now you get a download link instead, but there are still a lot of problems with any approach that needs an install.  Desktop software creates problems for enterprises.  Desktop software has a massive total cost of ownership.  It is far cheaper and easier to deploy an web application from a server and make it available to hundreds of people directly.

Most apps in enterprises are already deployed wholly in the browser.  But visualisation systems remain stubbornly desktop based. Vendors often claim that access to the local machine is necessary for performance reasons.

With the power of current browsers, that position is no longer sustainable.

Our first product is called KeyLines.

KeyLines is a commercial strength software development kit for visualising network data in your browser. It is designed to fit into your Service Oriented Architecture (SOA).

Lots of developers are either just starting out in JavaScript, or don’t want to make the transition from strongly typed languages like Java & .NET, where they are comfortable with their choices and have great tool support.  And many more developers don’t have experience of developing graphical applications.

These developers can pick up KeyLines, and with a minimal amount of JavaScript they can have a graphical component embedded in their web application.  KeyLines handles all the rendering code & event handling.  The developer decides what data should be shown and how.

KeyLines works everywhere in the enterprise – even on old machines running IE – as well as the CEO’s beloved iPad ;-)

Developers can keep their server-stack the same: KeyLines is agnostic about where the data comes from.

Development managers and system integrators will be happy too: their development costs can keep low.  The last thing they need is an over-confident developer who spends a year developing something that it would be easier to just buy in.  And the visualization part will be supported on proper commercial terms, long after the development team has moved on to other things.

KeyLines can be used anywhere where understanding networks is important.  I.e., anywhere one can get business value from looking and analysing them.

So. That is the theory. Time will tell if we are right..

Contact us if you want to learn more!

 

Interesting year…

In January 2011 I resigned from my safe cushy job. Security blanket discarded, I read lots of start-up books & met lots of business people. I decided to start a company. After a bit of bureaucratic admin, I began to assemble a team, a group of suppliers for things that could be outsourced & a network of business mentors and advisors. Thanks to the friendly folks at ideaSpace the company now has an office in a great location alongside a great bunch of other start-ups.

Meanwhile and throughout the year I wrote a shed load of code. I abandoned Python (temporarily?) and threw myself into JavaScript both client-side and server-side, embracing nodeJS and many other projects. The vibrancy, diversity and downright helpfulness of the people in the JS community has been massively galvanizing.

Alongside the coding I started the hunt for a lead customer – someone who’d value the ideals of the company and benefit from the growing code-base. Following some good fortune, we found an opportunity. After a long negotiation we signed the deal to our mutual interests and delivery began in earnest. Having people bash, prod and break things, together with suggesting feature after feature has been incredibly useful. We couldn’t have asked for better partners. And in December we completed the first part of the contract successfully.

I’ve been blown away by the enthusiasm friends have for the venture and the wholehearted support of my family. I didn’t predict how much I’d learn about my own character, flaws and all. I’ve made some mistakes, sure, but nothing I can’t back out of later on.

It has been an amazing year, but the real test is coming in 2012. Can we build a brand and repeat the business model? What else can we make and how can we sell it? What will happen when the product enters the public domain? A collective shrug of the shoulders or a genuine pipeline of customers?

With these questions and many more, it is challenging to work out which are the important ones and which should be tackled first. I never expected this seemingly perpetual uncertainty. Nor the constant testing of one’s abilities in unfamiliar situations. But I’ve never felt so alive and so enthusiastic about the future :-)

My Sunbelt Top 7

Last week I was lucky enough to get to INSNA’s Sunbelt conference in Florida. Here’s my top seven papers:
  1. Kevin Lewis presented some work done with Andrew Papachristos on the structure of gang warfare in Chicago using data on inter-gang murders.  Kevin described putting a stronger methodology on data from an earlier paper (pdf). One thing I loved about it was the mapping of street terms to abstract network structures – ‘payback’ = reciprocity, ‘untouchables’ = high out-degree, low in-degree, etc.
  2. Jamie F Olson talked about the statistical properties of centrality measures of communication networks over time. I didn’t quite grok the talk but the gist was that by varying the time-window size and comparing centralities across time periods it is possible to identify the ‘best’ sampling window for the network. For example, he showed that a week was a good period to sample some email data. Apparently a preprint may be available soon on his personal page – I’ll be looking out for that!
  3. Ulrik Brandes (with Bobo Nick) gave a beautifully crafted visualisation design paper. They used Gestalt principles to put together sparklines-inspired glyph for showing network dynamics.  Very elegant.
  4. Elisha Peterson had some smart ideas for keeping node positions stable in visualisations of dynamic networks. He did this by putting springs between versions same node across the time slices (before & after). It seemed to make things more stable at the expense of some calculational complexity.
  5. Lin Freeman shared his insights on the many ways of finding cohesive sub-groups in networks. He gave a clear and concise history of various methods from social sciences, maths & physics. Then an outline of measures of success (modularity q, EI conductance, Freeman Segregation index, Pearson’s correlation ratio) before running the algorithms over a collection of data sets.  Success depends not only on the algorithm but also of course on the cohesiveness of the data. Conclusion?  Good:  Correspondance Analysis, Leading Eigenvector, WalkTrap, Fast Greedy. Not so good: Factions, Tabu, others.  I hope this work gets written up in a review paper soon.
  6. Mark Lauchs talked about the networks involved in a massive police corruption case in Queensland, Australia that were exposed by the Fitzgerald Inquiry. This talk demonstrated that it probable that ‘dark networks’ can never be found automatically: the bad cops were structurally similar to the good cops. The only practical way of uncovering the network inside is to identify at least one bad egg, and use network structures to work from there to get the wider picture.
  7. Joshua Marineau had some interesting insight into the benefit of negative ties within an organisation.  Although it has been shown that individuals who have negative ties under-perform, he claimed that being positively connected to someone who themselves have negative ties can actually be an advantage.
The legendary hospitality suite was as friendly as ever too ;-)

The Birth of a Link

This diagram is absolutely fascinating.
It comes from Easley & Kleinberg’s new book from an excellent paper by Crandall et al (2008) (pdf).
It is a sort of anatomy of how links between people are created: it tries to capture the birth moment and the forces before and after it.
The upward curve is intriguing but straightforward to explain by homophily – like seeking like.
The most interesting bit is the curve just before the first communication occurs.  People get suddenly more similar – a kind of gravitational attraction occurs in the affiliation network and the first communication is sparked into life, closing the triads.
Although is tempting to explain this by creating physics based models, as the paper does,  I can’t help feeling there is a simpler explaination.   I would guess that the base of the curve is generally where ‘awareness’ happens.  At this moment the editors become aware of each other, and at that point a basic psychological effect takes over: simple curiosity. People actively seek each other out, viewing each other’s activities and building a picture of the type of the other person. Partly this is also to de-risk the first encounter in order to make the right first impression.
It isn’t often that one sees abstract concepts like curiosity in science, but I guess that is the power of big data & a great set of research questions ;-)

Social Network Analysis

One of the nice things about my new role is that I get to find out what is happening in lots of other research areas.

I’m really delighted that I persuaded i2 to donate some money to INSNA, the international network of social network analysts. Next week we’ll be travelling to Italy to attend their annual conference and I’m really looking forward to spending some time with the community.

I’m going to be learning about NetworkX and CASOS/*ORA from the experts and giving a citation prize to Mark Newman for his work on betweenness centrality. We’ll be going to lots of interesting sessions and lear ning about the current state-of-the-art in social network analysis, with a view to helping us choose the next steps in our SNA programme.

And it just so happens that Italy is my favourite country and June is one of the best months to visit. It’s a hard life!

Visual Analytics for Security

A few weeks ago Jörn Kohlhammer invited me to give a talk at the VisMaster Industry Day in Darmstadt, Germany.  It was a relaxed informal meeting where I caught up with some friends like Enrico Bertini – and I even finally got to meet one of my heroes – Jarke van Wijk – which was really exciting.

My talk was on Visual Analytics for Security.  I gave an overview of the work of  analysts in the crime and intelligence worlds and the unique challenges they face. Many of those challenges arise from the subject of their analysis: people, in all their complexity.  I hope this comes across from the slide deck.

Visualization Intern Time Again

Phew, I just got budget approval for another internship – if you or anyone you know might be interested in a visualization internship in Cambridge UK this summer – please apply!

Visual Analytics Panel 2009

I’m appearing on a panel at VAST today, talking about the investigation & analysis process in law enforcement & national security.  Here’s what I wrote as a high-level overview:

A common theme across many cases is the discovery of identifiers of interest: names, addresses, phone numbers, email addresses, bank account numbers, amongst others. Patterns of activity are deduced, connections between individuals, the timing and the location of key events like sightings, phone calls, etc., can lead to the generation of hypotheses/lines of inquiry which help drive the direction of the investigation as a whole. Relationship link diagrams, timelines and maps are the three most commonly expressed visualization needs.

Collaboration has been emphasized in recent years. In terms of the typology presented in Illuminating the Path, we find that typically collaboration is asynchronous remote (different time, different place), or synchronous local (same place, same time). In some shift patterns one sees continuous work done by a revolving team (same place, different time), but that is relatively uncommon. Asynchronous remote collaboration is typically achieved by emailing files. This ‘baton-passing’ approach shares a lot with the way that documents are authored in many professions. The key advantage of this approach is that information can be exchanged freely across organizational firewalls: disadvantages are that there is no definitive version of the information and multiple copies of the document can cause confusion.

In the case of (same place, same time) collaboration, this is done using a shared screen at a desk, within meeting rooms equipped with projectors and/or interactive whiteboards, or often done away from the computer entirely in a relatively informal context. In the latter case, printouts of visualizations are often pointed at and scribbled on. Printing is much more important than may be first realized. As cases get complex, it is common to print out the current known state of the case and pin it up on a wall for the investigation team to see and draw on. Evidential and other procedural requirements, especially within the law enforcement domain, mean that visualizations must fit with a ‘paper trail’ of documents.

Analysts have a very strong sense of ownership over the products they produce, and visualizations are no exception. Analysts raise concerns that their visualizations may be misinterpreted when viewed outside of the context of the task at hand. To ameliorate this, and also to facilitate basic reporting needs, visualizations are very commonly embedded as pictures within textual reports. In this state, they lose their interactivity and the consumer cannot ‘drill-down’ on the information represented. Such images are often produced in a separate ‘production’ stage after the analysis has been done. At the reporting stage, it is very common for a visualization to have to fit onto an A4/Letter size piece of paper!  Visual Analytic tools in general tend to neglect the reporting aspects of the job.

For the future, many of the general challenges facing are practical ones. Tool support for versioning, auditing data access, document searching and collaboration could be better. Tools need to be easily deployable by IT staff if they have any hope of adoption. The amount of available data is growing, but perhaps more importantly there are now more and more data sources that need to be checked during an investigation. Any help in getting data saves the analyst valuable time. Lastly improved summarization/aggregation techniques for large data sets would be very welcome.

Visual Analysis Tools: Practical Considerations

So, you’ve spent months working in the lab developing a new visualisation technique or system and you finally got some time with real users. They really like what they’ve seen. You’ve done a good job of writing the paper, it has been accepted and appears on your resume.

But hang on a minute – despite your best intentions and the users’ approval, they aren’t actually using the system right now.  So should you commercialize the idea?  This would mean the ideas are exploited and perhaps could give you some money back for all that hard work.

What are the practical steps you will need to take?

You’ll have to make sure that you actually own the IP on the system too. I’d do that bit first.

Then there are the standard set of business problems like marketing, sales channels, CRM systems, pricing.  And the usual software infrastructure stuff of build systems, installers, change management, testing, documentation.

Installers are a nightmare. ‘But it works on my machine!’ isn’t going to cut it. In real IT environments, the IT manager is a key person you will need on your side. And his/her department will need to test your application for compatibility and other things first.  For desktop applications it isn’t uncommon for deployments to lag from 3 to 5 years behind the current version. That can be very frustrating for you and for your users.

But actually, I’d argue that all those things are a lot easier than the business of really understanding the users needs: that is much harder.  Did the new system really improve their performance? Were they just trying to be helpful and polite when they said they liked it? Or are you seeing well-known experimental biases like the observer-expectancy effect or the Hawthorne effect?

What about the user’s workflow?  How does the tool fit into their existing processes?

If it did increase their performance, could they put a value on that?  And I’m not being theoretical – I’m talking about a real dollar value here. Or some measure of success in terms of the business drivers of the organisation. You will struggle to sell it unless you can talk in business terms that your buyers will use.

In this context one has to question statements like ‘the goal of visualisation is insight, not pictures’. Actually I’d argue that the end goal is action, not insight. The true aim is taking better decisions.

Don’t be disheartened: these issues make a long list, but provided you are providing enough value and provided you think about these up-front you can save yourself a lot of pain for later on. And if you don’t want to think about these things, maybe you could even strike up a licensing deal with someone who does.

Oops I crashed Gmail

I think I might have crashed Gmail last week. Seriously.

The crash is well documented with the usual set of vague excuses including ‘high load on the service’. But was it my fault?

I had just got a new HTC Hero and was trying to migrate my contacts from my old phone.  I’d got myself into a state where I’d got the new contacts onto the phone from Google, but at that point I realised most of the fields were misaligned.

Not thinking what I was doing, I deleted all the contacts from the phone and then started to edit my contacts within Gmail. What I didn’t realise was that hundreds of these deleted phone contacts were also being deleted from my Gmail contacts. My phone also locked up. While I was trying to correct my list in Gmail I kept getting these weird errors that the contacts I was editing didn’t exist any more. Very odd. The contacts list was getting shorter and shorter. And suddenly BANG, I got a big error in Gmail saying ‘Your contact list is not available right now, please try again later’.  I tweet about it and a friend tweets back and says that everyone else is having the same problem.

Coincidence?  Almost certainly. I guess I’ll never know. I just have this lingering sense of guilt about it ;-)