Evaluation of Temporal Visualisations

Academic visualisation papers often include a small evaluation element towards the end. Evaluations are rarely comparative and are performed by visualisation experts rather than HCI professionals. Generally this results in huge biases towards authors’ own “cool” visualisations.

Timeline Compare

An example is the ‘timeline compare’ project above – although well intentioned and designed well, the evaluation consisted a very small sample size with no working prototype to play with.

Evaluating interactive visualisations is a subject in it’s own right: it is a branch of human computer interaction research. It has inherent problems – tools have biases towards the data sets and tasks they were designed for, hence can be hard to compare. Similarly, users have a broad range of expertise and domain knowledge. So a firm methodology is necessary.

Experimental valuations consist of experiments to prove or disprove hypotheses. Typically these revolve around the ability of users to perform tasks: this is measured in terms of time taken and accuracy. Aesthetics and general usability are dealt with using questionnaires. Videos are recorded of activities. A good experimental evaluation is therefore very labour intensive. Furthermore there is a danger that reporting focuses on results of the form “X is better than Y for task Z” but the underlying reasons why also need examination.

Papers on evaluations can be restricted to very specific visualisations, so can be difficult to derive general principles from such studies.

Ideally I’d like to summarise lessons learnt, but due to a lack of depth in the literature on evaluation aspects of temporal visualisations, I’m unable to offer this. Instead it seems a task focused ‘fit for purpose’ study should be made for each domain.

