Monday, 7 July 2014

Data: the wood..or the trees?

People happily use the terms 'data' and 'information' as if they were interchangeable; they are not. It does not take much reflection to realise that, although all data is indeed information, not all information can be considered to be data. 
Consider the rings of a tree:

It is not entirely counter-intuitive to consider the rings of a tree, as shown here, to be an example of a natural data visualisation. Rich in information - bearing witness not just to its age, but also to the environmental changes it has endured - surely this is data in its purest form? Metaphorically perhaps, but not in reality.
To extract the sense contained here requires a large amount of interpretation and contextual knowledge. We need to know how often the rings form. We need to correlate the variations in the rings with their environmental causes. We need, in fact, to have a 'ring theory' to make sense of these strange markings that run through the trunk of a tree. Importantly, we will need to apply this ring theory to the information in order to convert it into data that we can communicate to others in a way that they will be to be able to process and manipulate.

Now, you may agree with me that there is information here, albeit requiring a supporting body of knowledge to be extracted, yet disagree with me when I say it is not data. After all, we have a sequence of observed regularities to which a simple algorithm can be applied which results in a fact, viz. the age of the tree. I can't help admitting that these are key components of what I believe is a list of necessary features for an information structure to qualify as data, but they are not sufficient conditions. If you remove the annotations from the image above, all you have is a set of irregular concentric circles.

For information to be data it must be identifiable, the variables need to be labelled in some way that corresponds with the world. There's no point knowing the age of a tree when you don't know which tree. You have information, but not meaningful, useful information that you can convert into knowledge, because you have no data. Data has to be about something and that piece of the puzzle needs to be a built in property of the information structure/content instance that can sensibly be called 'data'. A primary condition for something to be 'data' is that it be transferable, that other agents can interpret it, manipulate it and draw conclusions from it. 

In future posts I aim to investigate these criteria and others to build a comprehensive definition of the meaning of the term 'data', but for now I hope you'll agree (at least) with the assertion that not all information is data. 

Now, enjoy a visualisation of some data about trees that - necessarily - does not depend upon a count of their rings.