Monday, 7 July 2014

Data: the wood..or the trees?

People happily use the terms 'data' and 'information' as if they were interchangeable; they are not. It does not take much reflection to realise that, although all data is indeed information, not all information can be considered to be data. 
Consider the rings of a tree:

It is not entirely counter-intuitive to consider the rings of a tree, as shown here, to be an example of a natural data visualisation. Rich in information - bearing witness not just to its age, but also to the environmental changes it has endured - surely this is data in its purest form? Metaphorically perhaps, but not in reality.
To extract the sense contained here requires a large amount of interpretation and contextual knowledge. We need to know how often the rings form. We need to correlate the variations in the rings with their environmental causes. We need, in fact, to have a 'ring theory' to make sense of these strange markings that run through the trunk of a tree. Importantly, we will need to apply this ring theory to the information in order to convert it into data that we can communicate to others in a way that they will be to be able to process and manipulate.

Now, you may agree with me that there is information here, albeit requiring a supporting body of knowledge to be extracted, yet disagree with me when I say it is not data. After all, we have a sequence of observed regularities to which a simple algorithm can be applied which results in a fact, viz. the age of the tree. I can't help admitting that these are key components of what I believe is a list of necessary features for an information structure to qualify as data, but they are not sufficient conditions. If you remove the annotations from the image above, all you have is a set of irregular concentric circles.

For information to be data it must be identifiable, the variables need to be labelled in some way that corresponds with the world. There's no point knowing the age of a tree when you don't know which tree. You have information, but not meaningful, useful information that you can convert into knowledge, because you have no data. Data has to be about something and that piece of the puzzle needs to be a built in property of the information structure/content instance that can sensibly be called 'data'. A primary condition for something to be 'data' is that it be transferable, that other agents can interpret it, manipulate it and draw conclusions from it. 

In future posts I aim to investigate these criteria and others to build a comprehensive definition of the meaning of the term 'data', but for now I hope you'll agree (at least) with the assertion that not all information is data. 

Now, enjoy a visualisation of some data about trees that - necessarily - does not depend upon a count of their rings.

Sunday, 22 June 2014

The Philosophy of Data

Data Philosophy – a beginning

This is a blog about the philosophy of data. In many ways that means it’s a blog about nearly everything. Only ‘nearly’ because there are limits to what we can be called ‘data’. What those limits are, is, in my view, the subject of philosophical investigation. Just as are the following questions:

  •          What is ‘Data’?
  •          What is the Mind/Data relationship?
  •          Are there laws governing the nature of Data?
  •          Does Data exist objectively?

Seasoned philosophers will recognise the themes referenced by these questions, but I suspect they are new issues to many of those who practice the arts of identifying, retrieving, manipulating and presenting data. Everyone will notice that I have not raised any questions of ethics/value here. If those kinds of questions are pressing for you - sorry, that’s just not the sort of philosophy I do. 

I intend to explore these questions (and others) as well as the problems that will inevitably arise from attempts to answer them.  I welcome all comments and suggestions, I am not going to be able to answer any of these questions by myself, but possibly together we can at least make some progress towards understanding how to ask the best ones.

I promise to keep my posts short and to include at least one #dataviz in each one (because…you know…the internet).

Here's an indication of where data philosophy lies in the current zeitgeist against the fortunes of its soaraway sibling, data science: