Two approaches to coding
I’ve wanted to write about coding for some time, but somehow there was always something easier to write. Coding becomes fashionable, moreover, I’ve noticed that journal reviewers are becoming more and more interested in coding.
The trigger for this post came from another blog. In a recent post, Allan McDougall does a great job describing an approach to coding advocated by Sally Thorne. And given that I think about coding in a different way than Prof. Thorne, I decided to write.
Let me start by saying that I have coded data for longer than I care to remember. It was during the project on border communities that Ulrike Meinhof and I decided that we needed a way to get a grip of the masses of data we were collecting. It was the time that first coding software, very different from what we know now, started appearing, we were testing a number of programmes, finally settling on one. I’ve been using it ever since, incidentally, even though there is practically no resemblance between what it was then and what it is now.
Now, before I tell you about how the approach to coding I use, let me tell you about the approach the first approach to coding. There are of course many versions of it, I use Prof. Thorne’s account (well, as described by Allan) only to point to what I think are the main tenets of the approach; I also accept that my account is likely to miss much of the nuance. Still, the main point is that coding is revealing of the data we collected, it is a result of ‘deep understanding’ of the data (whatever it might mean). Whether done procedurally or in a way in which the coder simply engages with the data, coding is part of an analysis.
Indeed, Dr McDougall quotes his tweet:
Coding is not the same as deep, real exploration of qualitative data. We need to shift into deeper form of inquiry before codes.
I’m not exactly sure what deep and real exploration is, still, it seems, that coding happens towards the end of the process of engagement with the data. Though it is unclear (probably because the paper, and in the process the blogpost, focused on coding), it seems that coding is part of the analytical process, that part which engages with the data in a more formal and structured way. Indeed, it seems that coding gives the process this structure.
Now, the second approaches to coding, one which I prefer and practice, reverses the place of coding. In such an understanding, coding starts the process of data engagement. It is a way of managing a large set of data, which is too large to be repeatedly read and re-read. Much like beforehand individual utterances or exchanges, written on index cards, could have been sorted, arranged and re-arranged, often on the floor, coding is an efficient way of arranging the index cards.
Of course, just like arranging index cards on the table, coding is not ‘innocent’, any ‘arrangement’ or categorisation is an intervention. It is important therefore to understand what is being done. Taking my own work, such arrangements, and therefore coding, are always based on intersubjectively identified text fragments. So, for example, I can code for mentions of depression (i.e. there will be an identifiable reference to the illness), usage emotion labels, references to children in the informants’ stories and such like. The coding is always based on ‘objective’ linguistic aspects of the text.
Now, what is crucial in that understanding of the coding is that it is not used as part of the analytic process. Just as I might, as I have, separate stories by men and women, I can identify all these fragments which refer to informants’ children (more can be said here, obviously). And this is key: coding only offers me an easy way to focus on the part of the corpus I am interested in. It is only then when the analysis begins and I look for ‘objective’ linguistic patterns such as, for example, agency structures in the text.
And here is the crucial difference between the two approaches to coding. The first is used as part (if not most) of the analytical process and the results of the research are based on the coding. In the other approach nothing of the sort happens. Coding is a process of sorting the data. The analysis starts after the coding. Put differently, in the first process coding is central, indispensable, in the other, coding is incidental and in the case of smaller sets of data unnecessary.
This raises a number of issues, particularly with the first approach. I think that similar criticisms to those I made with regard to thematic analysis (see my post on how not to do qualitative research), can be made with regard to coding in the first approach. What exactly is ‘understanding data’, let alone ‘real exploration of the data’? If key is understanding, moreover, how do I ‘understand’ a storyline or a theme? How can there be a pattern without a reference to how the pattern is identified. In other words, if a coder/researchers claims discovery of a pattern, shouldn’t I, a researcher across time and space, be able to identify it myself? So I keep repeating the question: as you code your data, what are you looking for? And will I find, roughly, the same things, if I coded your data? I do realise, of course, that I ask my questions firmly placed in my kind of coding. I still think that they are worth asking, even if only to be dismissed.
That said, I’m not trying to pass ‘my’ coding practice as unproblematic. It has its own issues. Where does a fragment start? While I can easily use means of textual cohesion in identifying fragments, can I also use means of achieving the more nebulous coherence? Such questions are not trivial and require a different post, however, I would argue their significance pales in comparison, as they do not directly inform the analysis. Yes, they may make me miss an additional example or eight, but given that in qualitative research I don’t claim to propose how often a phenomenon occurs, I barely care. In other words, my claim that, for example, men linguistically distance themselves from their emotions, will not suffer from the fact that I miss 10 per cent of all the fragments. But yes, I suppose I can miss this one type of such distancing, perhaps two. The rest of my findings is sound, though.
I know that I have probably already been too boring about such methodological issues. Yet, even if qualitative researchers are glorified story-tellers, even if our subjectivity is important in the research, our story telling should be different from writing fiction or from journalism. In other words, neither our coding nor other results should only be based on a whim which we authorise by investing it with academicness. There must be more to qualitative research and the analyses it offers. My claim (made in the book on men and emotions) is not only ‘mine’. It results from an analysis which is, should someone wish to do it, repeatable (which is not to say it can be replicated, which is a concept from a different story).
As I read qualitative research, I don’t want to read John’s or Mary’s stories about people they met. I want to read analysis which, if probed, goes past their ‘subjectivity’. This is because as much as I might love John and Mary, I am scarcely interested in their individual stories. I still want research results.
Now, interestingly, it is the first type of coding which has become way more popular than the other. So, I end with a plea to journal reviewers. It might be a good idea not to mix them, as you now do, completely ignoring that there isn’t just one coding practice.