Information is interesting. Data is dull.
It’s like going to a play. Behind the dazzling performance on stage, a lot of difficult, time-consuming, and mundane work goes on “behind the scenes”. Similarly, many Business Intelligence software vendors offer up “Great and Powerful Oz” presentations full of fireworks and angelic choruses.
Yet, ask about the issue of data quality and you might be encouraged to ignore the “man behind the curtain” even though he seems to be pulling a lot of important levers.
Entire books have been written about data quality. However, one key concept to remember when thinking about using data for reporting and analytics is “intended use”. Data from a transactional system may have very different data quality requirements from the data quality needed for analysis.
For example, in a recent project for a client, our firm was asked to build a decision support application using data from a system used by nurses to record patient survey information. The system worked well for the nurses to do their jobs which involved interviewing patients about health status. Most questions were closed-ended, multiple choice responses. However, some crucial data was collected as free text. This “unstructured” data was easy for nurses to use when reviewing patient cases but when analysts sought to use the data to develop trends, it was all but unusable.
Another example from the same system was patient age which was chosen from a pre-defined pick list. Errors in keying data were reduced in the transactional system but the data was stored as text rather than in a numeric format. Nurses could review transaction system data and see age easily. However, analysts were not able to calculate average age or age ranges.
The solution for these issues when considering use of the data for reporting and analytics was to thoroughly review the data required to be extracted from the source (transaction) system and evaluate its fitness in the decision support system.
In this project, the text formatted age data was converted to a number as it was placed in the reporting database. The free text responses required a more involved solution in which the transaction system was changed to force more responses into a pick list format thereby making the data more consistent.
When considering important reporting and analytics efforts, think first about the underlying data before choosing a “front end” application. Once the source data issues are thoroughly understood, then figure out how to best manipulate and view the data. If you don’t pay attention to the data man behind the curtain, you’ll soon get that bad feeling, like Dorothy, that you’re not in Kansas anymore.