It’s not unusual for the quality of database software to exceed the quality of the data it processes, yet from the end-user’s viewpoint, system quality is limited by the lesser of the two. Companies everywhere are faced with databases full of inaccuracies and out-of-date or missing information. The problem is as obvious as the nose on your face, but like your own nose, it can be difficult to see. It’s hard for companies to come directly to grips with their own data-quality problems, though nobody has trouble seeing the other guy’s. What companies tend to see instead is a problem in the aggregate of software + data. Since the software is always easier to fix than the data (there is just so awfully much data), companies set out to fix or replace the software.
As none of this makes much sense, the essential thing to discuss here is not why we shouldn’t do it, but why we do it even though we shouldn’t. Part of the reason is a special instance of news improvement (see Pattern 45): The bad news that 2.4 percent of this month’s invoices were returned as undeliverable makes its way up the hierarchy, being greeted at each level with the angry question, “Well, what the hell are you going to do about this and damn quick?”
The damn-quick part immediately precludes extensive manual fixing. The vague answer is that a serious “data cleansing” effort will be started pronto. This charming little phrase means different things as it moves up toward the CEO level. At the bottom of the hierarchy, data cleansing means getting on the phone and Internet and poring over correspondence files to research and correct each separate bad datum. At the top, it means working smarter, somehow teasing out the right data by cleverly processing the bad data. Since funding comes from the top, the funds that are allocated are typically tied to the working smarter approach rather than to a small army of clerks to do the real work.
It’s worth pointing out that data can be corrupted (for example, by incorrect computing), and in this case, there are some at least partially automated ways to undo the damage by retrieving earlier backed-up versions. Similarly, when the same data are separately recorded in multiple systems, some automated data cleansing can help to isolate the better variant. In both cases, automated data cleansing depends on an ability to exploit data redundancy. While it’s easy to imagine an example of redundancy coming to our rescue (System A has an old address, but here’s a break: System B has the new one), real instances of poor data quality that can be automated away are few and far between.
The major cause of declining data quality over time is change. This spoilage in the asset we call “corporate data” can only be repaired by manual fix. Imagining otherwise just puts off the day of reckoning.
Tom DeMarco gives the keynote at the OOP Konferenz 2015 in Munich, January 29, 2015.
Business Analysis Conference Europe. Details and registration a Details and registration at Business Analysis Conference Europe 2015.
Als auf der Welt das Licht Ausging, the German edition of Tom DeMarco's science fiction epic, Andronescu's Paradox, has now been published by Hanser Verlag in Munich. Translation by Andreas Brandhorst.
Read Tom DeMarco's article from the July/August edition of IEEE Software: Sigil, BlueGriffon, and the Evolving Software Market.
Announcing the publication of the third edition of Tom DeMarco and Tim Lister's iconic text, Peopleware: Productive Projects and Teams. The book is available now from Amazon or directly from Addison Wesley. See press release on Business Wire.
"This war isn't going to blow anything up, only turn everything off."
Read Tom DeMarco's essay from the July/August issue of Software Magazine. It's entitled, Bells, Whistles, Power, and the Requirements Process.
The preparation course for the IREB "Certified Professional for Requirements Engineering" is now available as video training. Learn at home or any other place. Including questionnaires to prepare you for the multiple choice test.