The inevitable increases in computing power have brought us the ability to process larger and larger volumes of data in our data warehouses and operational data stores, as well as our transactional systems. Decreases in the relative cost of RAM have made it possible to store data sets in memory for processing and brought about an entirely new processing paradigm, and storage as always continues to be cheaper and cheaper. The result of this revolution has been the opportunity for companies to tackle data sources previously infeasible because of the amount or type of data the sources produce. This revolution, of course, is the phenomenon called Big Data.
Yet with these new sources, we find ourselves facing the same challenges we always have when it comes to how we interact with that data. Simply deploying new technology won’t answer questions for us such as:
- Where does this data come from? Is it correct and accurate?
- How do I integrate it with data from other sources and data I already have in my systems?
- How are we using this data throughout the organisation?
- What kinds of conclusions can I draw from the data and what are its limitations?
- Who is responsible for making sure that this data is accurate, timely, and available for us to use in our decision-making?
Yet these questions need not stimulate new answers; they were just as relevant to the data we generated from our transactional systems, purchased from data-gathering sources, or created through our analytics platforms. The key then, as it is once again, is effective data governance.
As a reminder, data governance is nothing more than a set of processes to manage data assets in the organisation. It might be as simple as password-protecting database accounts or as complex as a formal Data Review Board with data architects, IT leaders, and representatives of the business meeting to plan the use and innovation of data sources in the company’s analytics.
The difference in the “modern era” of post-Big-Data challenges is that the data governance process at an organisation has a role of increasing importance in maintaining the integrity of the company’s data. Organisations that give lip-service to data governance when attempting to integrate external data streams or high-volume transactional content (such as real-time logistics tracking or plant-floor sensor output) will not even realise their mistakes when interested business units need to integrate this data with existing sources yet cannot, or (worse yet) can but only to create inconsistencies that support contradictory conclusions!
Richard Neale said in a thought-provoking article that “the quality of the data will vary depending on the information governance that surrounds the source.”1 I’ll take that one step further: the quality of the data will be determined principally by the data governance practice that manages it. The manner in which we manage our data through effective and deliberate processes will determine whether the data we choose to gather and use can be used to drive powerful and incisive decision-making or lead us down blind alleys of self-deception.
So do you have a data governance practice? How comfortable with it are you in the face of mounting pressures for more and more data?
DataHub Writer: Douglas R. Briggs
Mr. Briggs has been active in the fields of Data Warehousing and Business Intelligence for the entirety of his 17-year career. He was responsible for the early adoption and promulgation of BI at one of the world’s largest consumer product companies and developed their initial BI competency centre. He has consulted with numerous other companies about effective BI practices. He holds a Master of Science degree in Computer Science from the University of Illinois at Urbana-Champaign and a Bachelor of Arts degree from Williams College (Mass)..
View Linkedin Profile->
Other Articles by Douglas->