Meaningful analysis relies on more than just the quality of the data itself, data scientists say. Not only must individual data points be accurately compiled, but they must be consistently formatted, stored and presented. It’s a huge (and often underrated) task when it comes to gleaning insights from data, often involving work from data scientists and data analysts.
In data-science parlance, this is called “data governance.” The idea emphasizes a consistent process of data-handling; or, in the words of David Ricciardi, president of Proximo, an analytics firm in Jersey City, N.J.: “A consistent process that supports clean, consistent, reliable data.”
As organizations compile information in multiple formats from an increasing number of sources, the importance of this critical (but distinctly unglamorous) area of data use is only going to grow. And while artificial intelligence (A.I.) and other technologies become more sophisticated, it falls to tech professionals on the team level to make sure the data that forms their company’s analytics is, as they say, ready for prime time. Like it or not, data governance is mission critical.
“The thing that most companies discover when they go to launch a better analytics program or a better intelligence program is that they need to have a data governance process,” said John Sumser, principal consultant at HRExaminer, an industry analyst firm in the San Francisco Bay area. “The naming conventions and field size and all those pathetic little things are in disarray.” In order to take advantage of new analytics tools, he added, “It’s almost always necessary to go through a data governance process.”
Hunkering Down for Consistency
Creating such a process isn’t a short-term effort. Sumser said. It can take a year and involve political battles as much as technical planning. Different departments may refer to the same data in different ways, for example, and the resulting inconsistencies prevent technology from uncovering the data’s patterns. “You have to get apples to apples inside of the data,” Sumser said. “So the data cleaning and governance step is always the first real step.”
Obviously, such details don’t take care of themselves: Someone has to own the process, Ricciardi said. In many ways, that person’s role is to be the enforcer of consistency while the data team interviews stakeholders across the business, compiles data from different sources, and makes sure it’s all up-to-date. Data governance is something that stakeholders must pay constant attention to.
“Somebody’s got to be in charge of that,” Ricciardi said. “Some people will think everybody’s in charge, but if everybody’s in charge, nobody’s in charge.” The remedy, he said, is having someone focused specifically on the consistency of data-gathering.
Consistent Nuts and Bolts
Bear in mind, good data governance isn’t only about consistent data storage: It’s an important aspect of most every component of data management. Consistent terminology must be used from the time data is gathered through the moment analytics are presented to any stakeholder.
For example, a company may have a definition of “anniversary date” that everyone understands: the day a worker signed the organization’s employment agreement. “You would want to make sure that you’re getting a system that uses the same terminology on the user interface as it does on a report,” Ricciardi said. “To me, that’s harmonization. Everything’s on the same note. It’s got the same sound, the same pitch.”
Data Governance: A Single Source of Truth
One way to ensure such harmonization is to create a data glossary, which will act as what many in the business call “a single source of truth.” Data glossaries allow users to see exactly how terms are defined, and how terms in one system map to their use in another. For example, a company’s official term for the day an employee began work may be “anniversary date,” but the glossary will note that legacy systems use the term “start date.”
While making data consistent is a simple idea, it’s extremely difficult to execute, especially in large companies. Many organizations implement different systems for various segments of the business. That can result in payroll that stores data in one way and workforce-management systems that store data in another. It's the polar opposite of good data governance.
Other organizations may have homegrown legacy technology in the mix, which reports data using even more terms. To make things extra-complicated, the group managing one system may be located in New York while the team managing another is in Chicago. Such dynamics make imposing consistent processes and nomenclature, to say the least, challenging.
It’s a hard a dirty job, data scientists agree. But for an organization to get full value from its data, someone has to wrangle with data governance.