“There were 5 exabytes of information created between the dawn of civilization through 2003, but that much information is now created every two days.” – Eric Schmidt ( former Technical Advisor – Alphabet, Google)
Data is the new currency; it is everywhere, and continues to grow exponentially in its various formats – structured, semi-structured, and unstructured. But, whatever the format, businesses cannot afford to slack on the proper accumulation and categorization of data, otherwise known as Data Governance, given that optimum value can be obtained from these data sources. As Jay Baer, a market and customer experience expert, remarked, “We are surrounded by data but starved for insights.”
So, what exactly is Data Governance, and what are its key elements?
Imagine that you wanted to rebrand and launch a failing product and needed some insight into achieving this by looking at YTD sales analysis for the previous five years or perhaps customer feedback for those same years. But, the data for your analysis or information is fragmented between different data storage units or lost due to local deletion by error. What a loss regarding revenue, time, insights, and progress.
Here is where the need arises for a proper Data Governance strategy to ensure such irreparable damage does not happen. A proper Data Governance strategy lists the procedures to maintain, classify, retain, access, and keep secure data related to a business. As data grows exponentially daily, mainly fueled by Big Data and Digital transformations, and with an estimated growth of 181 zettabytes by 2025, the need for a proper Data Governance strategy to ensure proper data usage becomes imperative.
Below are four elements that are key to a proper Data Governance strategy:
To prepare a proper Data Governance strategy, one must first understand the circle of life that data goes through. Data is created, used, shared, maintained, stored, archived, and finally deleted. Understanding these core aspects form the main points of Data Lifecycle Management.
For example, John Doe applied for the position of QA Manager. He applied online on the company’s website (creation), and his resume was chosen by his employers (used) and sent to HR to offer him a job (shared). John accepted the job and started work with the company. His details were kept with HR to update records annually and for tax and legal purposes (maintained and stored).
Finally, John retired, and his file was handed to the Data Steward (archived), where it may or may not be kept (deleted) depending on legal retention policies.
Now compare that to the data lifecycle of a draft sales PowerPoint presentation. The presentation will be created, used, shared, and probably deleted in favor of the final version, which will go through the entire Data Lifecycle Management process. Understanding the data is critical; that is where Data Quality Management comes to the fore.
Let’s go back to the example of relaunching a failing brand. You finally found all the pertinent files you have been looking for for the past five years. But interspersed with the sales and promotion figures are files dealing with a final presentation and numerous formats of that presentation leading up to the final format.
What do you keep? What is needed and what is not, and how do you know the difference? This is where Data Quality Management (DQM) comes in. Essential questions to ask about data when observing DQM are:
Now that we have answered all those questions, imagine for a moment all this data – structured, unstructured, and semi-structured sitting in data silos or data lakes as one giant beast. Which begs the famous idiom question – Who will bell the cat?
Who will take on this humongous task of Data Quality Management, i.e., classifying, archiving, storing, creating best practice guidelines, and ensuring data security and integrity?
This is where Data Stewardship comes into play. Appointing a sole person or a committee (which is better) to create and oversee all the tasks of Data Management is the optimum choice in the eventual buildup of good Data Governance strategies.
The main job of a stewarding committee is to ensure that data is properly collected, managed, accessed when needed, and disposed of at the end of the retention period.
Some essential functions of a data stewarding committee are:
Should you keep the data, classify or not, and share it or not? There are many questions regarding the usefulness and usability of data. But one thing stands out – whatever the reasons, all data usage should be considered secure in the entire Data Lifecycle Management process to the point of its deletion.
Data Security be it encryption, resiliency, masking, or ultimate erasure, tools have to be deployed along with policies to ensure that the company’s data is safe and secure and used by the proper personnel.
While classification and maintenance of data is a crucial factor in governance, the time factor is as important an element as any other. The question – of how long to retain this data is relevant in the archival process, and the answer is not so black and white.
With local and global policies changing daily when confronted with new and sometimes imposing queries, data retention times vary from year to year. For the people responsible for maintaining or classifying the data, there is a need to store data immediately, pending proper relevance tagged to the data. While in-house storage units can house them temporarily, they are vulnerable to security breaches, deletion due to error, or data fragmentation.
The obvious choice is to store them in a centralized cloud archive with combative features like encryption, security, secure access, flexibility, and scalability.
Because data is located in a secure, centralized cloud archive, employees distributed in different geographical locations can access data at any time depending on their time zone, at the same time as other employees, or at multiple times.
But should all employees have access to everything? The Centralised cloud archive ensures that Attribute Based Access Control (ABAC) is in force, ensuring employee rights based on attributes assigned to them. These rights are usually enforced when creating DQM strategies or by Data Stewards based on changing company, local and global policies.
A centralized cloud archive system has the innate software technology to ensure no data duplication, ensuring only one final copy to file. This is in stark contrast to the in-house data silos, which promote data fragmentation and unnecessary duplication of files.
All the data is stored in a single searchable platform, making it easier for consumers to source or explore independently. Self-service access allows consumers to access any data they have permission for without having to request access from the data owners manually.
A Centralised Cloud Archive System is much more cost effective than traditional storage methods, as it eliminates the need for businesses to purchase and maintain their data storage infrastructures.
With a Centralised Cloud Archive System, all the data stored on it is available in a searchable format, making it easier for stakeholders to understand the information and use it for decision-making. This helps in improving transparency and accountability within the organization.
Moreover, with automated reporting, and a data monitoring system in place, there is transparency as to who is obtaining data access, when, and where.
When a file is no longer needed, has served its retention period, and has been approved for deletion by the data stewarding committee, it is easier to access this redundant data file from the Centralised Cloud Archive and permanently delete it.
SaaS cloud data archiving platforms that offer high reliability and availability with in-built disaster recovery sites (like Vaultastic does) will drastically reduce the RPO and RTO anxieties of CXO teams. And also eliminate the effort of performing backups of the data.
The Centralised Cloud Archive is more secure than other data warehouses or in-house storage. Cloud-based data archiving platforms like Vaultastic leverage the cloud’s shared security model to provide multi-layered protection against cyber attacks.
Updated patches, two-factor authentication, encryption, and relevant security controls ensure that businesses’ data are kept in a tight vault.
More and more businesses are migrating to the Cloud for their solutions, primarily their archiving solutions. The reasons are many – cost-effectiveness, data security, deduplication, better infrastructure, IT support, user-friendly, the list is endless.
Once a business has established its Data Governance strategy and implemented it, the next step is to ensure this data is secured in a proper location.
What better way than the cloud, which is proving its practicality day by day.
If you have your Data Governance strategy in place, Vaultastic, an elastic cloud-based data archiving service powered by AWS, can help you quickly implement your strategy.
Vaultastic excels at archiving unstructured data in the form of emails, files, and SaaS data from a wide range of sources. A secure, robust platform with on-demand data services significantly eases data governance while optimizing data management costs by up to 60%.