Data lifecycle management cuts across several pillars of a data and analytics strategy.
The lifecycle is about processes and technologies that take data from source through transformation workflows resulting in useful analytics delivered to the right consumers. It keeps your data fresh, your insights valuable and your teams invested in getting the most out of your data and analytics processes.
But data lifecycle management is also vitally important to data governance. Having data management workflows and accompanying rules in place is fundamental to good data protection and information security.
When thinking about data protection, you’ll have to consider how long information is available to users/consumers (considering access rights), which might differ by role within the organisation. For example, in HR, certain data may only be available during the recruitment phase, or only aggregated data available.
Similarly, compliance with legislation and regulation is paramount, so you must follow correct protocol for data retention, for example with GDPR-related information. There could also be industry-specific compliance requirements, such as patient information in healthcare. It’s important to establish internal best practices and standards with these compliance needs in mind, as attaining certifications (such as ISO) will require evidence that you have appropriate controls in place throughout the data lifecycle. As such, ensuring regular audit of the information you collect and store.
Important considerations for the data lifecycle are the retention and disposal policies applicable for the variety of data types an organisation will handle. For example, moving data to low frequency access storage is a useful alternative to simply disposing of it, and tiered storage systems are helpful in retaining data based on the frequency of access i.e. infrequently accessed storage can be cheaper than storage mechanisms that give you immediate and high performing access retained data.
Archiving is another key aspect of data lifecycle management: how long does raw source data being consumed by downstream processes and transformations need to be retained? Can this data be archived or disposed of and then reproduced from the source on demand if needed? Try to answer these questions as soon as possible in designing your data cycle, to avoid issues down the line.
You need to think about how future transformation projects can impact your data sources, and consider automating the validation of data throughout its lifecycle including aggregations and analytics.
In this context, data quality and integrity checks can be gates to progress, enabling workflows to be automatically halted when issues are detected and require action to investigate and resolve. This ensures issues are fixed as an when they arise and maintains continuous quality and integrity of data.
Opportunities to automate the data lifecycle must be understood and become part of the data and analytics roadmap. Automation can span almost all aspects of the data lifecycle including data quality and integrity checks, execution of extraction, loading and transformation (ELT or ETL) pipelines, data cataloging, scheduling, compliance and more.
Finally, you should gauge the costs associated with data storage and movement in both the short and long-term, as these costs can run up quickly if not managed efficiently. Defining a solid technology architecture early can help ensure efficient and appropriate use of resources.
This brief introduction to data lifecycle management highlights some of the steps you can take to improve overall data processes, or to put effective processes in place if you’re just getting started on your data and analytics journey. As with anything business-related, it’s people that are key to the success of any of these methods, and to your overall data and analytics strategy.
Fundamental to efficient data lifecycle management is to ensure ownership and accountability is assigned as well as appointment data ‘champions’ or stewards to assist with implementation, collaboration between departments and compliance with governance and best practice.
This article is part of our Data-Driven SMB series. For more information, advice and resources on how to accelerate your organisation’s data and analytics maturity, click here or contact us today.
Nick Finch is currently CTO at TrueCue leading the engineering effort behind the TrueCue Platform as well as CIO across Concentra. A specialist in building and leading teams focused on delivering scalable, bespoke cloud based solutions and products with over 19 years' experience in technical and leadership roles across data analytics, development, infrastructure, information security and QA.