How Much Data Is Enough?

Data warehouse management and data analytics always had the challenge to decide what data to store and for how long to keep the data. This is even more relevant with today’s Data Lakes and the possibility of storing increasing volumes of information at cheaper cost in the cloud. Additionally, new data types, such as IoT or Social Media, have emerged that provide millions of records that may or may not be important to analytics at later stage.

Regulatory compliance is often used as the benchmark of what data should be stored and for how long. This can range from a few years to decades or indefinitely. In the airline industry, aircraft maintenance records must be held for the life of the asset plus 7 years. For most aircrafts that period is 20-30 years in which records need to be retained. The magic 7-year mark is another benchmark that is mainly relevant to financial and tax data but, has been defined as the regulatory retention policy in many cases.   

Whilst regulatory retention compliance is pretty clear-cut and well defined, what about the other data types that make up the majority of corporate data.

We can probably all agree that the value of raw data diminishes over time to the point that it is no longer relevant. Aggregated data might never reach the point of no value. A good example is trend information such as stock prices or stock index information that is still somewhat relevant even after 100 years.

The problem is that we don’t know whether data might be useful in the future. New analytics models might want to analyse new patterns and formulas and test them over longer periods of time and in different market situations. With that in mind a lot of data managers are reluctant to remove data with the result that in a lot of cases organisations have large pools of stale data that is increasingly hard to manage. 

 It all comes down to the cost/benefit assessment which in itself is problematic. How can we put a value on data that is easily measurable? One way to do this is to assess the usage patterns and translate data access into a data governance metric that will be used to assess the retention policy.

A clearly defined data governance framework must be put in place. The cornerstone of the governance framework is a clear understanding and profiling of the data that is available. Secondly, a clearly defined retention policy must be defined. This should not be a sweeping statement across the organisation but must be defined within the individual data domains.

 

Some hard decisions need to be made on how long raw data should be held. As hard as it sometimes is to let go of data the cost/benefit ratio for keeping records is often not warranting the further retention of data even if it is held in low cost archive storage.   

Let Fusion Professionals work with you to develop the most appropriate Data Governance framework and strategy for your organisation.

Achim Drescher

Achim Drescher is the Managing Consultant of the Big Data and Analytics Practice at Fusion Professionals.

With 30 years in the IT industry he is an Expert in Enterprise Software and Data Architecture, Data Governance frameworks and modern analytics platforms for Big Data and Data Lakes.

https://www.linkedin.com/in/achimdrescher/

achim.drescher@fusionprofessionals.com

Fusion Insights

In recent years data volumes have been increasing dramatically. This has created major challenges for traditional analytics platforms in terms…

MORE INFORMATION

With the increasing volumes of data that can be cost effectively stored in the cloud, comes increasing responsibility. The current…

MORE INFORMATION

With the advancement of technology and abundance of data your business receives on a daily basis, companies are now in…

MORE INFORMATION

Fusion Professionals held its annual Fusion Summit last Thursday the 18th of October at the Rag and Famish Hotel in…

MORE INFORMATION

The Client is one of major NSW government departments providing services to public. The Department had been experiencing performance issues…

MORE INFORMATION

Though its conception dates back to 1979, containers made their mark as much needed, major technology assets in 2000. Digital…

MORE INFORMATION

Objective The intelligent mobile app-based lending system is a new field, blending recent technical developments in mobile phones and Artificial…

MORE INFORMATION

Our Client is a well-known Australian freight logistics company, operating in railway freight and shipping.  The company embarked on a…

MORE INFORMATION

Data warehouse management and data analytics always had the challenge to decide what data to store and for how long…

MORE INFORMATION

Cloud computing is becoming a preferred storage platform for IT managers and organisations in general. In Australia alone, 31 percent…

MORE INFORMATION

Serving your customer in the best possible, most efficient way should always be the major goal of any organisation. The…

MORE INFORMATION

Moving out from proprietary software seems like a daredevil act, considering the possible data security issues some open source databases…

MORE INFORMATION

The Challenge Complex IT environments can pose significant technical risk that, if not managed adequately, have the potential of major…

MORE INFORMATION

Fusion Professionals has signed a partnership agreement with Waterline Data ( https://www.waterlinedata.com/ ) the leading provider of Information Catalogs and…

MORE INFORMATION

Most people do not like change. As much as possible, they want things to stay the same that is why,…

MORE INFORMATION

Regardless of your infrastructure whether you are running in the cloud or on-premise, there will always be a need to…

MORE INFORMATION

Data Analytics tools provide Data Scientists and Data Engineers with the instruments to find patterns in data and provide business…

MORE INFORMATION

Fusion Professionals held its Fusion Meld 2018 event last Thursday, the 17th of May at the Terrace Hotel in North…

MORE INFORMATION

Over the past 10 years, data has grown into a behemoth that dominates business intelligence. A huge percentage of the…

MORE INFORMATION

A data lab is a well-equipped environment that allows organisations to explore and examine new ideas by combining new data…

MORE INFORMATION