Data Analytics Trends in 2019
In recent years data volumes have been increasing dramatically. This has created major challenges for traditional analytics platforms in terms of storage, management and cost. This trend will continue and accelerate, requiring companies to find better and more cost effective ways to manage their data loads.
Data Lakes vs Data Hubs
The answer to the massive increase in data volumes was the Data Lake. The main aim was to use cheap storage to hold the raw data and only extract, transform and load (ETL) data for immediate needs. This reduces the cost of storage as only the data that is required is now moved to the centralised data warehouse or higher aggregation levels within the lake.
The problem is that there is no clear consensus on what a Data Lake is and how it should be architected. The majority of focus is solely on the storage of raw data and not so much on the usage, servicing, security and privacy. With this trend, many are questioning the value of a data lake.
As a result, the Data Lake concept will be becoming less popular and likely to be replaced with well architected Data Hubs. The Data Hub is the natural progression from the Data Lake as it not only focuses on storage but also provides layers of fast structured data in a cloud environment and a servicing layer that includes self-service as well as API access.
The focus for large scale data users will be shifting to how the hub can deliver the various different data sources and insights effectively to the business and customers in near real time. This will create the much needed value proposition that was envisaged from the Lake.
De-Centralisation of Data
Cloud storage has made it easy to store vast amounts of data for a relatively low cost. This is creating many pockets of raw data stored across the organisation as teams store massive amounts of raw data.
The question that every organisation will have to ask themselves is whether they want to hold on to a centralised analytics platform in form of a data warehouse or whether a decentralised approach is better..
The advantage of the decentralised approach is that the data is stored and maintained by the owners of this data. They can best manage quality, retention, privacy and security. However, to allow synergies across the organisation a centralised framework for governance, data discovery and data servicing must be in place.
The centrepiece of this framework will be an Information Catalogue that integrates the data on a semantic level and provides tools that allow Data Scientists and business people to access the data across the organisation. Analytics sandboxes will be required that can provide masked data for analytics modelling and pattern development.
The requirement for well designed Data Governance frameworks will continue to grow. With decentralised Data Hubs and the huge data volumes on one side and increasingly higher demands in privacy and security regulations on the other. As a result, it will be critical for organisations to invest in new organisational structures with clearly defined accountabilities for data as an asset.
Spearheading the changes in Data Governance are roles like Corporate Data Officers (CDO) who will oversee a number of Data Stewards (aka Data Curators) in their domain. The stewards ensure the quality, management and discovery of decentralised data within the various business domains.
In a “best in class” scenario where data is stored and managed in a decentralised framework across the organisation and Governance being centralised. It will be critical that the data sources can be embedded into a centralised catalogue. New tools are coming on the market that helps to discover data across the organisation and identify synergies automatically.
Tools and processes must be in place that allows staff to create productionised data pipelines that can feed from different decentralised data sources to provide business and customer insights.
The trend of moving analytics platforms from on premise to cloud will continue. Cloud offerings provide more flexible and often more cost effective storage solutions that have a number of advantages. Firstly, it is more effective to bring the processing to the data than bringing the data to the processing. Serverless compute within a cloud environment and the ability to spin up massive clustered analytics platforms on demand for short periods of time allows users to decentralise the analytics workload in a cost effective way.
Organisations have to be careful not to fall into the trap of approaching their cloud strategy with the monolithic mindset of the last 20 years. A successful strategy is to develop a layered data architecture that hooks decentralised data aggregation levels into a centralised data delivery framework allowing all parts of the organisation to access data appropriate to their clearance and data requirements.
Wider Experimentation with Machine Learning and Artificial Intelligence
Machine Learning (ML) and Artificial Intelligence (AI) have been buzzwords for a long time now. Many R&D focused organisations have productionised ML and AI implementations, but the adoption of this technology has been slow.
In the coming years, many more organisations will start experimenting with ML and will find new use cases in which the technology will be useful and add value. In data analytics, this will require new skill sets in the BI departments. Data Engineers will need advanced knowledge in modern analytics technology such as Hadoop, Spark and various different Machine Learning algorithms.
The spectrum of the different ML training models is quite diverse and the innovation rate is still quite high in this field. As a result, any investment in this space needs to be tightly embedded in the long term data strategy in order to make sure that the value added is clearly identified before starting a new project.
“With Great Power Comes Great Responsibility” One of the biggest ongoing responsibilities that comes after commissioning an Exadata appliance is…MORE INFORMATION
According to Constellation Research, a little more than half of traditional Fortune 500 companies have disappeared due to the lack…MORE INFORMATION
Fusion Professionals has signed a partnership agreement with Dataiku, one of the world’s leading machine learning platforms that moves companies…MORE INFORMATION
Statistical language models apply probability distributions to a sequence of words. These models are finding increasing use as natural language…MORE INFORMATION
Challenges The Company, one of Australia’s largest and fastest growing Telco companies had 2 primary SharePoint environments that had different…MORE INFORMATION
Containerization allows applications to run on any machine- anytime, anywhere so long as they are compatible. By virtualizing your OS,…MORE INFORMATION
So you’ve finally decided that the cloud is safer than corporate data centers and digital assets and you’ve chosen to…MORE INFORMATION
Building a system that houses your organisation’s data can be daunting, especially now that data acquisition is growing rapidly. The…MORE INFORMATION
Human-to-machine communication has not yet been perfected, but enterprises are already beginning to integrate this groundbreaking technology into their operations,…MORE INFORMATION
Fusion Professionals has signed a partnership agreement with MapR Technologies, provider of the industry’s leading data platform for AI and…MORE INFORMATION
“Big data is at the foundation of all of the megatrends that are happening today, from social to mobile to…MORE INFORMATION
In recent years data volumes have been increasing dramatically. This has created major challenges for traditional analytics platforms in terms…MORE INFORMATION
With the increasing volumes of data that can be cost effectively stored in the cloud, comes increasing responsibility. The current…MORE INFORMATION
With the advancement of technology and abundance of data your business receives on a daily basis, companies are now in…MORE INFORMATION
Fusion Professionals held its annual Fusion Summit last Thursday the 18th of October at the Rag and Famish Hotel in…MORE INFORMATION
The Client is one of major NSW government departments providing services to public. The Department had been experiencing performance issues…MORE INFORMATION
Though its conception dates back to 1979, containers made their mark as much needed, major technology assets in 2000. Digital…MORE INFORMATION
Objective The intelligent mobile app-based lending system is a new field, blending recent technical developments in mobile phones and Artificial…MORE INFORMATION
Our Client is a well-known Australian freight logistics company, operating in railway freight and shipping. The company embarked on a…MORE INFORMATION