As a data analytics company based in fast-paced London, we witness an ever-changing landscape in which every month brings a seemingly new trend.
So based on our experience, we’ve curated what we believe are four of the most significant, already established, trends in data analytics infrastructure.
Taking back control: Metadata management
As data lakes continue to evolve and grow, the looming threat of these lakes becoming swamps starts showing. At TrueCue, we’re seeing increased interest in metadata management, as organisations move further in their journey to derive value from data.
Metadata management is moving away from basic data catalogues to give way to more sophisticated solutions, including data lineage functionality. This is becoming key as an increasing number of analysts are becoming wary of the data they extract; they want to know where it comes from and which transformations it has undergone.
We’ve also observed organisations refresh their views on data warehousing, after having experienced the challenges associated with data lakes first-hand. This has sparked interest in our flagship product, the TrueCue Platform, a PaaS offering which automates data warehouse creation and management.
Regardless of whether analysts are implementing simple reports or multi-layer neural networks, one thing is for certain: they’ll need to find data and they’ll need to trust it. Even though metadata management and data warehousing aren’t new technologies, we expect them to experience continued growth as data management, particularly data governance, moves to the forefront of organisations.
Moving fast: Real-time processing
The era of data being refreshed daily or weekly isn’t over. In many cases, a higher load frequency isn’t required or cost-effective. Batch loads present many benefits, being simpler to manage and verify. However, for some use cases, batch loading is just too slow.
Real-time analytics is growing in importance not because it caters to impatient data analytics practitioners, but because it enables use cases that weren’t previously achievable using out-of-the-box software. For example, using real-time analytics it’s possible to monitor online shopping transactions and detect anomalies before they translate into hundreds of missed sales.
As companies move forward to embrace the benefits of both real-time and batch processing, data analytics infrastructure is evolving to support this. A prime example of this evolution is the Lambda architecture, which sets out a basic pattern for organisations to work with both real-time and batch data.
Connecting the dots: Graph databases
These databases are gaining traction due to their ability to make sense of data with millions of interconnected points. The design principle behind them is simple. Rather than tables and columns, graph databases are based on vertices and edges. A vertex is a data point, which represents an entity of a certain type, such as a person. An edge connects vertices, representing a relationship. For example, edges can be used to convey that a person knows another person.
It’s possible to store this information in a traditional, relational database engine. However, traditional solutions will very quickly degrade in performance as more points are connected. Graph databases scale much more efficiently than relational ones, with almost no performance loss as data volumes grow.
Graph databases are useful to deliver data analytics solutions which explore relationships between people, whether it’s in a virtual social network or in a professional setting. An example of such solution is TrueCue’s Organisational Network Analytics, shown in the screenshot below.
Just for the record, Organisational Network Analytics (ONA) is an approach which helps organisations to identify the way knowledge flows between employees and departments, identifying risks and showcasing opportunities.
Bridging the gap: Semi-structured and structured data working together
More businesses are acknowledging the benefits of both semi-structured and structured data. In turn, they’re opting for an infrastructure that can support both. Semi-structured data, epitomised by the JSON format, is particularly useful in transactional settings, as it’s less restrictive and allows IT to react quicker to business-driven changes.
However, semi-structured data has always posed a challenge for data analytics. In most cases, it would be converted to structured data via awkward ETLs and only then combined with other data sources. Nowadays, solutions such as Snowflake emphasise their ability to work on-demand with both types of data and combine them as needed, without the need for ETLs. This is achieved through a combination of non-standard syntax and semi-structured data types. Microsoft hasn’t been left behind, delivering functionality such as OPENJSON, which allows analysts to query semi-structured data from a SQL-based environment.
We expect to see more architectures deriving value from these capabilities, as a new generation of data analysts becomes equally comfortable with SQL and NoSQL databases.
If you’d like to discuss any aspect of your data analytics strategy with us, we’d love to hear from you. Contact us today.
Ramiro is an experienced BI professional, who developed his expertise at IBM and MicroStrategy before joining TrueCue. Having designed data warehouses for several multinational companies Ramiro is currently deployed as a Technical Architect in an ambitious, cross-organisational Public Sector engagement. Equipped with an M.Sc. in Information Systems (Lund) and an MBA (London Business School), Ramiro leverages his interdisciplinary background to provide solutions that maximise business value without compromising on technical aspects.