Recently, I went to an executive event in London. I’ve been to several of these over the years. This one was designed to match managers who have a challenge and who hold a budget with vendors who make a compatible solution. The thing I particularly like about these events is hearing what customers have to say. It is important to step back and hear what challenges customers are facing so that we know if what were selling is well aligned with the current trends, and what we’re offering will help solve customer problems.

In terms of customer challenges, there were two prevailing themes from this recent event, with Digital Transformation being a distant third.

  1. How do we ensure we are meeting the requirements of GDPR?
  2. How do we get value out of our data?

The first challenge, that of ensuring compliance with GDPR, is in one way a subset of the second challenge in that you need to understand your data before you can decide if you are compliant or not. For example, finding personally identifiable information (PII) amongst all the data you have available is only possible if you have a way to really understand your data.

The second challenge is one that we at Gemini Data are very used to hearing from our customers, and yet it is surprising since you could argue that our industry has already solved the difficult part of this challenge: the one of how we collect vast amounts of disparate and often unstructured data and make it available for search. The IT industry answered that challenge with many open source and enterprise software products. The market-leader arguably being Splunk. Customers finally had the ability to search data from any device or application in their data centre.

The term Data Lake appeared circa 2011 to describe the notion of having a single store of all data in the enterprise which is then accessible to users.

The idea of a Data Lake was to commoditise data within an organisation. Data belongs to the organisation so everyone in the organisation should be able to access it, right? Any user can ask any questions about Security, Service, Ops, Business Insights etc. and the Data Lake will have the answers.

The reality, sadly, is quite different. The clue is in the description of a Data Lake above. Accessibility.

Whilst the average user can search for basic terms in the Data Lake, it turns out that you have to be quite technical and well trained in the Data Lake's proprietary search language to do anything beyond very basic word searches. To really understand what is going on in your data, you often have to write complex searches that join different data sources together and then build visual elements so that you can easily access those searches.

So most organisations take the approach of training a very small number of staff to understand the complexities of the Data Lake in detail and then translate business requirements into visual dashboards, often underpinned by complex searches. You can see where this is heading. These small number of Data Lake Ninjas become a bottleneck for the wider user base in an organisation. I used to work at a reseller of these products and I can tell you that driving adoption within a customer was hard work because of this problem.

Hence the analogy of a frozen lake. All the data is there underneath but no one can access it.

What is required is a layer of intelligence above the Data Lake that removes the complexities associated with data wrangling and at the same time can display data in a way that is accessible to the average user in an organisation.

To do this you need to rethink that way to present data to the user.

First, you need to understand how all the different entities in any data sources might be connected and find a plain English way to describe their relationships. At Gemini we call this our Semantic Ontology.

Next you need to use Artificial Intelligence to do some of the leg work and fill in the gaps. At Gemini, we use Machine Reasoning to define relationships that are implicit but are not defined in the data.

Finally need to display data in a way that it easy for people to consume. The human brain is an associative or relational database. By this I mean that we can understand and remember a great many things as long as we can create relationships in our minds about them. So doesn't it make sense that the software we use all day long should present data in a similar way? It does to us at Gemini.

Gemini Enterprise is constantly analysing the data in your Data Lake so that when a user wants to access the Data Lake to find an answer, we already have the answer beautifully displayed in a way that makes sense to them. If they drill down, they get plain English definitions of entities and relationships.

We call this Continuous Data Analytics. Come check it out...