When Good Data Analytics Is Bad (And How To Stop It)

Analytics is good. Deeper still, AI-enriched data analytics with the ability to perform analytical sweeps across massive datasets is also good. One step further, AI-enriched data analytics with new strains of generative AI (gen-AI) making use of Large Language Model (LLM) information resources that can interact with queries posed via Natural Language Understanding (NLU) can be even better at democratizing access to analytics.

But with great AI-empowered data analytics platform power, comes great responsibility.

Good data analytics

Although we might argue that every form of data analytics is basically good, the way we use the insights resulting from our information analytics actions can be both good and bad. This proposition, explanation and clarification comes from Alan Jacobson in his role as chief data & analytics officer at Alteryx.

“Imagine an employee attrition analysis exercise where an organization ingests data values and points related to all individuals inside the company,” said Jacobson. “If we find that (for example) the highest levels of employee attrition are among people who regularly record working a 60 to 70 hour week, that might not be too much of a surprise – but it can inform managers to watch for people burning the candle at both ends. No problem so far, this is good AI-driven data analytics.”

But we need to think more broadly here, Jacobson extends the example and says that, what if the company also has information detailing whether any given employee is married or not. Clearly, this type of information is only available in some countries and (in the case of the USA, in some states), but where local jurisdictions allow, this information may be recorded.

“In this scenario, if we also correlate employee attrition rates with marital status and find that people who are married are also more likely to quit their jobs – and that statistic might be allied to number of hours worked or not – then we can do two things about it,” explained Jacobson. “On the one hand, the organization can start to spend more time caring about employee wellbeing and making sure that individuals’ office responsibilities do not impinge on their work-life-balance – that would be a good thing. Conversely, we could simply instruct the HR function to generally only hire single people – and that would quite definitely be a bad use of data analytics due (obviously) to the fact that infringes personal freedoms.”

Given this conundrum then, how should we ensure that we only enable, allow and perform ‘good’ AI-driven data analytics?

A large portion of this comes down to education and building diverse teams that will problem solve together. With easy-to-use tools that allow us to drag and drop data and perform analytics, less time is spent today learning syntax and programming skills, but more time is needed to formulate the right questions to ask. In this case, it means we can not only formulate a question the data can answer, but think about what we would do with the answer and ask questions around if those outcomes would be beneficial or not.

Chris Royles is field CTO for EMEA region at hybrid data management platform company Cloudera. Reminding us that the emergence of generative AI has raised awareness and fuelled conversations about the technology and its potential business benefits, Royles says that if AI-driven analytics is to scale, then ensuring trust in data through strong governance and interpret-ability is essential.

Trust in AI-driven analytics

“With business data residing across a mix of on-premises and public cloud environments, organizations must ensure they adhere to every country’s different data collection and governance laws. By building a corpus of trusted information, organizations can use it to enhance employee and customer interactions. To build this trust further, AI services need to be current, aware of the latest data and be able to respond to changes in real time,” he said.

Again bringing the conversation around to knowledge and skills, the Cloudera team advocate comprehensive employee training and the fostering of a data-driven culture. This way, they say, organizations can successfully implement AI-driven data analytics projects that are robust enough not to stray into the bad zone.

“Organizations must be clear on what they want the outcome to be and how AI and analytics will help. Running AI models can cost a lot, so it is important to identify high-value use cases with a clear return on investment. Once this has been established, organizations must ensure data quality and accessibility. Taking a ‘data as product’ approach can help build reliable data sources and pipelines, so they remain current and are of suitable quality and relevancy to the AI service and its users. Promoting and fostering collaboration between teams is important too,” surmised Royles.

Known for its work with the ‘data mesh’ approach (a technology paradigm that advocates the use of decentralized data architectures where information is aligned and stored specific to particular business domains, use cases or other) Cloudera obviously wants to emphasize the additional data control mechanisms here as a means of helping to build AI on safer foundations. The company suggests that it can help scale how an organization’s people engage with data either as product owners or consumers. This in turn is said to help ensure that AI models and analytics align with business needs.

In his position as VP of automation at IFS, Bob De Caux has plenty to share on this subject. He suggests that the effect of ‘bad’ AI is typically perceived when it has a detrimental impact on human beings, whether it’s a welfare system incorrectly determining that an individual is no longer eligible, or a transport firm’s system deciding that the least productive driver is the ideal model because they caused the least maintenance cost on their vehicle.

“Provided people are involved in making decisions that affect people, there is likely not going to be significant change noticed,” said De Caux. “Issues with uncaring employers will persist and will continue to be the responsibility of human beings, not AI. We know that AI makes us more efficient, but it cannot replace the human touch. This applies even if an AI solution is not near a human interaction. If costs rise against profit, you can be sure people will notice – even if they’re informed of that change by another information system.”

From his perspective, De Caux explains that the risk is that AI could be left unsupervised – that it might learn from bad data, not account for drift and be used to target incomplete or simply irrelevant metrics. “That’s really at the start of AI integration,” he said. “It will always be necessary – with traditional information systems and with AI – to carefully consider what you want out of the system, what you’re going to feed into the system and what success looks like. Anyone expecting a panacea, or expecting to simply throw AI at a problem and get good results, is likely to be disappointed.”

Raw AI, possibly bloody

Having worked to establish a strong proportion of both traditional (i.e. before gen-AI) and current-era automation solutions into its platform, the IFS team think that ‘raw AI’ solutions that simply consume data and produce predictions will need some development work to provide clean data and to verify that the predictions are reasonable. Providers of those solutions will expect their customers to be aware of drift and know how to deal with that. Again, failing to do so will have some effect that is noticed outside that AI integration.“Conversely,” says De Caux.

“Turnkey AI solutions will have been built with experience in a specific industry. Provided the data is standardized in that industry, the solutions may be able to perform much of the data cleansing and produce reasonable predictions from the outset – but only for very specific use cases. All examples of ‘bad AI stress the need for a robust framework for accountability, fairness and responsibility,” added De Caux.

Michael Queenan is co-founder and CEO of Nephos Technologies, a specialist data services integrator. Suggesting that if we look at AI using analytics to make decisions at their most binary level, Queenan says that it becomes a purely black-and-white affair – and, consequently, there is no consideration of what is good or fair.

“For example, if you look at using AI to make decisions on data points around how to reduce the UK’s NHS medical service waiting times, an answer could be to refuse to treat anyone who is over the age of 55, obese and a smoker, as this group has a very high cost of care. This would be using analytics to solve a problem but wouldn’t be ‘good’ or fair,” he said.

Looking further ahead then, the Nephos leader notes that whilst companies are just starting to adopt fairly benign LLM AI tools today, generative AI, which is designed to be as intelligent as humans, is waiting in the wings. Concerns about AI-driven data analytics could be dwarfed by those surrounding gen-AI, which could surpass human intelligence, become self-aware (according to some) and even pose a threat to our very existence.

Open transparency & accountability

“Researchers and policymakers are actively working on strategies to ensure gen-AI systems are designed to prioritize human values,” said Queenan. “Added to this interdisciplinary research and open dialogue among AI practitioners, policymakers and ethicists will foster the exchange of ideas and best practices to ensure gen-AI systems are designed with safety and robustness in mind. Further here, we can say that open source approaches and collaborations between academia, industry and civil society will also be crucial to ensure organizations enhance transparency and accountability.”

The consensus among industry practitioners (and even among the occasionally more vociferously charged professional advocates, futurists and evangelists) is that AI has the potential to bring enormous benefits to humanity and with responsible stewardship, we can ensure it remains a force for good while minimising the risks it poses.

We can certainly surmise that AI systems, if left unchecked, tend to amplify and reinforce our own biases, which they’ve typically ‘learned’ from our poor-quality data. Ultimately, as it is in dogs, it is in AI – there are no bad AIs, only bad owners.

Read the full article here