The insurance industry has been scrutinized for years due to “fair bias” practices. Indeed, bad data in business practices and bias are known insurance associates. Unfortunately, the result is marginalized populations.
Some industry experts—including a former insurance commissioner in the US—believe that discrimination will become the largest AI regulation issue. This is because customer data can easily reveal too much adverse data, allowing insurance companies to pick only the most desirable risks.
What is bad data for insurance businesses?
When building models, training data matters – a lot. Consider the example of body mass index (BMI) in life insurance. This example shows how a lack of diverse, representative and high-quality insurance data led to 80 years of an “ideal risk” that the American Medical Association eventually decried as inherently biased.
In this case, BMI data was based on a predominantly white male height and weight data set. Recent research proves that BMI does not account for things like bone density and muscle mass, so it is an inaccurate risk assessment measure for many people.
As the BMI example shows, a lack of data can create availability bias (an overreliance on data that’s easily accessed) – which leads to bad outcomes. And because data is the fuel for artificial intelligence, it follows that feeding bad data into AI systems will lead to poor results.
What are algorithms and why do they matter?
An AI algorithm is a list of step-by-step instructions designed to accomplish a specific task or solve a specific problem. Synthetic data generation (the creation of synthetic data) employs AI algorithms, such as machine learning algorithms and neural networks.
Bias: A 4-letter word
Historically, insurers have used zip codes or territory codes to calculate insurance premiums. But seemingly innocent variables like these can be proxies for sensitive data – such as race, gender or religion. Such variables can, in turn, hide bias.
Consider a Propublica story from 2017 in Chicago. The story focused on disparities in auto insurance premiums where zip codes were used as a primary data point for setting rates. Later research proved that those living in minority zip code areas paid higher premiums – holding constant factors such as age, coverage, gender and loss history.
In the most egregious example, the difference in premium when changing zip code was more than 300% higher in neighborhoods that were more than 50% minority. And it was higher in every one of the 34 companies quoted.
If biases like this are not assessed and mitigated, vulnerable populations will be further marginalized. AI will only exacerbate the inequities.
Where generative AI comes into play
Most business cases of generative AI (GenAI) feature large language model (LLM) capabilities. But another type of GenAI – synthetic data – is especially useful for addressing data concerns like privacy and fairness. Synthetic data offers modelers the advantage of not relying on data masking to protect sensitive personal data. Consider what these organizations are saying:
Too good to be true? Not at all.
A real-world example of synthetic data results
In 2022, SAS, in collaboration with Syntho and the Dutch AI Coalition, demonstrated that synthetic data produced more reliable results than anonymized data while maintaining the deep statistical patterns required for more advanced analysis.
Such advances, coupled with growing concerns about protecting privacy, are why IDC predicts that by 2027, 40% of AI algorithms insurers use throughout the policyholder value chain will use synthetic data to guarantee fairness within the system and comply with regulations.
Synthetic data for insurance: holy grail or AI snake oil?
Synthetic data, in and of itself, will not heal all wounds. Remember, you still need the original data to create the synthetic data. Because of that, perpetuated biases in the original data can still prevail.
Any dialogue on the safe consumption of AI, including GenAI, must acknowledge several truths:
- Bias creates inequities.
- All models possess bias.
- Bias can be mitigated, but not eliminated.
To position themselves as leaders in this space, organizations need to develop their own trustworthy AI principles. They should also:
- Foster a culture of data literacy and the use of data-driven decisions.
- Empower employees to call out unintended AI risks.
- Embrace a code of data ethics as an integral part of their enterprise.
Recently, SAS hosted a synthetic data insurance project with a large insurer experimenting with synthetic data and credit scoring. Results of the experiment were encouraging. The ensuing discussion also highlighted certain ugly truths about the use of credit and other factors that affect premium rating. For example:
-
-
- Multiple studies have confirmed minorities and female drivers pay more for auto insurance.
- Driving history can be influenced by police bias.
- Tracking driving behavior through smart devices can be skewed based on road conditions that vary among neighborhoods.
-
What’s next for synthetic data in insurance?
There are many ways for insurers to use GenAI.
Insurers can use generative AI models to create scenarios, then proactively identify risks and predict outcomes. GenAI can inform decisions about pricing and coverage. It can also automate claims processing to help lower costs and enhance customer experiences (and satisfaction). It can also be used to improve fraud detection and can make targeted risk prevention recommendations to customers that reduce the likelihood of claims.
Synthetic data holds the key to breaking the cycle of bias perpetuated in the insurance industry.
Rather than focusing on potential negative aspects of AI, the collective insurance community should ask the right questions and place a discreet focus on the quality of the data being used to generate their synthetic data. As a result, we can protect privacy and significantly reduce bias – all while unlocking the tremendous value of generative AI.