Or should we not? After all, humans are inclined towards bias and strange beliefs, says Dr. Shidan C. Murphy, director of APAC solutions Specialists, Altair.
The key picture in this story shows an innocent formation of clouds. However, if we look closely, we can discern something that resembles the side profile of a person’s face.
Similarly, in our daily lives, when we are presented with a wide range of data to make decisions with, humans are prone to interpreting illusory patterns that seemingly exist, but only because of conscious mental manipulation. In reality, there are often no real patterns. This is likely a remnant of human instincts from prehistoric times, when our ancestors’ sensitivity to potential hazards could mean the difference between life and death. Because for them, believing and acting like a threat existed was safer than ignoring one.
Fast forward to the digital era, and even the smartest people in the world are still relying on the same ancient mental functions to make important decisions. For many reasons, they ignore statistical logic or simply do not bother to proof-check their decision methodology or data quality.
As a result, this human tendency has led people to make countless incorrect decisions. Now, in the age of data analytics, humans have an opportunity to revamp bias-prone methodologies and data models before data insights get warped by our comparatively archaic brains into misinformation and biased predictions.
Tackling Simpson’s Paradox
In the data science field, scientists are well aware of the statistical phenomenon where an association between two variables in a population emerges, disappears or reverses when the population is divided into subpopulations. This is called Simpson’s Paradox.
A famous example of this paradox is the alleged gender bias in graduate admissions at the University of California, Berkeley in 1973. Due to Simpson’s Paradox, observational data about male and female applicants misled the public into “seeing” a gender bias when the real cause was a confounding variable in how men and women were choosing their majors.
In addition to this paradox, three other common types of problems can make data misleading:
Naturally, laypeople and scientists alike encounter these phenomena, but many aren’t aware of how to deal with them. As such, incorrect insight can carry over into their professional opinions and judgement — and lead to devastating errors.
How Data Scientists Combat Bias
According to Dr. Shidan C. Murphy, Director of Data Analytics, APAC, Altair, one challenging part of building models comes from checking, validating and ensuring data quality. For example, when data contains a high degree of intercorrelation between explanatory variables, this “multicollinearity” can lead to incorrect regression analysis results.
To account for Simpson’s Paradox, multicollinearity, and the common faulty assumptions in statistical methods, data scientists often rely on well-established mathematical and data processing algorithms to detect bias. “I’ve had people come up to me and say, ‘Look, we’ve built hundreds of models, but we cannot seem to find a good statistical solution to our problem.’ However, endlessly building models is just p-value hunting —trying to find the best possible fit without evidence of model robustness,” said Murphy. “A good model is fine – but anyone can build a good model on one set of data. A high-quality model must prove accurate across other sets of data across time.” With this mindset, people trying to use data science to solve problems may actually end up with thousands of models to fit data into, to get the “non-existent” conclusions and predictions they seek.
When this type of behavioral bias leads to misleading conclusions, decisions made based on these conclusions can be disastrous to users down the line. The problem is that most data scientists have to do a lot of work to sanitize data for biases, run through extensive processes to “debias” their data models and machine learning algorithms, and also educate their employers (top management), who may not know better when asking for p-value hunting.
An Easier Way to Eliminate Bias
It’s obvious that Murphy is dedicated to educating people on the dangers of biased machine learning, predictive analytics, and statistical methodologies. After all, this is knowledge that benefits industries and consumers alike. Altair is equally supportive of minimizing bias, and it owns patented, automated bias-detection solutions that make it easier than ever for data scientists to detect and eliminate bias.
“The interactive decision trees feature – which our software [Altair® Knowledge Studio®] is famous for – lets data scientists edit, build, and enforce custom business rules and logic on top of the data model to test out various variables,” Murphy said. “It therefore allows anyone – even people who do not code – to run models and test for biases and other patterns.”
The bottom line is that advancements in data analytics today have reached the point where they can be used to overcome biases ingrained in us since the beginning of time. Data scientists ̶ and indeed all employees involved in product and service development ̶ should avail themselves to it to better meet customer expectations in this new, hyperconnected world.