The Machine Learning Fairy Tale – Does it End Happily?

There are two types of people on earth; those who need to know thestory-ending spoiler, and those who will do everything to avoid it.

If you are one of the latter, you might be shocked to hear that people actually like spoilers. Indeed, there is statistical probability that people exist who want to know the ending before they watch a film or read a book. In fact, I’m one of them. To be clear: I don’t go about my day telling people what happens in Game of Thrones. But, in the context of this post, I will do exactly that. Believe me, if you know even just a little bit about the ending, you are going to love this story even more. Think of it this way: we already know it’s likely most stories will eventually have a happy ending. Most fairy tales start with “once upon a time” and end with “they lived happily ever after”. For the majority of us, the most interesting part of the whole endeavour is witnessing how specific characters overcome obstacles to reach the inevitable happy-place.

What does Machine Learning’s story ending look like? 

If you had a chance to look at my first piece, “Before You Jump into the World of Machine Learning”, some of you may have been wondering “Is there a final answer (a.k.a. prediction outcome) at the end?”. Spoiler alert Number 1: Yes, there is. Without it, the story ultimately cannot end. However, even if we uncover a prediction outcome, it may not be exactly what we were searching for, meaning there may be more stories awaiting our ML adventurers. Finding answers in the first book doesn’t always mean we’ve read the whole series and concluded the tale. We need to ask ourselves the harder question: “What is the REAL ending message in the ML World?”. I personally think it might be the same as fairy tales:

“They found their data-driven decision with minimal prediction error and lived happily ever after.”

When I was a student, I thought Machine Learning was all about prediction algorithms. In reality – this is not so true. The far more important things we should keep in mind are the implementation, application, or usage of ML predictions. Sometimes, it might be safer to call it a data-driven decision instead. I believe the ultimate purpose of using ML is to benefit mankind as much as possible, but there are a plethora of questions to be asked, processes to be created and decisions to be made before we reach that point.

As Jon Kleinberg, Computer Science Professor at Cornell University mentions in a faculty paper Human Decisions and Machine Predictions:

“Algorithms are a lens into human decision making.”

If you’re interested in learning more about how ML can be used to improve human decision-making, specifically to construct unbiased judicial decisions in a courtroom, then I highly recommend the read.

Before we go too far down Alice’s rabbit hole: I do not think machines should make all decisions for humankind. If we do that, we could end up with this:

I am more focused on what necessary conditions we as a species need to uncover to find those happy-endings. Let’s put aside the definition of the decision and its consequences (error types) for a moment and focus only on the math-free explanation. We need to get comfortable with the whole point of an ML exercise: how can we measure or evaluate whether we are going to be satisfied with its decision or not? This is not an easy job. Why? Just look at the world around us. We live in uncertainty which is full of noise, randomness and errors.

I believe statistics and data science is the study of measuring uncertainty. This study is supported by big data cloud environments provided by companies like Amazon (AWS), Google (Cloud), and Microsoft (Azure), data that can be effectively stored to increase the accuracy of measuring uncertainty. To wrap all these thoughts up, if we can quantify the uncertainty and measure the decision error rate precisely, these factors could be the key to making satisfactory decisions.

Are we there yet?

Spoiler alert Number 2: All of this comes down to a continuous circle of ‘yes’ vs. ‘no’ until we can all accept the decision. Data can be explained through imagining the steps of a journey. Each data-set has unique characteristics such as the size, existence of missingness, number of features (columns), imbalances in the target feature etc. Regardless of which characteristics the data portrays, there is no way for us to know for sure until we explore the data. Many investors and decision-makers are hesitant to know the data inside & out, but without the journey of exploring the data, we cannot reach the happy-endings.

ML algorithms do not guarantee to deliver what you need without any bias. Actually, it is often the opposite. Although the probability of this is becoming smaller every day, there are many examples of imperfectly trained ML algorithms. For example in 2015, Google mistakenly tagged black faces as an album of gorillas. ML algorithms learn how to behave by analyzing training data alongside where in life the data comes from. You can look at it as a collection of each individual’s life. Just like humans, ML algorithms can pick up our bad habits, biases and prejudices, and will naturally give it back to us with a biased prediction. This exact point is where data scientists should be involved so that we don’t misinterpret the prediction result and guide decision-makers down the wrong yellow brick road.

Going back to my original question – are we there yet? I have to say no. Even if we remove all noise, randomness, error, and even bias in our training data, in the situation when the prediction model is applied to real-world data, it will eventually meet another format of uncertainty, meaning an increased possibility of getting a prediction failure error. But, don’t be disappointed! We now know that there is always a chance of failure in absolutely everything due to uncertainty. Every journey has its ups and downs, heroes and villains, but I strongly believe that Machine Learning is a journey toward the golden egg of ensuring error rates are way below thresholds so we can all live happily ever after.

GCP as a CDP

Create your own Customer Data Platform leveraging the power of Google Cloud A Customer Data Platform (CDP) is an important utility for analysts and especially