Skip to main content

Posts

Showing posts with the label Data Analytics

Solving Customer Churn with a hammer!

Learning when data should take a back seat and give way to domain knowledge is a valuable skill. Suppose you built a machine learning model on the data of your customers to predict churn risk. Now that you have a risk score for each customer, what do you do next? Do you filter the top n% based on the risk and send them a coupon with a discount in the hopes that it will prevent churn? But what if price is not the factor driving churn in many of these customers? Customers might have been treated poorly by customer service, which drove them away from your company's product.  Or there might have been an indirect competitor's product or service that removes the need for your company's product altogether (this happened to companies like Blockbuster and Kodak in the past!) There could be a myriad of factors, but you get the point! Dashboards and models cannot guide any company's strategic actions directly. If companies try to use them without additional context, more often tha...

Can you defeat Monty Hall to win a Batmobile?

  You slipped after accidentally stepping on a banana peel and somehow fell into another dimension where people are in game shows all the time. As you dust yourselves off and stand up, you realize you are in the 1960s version of the game show “Let’s Make a Deal.” The host of this show, the late Monty Hall, looks at you suspiciously at first but later presents three doors in front of you and asks you to choose one. You don’t trust strangers, so you demand to know what’s happening before you make your next move. Monty Hall patiently explains that there’s a brand new Batmobile behind one of the doors (yes, Batman is real in this dimension), and goats behind the other doors. You could own the Batmobile if you correctly guess the door behind which it was hidden. You pull your Batsuit out of your pocket to don the mask of the world’s greatest detective (as per DC Comics) and analyze the three doors with a careful gaze. You look meticulously for any minuscule details that might give away ...

What is a common challenge faced by engineers, thieves, and business decision-makers?

There was a time when the only source of transportation for humans was on foot. But the invention of the wheel changed the game, as it allowed them to cover longer distances than before without expending as much energy. Further inventions such as steam locomotives and airplanes have only expanded this distance limit. And now, with the advent of rockets, spacecraft, and rovers, we have begun to explore other planets in our solar system that are millions of kilometers away! On a higher level, we can understand that our world-class engineers are in the business of expanding our horizons. However, on a much deeper level, they are indeed grappling with the challenges like improving the conversion efficiency of fuel energy into mechanical energy, refining engine designs, and developing better materials.  For example, consider the example of steam engines. When these engines were invented, they marveled the world with their ability to perform mechanical work using water. However, this pro...

Stemming vs Lemmatization

  Text is one of the messiest forms of data you would ever work with. And there is always some amount of redundancy in the data because a word in the text could take many forms due to: 1. Different spellings, e.g., color and colour. 2. Contractions, e.g., "can't" is a contraction for "cannot." 3. Inflection forms, i.e., changed forms of the words to indicate distinctions such as tense, number, person, etc. e.g., "walked" is the past tense form of "walk". Fixing the redundancies due to different spellings and contractions is a bit complicated, as it might require some manual intervention.  However, inflection forms are commonly handled through two methods: stemming and lemmatization. Stemming is the fastest way to deal with inflected words. It simply chops off the endings of the words in hopes of reaching the word's root form. It works well when converting words like "walking" or "walked" to "walk".  But, ask ...

A Poisson pondering

It might seem counterintuitive, like, how could a sample size be infinite? But let us remember the classic application of the Poisson distribution — modeling the probability of a given number of events occurring in a fixed time interval. To better understand it with an example, consider the probability of a given number of buses arriving at a station in an hour. Now, to look at it from a binomial distribution perspective, we could divide this one hour into sixty Bernoulli trials. Each trial models the probability that a single bus will arrive in that particular minute. The problem with this approach is that it simply restricts the number of buses that could arrive at any given minute to one. In the real world, it is not a completely unimaginable scenario that multiple buses could arrive back-to-back within a minute. We could increase the granularity by opting for 3600 Bernoulli trials instead of 60. It would mean we moved away from looking at minute intervals to seconds, and our proble...