From Math to Code
When I entered the software industry twenty years ago, I did it without a computer science degree or much of a mathematical education. As a result, I’ve always felt a little intimidated when I come across computer science papers or descriptions of software solutions that contain a lot of math.
That’s why I’m particularly excited to introduce today’s episode and guest chef. Today, Michael Herold joins us to dispel some of the mystique around translating mathematical formulas to software. He’ll show you how you can apply a step-by-step process to go from an algorithm definition to working, useful code. Enjoy!
Video transcript & code
Demystifying the Language of Mathematics
If you're like many people -- myself included -- you struggled with learning mathematics in school.
You vaguely remember lessons from a teacher or professor, but can't remember the specifics. And that's okay most of the time. However, at some point, you will need to solve a problem that uses mathematics.
If you realize that you'll need to use math, you might start to feel fear. This is natural!
You're about to confront a traumatic experience from your childhood. Oh, and did I mention? The mathematics that you'll need to use
is statistics. Take a deep breath! It will be okay. You're probably asking, "how do I know that you're going to need to use statistics?" The simple answer is data.
The topic of data is huge right now. Big data, data mining, machine learning ... all of these topics are grounded in statistics. However, you don't need to go that deep for statistics to be useful. At the end of the day,
statistics is about asking a question about some data. Statistics uses mathematical techniques to explain the features of your dataset.
Let's make this a little more concrete. Let's take a look at an example from doing pricing research.
It's based on a real problem I worked on recently.
Let's say we issued a survey to our customers,
asking them at what price our service would be too cheap for them to think it was quality.
After this survey, we take a look at the results
and see that people don't agree on a price. In economics terms,
they have varying degrees of price sensitivity. We decide that we would like to see the data in terms of "what percent of people think that a particular price is too cheap."
To answer this question, we're going to use a simple technique called a cumulative distribution function, or CDF. When we look up the definition of the CDF it says
that it is a function where the right-hand side is equal to the probability that a random variable is less than or equal to x.
When we read some more, we come across
the mathematical notation for the CDF.
If you're like me, it's at this point that you take a deep breath and pause.
Maybe, if it's a good day, you can work through the definition or equation and figure out exactly what it means. Or you might decide that it's not worth working on right now and move on to another task.
But this is a Ruby Tapas episode! We don't have that option, so let's figure this out.
If we were handed an overly large user story,
we'd break it up into its constituent parts to make it more manageable. Let's approach this definition in the same way. To do that, let's go back to the definition.
Let's pick out a piece of this definition and see if we can put a pin in its meaning. How about the "random variable" part?
What is a random variable? A random variable is a list of observations from a given distribution. Thinking back to our problem, what could that mean?
The responses given to us by our survey takers form our random variable! All we have to do is label that column
as "X" and we see the random variable for what it is.
When we translate that into Ruby, all we need to do is to track those observations in an array.
cheap = [49.99, 99.99, 75.00, 30.00, 45.00, 100.00]
That wasn't so bad.
What can we pick out next from the definition?
We now have a random variable, which was denoted as a capital X in the mathematical notation. Looking at the textual definition, we see something similar: a lowercase x.
What could that x mean?
Generally, when doing analytical statistics, it helps to measure your variables in terms that a human can easily understand and envision. In this case, we want to group our observations into steps on a scale.
A scale for our purposes can be any scale that captures all of your observations but is still understandable when you look at it. For example,
we create a scale that starts at $5.00 and spans all the way to $100.00 in increments of $5.00. This gives us 20 different steps to our scale, which is an easily interpretable amount for a human to understand.
Like before,
if we relabel the table, it helps us express our result in the same style and language as the definition.
Now we have our big X and our little x ... how do we represent the little x in Ruby?
Ruby gives us a nice tool for generating this scale.
cheap = [49.99, 99.99, 75.00, 30.00, 45.00, 100.00]
First, we start with the lowest value on our scale: 5.
cheap = [49.99, 99.99, 75.00, 30.00, 45.00, 100.00]
scale = 5
Then, we use the Integer#step
method, indicating we want to step by 5 up to 100. When we look at the result, we see exactly what we want: a scale from 5 to 100 in steps of 5.
cheap = [49.99, 99.99, 75.00, 30.00, 45.00, 100.00]
scale = 5.step(by: 5, to: 100) # =>
Great! We now have the two sets of variables that we need: the observations and the scale to measure against. Let's find the next piece of the puzzle.
What looks like a good next candidate? How about the word
function? We use functions all the time when we're programming. That sounds like something we can do.
cheap = [49.99, 99.99, 75.00, 30.00, 45.00, 100.00]
scale = 5.step(by: 5, to: 100)
We're going to want to apply the function to each step on the scale. That means we need something easily reusable. Let's use a method. We'll call it cdf
.
As parameters, it takes a step on the scale (our x
from the definition) and the list of observations that we'll compare against.
cheap = [49.99, 99.99, 75.00, 30.00, 45.00, 100.00]
scale = 5.step(by: 5, to: 100)
def cdf(x, observations:)
end
We'll map the scale, one element at a time, and pass it into the cdf
method. This gives us the resulting values for the CDF.
cheap = [49.99, 99.99, 75.00, 30.00, 45.00, 100.00]
scale = 5.step(by: 5, to: 100)
def cdf(x, observations:)
end
scale.map { |x| cdf(x, observations: cheap) }
Great! Now we have the framework for our problem. Let's look for what the method should do.
What's the most obvious phrase for what the feature should do? Reading through the definition leads us to pick
the "less than or equal to x" part as the most obvious next phrase. Translating that into Ruby should be easy.
Now, we know we want to find some qualifying observations out of the set of all of the observations.
cheap = [49.99, 99.99, 75.00, 30.00, 45.00, 100.00]
scale = 5.step(by: 5, to: 100)
def cdf(x, observations:)
end
scale.map { |x| cdf(x, observations: cheap) }
"Qualifying" means that the observation is less than or equal to x, where x is the step in our scale.
cheap = [49.99, 99.99, 75.00, 30.00, 45.00, 100.00]
scale = 5.step(by: 5, to: 100)
def cdf(x, observations:)
qualifying_observations = observations.count { |observation|
observation <= x
}
end
scale.map { |x| cdf(x, observations: cheap) }
Alright, so we're splitting the observations into continually overlapping subsets. That doesn't immediately feel right, so let's go back to the definition one last time.
Most of the remaining words in the definition are filler that is there to make the definition more human-readable.
The outlier is a single word: probability. This looks like the last piece of the puzzle!
How do we calculate a probability?
Calculating a probability for a sample of a random variable is an easy affair.
First, we count the number of qualifying observations.
cheap = [49.99, 99.99, 75.00, 30.00, 45.00, 100.00]
scale = 5.step(by: 5, to: 100)
def cdf(x, observations:)
qualifying_observations = observations.count { |observation|
observation <= x
}
end
scale.map { |x| cdf(x, observations: cheap) }
Then, we count the total number of observations. (You can find out more about this use of the count
method in episode #283.)
cheap = [49.99, 99.99, 75.00, 30.00, 45.00, 100.00]
scale = 5.step(by: 5, to: 100)
def cdf(x, observations:)
qualifying_observations = observations.count { |observation|
observation <= x
}
total_observations = observations.count
end
scale.map { |x| cdf(x, observations: cheap) }
Next, we divide the qualifying amount by the total amount.
cheap = [49.99, 99.99, 75.00, 30.00, 45.00, 100.00]
scale = 5.step(by: 5, to: 100)
def cdf(x, observations:)
qualifying_observations = observations.count { |observation|
observation <= x
}
total_observations = observations.count
qualifying_observations / total_observations
end
scale.map { |x| cdf(x, observations: cheap) }
To avoid the integer division that would lead almost all of these values to be zero, we convert the number of total observations to a float.
cheap = [49.99, 99.99, 75.00, 30.00, 45.00, 100.00]
scale = 5.step(by: 5, to: 100)
def cdf(x, observations:)
qualifying_observations = observations.count { |observation|
observation <= x
}
total_observations = observations.count
qualifying_observations / total_observations.to_f
end
scale.map { |x| cdf(x, observations: cheap) }
And there we have it! We have successfully converted a mathematical definition into Ruby code. While it is not the prettiest code, or the most performant,...
cheap = [49.99, 99.99, 75.00, 30.00, 45.00, 100.00]
scale = 5.step(by: 5, to: 100)
def cdf(x, observations:)
qualifying_observations = observations.count { |observation|
observation <= x } total_observations = observations.count qualifying_observations / total_observations.to_f end scale.map { |x| cdf(x, observations: cheap) } # => [0.0,
# 0.0,
# 0.0,
# 0.0,
# 0.0,
# 0.16666666666666666,
# 0.16666666666666666,
# 0.16666666666666666,
# 0.3333333333333333,
# 0.5,
# 0.5,
# 0.5,
# 0.5,
# 0.5,
# 0.6666666666666666,
# 0.6666666666666666,
# 0.6666666666666666,
# 0.6666666666666666,
# 0.6666666666666666,
# 1.0]
...we get the expected result; it's a Minimum Viable Product for calculating a CDF from a set of observations.
So, we were able to take a mathematical definition and translate it into a working program that performed the described function. How did we do this?
First, we read the problem. This is where we start when we work on a user story, so it makes a great place to start when we're working with mathematics.
Next, we broke the problem down into smaller parts. We do this when we take larger specifications and split them into user stories that we can then implement in small pieces. The same principle applies here.
Once you have the smaller parts, you implement them one at a time. This helps you focus on a small, solvable problem. When working on a software project, this often is represented by a single user story. When it comes to understanding a mathematical definition, it takes the form of, "what does a random variable mean?" Or "what should the function do?"
Fundamentally, these are the same thing, only expressed differently.
Last, you refactor or revisit your implementation and then iterate on it. By iterating through these small pieces of functionality, you make it easier to keep the problem to a manageable size in your head. This makes it easier to focus on the next step.
Hopefully, seeing this process broken down on an atypical problem shows you the power of an iterative process that is focused on small deliverables. Even when applying the process to something seemingly abstract, like understanding mathematical definitions, you can use it to make progress.
Focusing on small deliverables can make even the hardest of tasks easier to tackle.
That's all I have for you today. Happy hacking!
Responses