Chris Stucchio (original) (raw)
Selected blog posts
Posts listed on the homepage are the more popular or interesting ones. Click here for the chronological blog listing.
Boosting as a scheme for transfer learning
Here's a scenario that I believe to be common. I've got a dataset I've been collecting over time, with features \(x_1, \ldots, x_m\) This dataset will generally represent decisions I want to make at a certain time. This data is not a timeseries, it's just data I happen to have …
Calibrating a classifier when the base rate changes
In a previous job, I built a machine learning system to detect financial fraud. Fraud was a big problem at the time - for simplicity of having nice round numbers, suppose 10% of attempted transactions were fraudulent. My machine learning system worked great - as a further set of made-up round numbers …
Shareholder Short-Termism Theory has Died of COVID-19
It's become a popular meme that "shareholders only care about the next quarter". Lots of people make arguments like this - for example, Jamie Dimon and Warren Buffet. As the meme goes, shareholders only care about the next quarter of earnings, and CEOs make decisions accordingly - sacrificing long term profitability to …
Isotonic: A Python package for doing fancier versions of isotonic regression
Frequently in data science, we have a relationship between X and y where (probabilistically) y increases as X does. The relationship is often not linear, but rather reflects something more complex. Here's an example of a relationship like this:

In this plot of synthetic we have a non-linear but increasing …
The Final Stage of Grief (about bad data) is Acceptance
AI Ethics, Impossibility Theorems and Tradeoffs
Bayesian Linear Regression (in PyMC) - a different way to think about regression

Consider a data set, a sequence of point \(@ (x_1, y_1), (x_2, y_2), \ldots, (x_k, y_k)\)@. We are interested in discovering the relationship between x and y. Linear regression, at it's simplest, assumes a relationship between x and y of the form \(@ y = \alpha x + \beta + e\)@. Here, the variable $@ e …
Bayesian A/B Testing - my talk at Gilt
I gave a talk on friday about Bayesian A/B testing at Gilt's engineering seminar. You can view them here.
Wingify releases Bayesian A/B tester
I've written a number of posts here about a/b testing, and readers have probably observed that I favor the Bayesian approach. I'm very happy to announce that Wingify (my employer) has release SmartStats - a fully Bayesian A/B testing engine. I've always maintained that you should A/B test …
Don't use Hadoop - your data isn't that big

"So, how much experience do you have with Big Data and Hadoop?" they asked me. I told them that I use Hadoop all the time, but rarely for jobs larger than a few TB. I'm basically a big data neophite - I know the concepts, I've written code, but never at …
A High Frequency Trader's Apology, Pt 1
I'm a former high frequency trader. And following the tradition of G.H. Hardy, I feel the need to make an apology for my former profession. Not an apology in the sense of a request for forgiveness of wrongs performed, but merely an intellectual justification of a field which is …
Popular Topics
ai ethics
- AI Ethics, Impossibility Theorems and Tradeoffs
- Low Rate Loans for Ladies, Stags Pay Extra - The Role of Ethics in AI/ML
- Alien Intelligences and discriminatory algorithms
- The Mathematics of Paul Graham's Bias Test
bandit algorithms
- The Adversarial Bandit is not a Statistics Problem
- Bayesian Bandits - optimizing click throughs with statistics
- Why Multi-armed Bandit algorithms are superior to A/B testing
conversion rate optimization
- Measuring Bernoulli Probabilities in the Presence of Delayed Reactions
- Has your conversion rate changed? An introduction to Bayesian timeseries analysis with Python.
- Attribution Theory is Misguided
- Bandit Algorithm and A/B Testing Tutorial
- How to measure a changing conversion rate (with python code)
- Analyzing conversion rates with Bayes Rule (Bayesian statistics tutorial)