Chris Stucchio (original) (raw)

Selected blog posts

Posts listed on the homepage are the more popular or interesting ones. Click here for the chronological blog listing.

Boosting as a scheme for transfer learning

Here's a scenario that I believe to be common. I've got a dataset I've been collecting over time, with features \(x_1, \ldots, x_m\) This dataset will generally represent decisions I want to make at a certain time. This data is not a timeseries, it's just data I happen to have …

more ...


Calibrating a classifier when the base rate changes

In a previous job, I built a machine learning system to detect financial fraud. Fraud was a big problem at the time - for simplicity of having nice round numbers, suppose 10% of attempted transactions were fraudulent. My machine learning system worked great - as a further set of made-up round numbers …

more ...


Shareholder Short-Termism Theory has Died of COVID-19

It's become a popular meme that "shareholders only care about the next quarter". Lots of people make arguments like this - for example, Jamie Dimon and Warren Buffet. As the meme goes, shareholders only care about the next quarter of earnings, and CEOs make decisions accordingly - sacrificing long term profitability to …

more ...


Isotonic: A Python package for doing fancier versions of isotonic regression

Frequently in data science, we have a relationship between X and y where (probabilistically) y increases as X does. The relationship is often not linear, but rather reflects something more complex. Here's an example of a relationship like this:

In this plot of synthetic we have a non-linear but increasing …

more ...


The Final Stage of Grief (about bad data) is Acceptance


AI Ethics, Impossibility Theorems and Tradeoffs


Bayesian Linear Regression (in PyMC) - a different way to think about regression

simple regression

Consider a data set, a sequence of point \(@ (x_1, y_1), (x_2, y_2), \ldots, (x_k, y_k)\)@. We are interested in discovering the relationship between x and y. Linear regression, at it's simplest, assumes a relationship between x and y of the form \(@ y = \alpha x + \beta + e\)@. Here, the variable $@ e …

more ...


Bayesian A/B Testing - my talk at Gilt

I gave a talk on friday about Bayesian A/B testing at Gilt's engineering seminar. You can view them here.

more ...


Wingify releases Bayesian A/B tester

I've written a number of posts here about a/b testing, and readers have probably observed that I favor the Bayesian approach. I'm very happy to announce that Wingify (my employer) has release SmartStats - a fully Bayesian A/B testing engine. I've always maintained that you should A/B test …

more ...


Don't use Hadoop - your data isn't that big

image possibly inspired by this post

"So, how much experience do you have with Big Data and Hadoop?" they asked me. I told them that I use Hadoop all the time, but rarely for jobs larger than a few TB. I'm basically a big data neophite - I know the concepts, I've written code, but never at …

more ...


A High Frequency Trader's Apology, Pt 1

I'm a former high frequency trader. And following the tradition of G.H. Hardy, I feel the need to make an apology for my former profession. Not an apology in the sense of a request for forgiveness of wrongs performed, but merely an intellectual justification of a field which is …

more ...


Read more

ai ethics

bandit algorithms

conversion rate optimization

high frequency trading