Blog
Welcome to my web pages! The tabs at the top lead to data science projects, papers on statistical issues, or info about me. Here you'll find an occasional post on a topic that caught my interest...

Forecasting Wind Power
April 30, 2017
Data collected on the power output of an array of wind turbines provide an interesting case study for machine learning algorithms. I tried a random forest from scikitlearn, the gradient boosting algorithm XGBoost, and a generalized linear model from the R package GAMLSS. Choosing the best model can be surprisingly tricky... 
Taking the Confusion out of the Confusion Matrix
July 26, 2016
How Bayes' theorem helps organize the bewildering array of performance metrics that can be estimated from a classifier's confusion matrix. 
Predicting Flight Delays with a Random Forest
July 6, 2016
Following up on a workshop on random forests organized by the NYC meetup group Women in Machine Learning and Data Science. 
The Eternal Sunshine of Causal Thinking
May 9, 2016
A review of Samantha Kleinberg's latest book, "WHY: A Guide to Finding and Using Causes".