During the week, I often collect and share links I find interesting on Elder Research’s Slack. These are some links shared March 10–14, 2025.
“Overfitting to Theories of Overfitting”
The advice people draw from the bias-variance boogeyman is downright harmful. Models with lots of parameters can be good, even for tabular data. Boosting works, folks! Big neural nets generalize well. Don’t tell people that you need fewer parameters than data points. Don’t tell people that there is some spooky model complexity lurking around every corner.
Use a test set to select among the models that fit your training data well. It’s not that complicated.
This is part of a larger series on Ben Recht’s blog, and it sparked some good discussion.
- “Blurred contexts”
- “Thou Shalt Not Overfit”
- “Flavors of overfitting”
- “Prediction Games”
- “Holding out for an explanation”
- “Overfitting to theories of overfitting”
- “The Adaptivity Paradox”
- “In defense of typing monkeys”
- “What does test set error test?”
- “Nomological Networks”
“What’s new in the world of LLMs, for NICAR 2025”
Notes from Simon Willison’s recent review of large language models in 2024. I’m reading more and more of this recently—and trialing ChatGPT Plus (for myself) and Cursor (at work). Trying to give this stuff a fair shot, even though I’m still ambivalent about its value and effects on creativity and originality.
Check out these wild graphs showing how StackOverflow queries for things like “R” and “Pandas” have been wiped out in the ChatGPT era.
I suppose this is related to StackOverflow’s pivot, but I wonder where this all ends up. Robert Robison got to the right take, I think, which is to wonder where the equilibrium state is. Are we going to get to a place where LLMs totally replaces StackOverflow, etc.? Will we find a spot where LLMs can answer the easy questions, leaving us to talk to actual humans for the harder ones? Or (I fear), will LLMs put StackOverflow out of business and then we won’t have a centralized place for these kinds of discussions anymore?
“The State of Machine Learning Competitions, 2024 Edition”
Many, many statistics about 2024’s slate of ML competitions. I haven’t spent much time on competitions, so I was surprised at the number of competitions and the total prize sizes. A pretty healthy variety of solution types, too; it isn’t all deep learning all the time.