A Short History of AutoML: Part One

An inside perspective.

Jun 25, 2025

1978

It’s 1978. You work for a big company, and you want to predict something with data. Your job title is “Statistician;” that’s what you studied in grad school.

You will use a stat package to build and fit your model. Data Processing likes SAS because IBM pushes it. Everyone in DP is ex-IBM; if IBM sold cigarettes, they’d all be chain smokers. If you need something not available in SAS, you can use a Fortran executable built by one of your colleagues or write your own.

The company invested in dumb terminals for you and your team, so you don’t have to type punch cards, like you did in college. Still, every experiment is an overnight batch job. DP won’t let you run jobs during the day because they think you will crash the General Ledger. They’re very strict about that. If you forget and submit something during the day, the operator will kill it.

You measure project timelines in years. The credit risk project you’re currently working on will go live in 1980, if everything goes according to plan. That only allows a couple of weeks for model fitting; deployment into the order entry and collection systems will take a year.

“Automated predictive modeling” isn’t a thing. You have relatively few choices to make while building your model. You select a statistical method based on the nature of the problem and the data.

Choosing variables, or features, is one of your few decisions. Data is scarce, so you rarely have more than two dozen features from which to choose. Still, you can’t simply dump all of your data into a multiple regression procedure and hope for the best. You learned model-building methodology in grad school, but doing things the “right” way takes time.

Stepwise regression seems attractive. The technique, invented in 1960, uses statistical criteria to iteratively add or remove variables from a model. It’s automatic. BMDP, SAS, and SPSS added it to their regression tools in their earliest releases.

That sounds great! Now you can put all your predictors into a model, submit the batch job, drive home in your Oldsmobile Cutlass (with Turbo-Hydramatic automatic transmission), and watch Laverne & Shirley with the kids. In the morning, you drive to the office and wait for your printouts to arrive in the interoffice mail.

At a conference, you learn that stepwise regression overfits the models, leading to inflated coefficients, biased standard errors, invalid p-values, unstable models, and irreproducibility. The technique accelerates the production of shitty models.

Well, that’s a buzzkill.

You learned about interaction effects in grad school. In a linear model, interaction effects capture the combined effect of two main effect variables, and they can markedly improve a model’s prediction accuracy. You want to add interactions to your models, but with the tools available in your mainframe stat packages, you must write code to define and test every possible interaction. A dataset with 20 potential main effects has 190 possible two-way interaction effects. That’s a lot of coding.

One of your colleagues shows you some Fortran executables for AID and THAID. AID stands for Automated Interaction Detection. Two guys at UMich published a paper about it fifteen years ago. You specify a “target” variable in your dataset; AID iteratively splits the dataset into successively smaller segments, where each split maximizes the difference in the target variable.

AID works with continuous targets. THAID is for categorical targets.

You try AID and it’s amazing. The output is rudimentary, but it clearly shows the most powerful interaction effects. You don’t have to code and test every possible interaction. That’s a big time saver.

Even better, when you scribe the results on a chalkboard, you get a diagram that looks like a tree. You ask one of the artists in Graphic Design to make a polished chart, which you show to your internal clients in Advertising. They go nuts.

Marketing people love that graphic shit.

1993

It’s 1993. You see a lot of changes at your company since 1978. For starters, your job title now is “Analyst,” instead of “Statistician.”

Data Processing is now “Information Technology.” Same IBM sycophants, different department name. Your team has a dedicated DEC VAX server. That was a struggle. IT wanted to stick with IBM and expand the mainframe, but DEC quoted a 4000 machine for a fraction of the cost. Your boss told IT to shut up and buy it. You could see them cringe when DEC rolled the new box into their previously all-IBM data center.

The dumb terminals are gone, and your team has new Dell OptiPlex machines. Those babies have eight megs of RAM, 200 megs of hard drive, a 14-inch monitor, and Windows NT.

You just upgraded to SAS Release 6 on the VAX. SAS does a good job supporting all the different hardware vendors. They’ve pulled away from SPSS, who seem mostly focused on PC users. Same for the new players, such as Statgraphics, Statsoft, and Stata. Lots of graphics, so people who don’t know what they are doing can play with statistics.

At last, SAS supports generalized linear models. You only had to wait 20 years since Nelder and Wedderburn published their paper. Generalized linear models do not automate the modeling process; instead, they unify various model types under a single framework. The link function and error distribution are hyperparameters that the analyst can manipulate.

Unfortunately, there’s no built-in function to optimize all those hyperparameters. You can build a grid search with SAS Macros if you’re so inclined, and that won’t take forever to run on the VAX. At least you can run it during the day without crashing the General Ledger.

You’ve heard about artificial neural networks. NeuralWare, a startup based near Pittsburgh, sends you an invitation to a training class. Your boss approves, and you attend. It’s a well-organized introduction, complete with some hands-on work.

You return to the office with a trial version of the software on a floppy disk. You install it on your Optiplex and try it with some small datasets.

The software is easy to use. Artificial neural networks offer an open and flexible framework for modeling. But there are a million choices to make, and NeuralWare has no optimizer or built-in rules of thumb to guide the user. What transfer functions should I use? Do I need hidden layers? How many, and how big?

You spend a few days messing around with your datasets. Each experiment takes a long time to run, and no matter what you tweak, the results aren’t as good as what you get with logistic regression.

You thank the nice people at NeuralWare and return the trial software.

Your team has more success with Classification and Regression Trees (CART). You bought Breiman’s book at a conference a few years ago. You ran AID, THAID, and CHAID on the mainframe, so you’re familiar with the concept. CART synthesizes the best of those algorithms with new features, such as multivariate splits and automated pruning. Salford Systems, a small startup, licenses CART software that runs on the VAX.

Internal clients like the results you get with CART. They understand a decision tree. You can easily implement CART’s output as SQL queries in the new Oracle data warehouse. For run-of-the-mill response models, you can give Marketing a solution in a few days.

IT gets very huffy when you ask them to implement a logistic regression model. They have to rewrite it in COBOL, and it takes forever.

Cynthia F ORourke

Jun 25

My introduction to autoML was MuMIn, an R package that didn't pitch itself as autoML, but rather pitched [generalized] linear model selection as an information theoretics problem. AutoML at its best, for me, remains an algorithm-agnostic solution to an information theoretics problem.

Expand full comment

1 reply by Thomas W. Dinsmore

Gonzo

I recapitulated this history in grad school down to most of the tools—though i did use JMP for a semester before returning to SAS and I didn’t write any fortran until I started using R.

When i first learned neuralnets it was 2002 and my stats professor cautioned me: Neural networks are what really smart people who believe magic end up sacrificing their careers for. After a summer dalliance I got back to work with glm and some agent based modeling (ecology).

Don’t be them, he said. Alas, I wasn’t them.

2 more comments...

Thomas Dinsmore's Substack

Discussion about this post