If you mean “replace the AutoML back end with an LLM that writes code and a battery of agents to run and analyze experiments,” that is like training dogs to wait on tables.
The most important sentence in the whole article is:
Models don’t drive value if you can’t deploy them.
So why are they not being deployed? If it is because your favorite modeling software makes it hard to do, then why are you using it? If it is because it is hard to get an adequate model, then why? If it is because management-imposed processes are too cumbersome, then how should they be amended? If it because key people get pulled off of projects which are then left to die, then how shall we prevent that? In any case, the reasons for modeling project failure need to be analyzed, causes found and solutions devised without firing more people than is absolutely necessary (the optimal number is zero).
Next we have the question of AutoML replacing human analysts. My observation of least a couple of attempts to automate the computer programming and web development professions out of existence would suggest that this is not very likely. As you correctly point out, somebody has to tell the computer what is required in sufficient detail in order for the output to be useful. And what comes out must be validated (as lawyers are quickly discovering), and assuming it is valid, it must be interpreted, and the whole process must be transparent enough that sufficiently knowledgeable people can determine if everything was done correctly. Otherwise, it is like a computer generated mathematical proof that fails to persuade human mathematicians because the theorem wasn't proven to them. Doing all of that suggests that a different skill set than that generally posessed by corporate executives will be required.
Regardless, I think commercial autoML software is viable, but only if it is worth paying for at prices high enough to turn a profit. That means that it has to be continuously improved upon in order to stay ahead of open source developers (no cash cows allowed). Alternatively, consultants and analysts can pool their talents to improve the state of the art and make it available by such means as they think proper (to include open source development). Either way, human expertise will continue to be required. Certainly imagination will be.
My recent experience at Ford would definitely support your first sentence. Your second paragraph might well be correct, though I find it intriguing that open source implementations of CART still have not caught up algorithmically with what Salford Systems was putting out a quarter of a century ago (and Minitab-neglected TreeNet still strikes me as superior to open source GBM implementations).
Table waiting dogs would be a real draw for me to visit a restaurant at least once--just to see it done. That's about all i've got for llm automl too. I'm eagerly awaiting the OpenClaw Data Science headline that puts someone out of business. The best thing about human data scientists is that, while on average not very skilled or insightful, they can't work around the clock or while you sleep -- their stuff usually doesn't fuck up your business until you put the model in production and that still happens so rarely as to be a nominal risk. For safety's sake hire human data scientists but don't let them use openclaw. ;-)
I’ve been working on AutoML tools since 2016. I’m the author of mljar-supervised, an open-source AutoML tool with automated documentation and support for ML fairness [1].
Current AutoML solutions focus mainly on algorithm selection and hyperparameter tuning. They are still not very strong in feature construction.
In mljar-supervised, we introduced a feature called Golden Features, which searches for combinations of existing features using simple mathematical operations to improve predictive power.
From my recent experiments with LLMs, I see that they are surprisingly good at feature engineering. They can generate new features that are often very useful.
Recently, I built a tool called AutoLab Experiments (available in MLJAR Studio), which uses AI to optimize ML pipelines. It is conceptually similar to Andrej Karpathy’s AutoResearch. I also wrote a short comparison here [2].
Overall, I believe that combining these two approaches: classic AutoML and LLM-generated code can lead to very powerful systems.
Autogluon for the win.
Yes, Autogluon for the win!
What's next? An AutoML LLM?
If you mean “replace the AutoML back end with an LLM that writes code and a battery of agents to run and analyze experiments,” that is like training dogs to wait on tables.
Simpler for an LLM to invoke AutoGluon.
Or XGBoost. Fast and scalable to GPUs.
Funny thing about that. In 99% of all AutoML experiments, the algo that tops the leaderboard is either XGBoost or LightGBM.
So all you really need is XGBoost with feature engineering
Yes, 100%. Most of the time XGBoost is all you need (with great FE).
The most important sentence in the whole article is:
Models don’t drive value if you can’t deploy them.
So why are they not being deployed? If it is because your favorite modeling software makes it hard to do, then why are you using it? If it is because it is hard to get an adequate model, then why? If it is because management-imposed processes are too cumbersome, then how should they be amended? If it because key people get pulled off of projects which are then left to die, then how shall we prevent that? In any case, the reasons for modeling project failure need to be analyzed, causes found and solutions devised without firing more people than is absolutely necessary (the optimal number is zero).
Next we have the question of AutoML replacing human analysts. My observation of least a couple of attempts to automate the computer programming and web development professions out of existence would suggest that this is not very likely. As you correctly point out, somebody has to tell the computer what is required in sufficient detail in order for the output to be useful. And what comes out must be validated (as lawyers are quickly discovering), and assuming it is valid, it must be interpreted, and the whole process must be transparent enough that sufficiently knowledgeable people can determine if everything was done correctly. Otherwise, it is like a computer generated mathematical proof that fails to persuade human mathematicians because the theorem wasn't proven to them. Doing all of that suggests that a different skill set than that generally posessed by corporate executives will be required.
Regardless, I think commercial autoML software is viable, but only if it is worth paying for at prices high enough to turn a profit. That means that it has to be continuously improved upon in order to stay ahead of open source developers (no cash cows allowed). Alternatively, consultants and analysts can pool their talents to improve the state of the art and make it available by such means as they think proper (to include open source development). Either way, human expertise will continue to be required. Certainly imagination will be.
The barriers to model deployment are institutional, not technical.
Commercial AutoML cannot "stay ahead" of open source AutoML if the metric is model quality. User appeal is the primary differentiator.
My recent experience at Ford would definitely support your first sentence. Your second paragraph might well be correct, though I find it intriguing that open source implementations of CART still have not caught up algorithmically with what Salford Systems was putting out a quarter of a century ago (and Minitab-neglected TreeNet still strikes me as superior to open source GBM implementations).
CART was/is a great tool
Table waiting dogs would be a real draw for me to visit a restaurant at least once--just to see it done. That's about all i've got for llm automl too. I'm eagerly awaiting the OpenClaw Data Science headline that puts someone out of business. The best thing about human data scientists is that, while on average not very skilled or insightful, they can't work around the clock or while you sleep -- their stuff usually doesn't fuck up your business until you put the model in production and that still happens so rarely as to be a nominal risk. For safety's sake hire human data scientists but don't let them use openclaw. ;-)
Businesses will trust LLMs to perform low-value low-impact tasks, but they will not trust LLMs to perform high-value high-impact tasks.
Ergo, to the extent that data science is in the latter category, those jobs are safe.
As a DS who'se first job (2015-2019 at SparkBeyond - which did do Feature engineering, thankyouverymuch) - neat summary!
I hadn't thought about the mlOps part; our issues were more the clients actually managing to access their data and targets.
It usually goes like this:
(1) Decide on model needs
(2) Hunt for data
(3) Wrangle data
(4) Build models
(5) Explain models
(6) Secure approval to implement models
(7) Figure out how to implement models
(8) Implement models
(9) Monitor models
(loop)
2-3 is more of a moebius loop
Thank you for a fantastic article!
I’ve been working on AutoML tools since 2016. I’m the author of mljar-supervised, an open-source AutoML tool with automated documentation and support for ML fairness [1].
Current AutoML solutions focus mainly on algorithm selection and hyperparameter tuning. They are still not very strong in feature construction.
In mljar-supervised, we introduced a feature called Golden Features, which searches for combinations of existing features using simple mathematical operations to improve predictive power.
From my recent experiments with LLMs, I see that they are surprisingly good at feature engineering. They can generate new features that are often very useful.
Recently, I built a tool called AutoLab Experiments (available in MLJAR Studio), which uses AI to optimize ML pipelines. It is conceptually similar to Andrej Karpathy’s AutoResearch. I also wrote a short comparison here [2].
Overall, I believe that combining these two approaches: classic AutoML and LLM-generated code can lead to very powerful systems.
[1] https://github.com/mljar/mljar-supervised
[2] https://mljar.com/blog/autoresearch-karpathy-autonomous-ai-research/
Yeah, that's not feature engineering, sorry.
(That's feature selection, and +- interaction).
I'm a big fan of your work for a few years ago, I need to check you out again.
Thank you, sir!
Check PerpetualBooster also.