Notes on Dataiku
Dataiku prepares for an exit. This is not news; Forbes covered their IPO plans two years ago. Besides, every startup constantly preps for an exit unless it wants to join the Living Dead.
Even though Dataiku does not appear on lists of top IPOs to expect, there are signs that the company will exit sooner rather than later. Wellington Management was the sole investor in Dataiku’s Series F in December 2022. Wellington doesn’t fuck around with seeds and early-stage rounds. When Wellington takes a stake, they expect an early exit.
It’s like when Paulie takes a “silent partnership” in your restaurant. He expects a liquidity event.
Dataiku raised $200 million in that last round. That’s below average for a Series F, but we were in the middle of VC Winter, or whatever they called it. Company headcount is flat since then, so I expect they still have cash in the bank. Even so, at eighteen months, that last round is getting old.
Venture money does not improve with age. Raise money every two years or join the Living Dead.
Recently, Dataiku added some hired guns: a new President, a Chief Revenue Officer, an EMEA GM, and a new Chief People Officer. The CRO and EMEA GM have “over 25 years of experience.” The Chief People Officer only has “extensive experience,” but it doesn’t matter; nobody cares about Chief People Officers.
Oops, I see the CRO spent time at AppDynamics. We'd better count the spoons.
Lately, the “expert-for-hire” firms want inside intel about Dataiku. When investors smell a deal brewing, they don’t wait for the S-1: they ask “expert-for-hire” firms to snoop for inside information. Of course, they don’t call it that, and the “expert-for-hire” firms swear up, down, and sideways that they never ask people to reveal inside information.
However, inside information is the only proven exception to market efficiency. Seeking alpha, they call it, wink wink.
Is Dataiku ready for an IPO? It’s unclear what “ready” means anymore. Databricks was “ready” for an IPO two years ago. I respect the company more for NOT doing one; it means the founders are in it for the long haul and see lots of upside in the business.
Founders rush for an IPO when they are bored, or they figure the business has peaked. Tom Siebel sold Siebel Systems to Oracle just as it was about to be crushed by Salesforce. Startup leaders don’t sell when they are confident about the future of the business.
Last September, Dataiku claimed an ARR of “more than” $230 million, up from $150 million in FY 2022. That may be enough; the median revenue for tech IPOs in 2021 was $207 million. Of course, that was back in the boom years. Investors are more discerning now after getting burned by C3.ai and its ilk. C3 IPOed with about $150 million in revenue, traded briefly at $160, and now sees $30 on good days. Anyone who read the S-1 could see that coming.
Dataiku’s announced revenue figures show the kind of topline growth investors like. That growth is even more impressive considering Dataiku’s flat headcount: the company has dramatically improved its GTM efficiency. Lots of startups grow the topline by flooding the zone with people. In contrast, Dataiku leverages sell-through partnerships with Databricks, Snowflake, and AWS. You don’t need as many feet on the street if you can slipstream the big players.
Dataiku also gets a lot of business through its partnership with oil service giant Schlumberger. That company, recently greenwashed and doing business as SLB, is the world's largest offshore drilling company and contractor. This is deliciously ironic since the Dataiku founders are into saving the planet.
As Don Corleone says to Sollozzo: It makes no difference, it don't make any difference to me what a man does for a living, you understand. But your business is a little dangerous.
Dataiku describes itself as an artificial intelligence and machine learning company. That’s a wee bit of a stretch. Data Science Studio, or DSS, Dataiku’s core product, is an analytics platform built for business users. It integrates nicely with leading data platforms and has a graphical user interface. Users point and click; DSS goes beep beep boop, translates those gestures into SQL or whatever, and passes them to the data platform for execution.
That’s not a dunk. In most organizations, there are ten business analysts for every “expert” data scientist. When you’re selling seats, you get a lot more butts when your software has a graphical user interface.
I used to do win-loss analysis for DataRobot. Customers who chose Dataiku over DataRobot cited “no need for machine learning” or “users like graphical user interface.”
That’s why Dataiku partners so well with Databricks and Snowflake. The only people who can use Databricks and Snowflake are data nerds who write Scala or work with Snowpark or whatever. CDOs who only empower data nerds get fired. Dataiku offers an attractive front end for the lower-IQ folks in the business. They use it to create their own “data products” they can download to Excel.
Partnering with AWS, Databricks, and Snowflake also makes Dataiku look good because the DSS back end is less than robust. (I’m being diplomatic.) DSS creates scalable workloads by pushing them into high-performance platforms – like Databricks and Snowflake. That works well for data-wrangling functions or embarrassingly parallel tasks.
Unfortunately, there are some critical machine learning functions it can’t push down very well. Like running Hugging Face models:
Running local large-scale HuggingFace models is a complex and very costly setup, and both quality and performance tend to be below proprietary LLM APIs. We strongly recommend that you make your first experiments in the domain of LLMs using Hosted LLM APIs.
Or AutoML:
When selecting (the High Performance) prediction style, DSS will select various tree-based models with a very deep hyper-parameter optimization search. This will generally give the best possible prediction performance at the expense of interpretability. Training time will be strongly increased.
I attended a Dataiku webinar where the presenter recommended running this overnight.
Quoting the docs:
DSS can scale most of its processing by pushing down computation to Elastic computation clusters powered by Kubernetes.
The keyword is “most.” George Mallory survived “most” of his Everest expedition. Then he died. Running “most” of your workloads in K8S is like being “almost” pregnant. Your workload either runs on K8S or it doesn't.
That last mile kills you every time.
The workloads that Dataiku can’t push down to K8S or a scalable data platform run on a server. That’s fine if you have a small data science team or you don’t need to do serious machine learning. DSS will struggle if you want to do computationally intensive stuff, like hosting transformers or building good models with AutoML.
Dataiku made a big splash last year by partnering with NVIDIA. That should fix everything, right? Run that shit on one of NVIDIA’s big boxes.
Oh, wait. NVidia DGX support is experimental and covered by Tier 2 support.
Well, that inspires confidence.
What about generative AI?
Dataiku offers Dataiku Answers: a packaged, scalable web application democratizing enterprise-ready large language model (LLM) chat and retrieval-augmented generation (RAG) usage across business processes and teams.
Say that three times fast.
Can you picture it? The CEO sits with her management team, brainstorming solutions for tough business problems. Suddenly, she slams her fist on the table. “Dammit, we need more people using enterprise-ready large language model chat and retrieval-augmented generation across our business processes and teams! And we need it NOW!”
Search for “Dataiku Answers” in the DSS docs, and you get nothing. That means it’s not a product—it’s a concept car they put together because Goldman says they need a GenAI story for the roadshow.
I will get hate mail for saying this, but RAG is a workaround for shitty LLMs. Developers use RAG because LLMs are out-of-date, lack customer-specific knowledge, and tend to hallucinate. RAG helps a developer patch these problems, and it's cheaper than incremental training or fine-tuning. But it’s only a tool for skilled developers, not the business analysts who like Dataiku.
By the way, those developers already use LangChain, which they will give up when you tear it from their cold, dead hands. Dataiku can GTFO.
“Democratized RAG” is bullshit. Businesses will not entrust RAG to people who don’t know what they are doing.
If you’re an investor snooping around Dataiku, look at the company’s track record as an analytics platform. They claim pretty good numbers in that business. If the ARR figures they’ve published are legit, it’s a good story.
Look elsewhere if you want to place bets on advanced machine learning and generative AI. Companies like Anyscale, Hugging Face, and Weights & Biases are serious players in GenAI’s core disciplines. They believe in that stuff and know how to make it work.

As always, well thought out and entertaining! Can't wait for the IPO or whatever is next though, gotta put those kids through college.
Good stuff. I hope they do well. I’ve always admired DSS and its ability to push down. When u writing a DataRobot article? We need a fresh take now the dust has settled.