The ML Infrastructure Investment Thesis: Why We Back the Picks-and-Shovels of AI

ML infrastructure investment thesis 2024

During the California Gold Rush of 1849, the people who got reliably rich were rarely the prospectors. They were the merchants selling picks, shovels, denim pants, and provisions to the tens of thousands of miners flooding the Sierra Nevada foothills. Levi Strauss did not find gold; he sold trousers to people looking for it and built one of the most durable brands in American business history. The analogy to the current AI boom is imperfect in many ways, but at its core it captures something essential about where the most defensible value in the AI ecosystem is being built.

At Milestone AI Ventures, we have spent the past several years developing a specific conviction about the ML infrastructure layer — the software, platforms, and tooling that machine learning teams use to build, train, evaluate, deploy, and monitor AI systems. This conviction has been validated, in spectacular fashion, by the growth of companies like Weights & Biases, Hugging Face, and Scale AI, all of which have become foundational to how serious AI teams operate. In this piece, we want to articulate precisely why we believe ML infrastructure is the most interesting investment category in AI, what the market dynamics look like, and how we think about identifying the next generation of infrastructure companies that will define how AI is built over the next decade.

The Scale of the Opportunity: A Market That Did Not Exist Five Years Ago

To understand the ML infrastructure opportunity, it helps to appreciate how young the category is. Five years ago, most machine learning teams were cobbling together experiment tracking through handwritten spreadsheets and ad hoc logging systems. Model versioning was a custom script that a senior engineer had written and nobody else fully understood. Data labeling was a painful manual process managed through email threads and contractor spreadsheets. The concept of "MLOps" as a defined discipline did not exist — the term was not in widespread use until 2019, and the first MLOps-focused VC thesis pieces did not appear until 2020.

What changed? Three things happened simultaneously that created the market. First, deep learning models became good enough to deploy in production at scale — meaning companies had real business incentives to invest in the infrastructure required to do so reliably. Second, the size and complexity of models grew dramatically, making the informal approaches that worked for a research prototype completely inadequate for production systems. Third, a generation of talented ML engineers who had built these systems at Google, Facebook, and Uber left to start companies — and they built the tools they wished they had had at their previous employers.

The result is a market that has grown from essentially nothing to several billion dollars in annual recurring revenue in roughly five years, with most analysts projecting continued growth at 30-40% CAGR through the rest of the decade. IDC estimates that global AI infrastructure spending will exceed $300 billion by 2026. Even capturing a small fraction of that spend in the tooling and platform layer represents an enormous commercial opportunity.

The Validation: Three Companies That Built Categories

Before we describe our forward-looking thesis, it is worth examining in some detail the three companies that have done the most to validate the ML infrastructure category, because understanding why they won illuminates what we look for in the next generation of companies.

Weights & Biases: The Experiment Tracking Category

Weights & Biases was founded in 2018 by Lukas Biewald, Chris Van Pelt, and Shawn Lewis, all veterans of CrowdFlower (later Figure Eight, now Appen). The company's initial product was deceptively simple: a better way to track machine learning experiments. When you train a neural network, you are constantly making decisions about hyperparameters, architecture choices, data augmentation strategies, and a dozen other variables. Understanding which combination of choices produced which results — and being able to reproduce those results later — was a solved problem in theory but a genuine pain point in practice for every ML team in the world.

Weights & Biases built a clean, collaborative tool that made experiment tracking genuinely pleasant rather than painful. They grew primarily through bottom-up adoption — individual ML researchers started using the free tier because it was better than their existing tools, their colleagues started using it, and eventually enterprise procurement got involved to manage the sprawl. This bottom-up, product-led growth model is one of the most powerful distribution mechanisms in enterprise software, and W&B executed it exceptionally well.

By the time they raised their $200 million Series C in September 2022 at a $1.25 billion valuation — led by Insight Partners with participation from Coatue and Tiger Global — Weights & Biases had over 700,000 users at more than 500 enterprise customers, including OpenAI, EleutherAI, and virtually every serious AI research lab and AI-focused enterprise in the world. The round valued a company that had been bootstrapped for its first few years at over a billion dollars. More importantly, it validated that developer-focused ML tooling could achieve venture-scale outcomes through pure product excellence rather than aggressive sales-led go-to-market.

The lesson for seed-stage investors is that the best ML infrastructure companies often do not look like typical enterprise software companies in the early stages. They look like beloved developer tools — fast, clean, free to start, and adopted enthusiastically by the technical community before enterprise buyers get involved.

Hugging Face: The Model Hub and Open-Source AI Platform

Hugging Face's trajectory is perhaps the most remarkable in the ML infrastructure category. The company was founded in 2016 as an AI-powered chatbot for teenagers — a consumer product that pivoted hard into developer tooling after releasing the Transformers library in 2019. The Transformers library became, essentially overnight, the standard way that the global ML research community accessed and used pre-trained language models. It is now the most widely-used machine learning library in history, with over 100,000 pre-trained models hosted on the Hugging Face Hub and downloaded billions of times per month.

This open-source adoption created a commercial flywheel that powered Hugging Face to extraordinary venture outcomes. Their 2023 Series D raised $235 million at a $4.5 billion valuation — one of the highest valuations in enterprise AI tooling at that point. The round was led by Salesforce Ventures and GV with participation from Google, Amazon, NVIDIA, Qualcomm, IBM, and a who's-who of strategic technology investors, all of whom recognized that Hugging Face had become the standard platform for model hosting and sharing in the open-source AI ecosystem.

What is fascinating about Hugging Face is that they built enormous business value on top of giving away immense amounts of value for free. The Hub hosts models from Google, Meta, Mistral, EleutherAI, and hundreds of other organizations — none of whom pay Hugging Face to host those models. But the traffic, developer community, and enterprise relationships that flow from being the central repository for the open-source AI ecosystem create commercial opportunities that are genuinely difficult to replicate. Their enterprise offering, which gives companies the ability to host private models on the Hub infrastructure with enterprise security controls, is a natural upsell to an existing user base that already trusts and depends on the platform.

The Hugging Face lesson for infrastructure investors: owning the community often matters more than owning the code. The most defensible ML infrastructure companies are those that become the meeting place for a technical community — where practitioners share work, learn from each other, and build habits that eventually translate into commercial relationships.

Scale AI: The Data Labeling and AI Feedback Category

Scale AI was founded in 2016 by Alexandr Wang, then 19 years old and a freshman at MIT, who dropped out to pursue the company full-time. The original insight was simple and powerful: the quality of supervised machine learning models is fundamentally constrained by the quality of the labeled training data, and most companies were doing data labeling in-house with informal processes that produced inconsistent results at high cost. Scale built a managed service for data labeling that combined a large network of human annotators with quality control software and task management tooling designed specifically for ML workflows.

The company grew rapidly by targeting the self-driving car sector, which had enormous data labeling requirements for sensor fusion, object detection, and scene understanding tasks. Early customers included Lyft, Uber, General Motors, and Toyota Research Institute. As computer vision and NLP applications expanded beyond autonomous vehicles, Scale expanded its service offering to cover text annotation, conversation data generation, document processing, and eventually the reinforcement learning from human feedback (RLHF) pipelines that are central to training instruction-following language models like GPT-4 and Claude.

By their 2023 secondary and primary funding activity, Scale AI had reached a $7.3 billion valuation after their $1 billion funding round, with a customer base that included OpenAI, Anthropic, Meta, Microsoft, and the U.S. Department of Defense. Their government contracts division, Donovan, became a significant business in its own right, reflecting the reality that government agencies building AI systems have the same data quality and feedback pipeline requirements as commercial enterprises.

Scale's trajectory illustrates a critical insight about ML infrastructure: the most valuable companies often solve problems that are unglamorous but absolutely essential. Nobody builds a conference presentation around data labeling pipelines. But every serious AI system depends on them, and the companies that build those pipelines at enterprise scale, with the quality controls and governance features that large enterprises require, capture enormous value in the process.

The Data Infrastructure Layer: dbt, Fivetran, and the AI Data Stack

One dimension of the ML infrastructure opportunity that deserves particular attention is the intersection of traditional data infrastructure and AI-specific tooling. The companies that built the modern data stack — the set of tools that data teams use to ingest, transform, model, and analyze business data — are now becoming critical infrastructure for AI systems, because AI systems need clean, well-governed, up-to-date data to function reliably.

dbt Labs, which raised a $222 million Series D in February 2022 at a $4.2 billion valuation, built the standard tool for data transformation in modern data warehouses. Their open-source dbt Core product is used by virtually every sophisticated data team that runs a cloud data warehouse, and their commercial dbt Cloud offering provides the managed infrastructure and collaboration features that enterprise teams require. As AI teams increasingly need to define, test, and version the data transformations that feed their models, dbt becomes essential infrastructure for the AI stack as well as the analytics stack.

Fivetran, which raised a $565 million Series D in September 2021 at a $5.6 billion valuation, built the standard tool for automated data pipeline management — the software that ingests data from hundreds of source systems (SaaS applications, databases, APIs) and loads it into data warehouses. Every AI system that relies on up-to-date business data needs a reliable ingestion layer, and Fivetran's 300+ pre-built connectors make it the fastest path to getting business data into the format required for AI workloads.

Airbyte, the open-source alternative to Fivetran that raised $150 million in December 2022, took a different approach — building a community-driven connector ecosystem where the open-source community contributes and maintains connectors, driving faster coverage of long-tail data sources that the commercial vendors cannot economically support. Their model demonstrates that the open-core strategy pioneered by Hugging Face in model hosting can work equally well in data integration.

The common thread across all of these data infrastructure companies is that they become more valuable as AI adoption increases, because every AI application requires high-quality data pipelines. We view the data infrastructure layer as partially AI-adjacent infrastructure (companies that were not built specifically for AI but benefit from AI adoption) and partially AI-essential infrastructure (companies that are becoming critical path for AI deployments specifically). Both are interesting to us, though our focus at the seed stage is primarily on companies in the second category.

What We Look for in the Next Generation of ML Infrastructure Companies

Understanding the historical winners in ML infrastructure is one input into our investment thesis. The more actionable question is: what are the characteristics of the next generation of ML infrastructure companies that are being built now, and how do we identify them early enough to invest at the seed stage?

Based on our portfolio construction and the patterns we have observed in the companies that have succeeded in this category, we have developed a framework around five core criteria.

Technical depth that creates sustainable moat. The ML infrastructure companies that last are not those that built a nice user interface on top of existing open-source tooling. They are companies that made genuine technical contributions — new algorithms, new data structures, new architectural approaches — that make their product meaningfully better at the core task in ways that are difficult to replicate. Weights & Biases's artifact versioning and lineage tracking are technically sophisticated systems that took significant engineering investment to build correctly. Scale AI's quality consensus algorithms and task routing logic represent years of iterative improvement. We look for seed-stage teams that have the technical depth to make those kinds of contributions, not just the business acumen to package existing technology in a user-friendly wrapper.

Developer-first distribution with enterprise expansion potential. The most successful ML infrastructure companies grew bottom-up — individual practitioners adopted the tool because it was genuinely better for their workflow, and enterprise procurement followed adoption rather than leading it. This is in contrast to the classic enterprise software model where the buyer is not the user and the sales team drives adoption. We look for companies where the primary user (the ML engineer or data scientist) is the primary advocate, and where the product's value is so clear that it does not require a sophisticated sales process to demonstrate.

Position at a workflow bottleneck. The most defensible infrastructure companies sit at genuine bottlenecks in the ML development workflow — places where inefficiency has a direct, measurable cost on model quality or deployment velocity. Experiment tracking sits at the bottleneck of understanding what you have tried and what worked. Data labeling sits at the bottleneck of getting the quality training data that determines model performance. Model serving sits at the bottleneck of getting working models into production reliably. We are most interested in companies that have identified a genuine bottleneck and built a solution that makes crossing that bottleneck measurably faster or cheaper.

Expansion across the AI development lifecycle. The best infrastructure companies do not remain narrowly focused on a single workflow step. They expand across the AI development lifecycle as they build trust with users and learn about adjacent pain points. Weights & Biases expanded from experiment tracking into model registry, artifact management, and model evaluation. Scale AI expanded from data labeling into RLHF pipelines, evaluation services, and government AI systems. We look for seed-stage companies that have a clear initial wedge and a credible expansion roadmap that creates the potential for a platform business rather than a point solution.

Data network effects or compounding learning advantages. The infrastructure companies that build the most durable moats are those where usage generates data that improves the product, which attracts more usage, which generates more data. Scale AI's quality consensus algorithms improve with more labeling tasks processed. Hugging Face's model recommendations improve as more users interact with and evaluate models on the Hub. These data network effects are different from traditional network effects in that they improve the product's core capability rather than just its utility (though both are valuable). We spend significant time in diligence understanding whether a company's product generates proprietary data advantages that compound over time.

The AI Infrastructure Market in 2024 and Beyond

We are writing this in September 2024, at a moment when the AI infrastructure market has matured considerably from its 2021-2022 frothiness but continues to grow at rates that most enterprise software categories never achieve. Several dynamics are worth noting for investors and founders thinking about this space.

The commoditization of foundation models is accelerating the importance of the infrastructure layer. As GPT-4, Claude, Llama, and Mistral converge toward similar capability levels for most mainstream tasks, the companies that differentiate on the quality of their training pipelines, evaluation infrastructure, and deployment systems will pull away from those relying on raw model capability. This is creating urgency among enterprise AI teams to invest in the infrastructure layer — to build the feedback pipelines, evaluation frameworks, and governance systems that allow them to continuously improve their AI systems rather than depending on their foundation model provider's next release.

The regulatory environment is creating new infrastructure requirements. The EU AI Act, the Biden Administration's AI Executive Order, and emerging state-level AI regulations all create compliance and governance requirements for AI systems — requirements for documentation of training data, testing for bias and safety, explainability of model decisions in high-stakes applications, and audit trails for model behavior. These requirements create demand for infrastructure that most AI teams do not currently have. The companies that build governance and compliance tooling for AI systems are addressing a market that is being created by regulatory fiat, which gives it a different risk profile than pure market-driven adoption.

The talent market is driving infrastructure standardization. As ML engineering talent has become scarcer and more expensive, companies are increasingly standardizing on the tools that new hires already know from their previous employers or from the open-source community. This creates a virtuous cycle for the market leaders — a senior ML engineer who has been using Weights & Biases for three years will advocate for using it at their new company, creating enterprise adoption without enterprise sales effort. For founders building new infrastructure tools, this dynamic creates both an opportunity (build something that the community loves and adoption follows) and a challenge (overcome the switching costs and familiarity of the incumbent platforms).

Our Portfolio and Where We Are Investing

At Milestone AI Ventures, our ML infrastructure thesis has led us to invest across several specific subcategories where we believe the next generation of defining companies is being built.

We are actively investing in AI evaluation and testing infrastructure — companies that help ML teams rigorously assess model behavior before deployment, continuously monitor model performance in production, and build automated regression testing pipelines that catch model degradation before it reaches end users. Evaluation is one of the most underdeveloped areas of ML infrastructure relative to its importance: a model that works in the lab but behaves unexpectedly in production is not just a technical failure, it is a business and reputational risk that enterprise buyers are increasingly unwilling to accept.

We are also investing in AI governance and lineage tooling — companies that help enterprises understand and document the data that trained their models, the decisions made during model development, and the ongoing behavior of models in production. These capabilities are table stakes for regulated industries (finance, healthcare, insurance) and are becoming requirements in general enterprise contexts as AI systems are used to make consequential decisions.

Finally, we are actively looking at the emerging category of AI agent infrastructure — the orchestration frameworks, tool integration platforms, and state management systems that support AI agents executing multi-step tasks autonomously. This is a nascent market where there is significant technical uncertainty, but the eventual scale of the opportunity is enormous, and we believe the infrastructure standards for agentic AI are being established now by the companies building the first generation of production agentic systems.

Conclusion: Infrastructure Is Where Empires Are Built

The history of technology is littered with examples of the infrastructure layer outperforming the application layer in long-run value creation. Oracle beat every early database application company. Cisco beat every early internet service provider. AWS beat every early cloud application. The pattern is not universal — Amazon the application company did fine — but it reflects a real dynamic: infrastructure that other systems depend on has structural advantages in defensibility, switching costs, and gross margin that application companies struggle to replicate.

We believe the ML infrastructure category is in the process of generating the same dynamic in the AI era. The Weights & Biases, Hugging Face, and Scale AIs of today will be joined by a new generation of infrastructure companies that own the experiment management, model governance, data quality, deployment orchestration, and agent infrastructure layers of the AI stack. At Milestone AI Ventures, we are deploying capital into this generation of companies with conviction that the picks-and-shovels metaphor is not just marketing copy — it is an accurate description of where the most durable value in the AI ecosystem is being built.

If you are building ML infrastructure — experiment management, data quality, model governance, evaluation tooling, agent infrastructure, or anything else that makes AI teams faster and better — we would like to hear from you. Reach us at founders@mstone-ai.com or through the contact form on our website.

Marcus Reyes is a General Partner at Milestone AI Ventures. He previously led engineering infrastructure at Scale AI and holds a B.S. in Computer Science from Stanford University. The views expressed here are his own and do not constitute investment advice.