The Hidden Data Harvest in AI Consulting

Team of consultants working on AI implementation in a boardroom

The narrative around the AI productivity puzzle has been stuck in a loop: tools exist, but output doesn’t budge. The obvious answer—better implementation—now has a price tag attached to it. Last week, OpenAI and Anthropic announced they are launching private-equity-backed consulting businesses, raising billions from the likes of Blackstone, TPG, and Goldman Sachs. The surface story is that these firms will help mid-market companies integrate AI into their workflows. The deeper story, the one that actually explains the economic logic of these ventures, is about something else entirely: data.

The Overlooked Angle

Most coverage treats these consulting arms as a natural extension of the AI companies’ product lines—a way to capture more value by offering services alongside software. That framing misses the point. The real purpose of these ventures is not to generate consulting fees. It is to acquire the one resource that current AI models desperately lack: high-fidelity, proprietary, workflow-specific training data from actual enterprise operations.

Why This Small Detail Matters

The productivity puzzle exists because general-purpose generative AI models produce generic outputs. They can write a memo, but they cannot reliably manage a supply chain exception, negotiate a vendor contract, or reconcile a complex P&L. These tasks require context that is buried in company-specific processes, legacy systems, and human decision patterns. No amount of internet-scale pretraining will capture that. The only way to get that data is to embed human consultants into the workflow and record every decision, override, and output. That is exactly what the PE-backed ventures are designed to do.

The Economic Mechanism

Let us look at the unit economics of this model. A traditional consulting engagement charges by the hour or by the project. Margins are typically 20-30% for a well-run firm, but scale is linear—more consultants, more revenue. That is not a particularly attractive business for investors who expect multiples on their capital. However, if the consulting engagement is structured to collect every interaction, every prompt, every correction, and every approved output, then each hour of consulting is also generating an asset: a labeled dataset of human-in-the-loop AI usage in a specific domain. This dataset is worth far more than the consulting fee because it can be used to fine-tune a specialized model that eventually automates the very tasks the consultants are doing.

The PE firms are not betting on billable hours. They are betting on the eventual creation of proprietary, domain-specific models that can be sold as software subscriptions with 80%+ gross margins. The consulting arm is the cost of acquiring that data. It is a capital-intensive R&D expense disguised as a revenue-generating service.

The Cost Structure of Data Acquisition

Compare this to other data acquisition strategies. Synthetic data generation is cheap but low quality—it lacks the noise and friction of real operations. Scraping public data is cheap but not proprietary. Buying data from brokers is expensive and shallow—a typical enterprise dataset costs thousands per company, and still misses the fine-grained decision context. The only way to get high-quality, proprietary, domain-specific data is to embed a human into the actual workflow. A senior consultant costs roughly $300-$500 per hour. For a 100-hour engagement, that is $30,000-$50,000 per client. If the resulting dataset can train a model that replaces even one low-level analyst (salary $70k/year), the one-time data cost is recouped in less than a year. Multiply that by thousands of clients, and the dataset becomes a multi-billion dollar asset. The consulting fees are simply a vehicle to defray the cost of data collection while the real value accumulates in the training corpus.

Why Private Equity? Private equity firms are structurally suited to this play. They have long time horizons—typical fund lives are 10-12 years. They are comfortable with capital-intensive, low-margin operations in the near term if the endgame is a high-margin software platform. They also have a ready-made client pool: their own portfolio companies. A PE firm like TPG can mandate that its portfolio companies engage the AI consulting arm, guaranteeing immediate deal flow. This captive channel accelerates data collection and reduces acquisition costs. The AI companies get the data; the PE firms get the upside from the eventual model licensing. It is a symbiotic structure that a traditional VC-backed software company could not replicate.

The Strategic Consequence

If this interpretation is correct, the winners will not be the companies that simply deploy AI fastest. The winners will be the companies that generate the most valuable training data during the deployment process. This creates a stark bifurcation. Large enterprises with in-house AI teams already have access to their own data and can build custom models. Mid-market companies, which lack that capability, become the prime targets for the consulting data harvest. They pay for implementation, but the resulting dataset flows back to the AI company—or worse, to a consortium of PE investors.

This also explains the scale of the investments—$4 billion from OpenAI’s venture, $1.5 billion for Anthropic. These numbers make sense only if the endgame is a high-margin software platform, not a services business. The consulting revenue alone could never justify a $10 billion valuation. The data asset can.

What Most Commentary Gets Wrong

The lazy take is that these consulting ventures are a sign that AI companies cannot sell software alone, so they must offer services to close the deal. That view ignores the asymmetry between the cost of consulting and the value of data. Another common mistake is to assume that the productivity puzzle will be solved by better models alone. It will not. It will be solved by better data—specifically, data that reflects the real friction of enterprise operations. The consulting arms are the most efficient way to collect that data at scale, even if the efficiency of the consulting itself is low.

Some observers worry that these ventures will fail because consulting is hard to scale. That misses the point entirely. The consulting is not meant to scale. It is a temporary bridge to automated intelligence. Once the data from enough engagements trains a sufficiently competent model, the consultants become obsolete—replaced by the very software they helped create. The business model flips from services to software, and the margins explode.

The Hard Business Lesson

If you are a mid-market executive considering hiring one of these AI consulting teams, ask yourself: Who owns the data generated during the engagement? If the answer is not you, then you are not a customer—you are a feedstock. The AI productivity puzzle will be solved not by technology breakthroughs but by the accumulation of the rarest resource: operational data that has been labeled by human experts. The companies that control that data will control the future of enterprise AI. The rest will pay the toll.

The Hidden Data Harvest in AI Consulting

The Overlooked Angle

Why This Small Detail Matters

The Economic Mechanism

The Cost Structure of Data Acquisition

The Strategic Consequence

What Most Commentary Gets Wrong

The Hard Business Lesson

Enter Product Keywords to Find Suppliers

Read Next

Pricing Power Is Not What It Used to Be

The Unseen For-Sale Inventory Bomb

Connect with me