Deal Trends
AI and Data IP Diligence in Software M&A
"Standard IP diligence misses the questions that determine value in AI software M&A. The diligence playbook needs to be rebuilt around training data, model provenance, and downstream license rights."
Software M&A has long had a recognized IP diligence playbook: review the corporate IP assignments, confirm the chain of title to material IP, evaluate the open-source compliance posture, and assess the contractual scope of the company's customer-facing licenses. For traditional software targets, that playbook is sufficient. For AI-powered software targets - which now constitute a meaningful share of software M&A - the playbook is materially incomplete.
The new diligence questions cluster around three areas: the data used to train the company's models, the provenance of the models themselves, and the downstream license rights the company has granted to its customers. Each of these areas carries risks that the standard IP package does not address, and each of them can move valuation by single-digit or even double-digit percentages of the headline price.
Training data is the area where the legal landscape is most unsettled and the diligence work most consequential. The questions to answer are: what data was used to train the models the target relies on? Was the data lawfully acquired and lawfully used for the training purpose? What contractual restrictions, license terms, or regulatory constraints (privacy, sectoral data rules) apply to the data? What documentary record exists to support the answers? The litigation landscape on AI training data is in active development in multiple jurisdictions, and an acquirer needs to understand the target's posture before pricing the asset.
Model provenance is the second area. Many AI products are built on a stack that combines proprietary models, fine-tuned versions of foundation models from third-party providers, and open-source model components. Each layer carries its own license terms, attribution obligations, and downstream-use restrictions. The diligence work is to map the model stack, identify the license obligations at each layer, and confirm the company's use is consistent with each set of terms. Foundation model license terms in particular have evolved rapidly and contain restrictions (on commercial use, on output use, on competitive applications) that a casual reading easily misses.
Downstream license rights - the rights the company has granted to its customers in respect of model outputs - are the third area. Standard SaaS terms may not adequately address questions specific to AI-generated content: who owns the output? Can the customer use the output to train its own models? Does the company retain rights in the output that conflict with the customer's expected ownership? These are practical commercial questions, but they are also legal questions whose answers determine the durability of the company's customer relationships in a litigation or audit scenario.
The documentary record for each of these areas should be specifically requested, specifically reviewed, and specifically incorporated into the diligence summary. Standard data-room categories typically do not include training data inventories, model provenance maps, or AI-specific customer-license analyses. Buyers who add these as explicit categories - and sellers who prepare them in advance of process launch - find the diligence phase materially shorter and more conclusive.
The representation and warranty package should be calibrated to the AI dimension as well. The standard IP representations need to be supplemented with reps on training-data lawfulness, model provenance, output ownership, and AI-specific regulatory compliance. The reps and warranties insurance market has begun pricing these risks separately; an acquirer who expects the standard policy to cover the full AI risk profile will often be disappointed by the resulting exclusions. A targeted underwriting conversation about the AI-specific representations, with supporting diligence work, can convert a broad exclusion into a narrow one.
The talent dimension overlaps with the IP analysis in AI-powered targets. Key technical contributors - the engineers and researchers who built the model stack - are often the people best positioned to answer the diligence questions on training data and model provenance. Their retention post-closing is often essential to the diligence record's continued accuracy as the company's model stack evolves. The talent retention plan should be drafted with these dependencies in mind.
AI software M&A is not a new asset class so much as a new diligence and documentation discipline applied to an existing asset class. The deals that close cleanly are the deals where the diligence playbook was rebuilt for the AI dimension, the documentary record was developed before launch, and the representations and warranties package was negotiated with awareness of the specific risks the asset presents. The deals that go badly are the ones where the standard playbook was applied to a non-standard asset.
What we are watching
We will return to this topic across the coming quarter. If you are actively negotiating a transaction where these issues are live, we'd welcome a confidential conversation.
Three takeaways
- The market is settling, but the diligence bar is rising.
- Preparation, not posture, is the source of speed.
- The right structure can move price more than another round of negotiation.

Related
Reps & Warranties Insurance in 2026: What's Changed
Premiums have stabilized after two volatile years, but the diligence bar has climbed. We unpack what underwriters now expect on tax, cyber, and pre-closing IP.
HSR Threshold Update: Implications for Mid-Market Deals
The 2026 HSR thresholds are out. Below the line, life is easier; above it, the new disclosure regime is a real cost.
Earnouts in a Volatile Market
Earnouts are back in fashion as a price-bridging tool. They are also back in court. Five drafting principles to keep them out of litigation.