Commercial Contracts 12 min read

Protecting Intellectual Property in AI Development Agreements

As AI development increasingly involves third-party datasets, pre-trained models, and collaborative development arrangements, the question of who owns what has never been more commercially significant, or legally uncertain.

Rini Mathew

28 March 2026 Updated 4 April 2026

Key Takeaways

AI systems contain five distinct IP layers (training data, architecture, weights, outputs, derived models), each requiring separate contractual allocation
Model weights — often the most valuable component — lack clear statutory protection in most jurisdictions, making contractual provisions the primary safeguard
Joint IP ownership rarely works in practice due to conflicting rules across jurisdictions; avoid it unless genuinely planned for
Every AI development agreement should include an IP audit, explicit fine-tuning provisions, and regulatory compliance considerations

The IP Challenge in AI Development

Traditional intellectual property frameworks were designed for a world where creation was linear: an inventor conceived an idea, reduced it to practice, and secured protection. AI disrupts every assumption in that chain.

Consider a typical AI development scenario: a company commissions a vendor to build a machine learning model. The vendor uses open-source frameworks (PyTorch, TensorFlow), pre-trained foundation models (licensed under various terms), the client's proprietary training data, the vendor's proprietary algorithms, and third-party datasets acquired under licence. The resulting model generates outputs that may incorporate elements of all of these inputs.

Who owns the model? The training pipeline? The outputs? The weights? Without clear contractual allocation, the answer depends on a tangle of copyright law, trade secret doctrine, patent principles, and licence terms, each varying by jurisdiction. AI systems that cost millions to develop can become unusable if IP ownership is disputed.

The Five Layers of IP in AI Systems

To negotiate AI development agreements effectively, you must first understand the distinct IP layers at play. Each layer has different ownership dynamics and protection mechanisms.

1. Training Data

Training data is the foundation of any AI system, and the IP issues begin here.

Proprietary data: If the client provides its own data (customer records, transaction histories, operational data), the client typically retains ownership. But the agreement must address what the developer can do with insights derived from that data
Licensed data: Third-party datasets come with licence restrictions that flow through to the AI system. A dataset licensed for "research purposes only" cannot be used to train a commercial model without additional permissions
Scraped or publicly available data: The legality of web scraping for AI training varies significantly by jurisdiction. The EU's Text and Data Mining exceptions (DSM Directive Articles 3-4) permit mining for research; commercial use requires opt-out mechanisms. The US relies on fair use analysis, with cases like Thomson Reuters v. ROSS Intelligence and New York Times v. OpenAI shaping the boundaries
Synthetic data: Data generated by AI models to augment training sets raises novel questions: is synthetic data derived from copyrighted material itself subject to the original copyright?

2. Model Architecture and Algorithms

The design of the neural network, its architecture, hyperparameters, and training methodology, represents a distinct IP layer.

Custom architectures: Novel model designs may be patentable, though the bar for software patents varies by jurisdiction. The US, China, and Japan are relatively permissive; the EU and India are more restrictive
Trade secrets: Training methodologies, hyperparameter configurations, and data preprocessing pipelines are often best protected as trade secrets rather than patents, provided adequate confidentiality measures are in place
Open-source components: Most AI systems rely on open-source frameworks. The licence terms (Apache 2.0, MIT, GPL) determine what restrictions flow through to the resulting system. GPL-licensed components can be particularly problematic for proprietary AI systems

3. Trained Model Weights

Model weights, the numerical parameters learned during training, are the most commercially valuable and legally uncertain layer.

Weights are not clearly protectable under copyright in most jurisdictions (they are numerical values, not "creative expression")
Trade secret protection is available if the weights are kept confidential, but is lost if the model is deployed in a way that allows extraction
Patent protection for specific weight configurations is theoretically possible but practically difficult to enforce
The contractual allocation of weight ownership is therefore critical; it may be the only reliable protection

4. AI-Generated Outputs

The outputs of AI systems (text, images, code, predictions, recommendations) present the newest IP challenge.

Copyright status: Most jurisdictions require human authorship for copyright protection. The US Copyright Office has consistently denied registration for purely AI-generated works (Zarya of the Dawn, Théâtre D'opéra Spatial). The UK is an exception, assigning copyright in computer-generated works to "the person by whom the arrangements necessary for the creation of the work are undertaken"
Commercial implications: If AI-generated outputs cannot be copyrighted, they cannot be exclusively licensed. This affects business models built on AI-generated content, code, or designs
Contractual solutions: Agreements should allocate rights to outputs regardless of their copyright status, using contractual mechanisms (assignment, exclusive licence, usage rights) to provide commercial certainty even where statutory IP protection is unavailable

5. Fine-Tuned and Derived Models

When a pre-trained foundation model is fine-tuned on proprietary data, the resulting model contains elements of both the original model and the new training data.

Foundation model licences increasingly address fine-tuning rights (Meta's Llama licence, Mistral's various tiers, Google's Gemma terms)
The distinction between the base model (licensed) and the fine-tuning delta (potentially owned) must be clearly articulated in agreements
Some licences restrict the use of outputs to improve competing models, a provision that can constrain entire product strategies if overlooked

Structuring the AI Development Agreement

With the IP layers identified, the development agreement must address each one explicitly. Ambiguity in AI IP clauses is not just a legal risk; it is a commercial certainty of future dispute.

IP Ownership Allocation

The most critical clause in any AI development agreement is the IP ownership allocation. Three models predominate:

Model	Client Owns	Developer Owns	Best For
Client ownership	Model, weights, training pipeline, outputs	Pre-existing IP only	Bespoke systems where exclusivity is critical
Developer ownership with licence	Perpetual licence to use the model and outputs	Model, platform, methodology	Platform-based AI services, SaaS models
Joint ownership	Rights to use and modify the model	Rights to use and modify the model	Collaborative R&D (use with caution)

A note on joint ownership: Joint IP ownership is frequently proposed as a "fair" compromise but rarely works well in practice. Under English law, joint owners can exploit IP independently without accounting to the other. Under US law, joint owners need each other's consent for exclusive licences. Under Indian law, joint ownership of patents requires agreement for any exploitation. Unless the parties genuinely intend and have planned for shared exploitation, avoid joint ownership.

Background and Foreground IP

Every AI development agreement should clearly distinguish:

Background IP: Pre-existing intellectual property that each party brings to the project (the developer's algorithms, the client's training data, licensed third-party components)
Foreground IP: New intellectual property created during the project (the trained model, novel algorithms, processed datasets)
Sideground IP: IP created by a party during the project but outside its scope (increasingly relevant as developers work on multiple AI projects simultaneously)

Background IP should remain with its original owner, with licences granted only to the extent necessary for the project. Foreground IP allocation is the subject of negotiation. Sideground IP should be addressed to avoid disputes about what was created "within" versus "outside" the project scope.

Key Contractual Provisions

Beyond ownership allocation, the following provisions are essential in AI development agreements:

Data rights and restrictions:

Specify what data the developer can access, use, retain, and derive insights from
Prohibit use of client data to train models for other clients (a common source of dispute)
Address data return and deletion obligations at project completion or termination
Include audit rights to verify compliance with data use restrictions

Open-source compliance:

Require the developer to disclose all open-source components and their licence terms
Prohibit use of copyleft-licensed components (GPL, AGPL) without prior written consent
Include indemnities for open-source licence violations
Require an open-source Bill of Materials (BOM) for the delivered system

Foundation model terms:

Identify all pre-trained models used and their licence terms
Ensure the development agreement is consistent with foundation model licences (e.g., Llama's acceptable use policy, commercial use thresholds)
Address what happens if a foundation model's licence terms change (increasingly common as providers tighten commercial terms)

Model performance and acceptance:

Define measurable acceptance criteria (accuracy, latency, fairness metrics)
Specify testing methodology and datasets
Address ongoing performance obligations and model drift
Include provisions for model retraining and the IP implications of updated versions

Jurisdiction-Specific Considerations

IP protection for AI systems varies significantly across the jurisdictions most relevant to technology companies:

	India	European Union	United States
Copyright & AI works	No explicit provisions for AI-generated works; Copyright Office has not issued definitive guidance	EU AI Act imposes transparency obligations that may require disclosure of training data, potentially conflicting with trade secret protection	Copyright Office guidance evolving; increasing specificity on what human involvement creates copyrightable output
Patent eligibility	Higher bar under Section 3(k) of the Patents Act; "technical effect" arguments can overcome the exclusion	More restrictive approach to software patents; Database Directive provides sui generis protection for AI training databases	Uncertain post-Alice; USPTO has issued guidance on AI-assisted inventions
Trade secret protection	Relies on contract law and common law of confidential information; no dedicated statute	Trade Secrets Directive provides harmonised protection; DSM Directive creates text and data mining exceptions for AI training	Defend Trade Secrets Act provides federal protection — the most robust statutory framework for AI methodologies
Work-for-hire / ownership	Section 17 applies to employees; contractor arrangements require careful structuring	Varies by member state; generally requires explicit contractual assignment	Work-for-hire doctrine well established; scope of "specially commissioned works" is defined by statute

Practical Recommendations

For technology companies entering AI development agreements, whether as client or developer, the following practices reduce risk and create commercial certainty:

Conduct an IP audit before negotiation: Identify every IP input (data, models, code, algorithms) and its ownership status, licence terms, and restrictions before drafting commences
Be specific about model components: "The Client owns all IP in the Deliverables" is insufficient. Specify ownership of each layer: training data, architecture, weights, outputs, and derived models
Address the fine-tuning scenario explicitly: If the project involves fine-tuning a foundation model, clearly articulate who owns the delta and what rights each party has to the combined system
Include robust confidentiality provisions: Given the limitations of statutory IP protection for AI (particularly model weights and training methodologies), contractual confidentiality may be the most effective protection mechanism
Plan for model lifecycle: AI models require ongoing retraining, updating, and maintenance. The initial development agreement should address who owns, and who is responsible for, subsequent versions
Consider regulatory compliance implications: The EU AI Act's transparency requirements may limit the extent to which AI systems can be kept confidential. Build regulatory disclosure obligations into your IP protection strategy

Conclusion

AI development agreements demand a level of IP specificity that traditional technology contracts rarely required. The five distinct layers of IP in any AI system, from training data to derived models, each carry different ownership dynamics, different statutory protections, and different commercial implications. A single generic IP assignment clause cannot address this complexity.

The companies that protect their AI investments most effectively are those that treat IP allocation as a foundational commercial exercise, not a legal afterthought. That means conducting an IP audit before negotiations begin, drafting provisions that address each layer explicitly, and building in the flexibility to accommodate a regulatory landscape that continues to evolve. In a field where the law has not yet caught up with the technology, the contract remains the most reliable safeguard.

Rini Mathew · Founder, Lawsel Advisory

All insights

Need guidance on commercial contracts?

Book a complimentary 30-minute consultation to discuss your specific requirements with Rini.

Book Free Consultation

30 min · No obligation

The IP Challenge in AI Development

The Five Layers of IP in AI Systems

1. Training Data

2. Model Architecture and Algorithms

3. Trained Model Weights

4. AI-Generated Outputs

5. Fine-Tuned and Derived Models

Structuring the AI Development Agreement

IP Ownership Allocation

Background and Foreground IP

Key Contractual Provisions

Jurisdiction-Specific Considerations

Practical Recommendations

Conclusion

Need guidance on commercial contracts?

Top 5 AI Governance Mistakes in 2026