All Insights

Protecting Intellectual Property in AI Development Agreements

As AI development increasingly involves third-party datasets, pre-trained models, and collaborative development arrangements, the question of who owns what has never been more commercially significant, or legally uncertain.

Rini Mathew

Rini Mathew

28 March 2026 Updated 4 April 2026

Key Takeaways

  • AI systems contain five distinct IP layers (training data, architecture, weights, outputs, derived models), each requiring separate contractual allocation
  • Model weights — often the most valuable component — lack clear statutory protection in most jurisdictions, making contractual provisions the primary safeguard
  • Joint IP ownership rarely works in practice due to conflicting rules across jurisdictions; avoid it unless genuinely planned for
  • Every AI development agreement should include an IP audit, explicit fine-tuning provisions, and regulatory compliance considerations

The IP Challenge in AI Development

Traditional intellectual property frameworks were designed for a world where creation was linear: an inventor conceived an idea, reduced it to practice, and secured protection. AI disrupts every assumption in that chain.

Consider a typical AI development scenario: a company commissions a vendor to build a machine learning model. The vendor uses open-source frameworks (PyTorch, TensorFlow), pre-trained foundation models (licensed under various terms), the client's proprietary training data, the vendor's proprietary algorithms, and third-party datasets acquired under licence. The resulting model generates outputs that may incorporate elements of all of these inputs.

Who owns the model? The training pipeline? The outputs? The weights? Without clear contractual allocation, the answer depends on a tangle of copyright law, trade secret doctrine, patent principles, and licence terms, each varying by jurisdiction. AI systems that cost millions to develop can become unusable if IP ownership is disputed.


The Five Layers of IP in AI Systems

To negotiate AI development agreements effectively, you must first understand the distinct IP layers at play. Each layer has different ownership dynamics and protection mechanisms.

1. Training Data

Training data is the foundation of any AI system, and the IP issues begin here.

  • Proprietary data: If the client provides its own data (customer records, transaction histories, operational data), the client typically retains ownership. But the agreement must address what the developer can do with insights derived from that data
  • Licensed data: Third-party datasets come with licence restrictions that flow through to the AI system. A dataset licensed for "research purposes only" cannot be used to train a commercial model without additional permissions
  • Scraped or publicly available data: The legality of web scraping for AI training varies significantly by jurisdiction. The EU's Text and Data Mining exceptions (DSM Directive Articles 3-4) permit mining for research; commercial use requires opt-out mechanisms. The US relies on fair use analysis, with cases like Thomson Reuters v. ROSS Intelligence and New York Times v. OpenAI shaping the boundaries
  • Synthetic data: Data generated by AI models to augment training sets raises novel questions: is synthetic data derived from copyrighted material itself subject to the original copyright?

2. Model Architecture and Algorithms

The design of the neural network, its architecture, hyperparameters, and training methodology, represents a distinct IP layer.

  • Custom architectures: Novel model designs may be patentable, though the bar for software patents varies by jurisdiction. The US, China, and Japan are relatively permissive; the EU and India are more restrictive
  • Trade secrets: Training methodologies, hyperparameter configurations, and data preprocessing pipelines are often best protected as trade secrets rather than patents, provided adequate confidentiality measures are in place
  • Open-source components: Most AI systems rely on open-source frameworks. The licence terms (Apache 2.0, MIT, GPL) determine what restrictions flow through to the resulting system. GPL-licensed components can be particularly problematic for proprietary AI systems

3. Trained Model Weights

Model weights, the numerical parameters learned during training, are the most commercially valuable and legally uncertain layer.

  • Weights are not clearly protectable under copyright in most jurisdictions (they are numerical values, not "creative expression")
  • Trade secret protection is available if the weights are kept confidential, but is lost if the model is deployed in a way that allows extraction
  • Patent protection for specific weight configurations is theoretically possible but practically difficult to enforce
  • The contractual allocation of weight ownership is therefore critical; it may be the only reliable protection

4. AI-Generated Outputs

The outputs of AI systems (text, images, code, predictions, recommendations) present the newest IP challenge.

  • Copyright status: Most jurisdictions require human authorship for copyright protection. The US Copyright Office has consistently denied registration for purely AI-generated works (Zarya of the Dawn, Théâtre D'opéra Spatial). The UK is an exception, assigning copyright in computer-generated works to "the person by whom the arrangements necessary for the creation of the work are undertaken"
  • Commercial implications: If AI-generated outputs cannot be copyrighted, they cannot be exclusively licensed. This affects business models built on AI-generated content, code, or designs
  • Contractual solutions: Agreements should allocate rights to outputs regardless of their copyright status, using contractual mechanisms (assignment, exclusive licence, usage rights) to provide commercial certainty even where statutory IP protection is unavailable

5. Fine-Tuned and Derived Models

When a pre-trained foundation model is fine-tuned on proprietary data, the resulting model contains elements of both the original model and the new training data.

  • Foundation model licences increasingly address fine-tuning rights (Meta's Llama licence, Mistral's various tiers, Google's Gemma terms)
  • The distinction between the base model (licensed) and the fine-tuning delta (potentially owned) must be clearly articulated in agreements
  • Some licences restrict the use of outputs to improve competing models, a provision that can constrain entire product strategies if overlooked

Structuring the AI Development Agreement

With the IP layers identified, the development agreement must address each one explicitly. Ambiguity in AI IP clauses is not just a legal risk; it is a commercial certainty of future dispute.

IP Ownership Allocation

The most critical clause in any AI development agreement is the IP ownership allocation. Three models predominate:

ModelClient OwnsDeveloper OwnsBest For
Client ownershipModel, weights, training pipeline, outputsPre-existing IP onlyBespoke systems where exclusivity is critical
Developer ownership with licencePerpetual licence to use the model and outputsModel, platform, methodologyPlatform-based AI services, SaaS models
Joint ownershipRights to use and modify the modelRights to use and modify the modelCollaborative R&D (use with caution)

A note on joint ownership: Joint IP ownership is frequently proposed as a "fair" compromise but rarely works well in practice. Under English law, joint owners can exploit IP independently without accounting to the other. Under US law, joint owners need each other's consent for exclusive licences. Under Indian law, joint ownership of patents requires agreement for any exploitation. Unless the parties genuinely intend and have planned for shared exploitation, avoid joint ownership.

Background and Foreground IP

Every AI development agreement should clearly distinguish:

  • Background IP: Pre-existing intellectual property that each party brings to the project (the developer's algorithms, the client's training data, licensed third-party components)
  • Foreground IP: New intellectual property created during the project (the trained model, novel algorithms, processed datasets)
  • Sideground IP: IP created by a party during the project but outside its scope (increasingly relevant as developers work on multiple AI projects simultaneously)

Background IP should remain with its original owner, with licences granted only to the extent necessary for the project. Foreground IP allocation is the subject of negotiation. Sideground IP should be addressed to avoid disputes about what was created "within" versus "outside" the project scope.

Key Contractual Provisions

Beyond ownership allocation, the following provisions are essential in AI development agreements:

Data rights and restrictions:

  • Specify what data the developer can access, use, retain, and derive insights from
  • Prohibit use of client data to train models for other clients (a common source of dispute)
  • Address data return and deletion obligations at project completion or termination
  • Include audit rights to verify compliance with data use restrictions

Open-source compliance:

  • Require the developer to disclose all open-source components and their licence terms
  • Prohibit use of copyleft-licensed components (GPL, AGPL) without prior written consent
  • Include indemnities for open-source licence violations
  • Require an open-source Bill of Materials (BOM) for the delivered system

Foundation model terms:

  • Identify all pre-trained models used and their licence terms
  • Ensure the development agreement is consistent with foundation model licences (e.g., Llama's acceptable use policy, commercial use thresholds)
  • Address what happens if a foundation model's licence terms change (increasingly common as providers tighten commercial terms)

Model performance and acceptance:

  • Define measurable acceptance criteria (accuracy, latency, fairness metrics)
  • Specify testing methodology and datasets
  • Address ongoing performance obligations and model drift
  • Include provisions for model retraining and the IP implications of updated versions

Jurisdiction-Specific Considerations

IP protection for AI systems varies significantly across the jurisdictions most relevant to technology companies:

IndiaEuropean UnionUnited States
Copyright & AI worksNo explicit provisions for AI-generated works; Copyright Office has not issued definitive guidanceEU AI Act imposes transparency obligations that may require disclosure of training data, potentially conflicting with trade secret protectionCopyright Office guidance evolving; increasing specificity on what human involvement creates copyrightable output
Patent eligibilityHigher bar under Section 3(k) of the Patents Act; "technical effect" arguments can overcome the exclusionMore restrictive approach to software patents; Database Directive provides sui generis protection for AI training databasesUncertain post-Alice; USPTO has issued guidance on AI-assisted inventions
Trade secret protectionRelies on contract law and common law of confidential information; no dedicated statuteTrade Secrets Directive provides harmonised protection; DSM Directive creates text and data mining exceptions for AI trainingDefend Trade Secrets Act provides federal protection — the most robust statutory framework for AI methodologies
Work-for-hire / ownershipSection 17 applies to employees; contractor arrangements require careful structuringVaries by member state; generally requires explicit contractual assignmentWork-for-hire doctrine well established; scope of "specially commissioned works" is defined by statute

Practical Recommendations

For technology companies entering AI development agreements, whether as client or developer, the following practices reduce risk and create commercial certainty:

  1. Conduct an IP audit before negotiation: Identify every IP input (data, models, code, algorithms) and its ownership status, licence terms, and restrictions before drafting commences
  2. Be specific about model components: "The Client owns all IP in the Deliverables" is insufficient. Specify ownership of each layer: training data, architecture, weights, outputs, and derived models
  3. Address the fine-tuning scenario explicitly: If the project involves fine-tuning a foundation model, clearly articulate who owns the delta and what rights each party has to the combined system
  4. Include robust confidentiality provisions: Given the limitations of statutory IP protection for AI (particularly model weights and training methodologies), contractual confidentiality may be the most effective protection mechanism
  5. Plan for model lifecycle: AI models require ongoing retraining, updating, and maintenance. The initial development agreement should address who owns, and who is responsible for, subsequent versions
  6. Consider regulatory compliance implications: The EU AI Act's transparency requirements may limit the extent to which AI systems can be kept confidential. Build regulatory disclosure obligations into your IP protection strategy

Conclusion

AI development agreements demand a level of IP specificity that traditional technology contracts rarely required. The five distinct layers of IP in any AI system, from training data to derived models, each carry different ownership dynamics, different statutory protections, and different commercial implications. A single generic IP assignment clause cannot address this complexity.

The companies that protect their AI investments most effectively are those that treat IP allocation as a foundational commercial exercise, not a legal afterthought. That means conducting an IP audit before negotiations begin, drafting provisions that address each layer explicitly, and building in the flexibility to accommodate a regulatory landscape that continues to evolve. In a field where the law has not yet caught up with the technology, the contract remains the most reliable safeguard.

Share this article
Rini Mathew

Rini Mathew · Founder, Lawsel Advisory

All insights

Need guidance on commercial contracts?

Book a complimentary 30-minute consultation to discuss your specific requirements with Rini.

Book Free Consultation

30 min · No obligation

Or get insights in your inbox:

Free 30-Min Consultation