
The rise of generative artificial intelligence (hereinafter “AI”) calls for its due recognition in copyright laws. The Indian legal system does not recognise AI systems as authors yet. AI systems are considered tools for human use, and the output generated is not attributed to the AI itself. However, the narrative may shift when AI systems reach the capability of generating “original” works rather than regurgitations of pre-existing works. The method of training AI requires a database of pre-existing works that are often protected but used without the author’s permission. However, copyright law permits authors to borrow or take inspiration from existing works despite the copyright in them.[1] It is a fact that all works are influenced by pre-existing works, and therefore, certain leeway must be given to AI systems to temporarily reproduce or consume pre-existing works as a form of learning.
The database of copyrighted work used for machine learning is merely a tool that is necessary for the AI to identify patterns. AI-generated outputs that are a result of using those patterns, similar to an artist using an art style, should be permissible because the copyrighted work is not the input itself. This kind of non-expressive, transformative use does not infringe on the rights of authors, and is protected under Section 52 of the Indian Copyright Act, 1957 (hereinafter “ICA”) which laid down the doctrine of fair use/fair dealing. The same should extend to AI systems because it differs from AI-generated outputs that are simply compilations of existing copyrighted material, which are/should not be protected from copyright infringement. The law must create a distinction between ideas and their expression – protecting the latter, not the former. Recognising this distinction is crucial in adapting copyright law to the realities of AI innovation and ensuring that progress is not held back by overly broad interpretations of infringement.
Defining Originality in Copyright Law
Originality is sine qua non for copyright protection.[2] While the ICA does not explicitly define what “original” means, courts have attempted to interpret it through various legal doctrines. The doctrine of ‘Sweat of the Brow’ was adopted by the Indian legal system, where copyright protection was granted on the basis of skill, labour and effort involved in creating a work, rather than its originality or creativity.[3] The Supreme Court in Mishra Bandhu v. Shivratan held that novel research or innovation are not prerequisites for copyright protection. Effectively, compilations like gazettes, maps, and encyclopaedias are eligible for copyright.[4]
The position in India has since shifted significantly from the approach of English courts, and has raised its standard of copyright protection. In the landmark judgment given in Eastern Book Company v. D.B. Modak, the Supreme Court discarded the ‘Sweat of the Brow’ doctrine and shifted to a ‘Modicum of Creativity’ doctrine.[5] In this case, a legal publisher (SCC) alleged that their copyright was being infringed by parties distributing software containing judgments edited and formatted by SCC. The modifications made by SCC to their work included but are not limited to elements like cross-references, headnotes, selected extracts from judgments, etc. The court introduced a concept of “flavour of minimum creativity.”[6] It stated that copyright protection has a threshold of originality which requires some degree of creative input. Novelty or innovation cannot always be a conclusive degree, and thus a basic level of creative judgment is necessary. Here, SCC is not claiming copyright on judicial orders/judgments that are already available on the public domain. The modifications SCC made involved the application of legal expertise and discretion, and therefore, met the requisite standard of creativity to qualify for copyright protection.
Subsequent rulings have reaffirmed this position, expressly denying copyright to mere compilations that lack creative input.[7] Therefore, the principle holds that copyright protects content originating from an author that involves some independent expression, and not mere replication. This approach avoids setting an excessively high threshold for originality, while still ensuring that copyright law protects genuine contributions to a work without extending protection to uncreative efforts.
The Idea-Expression Dichotomy
The doctrine of idea-expression dichotomy has developed significantly in Indian jurisprudence. The principle holds that copyright protection does not extend to mere ideas per se, it only protects the author’s expression of those ideas. Article 9(2) of the Agreement on Trade-Related Aspects of Intellectual Property Rights (TRIPs) affirms this:
“Copyright protection shall extend to expressions and not to ideas, procedures, methods of operation or mathematical concepts as such.”[8]
The Supreme Court in R.G. Anand v. Delux Films stated that copyright protection does not extend to mere ideas.[9] Therefore, taking inspiration or elements from a work cannot constitute infringement, so long as it is independent of the original expression. This doctrine serves a similar purpose as Modicum of Creativity – both ensure that the threshold of protection is not too high. By excluding ideas from copyright protection, the doctrine attempts to prevent monopolisation because no idea can truly be original. Every idea originates from certain pre-existing notions, and it would be a daunting task to ascertain and attribute the rightful ownership of an idea. Thus, it exists in the public domain, allowing creativity and innovation to be built upon freely.
With respect to generative AI and the output it generates, one must determine whether the AI is presenting compilations of pre-existing expression or compiling ideas to create its own distinct expression, the latter being permitted under copyright law. For instance, if an AI learns from existing works that a fantasy story involves a wizard school, dark villain, Latin-sounding spells, etc., it is only learning ideas and tropes of a story. If the output it generates is a new story that draws on those ideas but with novel characters and plot lines, it is analogous to an inspired human author and should not be deemed as infringement. AI’s ability to emulate a genre of stories is lawful because “fantasy” is only an idea, not protectable expression in itself.
The decision in Barbara Taylor Bradford v. Sahara Media Entertainment recognised that a work which is taken, and then used for producing a subsequent work that is so changed and muted as to make it transformed, and a different work altogether, would not generate an actionable claim for the owner.[10] Copyright holders have rights to control reproduction, adaptation, publication, etc., of the expressive elements of their work. However, prohibition from accessing the work itself would be perverse and contrary to the intent of copyright protection. For extraction of ideas, it is necessary for the AI system to access the whole copyrighted expression, and even store it, as long as it is not exposed to the public in its expressive form.[11] Provided the appropriate precautions are in place to avoid non-transformative outputs, the idea-expression dichotomy draws a line between reusing protected expression and using ideas within a work to create something new, which is lawful.
AI Training as a Non-Expressive, Transformative Use
It has been established that generative AI can be capable of producing “original” outputs, provided that it has access to a database of pre-existing works. For AI outputs to be protected from infringement claims under copyright law, the outputs must qualify the threshold for originality and be a novel, transformative expression of an idea. To facilitate this lawful process, a distinction must be drawn between training an AI on copyrighted works versus AI consuming or exploiting those works for their expressive value.
The AI training process is inherently non-expressive. When an AI system is given a database of thousands of novels, it is not “enjoying” the creative expression in the way a reader would. Rather, it extracts patterns, structures, and correlations. For example, in ANI v. OpenAI, the plaintiff argued infringement occurs at both training and output stages, claiming that even the vectorization (tokenization) of text was an infringing act.[12] Vectorization is a simple process of converting data into numerical vectors that machines can understand and process mathematically, specifically for machine learning and pattern recognition. This act cannot be construed as infringement – it is a non-expressive use of a work that is not reproduced in its original expression.
Machine learning is similar to the human learning process.[13] Content is not consumed for its story, style, or aesthetic appeal. Instead, it serves as raw data from which the AI extracts knowledge about language, styles, or other features. AI’s internal processes are not accessible to the public, and is not a reproduction of the original work. This fundamental difference in purpose categorises AI training as a transformative use of pre-existing works. In copyright law, the use is deemed “transformative” if it employs the original work for a new and different purpose, altering the context or function of the work. Taking an example of a story: a story is written to be read by humans for entertainment or education (an expressive use). An AI copies that story into its training database, for the purpose of analysing linguistic patterns or narrative structures to then generate its own novel story (a non-expressive use). The Delhi High Court in University of Cambridge v. B.D. Bhandari held that copying portions of textbooks to create a guidebook was for a “substantially different purpose” than the original textbook’s purpose, making it a transformative use that does not amount to infringement.[14]
The risk of emerging AI systems regurgitating large portions of the original work without any transformative value exists. However, technical solutions and reasonable care can mitigate this risk. For example, AI developers can implement filters to prevent large verbatim outputs or to detect and avoid reproducing any training data. This is referred to as “overlap filtering” or “deduplication” of training data to reduce memorisation.
Fair Dealing Principle
There exists in the black-letter law a means to accommodate and protect AI training from copyright infringement. Section 52(1) of the ICA laid down several acts that “shall not constitute an infringement of copyright.”[15] Section 52(1)(a) specifically permits fair dealing with any work (excluding computer programs) for the purposes of:
- private or personal use, including research,
- criticism or review, whether of that work or another work, and
- the reporting of current events and current affairs, including the reporting of a lecture delivered in public.[16]
The ICA is yet to include an explicit exception in the statute for using works to train AI. The most plausible approach is to treat AI training as a form of private use. The intent of this provision is to permit individuals to use copyrighted works for personal enlightenment, analysis or scholarship, without commercial exploitation of the work’s expression. AI systems follow the same process. Training a generative AI system is akin to researching – the process of training involves systematic analysis of a large database to discover patterns and information, like a data-driven research task. In ANI v. OpenAI, Prof. (Dr.) Arul Scaria suggested that since AI systems assist humans with learning and research, storing works for such purposes should be deemed permissible.[17] The court must interpret “private use” liberally to include AI training by conducting research, and not literally as the end-user’s private use of a work. The fair dealing principle does not explicitly limit “research” to non-commercial research. It could include a company’s internal use of data for development, as long as the use is not public-facing.
A use is fair if training AI does not result in the creation of a substitute for the original work, which was held in Google LLC v. Oracle Am., Inc.[18] AI training does not deprive the author of a sale or use of their work, neither does it generate a competing product by reproducing large portions of the original work. Section 52 is deemed exhaustive, yet a liberal interpretation can still include AI training under the provision of “private use.” Inclusion either requires willingness to adapt liberal interpretations, or add an entirely new exception, given India’s broader inclination towards technological reliance. Ultimately, a purposive, principle-based reding of Section 52 strongly favours protecting AI training under the fair dealing principle.
Conclusion
As AI transforms how creativity and knowledge are produced, copyright laws must evolve to distinguish between uses that infringe upon an author’s rights and uses that promote novel innovations. By interpreting various doctrines and fair dealing principle under Section 52 of the ICA, copyright law shall accommodate AI training as a non-expressive, transformative use of copyrighted works as permissible. Indian jurisprudence has long acknowledged that ideas are not protected by copyright. Courts have shown flexibility in recognizing transformative uses and prioritising access to knowledge, logically drawing AI training as a lawful act. From a policy perspective, embracing AI training as fair dealing would promote technological evolution that is compliant with copyright laws.
This stance does not shield negligent AI systems or outputs that are mere compilations of pre-existing works. Instead, it encourages responsible AI development where AI systems learn from their database to create novel outputs. By condemning the former and protecting the latter, the law incentivises AI developers to implement safeguards to ensure compliance and protection of authors’ rights. To cement this further, copyright laws must be either interpreted liberally to apply to AI systems or include new provisions directed towards adapting to technological evolution. At present, relying on the spirit of Section 52 and the larger objectives of copyright law are suitable for protecting AI training against infringement claims.
[1] A. Swetha Meenal and Sayantan Kumar, Keeping Up with the Machines – Can Copyright Accommodate Transformative Use in the Age of Artificial Intelligence, 11, IJIPL, 260 (2020)
[2] Dixit Parakh, Copyright Protection in India: Doctrine of Originality, BoudhikiP (Jan. 10, 2025), https://www.boudhikip.com/copyright-protection-in-india-doctrine-of-originality/
[3] Dixit Parakh, Copyright Protection in India: Doctrine of Originality, BoudhikiP (Jan. 10, 2025), https://www.boudhikip.com/copyright-protection-in-india-doctrine-of-originality/
[4] Mishra Bandhu Karyalaya v. Shivratanlal Koshal, 1969 SCC OnLine MP 35
[5] Shuchi Mehta, Analysis of Doctrines: Sweat of Brow, Modicum of Creativity & Originality in Copyright, IndiaLaw LLP Blog (Jan. 8, 2015), https://www.indialaw.in/blog/law/analysis-of-doctrines-sweat-of-brow-modicum-of-creativity-originality-in-copyright/.
[6] Eastern Book Company v. D.B. Modak, (2008) 1 SCC 1
[7] Reckeweg & Co. Gmbh. v. Adven Biotech (P) Ltd., 2008 SCC OnLine Del 1741
[8] Agreement on Trade-Related Aspects of Intellectual Property Rights art. 9(2), Apr. 15, 1994, Marrakesh Agreement Establishing the World Trade Organization, Annex 1C, 1869 U.N.T.S. 299.
[9] R.G. Anand v. Delux Films, (1978) 4 SCC 118
[10] Barbara Taylor Bradford v. Sahara Media Entertainment Ltd., 2003 SCC OnLine Cal 323
[11] Akshat Agarwal, Indian Copyright Law and Generative AI – Part 2: Transformative and Extractive Use, IPRMENTLAW (May 29, 2024), https://iprmentlaw.com/2024/05/29/indian-copyright-law-and-generative-ai-part-2-transformative-and-extractive-use/.
[12] Ani Media (P) Ltd. v. Open AI Inc, 2024 SCC OnLine Del 8120
[13] Ibid.
[14] University of Cambridge v. B.D. Bhandari, 2011 SCC OnLine Del 3215
[15] The Copyright Act, No. 14 of 1957, § 52(1), INDIA CODE (2024).
[16] Ibid.
[17] Ani Media (P) Ltd. v. Open AI Inc, 2024 SCC OnLine Del 8120
[18] Google Llc v. Oracle America, 2021 SCC OnLine US SC 64
Author: Mayukhi Pillai
