Anthropic case: does the training of LLMs involve unauthorised copyright use?

Insights27 June 2025

A US court case involving the process of training Claude, an LLM system owned by Anthropic, is the first decision to assess whether the use of copyright material to train LLMs is considered ‘fair use’ under US laws. What implications could it have for Australia?

Artists, creatives and owners of copyright are increasingly bringing litigation against companies that develop, train and operate Generative AI (GenAI) platforms and their underlying Large Language Models (LLMs). 

A common thread to these cases is the assertion that the training of the LLMs involves unauthorised use of original copyright works in a way which constitutes an infringement of those copyright works.

On 25 June 2025, the United States District Court in the Northern District of California handed down a decision in the class action case of Bartz, Graeber, Johnson and others v Anthropic PBC (No. C24-05417 WHA).

The case is seminal for the following reasons:

  • It highlights the technological processes of how LLMs can train on copyright material – in this instance, the copyright material is uploaded into a central database and then digitally copied to train the LLMs.
  • It is the first decision to assess whether the use of copyright material to train LLMs is considered ‘fair use’ under US laws. ‘Fair use’ is a defence to a copyright infringement claim.
  • Many jurisdictions have similar or comparable concepts of a ‘fair use’ doctrine under their copyright laws. The case thus provides useful context and guidance for considering how various copyright laws outside of the United States may apply to the training of LLMs. 

The case

A class action lawsuit was launched in September 2024 against Anthropic PBC by three lead plaintiff authors (Authors). The Authors asserted that Anthropic infringed their copyright subsisting in various books and other literary works to train Anthropic’s proprietary GenAI system, Claude. Copies of these books were included in a dataset of purchased and pirated books that were housed in a centralised repository. Some of these books were then used to train the underlying Claude LLMs.

‘Claude’ refers to the family of LLMs and an AI assistant developed by Anthropic. It is typically used for text generation, code explanation and generation, summarisation and data analysis. 

Claude operated in these circumstances as follows (Process):

  • Anthropic would collect vast swathes of works and upload them into a centralised library.
  • Each work was copied from the central library to create a working copy for the training set, before being ‘cleaned up’ to remove lower value and repeating text (such as headers, footers, page numbers).
  • Each ‘clean’ copy was then translated into a ‘tokenised’ copy with simpler forms, and these tokens were repeatedly copied as part of training. Copies were used to iteratively map statistical relationships between every text-fragment and every sequence of text-fragments so that a completed LLM could receive new text inputs and return new text outputs as if it were a human reading prompts and writing responses. This mapping was so complete it was likened to ‘memorising’ the works.
  • Each fully trained LLM then retained ‘compressed’ copies of the works it had trained upon. 

Notably, Anthropic moved to allow an early motion for summary judgment on the issue of ‘fair use’ only and so the judgment effectively bypasses any discussion about whether the threshold issue of copyright infringement had been made out. 

In considering the ‘fair use’ issue, the judge found that, among other things, making copies of the works for the purposes of training LLMs to receive text inputs and return text outputs was ‘fair use’ under the US Copyright Law.[1] While on balance, the four primary factors comprising the assessment of ‘fair use’ pointed towards fair use, most significantly, Judge Alsup found that the purpose and character of using the copyright works to train LLMs to generate new text was ‘quintessentially transformative’. 

Essentially:

  • Judge Alsup did not accept that training the LLMs was akin to using works to train a person to read and write and that the Authors should therefore be able to exclude Anthropic from this use. Rather, the Authors cannot rightly exclude anyone from using their works for training or learning as learning, reading and writing from texts is a normal creative process. Users might need to pay for this access initially, but to make someone pay specifically each time (as if it were infringement) they use, read, recall from memory or later draw upon the work, would be untenable.
  • Anthropic’s LLMs did not reproduce any given work’s creative elements or an identifiable expressive style (notwithstanding whether this is copyrightable) to the public. Claude might output grammar, composition or a style distilled off many works, but this would be no different to an individual doing the same by reading classic texts because they liked the expression, memorised them and then emulated a blend of the writing.
  • Judge Alsup accepted that as Anthropic’s LLMs were trained on the copyright works to ‘make something different’ and not to ‘race ahead, replicate or supplant’ the works, the use was transformative. The reproduction of the works was thus necessary for the transformative use.

Key takeaways

How LLMs training is conceived: Judge Alsup’s reasoning conceptualises the training of LLMs as effectively fresh, iterative and a cumulative learning activity that makes transformative uses of copies of underlying copyright work as opposed to unoriginal wholesale copying/reproduction of the underlying original copyright works. 

How copyright material can be used to train LLMs: The decision sheds light on the technological processes that can be used by LLMs to train using copyright material. The precise technological processes that are used are highly relevant factors to be taken into account when assessing any copyright infringement claims. 

A note of caution: Companies would be mistaken for concluding that the decision is a free pass to use copyright works to train LLMs on the basis that all training of LLMs constitutes ‘fair use’. In this case, Claude had not published the copyright works or outputs to end users. Further, Anthropic otherwise remained liable for building out their centralised library by acquiring pirated and unauthorised copies of the works from pirated sources, to train their LLMs. Judge Alsup emphasised that this case would be markedly different if these things did not occur. 

The Australian context: If this case were brought in Australia under the Copyright Act 1968 (Cth), a court would likely have to consider at least two key issues: (1) whether the Process described above constitutes a ‘reproduction’ of the underlying copyright material, or a substantial part of that material, so as to constitute an infringement of copyright; and (2) whether there is any available defence. 

On the first issue, given the Process seemingly entails the making of a digital copy of the underlying copyright material, on the face of it, it appears the a reproduction has been made. 

On the second issue, a comparable concept of ‘fair use’ is the doctrine of ‘fair dealing’. In contrast to ‘fair use’, ‘fair dealing’ under the Copyright Act 1968 (Cth) is exhaustively set out and prescriptive Given that the various fair dealing defences are tightly defined, there is a significant question mark as to whether or not they would apply to this factual scenario.

To keep updated with all things AI, intellectual property and the law get in contact with our AI and emerging technology specialists at Hall & Wilcox.


[1] 17 U.S.C. § 107 (section 107 of Title 17 of the United States Code).

Contact

Hall & Wilcox acknowledges the Traditional Custodians of the land, sea and waters on which we work, live and engage. We pay our respects to Elders past, present and emerging.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of service apply.