![]()
Archived chats under legal hold
OpenAI said in a message on its website that the 20 million chats include a random sample of ChatGPT conversations from December 2022 to November 2024 and do not include chats from business customers.
“We presented a number of privacy-preserving options to The Times, including targeted searches on samples (For exampleto search for chats that may have included the text of a New York Times article so that they only received conversations relevant to their claims), as well as classifying high-level data on how ChatGPT was used in the sample. These were rejected by The Times, OpenAI said.
Chats are stored in a secure system that is “protected under legal hold, meaning it cannot be accessed or used for purposes other than to meet legal obligations,” OpenAI said. The NYT “will be legally bound not to make any data public outside of the court process at this time,” and OpenAI said it would fight any efforts to make user conversations public.
An October 30 NYT filing accused OpenAI of disregarding prior agreements, saying “its conduct in this case led to its refusal to produce even a small sample of billions of model outputs.” The filing continued:
Immediate production of output log samples is required to remain on track for the February 26, 2026, discovery deadline. OpenAI’s proposal on Plaintiff’s behalf to run searches on this small subset of its model outputs is as inefficient as it is inadequate to allow Plaintiffs to analyze how “real world” users interact with a core product at the center of this lawsuit. Plaintiffs cannot reasonably conduct expert analysis of how OpenAI’s models function in its original consumer-facing product, how retrieval augmented generation (“RAG”) functions to deliver news content, how consumers interact with that product, and how the frequency of hallucinations occurs without access to the model outputs.
OpenAI said the NYT’s search requests were initially limited to logs “related to Times content” and that it “is working to satisfy those requests by sampling conversation logs. At the end of that process, the News plaintiffs filed a motion with a new demand: Instead of finding and producing logs ‘related to Times content,’ OpenAI must hand over the entire 20-million-log sample ‘via hard drive.'”