AI Fiction in the Wild

This website hosts real-world, anonymized AI-user conversations, in which users requested some form of fiction generation—including stories, novels, scripts, roleplay, hypothetical scenarios, erotic imaginings, and more. These conversations were collected with users’ consent by researchers between 2023 and 2024. The models were powered by GPT-3.5 and GPT-4.

We organize conversations by estimated user to show common patterns, including story permutations and revisions.

Content warning: This dataset contains conversations that are explicit, offensive, and sexually graphic. You can hide content that has been tagged explicit or toxic. Be aware that other offensive material may not be caught by these filters.

User privacy and demographics

Because the users are anonymized, we don’t know much about them or how representative they are. The AI models were hosted on HuggingFace, were freely available, did not require a login or account, and did not have rate limits. This might mean that users are from lower-income backgrounds, from countries where ChatGPT is banned, or that these users were more invested in pushing the boundaries of the model with explicit or prohibited content (not connected to a real account). These users may also skew more technically literate and “online” than the average ChatGPT user due to their navigation to HuggingFace, a community for machine learning models and data.

While users consented to share their data, there are still many ethical and privacy questions to consider when analyzing this data. We might wonder how many of these users actually read the agreement or fully understood the implications of their decision. Many users shared personal or sensitive information in their chats despite agreeing to share them publicly. See Antoniak et al. for more.

Search conversations →