AI Fiction in the Wild
This website hosts anonymized ChatGPT-user conversations where users requested some form of fiction generation—including stories, novels, scripts, roleplay, hypothetical scenarios, erotic imaginings, and more. The data is drawn from WildChat and was collected voluntarily and with users’ consent between 2023 and 2024. The models were powered by GPT-3.5 and GPT-4.
We organize conversations by estimated user to show common patterns, including story permutations and revisions. You can scroll down to see an example.
Content warning: This dataset contains conversations that are violent, offensive, and sexually graphic. You can hide content that has been tagged "explicit" or "toxic."" But not all toxic, offensive, or sexually graphic conversations are caught by these filters.
User privacy and demographics
The users featured in the WildChat data (Zhao, et al.) consented to have their data used for research purposes. You can read their paper for more details. However, ethical and privacy considerations remain. We might wonder how many of these users actually read the agreement or fully understood the implications of their decision.
Additionally, though the conversations are anonymized — including only a hashed IP address and, based on this IP, an estimated geographic state or region — and though users agreed to share their chats publicly, some users still shared personal and sensitive information. The WildChat authors implemented automated procedures to anonymize personally identifiable information (e.g., personal names, email addresses), but subsequent studies showed that sensitive data remained (Antoniak et al.). These authors notified the WildChat creators, who took steps to scrub this additional information.
In choosing to study and release this data, we acknowledge a trade-off of concerns. While we believe the potential risks and harms for the WildChat users have been sufficiently mitigated, some may still exist. We believe this analysis is justified by users’ explicit consent and by the urgency of helping researchers and the public understand what people are really doing with these tools.
Because the WildChat users are anonymous, we also do not have a clear sense of their demographic breakdown or representativeness. The chatbots were hosted on Hugging Face — a community for sharing machine learning models and datasets — so we might guess, as the original creators do, that these individuals skew more technically literate and “online” than your average user. These instances of ChatGPT were also free, did not require a login or account, and did not have the same rate limits as a regular account. This might mean that users are from lower-income backgrounds, from countries where ChatGPT is banned, or that they were more invested in pushing the boundaries of the model with explicit or prohibited content. WildChat is not a representative sample of all ChatGPT users, with recent research showing a particular dominance of power users compared to a random sample of Microsoft Bing Copilot user chat logs (Hicke and Tomlinson).