Embedding Explorer
What am I looking at?
Each dot is one of ~193,000 fiction-related ChatGPT conversations from the WildChat dataset. Conversations with similar content appear close together.
Each conversation's first user message was embedded with all-MiniLM-L6-v2 (Sentence Transformers, 384 dims), reduced to 50 dimensions with PCA, then projected to 2D with UMAP (n_neighbors=30, min_dist=0.1, cosine metric).
Topic labels come from two-level HDBSCAN clustering on the 50-dim PCA embeddings (not the 2D coords). Coarse clusters (min 500 pts) define broad topics; fine clusters (min 50 pts) define subtopics. Each cluster was labeled by GPT-4o from a sample of 20 representative prompts.
Loading sample...
Click to pin & view details