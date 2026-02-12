My fetish survey is now at 970k people, and I have desperately been wanting to share a raw version of the dataset with you guys so you can take a look. It’s on Zenodo here (https://doi.org/10.5281/zenodo.18625249), if you want to cite it. The raw folder is here. Even if you don’t know anything about data analysis, you can probably just load it into an AI like claude code, and simply ask it questions about the data. Ask it to make some cool charts about anything you’re curious about. Included are files with the back-end survey structure and a bit of info about column changes, which can help for context/figuring out what columns mean.

Releasing this has taken me a while, because fetishes are sensitive info and I want to make extra super duper double dog dare you sure that responses are anonymized.

So: I dropped 98.4% of responses into a very representative-to-the-population sample (age, gender, politics, and a few other things), leaving only a tiny fraction of the original data. I aggressively binned a bunch of the columns, and took a baseball bat to over half of the demographic info. I then added noise to the data in a few different layers, using different methods. I’m unfortunately not going to tell you very much about this (to protect the anonymization), but I came begging at the doors of various stats nerds to help me, and now it’s modified to a degree that I feel good about it. If you see a row and think it matches somebody specific, you are almost certainly wrong.

The correlations are mostly preserved, but overall, the noise means that correlations you find will be smaller than the correlation in the original data. In general, the correlations are reduced around 25%; e.g., a r=0.5 correlation was likely ~0.62 in the original data, and almost all correlations are reduced in a range between 30-15%.

Somewhat ironically, despite all the anonymizing, the aggressively representative sampling means that the base rates in this are likely closer to reality than my normal dataset I usually work with.

I did limit the sample from the very beginning to be ages 14-32, and responses from western countries only (US/canada and Europe). I removed some information about some very extreme fetishes. Most of the data is pretty robust, but I wouldn’t be surprised if there were a few errors in there; if you find something extremely counterintuitive, there might be an issue with the data! Maybe I flipped a sign somewhere when cleaning it. Let me know and I’ll doublecheck anything weird.

I hope you like it, I wanna see if you find anything cool!