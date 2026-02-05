Behold, a common sentiment:

I do think I am one of the world’s top scientists on sex and gender kinks, actually, which is extremely tragic and an indictment of our entire civilization. I’m just an uneducated enthusiast who got into this because I had some questions nobody else had answered yet. My work has plenty of limitations, I barely know any stats, and I haven’t even published a single .pdf.

But unfortunately, the rest of the field isn’t much better. Think I’m exaggerating? Let’s take a look at the competition.

I used Elicit’s systematized review to pull ~500 published papers that focused on fetish, kink, and paraphilia, from vanilla to taboo. I narrowed it down to papers that are similar to my category of work - specifically surveys (ideally that the researcher performs directly and analyzes), looking at relationships or comparisons between groups.

To steelman this investigation, I narrowed further by selecting only top journals from the top 25% of SJR rankings, and weighted them heavily for being published recently (the oldest in my sample was from 2000, but median year was 2016. I didn’t have papers after ‘22, as too-recent is harder to find free online), and looked only at papers with more than 10 citations (median 28).

This left me with 51 papers. Hopefully this is a solid representation of the best, most influential survey fetish work in the field! Surely this must be more reliable than my own work!

I used elicit to generate specific information about each paper, re-ran each full paper individually through chatgpt to verify that specific information, manually checked all disagreements, and spot-checked the final results by reading through many of the papers myself, focusing on high-citation, high-n ones. I focused on simple questions (what was the sample size? Where did they get the participants from), and the AIs had pretty high accuracy about this.

So let’s explore what these papers consist of! How good are their samples?

The median sample size was 404 (total sample ranged from 24 to 8,718). This is about 0.04% of my Big Kink Dataset; my sample is ~2,400x as big as this (now 970,000).

Is my sample size overkill? No. Not at all. It’s vital if you’re trying to detect rare things in data, and especially if you want to check a lot of rare things (to avoid statistical tomfoolery), and double especially if you want to carefully check confounders in the data. For example: Roughly 13k people reported pedophilic inclinations in my data. If we want to split this out by degree of interest, by sex, by gender, by age, by ‘were they abused in childhood’, and maybe checking non-sociopaths for good measure, then we suddenly are dealing with tiny bins in each set. You need an extraordinarily overpowered sample size to be able to do delicate, high-confidence extraction at the margins.

But still - size isn’t everything! My dataset is biased, after all; it’s a convenience sample, which is basically ‘what’s a population that’s easy for the researcher to access’? Surely most high-quality research isn’t simply what’s conveni-

78.4% of papers in the Top 51 were what I considered to be convenience samples - where the researcher went to an already-there group of people (university students, online groups, people walking down the street, etc) and plopped their survey in front of them. I didn’t include as convenience samples stuff like ‘uses a national database’ or ‘went into a prison to hand worksheets to prisoners’ (which seems very inconvenient). I did count use of microtaskers (used at least partially by 5 papers).

51% of these were online convenience samples, where the researchers posted their surveys somewhere online (just like mee!)

12 of the papers had what I considered to be underspecified source, where they said stuff like “participants were drawn from online forums” and left me with a sense that I didn’t actually know how to evaluate the level of selection effects that might be present.

7 offered rewards of some sort to complete the survey, like raffles, college credits, gift cards, etc. This is much lower than I anticipated, and much lower in proportion to the amount of criticisms about this I hear! Actually a surprisingly low number were drawn exclusively from college students at all.

One of my main concerns with fetish research is that they draw from targeted groups, which are much more likely to be biased in a way relevant to the research. Like, if you’re trying to conclude stuff about the personalities of BDSM users by surveying people who go to BDSM clubs vs. normies on the street, you’re almost certainly gonna have confounders with the types of people who like going to clubs at all. And a ton of the studies pulled from exactly this kind of thing - often online, often forums or subreddits.

I don’t use targeted groups in this way, and find that to be maybe the biggest advantage of my research.

59% used targeted samples, where they went directly in to some already-existing cluster of people with the trait they were trying to study. Only 3 studies used neither convenience samples nor targeted groups.

Anonymity is super important when measuring sex stuff; I am by default suspicious of people doing fetish surveys when the users feel they can be identified at all, especially when looking at highly-taboo stuff. So - how many of these surveys featured full anonymity?

I set the bar a bit high for this one, because I imagine users feeling skittish about reassurances of anonymity if they’re aware any information could be traced back to them at all - like, one study had researchers mailing random codes to people that they could use to identify themselves. This was supposed to preserve anonymity, but presumably the participants were aware that somewhere existed a database where those codes were matched to their mailing addresses and legal names.

Only 39.2% were fully anon; another 23.5% were somewhere in a grey area (attempts made at anonymity that were unclear how effective they were, or methodology was left unspecified).

I also think concealing your hypothesis is good - did the researchers take any explicit steps to obscure what they were testing, to the users? Did I think the participants would have a hard time guessing what the researchers were trying to do? I also set a bit of a high bar here, and only 4 studies passed it.

I think overall this is a bit less important, because often studies are more exploratory, or are studying a topic where ‘knowing what the researchers are studying’ is unlikely to shift responses that much.

But okay - let’s do a quick scoring of all these studies. You get points if your sample is non-convenience, non-targeted, fully anonymous, and no reward offered. (Half-point if the answer is ‘maybe’).

I’m also going to add a point if the study was properly randomized - like, the participants were selected randomly from a nationally representative database in some way. Four studies passed this bar (mine did not).

I’m also gonna add half a point if the study has more than 404 participants (the median), and a full point if they have more than 1,500.

There’s the distribution of scores; my Big Kink survey scores a 4.0; four surveys scored higher than mine, and they were these:

Paraphilic Sexual Interests and Sexually Coercive Behavior: A Population-Based Twin Study

Development and psychometric validation of the Sexual Fantasies and Behaviors Inventory.

Exhibitionistic and Voyeuristic Behavior in a Swedish National Population Survey

Sexual Arousal and Sexually Explicit Media (SEM): Comparing Patterns of Sexual Arousal to SEM and Sexual Self-Evaluations and Satisfaction Across Gender and Sexual Orientation

This is mostly looking at the quality of sample in terms of freedom from selection bias - is it big, anon, random, nontargeted, etc? And by these standards I think mine likely scores in the top 10% (of most-cited papers in top journals that are doing the same category of research as I am).

I do want to point out that I am possibly being a little unfair by focusing on papers looking at correlations, relationships, factor development, etc (like, ‘do women into gangbangs have better or worse relationships?). I am not looking at papers attempting to establish good base rates (like, what % of the world is aroused by bestiality?).

This is because I don’t think my data is a super reliable source for establishing base rates. I think it’s still helpful to estimate base rates - my massive sample size and huge amount of questions give me a lot of ability to control for things that tend to be correlated with the way my population deviates from normal - but the more you fuck around with data in that way, the less robust it gets, the more assumptions you have to make. For example, I can weight my data to be representative of the national population, but it seems likely that e.g. people above 70 in my dataset are more weird than an avg 70 year old, while the 20 year old who takes my survey is likely closer in weirdness to the avg 20 year old, which means we might find odd correlations if we’re checking age. This is partly why I cap most of my analyses at 35yo!

I do sometimes quote base rates about my data, and I think many of the times I do this it is a mistake. I think I make this mistake usually because conversations go like: ‘in your survey, do childhood experiences correlate with fetishes?’ ‘yes, it’s nuanced, etc.’ ‘oh - wow, how many people are into xyz?’ and I just say the straight number from my survey, without caveating that this is the raw number and I did not attempt to weight it. The caveats get cached as background in my brain and I fail to make sure it’s part of my communication regularly.

But the vast majority of my published writeups (as opposed to talking about it live) involve looking at relations between elements, not about base rates - and relationships are more robust than base rates. This is why I’m comparing my research to other research that is also working on relationships, not base rates.

But okay - still, my sample is a random internet survey that went viral on tiktok and is now just ranked super high in google search or whatever. Maybe other published academics with online samples still understand better where their samples come from, and have a better grasp on how much to trust the data somehow. How do I know my survey hasn’t been polluted by mass goofery?

Well - what would we expect to see if people goofed off in my survey? It’s easy to spot and remove some categories of this - people who answer very fast, or pick extremes for all the questions, or with inconsistent response sets, or who just explicitly tell me they were dishonest, or who say they’re 69 years old, etc.

But trolling and honesty are predictable, and I’ve done a lot of work experimenting with how you can impact and measure honesty in surveys; this isn’t just a ‘people are wild and we dunno’ - we kind of do know, and you can get whiffs what fuckery is afoot.

If people goofed en masse, I should expect to see my survey data deviate from established findings. If responders aren’t taking my survey seriously, it would be weird to see their data somehow still strongly replicating a bunch of established research, right?

Well, let’s spot check some stuff with established research! In my data (this is raw, not controlling for anything), I find:

Women are more neurotic (d=.6), more agreeable (d=.06) Openness correlates with IQ (r=0.2) Neuroticism correlates with number of mental illnesses people reported (r=.29) Childhood abuse of various sorts correlates with mental illness (r=.3, .35 ish) Narcissism anticorrelates with agreeableness (r=.32) and extroversion (-.18) Childhood abuse correlates with neuroticism (r=.2) Extraversion and sexual partner count (.17) Older people are more agreeable (.13) and more conscientious (.16) and less neurotic (.-07) Taller people report being smarter (.1) and more extroverted (.06) Fathers being present in the home causes later onset of sex (.11) Later birth order, lower IQ (-.07)

This is just a quick snippet; some are a bit off, but likely depends a lot on exact wording, did we control for the right variables, etc. I also used only two items from each of the OCEAN variables (total of 10 questions) which is lower fidelity. Also there’s some things that don’t seem to replicate on a quick first pass of my data (like self-reported narcissism doesn’t correlate with bodycount), but that might if I look more closely into how the study was done and does it match up with the thing I’m measuring? But overall, roughly speaking, I have a lot of replications in there. It seems pretty suspicious for lots of goofball trolls online to manage to answer in ways that are similar to the ways people answer ‘official’ online surveys by academics.

But what about analysis? How am I doing?

I’m way less sure about this one. I think good analysis is harder than sheer survey design for a layman to figure out; for example, early on I learned how correlations worked, and then went and made a bunch of claims about correlations (or lack thereof) in my data, failing to notice that this only works if your data is close to normally distributed! There’s lots of ways to accidentally fuck up stats, and so I’ve tried to be very conservative with my statistical analysis. I rarely even run regressions - it’s simple, and I can do it in code, but I do not feel that I deeply understand what it’s doing to the data well enough that I can be sure I won’t be accidentally making a mistake somewhere. So I usually just pull slices from my data instead, which is very simple and straightforward.

It’s also hard to compare statistical methods of my dataset vs others, because many statistical best practices are in place to handle limitations of smaller sample sizes. Like, it’s great to include confidence intervals in your graphs, but when I generate graphs half the time the CI is so tiny it’s not even visible, so I sometimes just don’t even bother. Or, I get to carefully investigate my data for lots of things that might be confounders in a way that other surveys simply cannot do. It would be unfair to penalize me for not using CI, and unfair to penalize other researchers for not investigating all confounders; it’s just not really a good match for our data!

I also just don’t really like doing analysis as much. I can joyfully build surveys until the world ends, but actually sorting through the data properly feels like eating vegetables. I might be top-tier at survey design, but I’m probably in bottom 10% of statisticians. This probably leads to worse work overall.

(I do have a sneaking suspicion that the people writing many of these papers I’m reading don’t actually know how their statistics work, either, though, based on how frequently people find basic statistical errors in published stuff.)

In an ideal world, my research would be mid at best, cute at worst. I think it’s a pretty sad state of affairs that a completely uneducated amateur starts running surveys on sex stuff cause she’s curious, and ends up with a dataset that can easily compete with the top stuff in top journals. This is an indictment on the field. It’s not that I’m good, it’s that everyone else is bad.

I think most of the issues with the field are not with the individual researchers. They might be looking at more biased, smaller samples than mine, but I think this is still a reasonable thing to do given limited funding and the tyranny of overprotective IRBs! I think any evidence is better than none; finding a small population and being able to describe it gives us another clue in our arsenal to piecing the world together, and this is valuable. Limitations do not make data useless! Biased data is still data!

The primary issue seems to be that the social science incentive structure has gone to fucking loony toons town. I’m a total outsider, and have been sniffing around getting some of my work published, and I am constantly dumbfounded in these conversations. Academics have a wide variety of advice for me, but almost none of it has to do with improving my work itself - it’s all stuff like “ok figure out which peers will review your paper and then make sure you cite them or else they will get offended and reject it’, or ‘way more people would take you seriously if you put that blog post into .pdf format’ or ‘I think journals might reject you for having a mononym ‘aella’, can you make up a last name?’ or ‘maybe you should deliberately leave a mistake in your paper so the peers that review you can feel special for finding it and not terrorize you with stupid changes that make your paper worse’ or ‘they probably won’t accept a paper that’s entirely novel and groundbreaking, it has to be building on previous research, you know, part of the existing conversation’. And even when they do have advice about the work itself, it’s specifically about what’s ‘in’ right now - “Oh, journals really like it if you run two separate studies to compare groups. Yes I know you have data for both groups in the same survey, but to get published you should copypaste the questions and run them again, the url just has to be different.”

(I AM NOT EXAGGERATING FOR ANY OF THIS, ALL OF THE ABOVE IS NEAR-VERBATIM WHAT PUBLISHED ACADEMICS HAVE TOLD ME, AHHHHH)

Growing up, academia seemed like a lofty goal. I thought that was where serious people went to do serious work, and if I entered the cathedral I’d be amazed at how skilled and knowledgeable everyone was, and focused on figuring out the truth, and doing science, and glowing with a glow of smartness. But instead I’ve peeked inside the door and found everyone standing in a circle with their thumb inside the butt of the person in front of them, creating a thumb-in-butt circle.

And the academics I talk to are very self-aware about this. They say “Yeah, i’m in a thumb-butt circle. It’s not great.” and im like “why not take your thumb out of the butt” and they say “well then gary back there would take his thumb out of my butt and unfortunately that thumb is important for my job”.

I go ‘ok well have fun’ and pull my head out and go back to my blog and to twitter, where I find hundreds of comments saying my work is not comparable to real research, which has standards and peer review and no selection bias. They point towards the glowing cathedral and say ‘why aren’t you in there if you’re so great’. and just. I can’t believe we’re in a world where something so broken has so much prestige, that people don’t know how broken it is. My stupid survey is at least a slightly less stupid survey than most, at least it’s something. It’s an inadequate, imperfect block on top of a pile of debris.

In a functioning system, we probably would see the field of sex research collecting much bigger sample sizes, because they’d have put more effort into figuring out how to make surveys more fun (which is a novel concept, not ‘part of an existing conversation’), how to make their scales more efficient (instead of highly redundant items held lofty by how well they correlate), how to increase survey completion rates (narrowing questions, getting rid of unnecessary consent forms). They would be able to ask more direct, explicit questions without the IRB clamping down because they might be doing emotional harm to their subjects. In a functioning system, all the internet accusations about my work would be true; I would be doing useless work compared to what’s already been done. I hope one day we get there.