You don't need a perfectly random sample for useful data, jfc

and other critiques

Dec 07, 2022

This is maybe a slightly incensed post, so forgive me in advance, but you will not believe the amount of comments I get that

a) assume I am claiming more than I am based on my data

I don’t claim to be perfect here, but I do try to be very careful when making strong claims about what my data is saying, particularly in my smaller surveys among subpopulations. Generally I feel most comfortable simply reporting the results of the survey, along with stuff like the sample size, the fact that it’s self reported, who took the survey, etc.

Maybe I’m wrong here, but I generally make the assumption that “self reported internet survey, n=253” conveys that I am being open about the limitations of the survey and that any implications should be considered in the light of those limitations.

b) assuming imperfect data is equivalent to no data

In one world, most clowns prefer pie. In the other world, most clowns prefer anal sex. You’re not sure what world you’re in, you guess about 50/50% chance you’re in either. So you go and post an internet survey for all the clowns in your neighborhood to ask if they prefer pie or anal sex. Ten clowns respond, and nine of them prefer pie.

This is very imperfect data - it’s not a random sample at all, it’s a tiny dataset, but it should still update your probabilities of what world you’re in, because you ask the question “in which world, the pie or anal sex world, is it more likely that you get this result?” Maybe you’re update now to 60% clowns-prefer-pie world now instead of 50%.

This is how I interpret studies in general - they are stones thrown in various buckets weighed against each other. Of course randomized, large samples are bigger stones, but even small updates can be useful, especially in a field where there hasn’t been much research done before at all!

This is also my understanding of why likelihood ratios are useful. I don’t understand them very deeply, but enough that it seems like it gives you a mathematic, rigorous way to actually evaluate the studies weighted together. I do really want to switch to using likelihood ratios fully once I figure out how to program it.

c) assume small sample sizes are auto bad

I usually get huge sample sizes, but occasionally survey subpopulations where there’s much smaller samples. Small samples aren’t bad, it just means you can be less certain about the conclusions you draw from them. This is fine! It’s okay to have data you’re less certain about. A little data is better than no data. If I presented small sample sizes and made an overconfident claim based on it, then that would be an error. But the error is in the claims, not the data.

d) assume that gigantic sample sizes are useless

I agree you can get useful statistical significance out of much smaller datasets than e.g. my most recent survey at 450k respondents. But if you wanna check statistical significance (which is not a great way of looking at data, to be fair) and you have a lot of questions, then larger datasets are important to avoid p-hacking, where you basically just ask enough questions until one of them accidentally comes up positive and then you point at it and go “see, statistically significant!”

Larger datasets are also important to get at rare subpopulations! If you want to draw good conclusions about people who are really into necrophilia, for example, you have to ask a lot of people to get enough necrophiles to understand anything about them.

(I do prefer this method, because my guess is “people who happen to be necrophiles taking a survey not about necrophilia” is much less suspectable to selection bias than if you posted a necrophilia-specific survey on the necrophilia subreddit, for example)

e) assume because I use internet surveys, the data is bad

Internet surveys do introduce limitations and introduce bias to the sample, yes! For example, if I’m surveying sex workers, I’ll get online ones and not ones on the street, and this is a huge skew.

But again, limited data is better than no data. At the very worst we’re getting information about the kind of sex workers who answer surveys, which can still be useful!

When I’m trying to do Srs Research, I try to design surveys that are difficult to easily game in large numbers; I check outlier responses, include contradictory questions to catch people answering randomly, ask self-reports about honesty, etc.

I also check responses based on the place on the internet they came from, to see how much “reddit people” and “tiktok people” etc. differ from each other. If you’re like “this doesn’t count, it’s just reddit people” - well, we can check how much the “redditness” of the people actually changes the results!

(so far the greatest impact between sources tends to be age and gender)

f) assume we’re just learning about “people who follow aella”

I’ve thought about this a lot! I have around a million followers across all my platforms, and people follow me for very different reasons. I also have had surveys go viral, so I get responses from people who have never heard my name before.

And key to this, I can see how their responses differ. I have a pretty good grasp on the degree to which “people who follow me” is a unique demographic. And surprise - for most (though admittedly not all!) things I measure (mostly sex stuff, which is the majority of my focus), they’re very similar to other sources.

The kind of response I would love to hear is “How much is this data representative? People who follow you tend to be more sex-positive, do you have any comparable data from less aella-focused sources so we can get a sense of how much that’s impacting the results?”

g) assume twitter polls are completely worthless

Say it with me now, limited data is better than no data.

I mostly use twitter polls as loose, top-level broad strokes to find things that might be more interesting to study in depth later. I’m under no illusions that this is rigorous research and I never pretend that it is.

However it might be more trustworthy than you think! I often tweet the same twitter polls over time, I tweet the same polls but worded differently, I test various types of compliance (e.g., asking people a series of math problems from extremely easy to slightly hard) to see what kinds of questions people are more likely to troll on vs not. (The Lizardman’s Constant is not constant!)

And my polls regularly reflect real phenomenon! For example, I asked about people’s predictions about the severity of Covid starting in early 2020, and watched the predictions steadily increase in severity as I repeated the poll every few months. It seems likely that this is reflecting an actual increase in people’s worry about covid, as opposed to people trolling or randomly selecting buttons.

I often re-test twitter polls in full surveys posted to other demographics in other places on the internet, and generally they do replicate.

I also am really familiar with my twitter follower demographics, so I can anticipate when stuff might be confounded or warped due to selection bias.

So mostly I treat my polls as probably-pointing-at-something-real-but-we-should-more-thoroughly-check-before-making-very-bold-claims-based-on-them

h) assume I claim to be a data scientist

I’m more of a data hobbyist. Sometimes other people describe me as a data scientist, and while this might be technically true (as in, I am doing science on data), I don’t have the level of familiarity with statistics or data processing as most professionals do. I’m learning as I go, and trying to be fully open about my methodology and data so other people can correct mistakes I make.

This feels like a much better process to me - to have thousands of people, hundreds of them with degrees in data science, picking apart my process and looking at the data directly, suggesting better methods of doing things. It feels significantly more transparent than a lot of research done in academia.

To register my current level of stats familiarity:

I feel a pretty deep (as in, I know the principles of how the math is done and can rotate this in my brain from different angles) intuitive grasp of stuff like correlations, p-values, p-hacking, and standard deviations. I think I understand anova tests, confidence intervals and likelihood ratios, but need more putting-it-into-practice to be sure. I have a poor/light understanding of stuff like factor analysis, regressions and controlling for variables. I mean poor understanding as in, I understand the principles, but not at all the gears of how it’s done, and am unable to rotate the problems in my head. I also have a light understanding of failure modes besides p-hacking, as in occasionally read articles where people explain how stats can fail, but I don’t think I will understand those deeply until I make (or narrowly avoid making) the mistake myself.

I am still pretty bad at ‘best practices’ and haven’t yet developed an intuition for the best way to present data; e.g. I often get feedback that I should have included a key metric I didn’t think about. I am pretty unfamiliar with many stats terms outside of the ones listed above, though I might be somewhat familiar with the concepts they’re pointing at.

Although I’m not perfect, I still think I am very good at survey design and construction! As in, I regularly read surveys written by academics in published research fields and find them to be absolutely terrible. I have a deep and intuitive understanding of the failure modes of bad question wordings. I don’t mean that I never fail at wording a question, but that I think compared to the field and style of surveys I’m doing, my questions are significantly higher quality than average.

i) assume that published peer review research by credentialed scientists is really that good

A huge amount of research in academia is done on student populations, with way smaller sample sizes than what I get, and from equally if not more biased a population. Again, limited data is better than no data, and it's totally fine for academic research to use small, biased samples. But from my perspective, despite me having similar or better quality population and samples than published academia, people are more critical about my samples, more overconfidently assume my data is useless, and think I’m making way stronger claims than I am, compared to the way I see people talking about academic research. My guess is that because I'm not wearing the metaphorical white lab coat and because I don't use academic language, people don't perceive me as carrying authority, and thus don't trust my research as much.

Of course, this is bad. You should trust research for the quality of the research, regardless of who does it.

I also sometimes wonder how much of the disproportionate criticism is leveled due to me being an open sex worker on the internet? I don't think this is all of it, I don't mean to dismiss valid concerns with the sex worker discrimination card, but based on the level I see people confidently dismiss my data based on misguided ideas, I am a bit suspicious.

WillWorker

Dec 7, 2022

Utilizing imperfect information properly also requires a certain precision of thought. One has to be aware of all the contextual elements which go into relying on that data. While it varies amongst the population, everyone has a maximum capacity of variables they can simultaneously hold in their mind before the complexity falls apart into chaos. When people hit their personal limit, it is always interesting (and sometimes disappointing) to see who defaults to, "This is too complex for me to properly understand" from "This cannot be understood." My general preference is to avoid the latter.

Another element I find is that, in their quest for certainty, people seem to heavily discount how much fun imperfect information is. Combining a variety of reasons into a hunch and finding out whether or not one got it right is way more entertaining than adding two-plus-two together and always getting four.

Or the fact that the more one practices making and refining decisions with imperfect information, the more one hones their instincts to navigate its blind spots. People who scoff at thinking about a situation in probabilistic terms and updating those probabilities based on new information are usually more likely to dismiss someone as simply lucky or blessed with good fortune (source: anecdotal inkling).

Which is partly true. An intuited probability model still involves chance. But, for similar reasons you raised in your post, it unfairly discounts the fact that someone intentionally (if imperfectly) positioned themselves to heighten their odds. While simplistic, overtime a small edge derived from imperfect information can compound into a completely different life experience compared to those who simply shrug and say, "It can't be known".

Expand full comment

bpanak

Good post. You are very good at survey methods and you will get multivariate regression and factor analysis in due time. Really nothing to critique in your essay, here is a man observation: if the critique is so broad and general that “anyone can say the sample is to [large / small / restrictive / biased / … ]” and the critic is not providing details on why the critique is relevant to the specific analysis or interpretation you are working on, then put little weight on that feedback. When the critique is targeted, backed up with citation or thoughts experiment that fits the situation, and paired with recommendations on how to improve your game, weigh that feedback more. Anyone who has taken one course in research methods can come up with 15 different “rival hypothses”, the good critic is one who is discerning and helpful and who critiques the most relevant issue with tact, like a good teacher would.

16 more comments...

Knowingless

Discussion about this post