How Your Survey Responder Lies
comparing honesty rates across survey types
I occasionally run a series of polls where I ask people to solve extremely simple math problems that slowly increase in complexity. I ran them again real quick as I write this to show you. Watch what happens to the correct answer (which is always 2):
#1:
#2:
#3
#4
#5
In order of difficulty, the % of people who chose the correct answer is 87%, 82%, 95%, 92%, and 91%. Obviously something else is going on - 13% of people are not mistaken about what 1+1 equals!
The numbers here are a bit noisy due to low sample size, but in general whenever I’ve asked variations on this set of questions, the number of correct votes roughly increases with the difficulty of the problem, before hitting a ceiling when the difficulty is too high, and slowly decreases with that difficulty.
It seems that people enjoy voting falsely when nobody could possibly mistake their vote for proof of actual idiocy. If your friend says 1+1=3, you probably take this as evidence for their jovial rakishness more than for their failure to pass kindergarten. But if they get (4+6)-8 wrong, well, I dunno. Probably they’re not that bad at math, but… they might be?
But this troll streak is not an absolute rule!
I ran a survey - a survey proper, not a twitter poll. For my Fancy Survey, people had to click on a link and visit a webpage where they input their age, and ethnicity, and level of education, and gender identity, and a bunch of boring stuff. Then they got some questions including political views, and some about sexual preferences (a thing I survey often). Then they got one or two of the math questions, at random. Around 170 people answered each math question.
Here’s how they did:
Basically: inside the survey, the accuracy mostly decreased with the difficulty of the problem. The troll effect seems to be completely gone here.
It’s interesting that the least-accurate question for twitter was 19-17 instead of 1+1. My guess is that 1+1 should have been even less accurate, but there was a whole angry debacle a year or two ago over 2+2=5, and by answering 1+1=2 people are signaling they don’t agree with the liberal math nonsense. 19-17, though, has no such controversy associated with it, so people feel more free to troll.
Let’s try another one: Agree or disagree - “A VCR is a good place to store a peanut butter sandwich”
I don’t think a VCR is a good place to store a peanut butter sandwich. I might go so far as to venture that my opinion is also the correct answer to this question.
61.4% of twitter users voted correctly here. And in my Slightly More Professional Survey, this jumped to 81% of responders.
Let’s check a compliance test:
73 vs 90%, beautiful.
So far most of these questions have been easy, where there’s an obvious answer, and people have to choose whether or not to click the obvious answer. They know they’re being tested, and there’s a tension between “do I do what the tester wants, or do I fuck around.”
I think probably the immediate visibility and informality of the twitter poll gives people a jolt of delight to fuck with the results, as long as fucking with the results would be funny. 1+1 = 3 is hilarious when it’s in front of a crowd and everybody sees the stupid percentages being stupid, but it’s boring when it’s inside the solemn walls of a researcher’s guidedtrack url.
(3³ − 23) ÷ 2 = 3, however, is deadly serious. Nobody finds that funny, crowd or not. It’s the one question where the twitter poll and survey question collapsed to almost exactly the same % of correct answer.
Much of good survey design is getting a decent model of the mind and incentives of your audience. Egos are predictable. You have to tease it out like taffy, and if the ego is sticking to your results too hard you have to find other questions to trap them.
Let’s move away from the troll questions and look more directly at ego. When does it flare up?
I asked:
“I treat everyone equally, regardless of background”
“Advertising influences my purchasing decisions”
“I’m satisfied with my sex life”
The do-you-treat-people-equally one saw twitter people answering very slightly more “disagree” than the survey ones, but this is a bit hard to disentangle cause my survey-takers tend to be a a little more liberal (5%) than my twitter audience, and conservatives are probably going to honestly answer ‘no’ more often.
But the rest had basically no change in X vs survey polls. Being affected by advertising, having a good sex life - the venue didn’t seem change the answers.
However this one:
“Are you above or below average compared to the other people who answer this poll/survey”?
is pretty hilarious. 79% of twitter-poll people said they were above average, but 89% of my Fancy Survey-takers said they were! I think people who knew their answers would be visible, and that it would be embarrassing if the bars didn’t align 50/50, felt more pressure to answer humbly. Fancy Survey, in private, presented no such pressure.
I also asked about sexual dominance. A common theory is that men are less likely to admit wanting to be sexually rapey because it’s stigmatized, so this would predict the more anon and hidden their answers, the more honesty you’d see. This… maybe held up in my tests? Slightly? I asked:
“Are you sometimes aroused by fantasies of nonconsent, where *you* are the *aggressor/perpetrator*?” with some ‘don’t worry we won’t judge you’ text.
If you count ‘not at all’ as 0 and ‘yes’ as 2, the mean is actually basically the same (about 0.92). But the twitter result was more polarized - in public, people were more likely to say they both were into and not into nonconsent; in the survey, people said ‘a little.’ I think probably this is just noise, the samples aren’t super high. This suggests that immediate visibility of answer does not impact people’s willingness to admit being into a sexually taboo thing.
Okay - but this is looking at like, how does public visibility of specific answers change those answers? Something I’m more curious about is - how does offering test results change answers?
I build a lot of data-backed surveys that tell you how you differ from other people via giving you cool results. This gets more users to take them for free, subverting the whole ‘have to bribe undergrads with a grade’ or ‘have to pay bored microtaskers $2 each to please answer my questions’ problem that plagues most survey research.
But ego does change things. How much?
I duplicated my Fancy Survey, removed less-important questions, and renamed it to Quick Personality Feedback Survey. I said “what’s your spirit animal?” and said if you took my survey you’d find out! I also suggested people share their result so others can see what animal they are. I tacked on some preliminary results onto a few basic questions from a different project I’m working on to actually give them an animal.
Here’s the questions I measured. Guess which, if any, showed people answering them differently when they thought they were being evaluated:
How horny are you right now?
Women are too easily offended
Women have a purity that men lack
Are you sometimes aroused by fantasies of nonconsent, where *you* are the *aggressor/perpetrator*?
Are you sometimes aroused by fantasies of nonconsent, where *you* are the *receiver/victim*?
Do you sometimes find people who are 15 years old, sexually attractive?
How many people have you had sex with?
The correct answer to this question is “Slightly agree”. Please select it.
Advertising influences my purchasing decisions
I’m satisfied with my sex life
I have made decisions primarily to protect my pride
I treat everyone equally, regardless of background.
On a scale of 1-10, how attractive are you?
How smart are you, compared to the other people who will answer this survey?
Overall, how satisfied are you with your life these days?
Three questions seemed to show a meaningful difference between groups - Women have a purity that men lack (d=.25), How horny are you right now? (d=.23), and Advertising influences my purchasing decisions (d=.25). The spirit-animal audience said women were purer, that they were hornier right now, and more likely to deny advertising impacted their decisions. The rest were basically the same.
This is a pretty confusing cluster of questions. Why those and not the others? Why the horny question and not the like, being aroused by noncon? Why the ad question but not the pride or the equality or the smartness question? I think it’s probably not noise based on the sample sizes (remember, about ~600 and 900), but it’s patternless enough I keep wanting to think it’s noise - or else something totally different is underlying it that I haven’t figured out a theory to match yet
But still - overall, most of them showed no meaningful shift.
Ok so I’ve compared three types of surveys here - twitter polls, a normal generic survey, and an incentivized-by-finding-your-spirit-animal survey. I didn’t try super hard to get huge sample sizes, but they should be big enough to work.
And the message is: it’s hard to generalize about any of these. I don’t think it’s right to say a poll, survey, or incentive is worthless, only that some categories of questions are affected, sometimes. Twitter polls seem especially susceptible to questions that allow trolling opportunities, but seem kind of ok otherwise? At least not worse than normal surveys? And incentivized surveys also do pretty well, but with weird fluctuations in a small handful of questions, and it’s hard to interpret but maybe points to a desire to get a more flattering outcome.
And the common claim that “people are less likely to admit highly stigmatized sexual fantasies” fails to find evidence here. The rate of reported interest was exactly the same across all surveys.
Of course, maybe there’s some warping going on by the fact people are answering online questions at all, and it warps all the questions across all the tests I did equally, so I can’t tell.
But my point is we shouldn’t be applying a single rule universally across all surveys to determine if the questions are reliable or not! You’ve got to look at where and how it was shared, and how the questions were phrased, and to whom, and then imagine taking the survey and feel where your ego flares up, and then squint hard at those specific questions to see if you can make them more resilient somehow. It’s an art. It’s also a science, but still an art.


















Agh it makes me so annoyed that you take this more seriously than most of academia, and that there’s no way to express this in other contexts without it being dismissed as some kind of anti science sentiment. (Ironic since the point is that Science isn’t being scientific.)
Sometimes when I am taking surveys, including yours because I recognize those questions, I worry whether I understand the question the way it is intended.
For example, when I answered the question about being influenced by advertising, I answered in a strong positive, because I used my own internal rule that I will not willingly buy anything for which I have seen an advertisement. That's a very powerful influence. But I assume that you meant "Do advertisements convince you to buy the product advertised". I debated for several minutes before answering because I wasn't sure which way my answer would be taken. And the two interpretations lead to exactly opposite answers.
This happens rather often for me in surveys, where there is a common understanding that I assume is at work, but I approach from outside that frame. Then I have to take a guess what to say.
The math ones are much easier on that it is always math.
You didn't mention that you often end your surveys with a question about did we answer honestly. What results do you get from that question? Do many people admit to lying? Does it corrrelate to troll answers?