Agh it makes me so annoyed that you take this more seriously than most of academia, and that there’s no way to express this in other contexts without it being dismissed as some kind of anti science sentiment. (Ironic since the point is that Science isn’t being scientific.)
Sometimes when I am taking surveys, including yours because I recognize those questions, I worry whether I understand the question the way it is intended.
For example, when I answered the question about being influenced by advertising, I answered in a strong positive, because I used my own internal rule that I will not willingly buy anything for which I have seen an advertisement. That's a very powerful influence. But I assume that you meant "Do advertisements convince you to buy the product advertised". I debated for several minutes before answering because I wasn't sure which way my answer would be taken. And the two interpretations lead to exactly opposite answers.
This happens rather often for me in surveys, where there is a common understanding that I assume is at work, but I approach from outside that frame. Then I have to take a guess what to say.
The math ones are much easier on that it is always math.
You didn't mention that you often end your surveys with a question about did we answer honestly. What results do you get from that question? Do many people admit to lying? Does it corrrelate to troll answers?
Yes that kind of clarification would be helpful in quite a number of cases. Survey design is hard, it is one reason I love these articles that dive into it.
Can relate to tthat, this makes a number of those questions hard to answer for me. Maybe a survey on how people understood the survey would be interesting?
Re the theory that "conservatives would be more likely to honestly answer they don't treat everyone else equally": I would actually have the opposite prior, that liberals care a lot about equality and so would hold themselves to an impossible bar ("everyone's acting racist me included") while conservatives would be more "sure I treat everyone the same I'm a good person", basically understanding the question differently.
I remember hearing a mindfulness researcher describe that people are *more* likely to answer they act out of impulse after a mindfulness course, cause before they weren't conscious of it and after they are.
Some scenarios are still not feasible, but I would love to see a study on people filling out surveys while wearing those thought reading headsets and comparing their internal thoughts to what they stated. A cheaper one would be filming with an infrared camera while they complete it (there are very good studies showing blood flow changes with lying, as vastly more effective than polygraph etc.) . Essentially the ways to compare stated vs revealed preferences.
The survey about how they understood the survey, I would definitely participate in that. And yes I would love to see the results.
On the self-purity differences between liberals and conservatives, Jonathan Haidt did great work on that back before he started on social media harming kids. In particular in his book the righteous mind. Both sides will confirm for different reasons. The framing is what pushes one way or the other.
Many conservatives believe that they do treat people equally regardless of background, and would say that liberals are the ones treating people differently.
What I really want to know is how well a joke answer on the math question correlates with a non-compliant answer on the 'type mildly agree' question or the 'a VCR is a good place to store a peanut butter sandwich' question. This tells us whether we can give a shibboleth question and then throw out those surveys that don't answer correctly to improve the fidelity of later questions.
On the other hand, do joke answers correlate with any other serious tendencies, such that throwing out joke answers would bias survey results in some meaningful way?
I'm not sure if you were worried about this in-survey, but in survey it says "19-17=?" and the answers are 1, 2, 3, and 4. I think this is probably pretty clear?
Well, both of those have an ASCII hyphen-minus. The subtraction should be “19 − 17”, and the range—in ordinary text, not in a mathematical formula—“19–17”.
By the way, it’d be interesting if the problems with a “÷” had “6” as an answer, for those who mistook it for a “+”. Mistaking it for a “−” doesn’t matter, because the result is still 2.
If all of these examples of trolling were on the same survey, it would be interesting to check if the wrong/troll answers are correlated or not. In other words, are people just getting one wrong, or multiple? If it's the former, then you're probably seeing measurement error. If it's the latter, it's probably trolling as you suggest.
I'd love to see one where you set the context that one of the easier questions will be used to weed out bots and trolls and that their answers will be omitted from the data set. I bet that has a big effect on the outcome.
Well it's hard to know, right? Like it seems a certain subset of person is likely to troll on questions that are prone to get trolled on, but the troll nature seems to disappear in other questions.
I *do* usually remove responses that have troll behavior anywhere just to be safe, but I do wonder if I'm being overly conservative here.
This definately feels like the type of thing worth publishing official by teaming up with a credentialed co-author (mostly just to help get publishable credentials associated with the work as to facilitate publication.)
I think the explanation here is simple: this is an equally-honest population to the survey-takers, but they're *different people.* Run a similar survey with a different disguise and you'll get different results. Well, partially different - 'horny' will still be higher, because you're targeting a population at a time when they're more likely to be horny than when scrolling on X or being paid to do surveys. (...You may browse X horny more often than the median user.)
I suggest reskinning this as 'What should your Fursona be?' and running it again; I predict the 'horny' effect will stay (and probably get bigger), but the other two will vanish, and 2-4 new things will be anomalously high or low.
Before seeing the results for the last questions, I flagged three as 'seems likely.' Horny, 'Slightly agree', and pride. Technically one of those was right, but my logic was only partially accurate even for that one, so I'm not sure I count it.
How I got to this conclusion:
People taking a survey for fun to pick a spirit animal are:
- not working (survey-takers) nor at work (Twitter, frequently), and so more likely to *be* horny
- the kind of person who takes a survey to tell them their spirit animal
There's more than one type of person that would say that 'Women have a purity that men lack', and some of them are e.g. conservative Christians, but people very into pop-spirituality (and particularly the surface-level, vaguely un-PC 'spirit animal' type) are unusually likely to be women and agree with this. It's a signal of femininity and pop-spirituality to be taking the survey at all.
Advertising isn't as obvious, but I think it makes more sense when you consider the most glaring thing that *didn't* change alongside female purity: 'women are too easily offended'. That's a sign that (unless this is just noise) this isn't just femininity, it's a *specific* subculture. And I don't think it's hard to draw a circle around a crunchy hippy, woo-y subculture that is proud not to pay attention to advertising (truthfully or not), thinks women are purer/more natural, and thinks spirit animals are a neat idea.
The specific follow-up could use workshopping. Fursona seems easy to adapt and targeting a very different audience (more male or at least AMAB, more gay, more kinky, more educated). Some other disguise could work more effectively at hitting a more neutral or more Twitter-like population. Is there some extremely vague personality typing system that appeals to wordcels more and would let you mix in all of these? IDK.
Try adding a pair of eyes to the math questions in the X poll so responders feel like they are being watched. I know it sounds hocus-pocus, but I’m reading Thinking, Fast and Slow by Daniel Kahneman and it turns out we are way more sincere when we feel observed. It’s surprisingly easy to trick the brain—specifically the 'System 1' part—into behaving nicely just by using that visual cue.
My bet is that the proportion of correct answers would increase for the easy questions specifically, where people like to troll. So basically it could close the gap between the X poll and the real survey. I'm really curious if it would work, I hope you try it!
Also, instead of a bar plot for the math data, you could try using a spider/radar chart. I think it would look nice and make it even easier to interpret the percentages of correct answers. I wrote a quick guide about them here:
Does the category of "non-consent fantasy where you're the aggressor" include things like "they're high on sex pollen that's made them too horny to say no"?
My guess is that your survey takers don't consider the "yes I'm into non-consent fantasies where I'm the aggressor" to be a stigmatized or taboo answer bc they know that you yourself are into CNC and aggressors, and your survey takers are likely to be people who admire you or at least wish to aggress their way into your pants, thus those answers are much more likely to evidence them trying to give you what they imagine you to find a desirable response.
My secondary theory is that users of X generally have a much higher proportion of jackasses and pains in the asses and trolls than likely any other site for a survey.
Have you ever calibrated against "Sex in America, a definitive survey"? (It's on the Internet Archive, if you need a copy.) They were more careful about methodology than anyone before or since. They selected random areas in the US, then random people in those areas. They sent questionnaires. There were phone interviews. There were in-person visits by trained interviewers. At the end they paid people to finish the process. So they got a very high response rate from a non self selected sample. Sample size around 3,000 people.
Agh it makes me so annoyed that you take this more seriously than most of academia, and that there’s no way to express this in other contexts without it being dismissed as some kind of anti science sentiment. (Ironic since the point is that Science isn’t being scientific.)
Sometimes when I am taking surveys, including yours because I recognize those questions, I worry whether I understand the question the way it is intended.
For example, when I answered the question about being influenced by advertising, I answered in a strong positive, because I used my own internal rule that I will not willingly buy anything for which I have seen an advertisement. That's a very powerful influence. But I assume that you meant "Do advertisements convince you to buy the product advertised". I debated for several minutes before answering because I wasn't sure which way my answer would be taken. And the two interpretations lead to exactly opposite answers.
This happens rather often for me in surveys, where there is a common understanding that I assume is at work, but I approach from outside that frame. Then I have to take a guess what to say.
The math ones are much easier on that it is always math.
You didn't mention that you often end your surveys with a question about did we answer honestly. What results do you get from that question? Do many people admit to lying? Does it corrrelate to troll answers?
i think you probably answered correctly, because a negative influence is still an influence!
eta: maybe that would be a useful clarification for the question -
"Advertising influences my purchasing decisions (making you more or less likely to purchase)"
Yes that kind of clarification would be helpful in quite a number of cases. Survey design is hard, it is one reason I love these articles that dive into it.
Can relate to tthat, this makes a number of those questions hard to answer for me. Maybe a survey on how people understood the survey would be interesting?
Re the theory that "conservatives would be more likely to honestly answer they don't treat everyone else equally": I would actually have the opposite prior, that liberals care a lot about equality and so would hold themselves to an impossible bar ("everyone's acting racist me included") while conservatives would be more "sure I treat everyone the same I'm a good person", basically understanding the question differently.
I remember hearing a mindfulness researcher describe that people are *more* likely to answer they act out of impulse after a mindfulness course, cause before they weren't conscious of it and after they are.
Really good insights thank you.
Some scenarios are still not feasible, but I would love to see a study on people filling out surveys while wearing those thought reading headsets and comparing their internal thoughts to what they stated. A cheaper one would be filming with an infrared camera while they complete it (there are very good studies showing blood flow changes with lying, as vastly more effective than polygraph etc.) . Essentially the ways to compare stated vs revealed preferences.
The survey about how they understood the survey, I would definitely participate in that. And yes I would love to see the results.
On the self-purity differences between liberals and conservatives, Jonathan Haidt did great work on that back before he started on social media harming kids. In particular in his book the righteous mind. Both sides will confirm for different reasons. The framing is what pushes one way or the other.
Many conservatives believe that they do treat people equally regardless of background, and would say that liberals are the ones treating people differently.
Always reminded of Scott Alexander's "Lizard Man Constant" when stuff like this comes up:
https://slatestarcodex.com/2013/04/12/noisy-poll-results-and-reptilian-muslim-climatologists-from-mars/
This does seem applicable to the topic.
What I really want to know is how well a joke answer on the math question correlates with a non-compliant answer on the 'type mildly agree' question or the 'a VCR is a good place to store a peanut butter sandwich' question. This tells us whether we can give a shibboleth question and then throw out those surveys that don't answer correctly to improve the fidelity of later questions.
On the other hand, do joke answers correlate with any other serious tendencies, such that throwing out joke answers would bias survey results in some meaningful way?
C’mon, you can’t ask for “1 + 1” or “19 - 17” and not provide the answers “11” and “That’s a range, not a number”, respectively.
Legitimately read 19 - 17 as a range. Leapt some logic about primes and oddness to get 3. 19-17, its cleaner.
I'm not sure if you were worried about this in-survey, but in survey it says "19-17=?" and the answers are 1, 2, 3, and 4. I think this is probably pretty clear?
I don't recall seeing the "=", did both the twitter and survey versions have that? Either way, it was a dumb moment and comes out as noise.
ah you're right that I didn't include this in the twitter polls. I did doublecheck and I def included them in the survey tho!
Yeah, I think that even without the equals it should be clear to most people, I just had a moment.
Well, both of those have an ASCII hyphen-minus. The subtraction should be “19 − 17”, and the range—in ordinary text, not in a mathematical formula—“19–17”.
By the way, it’d be interesting if the problems with a “÷” had “6” as an answer, for those who mistook it for a “+”. Mistaking it for a “−” doesn’t matter, because the result is still 2.
“(3³ − 23) ÷ 2” definitely needs a footnote 3: <https://xkcd.com/1184/>.
Nice, really enjoyed this. One thing to consider: How much of this is measurement error (e.g. misclicks)?
Context: In political science, scholars have long struggled with explaining why individuals' responses to the same policy question often differ between waves. One common explanation for this is measurement error (people misunderstand the question, accidentally click on the wrong one, or in the old days the response was incorrectly recorded by the pollster on the phone). That's why social scientists like to use multi-item scales to improve robustness (e.g. see https://www.cambridge.org/core/journals/american-political-science-review/article/strength-of-issues-using-multiple-measures-to-gauge-preference-stability-ideological-constraint-and-issue-voting/16F6AF97F7B71AA0112EC9ADF78B553A)
If all of these examples of trolling were on the same survey, it would be interesting to check if the wrong/troll answers are correlated or not. In other words, are people just getting one wrong, or multiple? If it's the former, then you're probably seeing measurement error. If it's the latter, it's probably trolling as you suggest.
Философия 😁🎉
I would like to see the p values for
H: people answering polls on twitter behave differently from people answering questions on surveys
on these various questions
I'd love to see one where you set the context that one of the easier questions will be used to weed out bots and trolls and that their answers will be omitted from the data set. I bet that has a big effect on the outcome.
Well it's hard to know, right? Like it seems a certain subset of person is likely to troll on questions that are prone to get trolled on, but the troll nature seems to disappear in other questions.
I *do* usually remove responses that have troll behavior anywhere just to be safe, but I do wonder if I'm being overly conservative here.
Nah... If you have good reason to expect it, I'd remove them too. But I'm all about clean data, so I'm probably biased.
This definately feels like the type of thing worth publishing official by teaming up with a credentialed co-author (mostly just to help get publishable credentials associated with the work as to facilitate publication.)
I think the explanation here is simple: this is an equally-honest population to the survey-takers, but they're *different people.* Run a similar survey with a different disguise and you'll get different results. Well, partially different - 'horny' will still be higher, because you're targeting a population at a time when they're more likely to be horny than when scrolling on X or being paid to do surveys. (...You may browse X horny more often than the median user.)
I suggest reskinning this as 'What should your Fursona be?' and running it again; I predict the 'horny' effect will stay (and probably get bigger), but the other two will vanish, and 2-4 new things will be anomalously high or low.
Before seeing the results for the last questions, I flagged three as 'seems likely.' Horny, 'Slightly agree', and pride. Technically one of those was right, but my logic was only partially accurate even for that one, so I'm not sure I count it.
How I got to this conclusion:
People taking a survey for fun to pick a spirit animal are:
- not working (survey-takers) nor at work (Twitter, frequently), and so more likely to *be* horny
- the kind of person who takes a survey to tell them their spirit animal
There's more than one type of person that would say that 'Women have a purity that men lack', and some of them are e.g. conservative Christians, but people very into pop-spirituality (and particularly the surface-level, vaguely un-PC 'spirit animal' type) are unusually likely to be women and agree with this. It's a signal of femininity and pop-spirituality to be taking the survey at all.
Advertising isn't as obvious, but I think it makes more sense when you consider the most glaring thing that *didn't* change alongside female purity: 'women are too easily offended'. That's a sign that (unless this is just noise) this isn't just femininity, it's a *specific* subculture. And I don't think it's hard to draw a circle around a crunchy hippy, woo-y subculture that is proud not to pay attention to advertising (truthfully or not), thinks women are purer/more natural, and thinks spirit animals are a neat idea.
The specific follow-up could use workshopping. Fursona seems easy to adapt and targeting a very different audience (more male or at least AMAB, more gay, more kinky, more educated). Some other disguise could work more effectively at hitting a more neutral or more Twitter-like population. Is there some extremely vague personality typing system that appeals to wordcels more and would let you mix in all of these? IDK.
Thanks for the interesting read!
Try adding a pair of eyes to the math questions in the X poll so responders feel like they are being watched. I know it sounds hocus-pocus, but I’m reading Thinking, Fast and Slow by Daniel Kahneman and it turns out we are way more sincere when we feel observed. It’s surprisingly easy to trick the brain—specifically the 'System 1' part—into behaving nicely just by using that visual cue.
My bet is that the proportion of correct answers would increase for the easy questions specifically, where people like to troll. So basically it could close the gap between the X poll and the real survey. I'm really curious if it would work, I hope you try it!
Also, instead of a bar plot for the math data, you could try using a spider/radar chart. I think it would look nice and make it even easier to interpret the percentages of correct answers. I wrote a quick guide about them here:
https://scientistmom1.substack.com/p/is-an-80-score-a-failure-or-a-triumph
Does the category of "non-consent fantasy where you're the aggressor" include things like "they're high on sex pollen that's made them too horny to say no"?
My guess is that your survey takers don't consider the "yes I'm into non-consent fantasies where I'm the aggressor" to be a stigmatized or taboo answer bc they know that you yourself are into CNC and aggressors, and your survey takers are likely to be people who admire you or at least wish to aggress their way into your pants, thus those answers are much more likely to evidence them trying to give you what they imagine you to find a desirable response.
My secondary theory is that users of X generally have a much higher proportion of jackasses and pains in the asses and trolls than likely any other site for a survey.
It feels so good to read such coherent thoughts
Have you ever calibrated against "Sex in America, a definitive survey"? (It's on the Internet Archive, if you need a copy.) They were more careful about methodology than anyone before or since. They selected random areas in the US, then random people in those areas. They sent questionnaires. There were phone interviews. There were in-person visits by trained interviewers. At the end they paid people to finish the process. So they got a very high response rate from a non self selected sample. Sample size around 3,000 people.
How does that compare with your survey results?