7 Comments

I think what you have done is fascinating overall. However I felt this post was a bit of a tangle and didn’t get the ‘so what’. Could you please think about a summary that either says: why what I found matters or why what I found interested me.

Thanks

Expand full comment

Trans women are female. Trans men are male. Why the hell do you always group trans men with cis women and trans women with cis men in your studies? Why do you only separate nonbinary people by sex assigned at birth? Why do you use "transwomen" and "transmen", classic TERF language?

Expand full comment

This is interesting! As a trans woman (whose data points are in your survey) it’s curious to know that as a group we’re really similar to AMAB Enbies across all of your questions. I suppose in a lot of ways that makes some sense.

Getting into the weeds, some of the really narrow numerical differences make the results hard to interpret though. Like when the lowest correlation you measure is 0.89 across such a huge question set, that’s hard to get an intuitive sense for what that means? Or what 0.99 across the data set really means?

What might be super interesting is to take that same matrix, but for certain sub-categories of questions or specific questions, measure *how different* the distribution of responses are. Like take a smattering of questions you think are interesting to consider, and then for each one look at how the distributions of answers fell across each gender and put a number to how distinct (or not) they are.

This might help the data interpretation be a little clearer, versus looking at calculated averages that may have very subtle differences.

From my cursory understanding of statistics and some wikipedia searching, the Bhattacharyya distance (or the coefficient if you think 0 to 1 is more intuitive than 0 to infinity) is useful here since your answers are already nicely binned into 0--7. It should be fairly easy to calculate as well, I think? And it spits out a nice number at the end.

You would double check your implementation here by looking at the question for self-identified gender. The diagonal on your matrix would be zeros and all other cells would be infinity. The interpretation is just that the distributions into buckets (M, F, MtF, FtM, Enby AMAB, Enby AFAB) are either perfectly identical or perfectly different, by definition.

Expand full comment
Mar 29, 2023·edited Mar 29, 2023

I would be very curious to see this for age of transition. Do trans women that underwent female puberty are closer to cis women or non-cis people that transitioned later in life? This, I think, would help answer the question if those differences are due to socialisation or physiology.

Expand full comment

Amazing sample you are working with. Huge props for collecting it. For the notion of similarity I would have zero-meaned the scores before taking the correlation between groups. What I mean by that is you have a 1-100 rating on something rare or very bad (eg, I am a serial killer), then almost everyone will score 100. When you take the correlation, that question will tend to count more than something where people give a range of responses and the mean is near 100.

You end up getting a lot more separation between the groups, because you are comparing how they differ from the global average. So instead of getting correlations between .9 and 1 for bio_males I got their correlations to be: 1.00, -0.21, 1.00, 0.02, 0.13, -0.90, 0.24, -0.88. The corresponding labels are bio_male correlated with: biomale, biofemale, cisbiomale, cisbiofemale, queerbiomale, queerbiofemale, ebybiomale, enbybiofemale. It's interesting that you can see bio_males and bio_females are a bit different (-0.21), but queerbiofemale is basically the opposite of bio_male (-0.9). As if the identity is to be what straight males are not. (Granted, I did not drop any of the questions specifically about gender. You can see my work at https://docs.google.com/spreadsheets/d/1eaosAMYxjVTi-nUkxWZ10K3teM0X9z4NEwv13T5K7kc/edit?usp=sharing )

There are some more complicated notions of similarity which wouldn't be so hard to implement. Like for example train a classifier to go from personality -> gender. I have done this on a large (100k) sample of Big Five data and gotten ~80% accuracy. The errors are typically pretty informative when displayed as a confusion matrix: for each class show how often is was misclassified as each other class. This has the advantage of modeling nonlinear relationships in the data, as you can use a neural net or random forest as the classifier

Expand full comment
Jan 30, 2023·edited Jan 30, 2023

You're missing the category name "cis man" in second table.

Expand full comment

My intuition would be that these results are dominated by gender progressivism, as women are more progressive than men and trans people are more progressive than cis people.

Expand full comment