For a better browsing experience, please upgrade your browser.



People tell me that, as a female scientist, I need to stand up for myself if I want to succeed: Lean in, close the confidence gap, fight for tenure. Being a woman in science means knowing that the odds are both against you being there in the first place and against you staying there. Some of this is due to bias; women are less likely to be hired by science faculty, to be chosen for mathematical tasks and to have their papers deemed high quality. But there are also other barriers to success. Female scientists spend more time rearing children and work at institutions with fewer resources.

One measure of how female scientists are faring is how many papers they write. Papers are the coin of academic science, like court victories to lawyers or hits to baseball players. A widely read paper could earn a scientist tenure or a grant. Papers map money, power and professional connections, and that means we can use them to map where female scientists are succeeding and where inequality prevails.

To this end, I downloaded and statistically analyzed 938,301 scientific papers from the arXiv, a website where physicists, mathematicians and other scientists often post their papers. I inferred the authors’ gender from their first names, using a names list of 40,000 international names classified by native speakers.1 Women’s representation on the arXiv has increased significantly over the 23 years my data set covers:


But I wanted to see not only many how papers women wrote, but also on how many they earned the coveted positions of first author (indicating the scientist primarily responsible for the paper) and last author (indicating the senior scientist who supervised the work). The news is both good and bad. When a female scientist writes a paper, she is more likely to be first author than the average author on that paper. But she is less likely to be last author, writes far fewer papers and is especially unlikely to publish papers on her own. Because she writes fewer papers, she ends up more isolated in the network of scientists, with additional consequences for her career.

The average male scientist authors 45 percent more papers than the average female scientist;2 he authors more than twice as many solo papers, on which he is the only author. (Solo papers can look particularly impressive because the scientist gets all the credit for the work.) Sixty times as many multi-­author papers with identifiable gender for all authors will have all male authors as all female authors; twice as many will have all male authors as any female author.3


As a consequence, women end up at the fringes of the scientific world. We can consider two scientists “connected” if they’ve collaborated on a paper, but even though women tend to work on papers with more authors, they have significantly fewer collaborators and are significantly less central to the overall community of people publishing scientific papers.4 This social isolation matters because of nepotism: Being friends with a scientist who reviews a paper, grant or job application can provide a crucial bonus.

One female scientist I spoke with suggested that women may appear on fewer papers because their contributions are often ignored. “Some men get added to papers even if their contribution was cosmetic, yet women who contributed ideas (and perhaps even writing or data) are left out,” said the woman, who blogs pseudonymously as Female Science Professor.

Maria Mateen, a friend of mine and a psychology researcher at Stanford, offered another explanation for why men write more papers: They are more likely to be “principal investigators” (PIs), senior researchers who run their own labs. In many fields, PIs get their names on papers by default, usually as last author, because they provide funding or resources for the scientists who do most of the work. When I identified PIs in my data set (scientists who were last authors on at least three papers with four or more authors), they were indeed less likely to be women: 12 percent of PIs were women, as opposed to 17 percent of scientists overall. And these PIs wrote far more papers and more first-author papers as well. But though this effect may partially explain the gender discrepancy in publication counts, it probably does not fully explain it: When we compare male PIs to female ones, or male non­-PIs to female non­-PIs, the men still have more papers.

Women might compensate for writing fewer papers by more frequently ending up as first author on the papers they do write. Of the 938,301 papers, 200,485 had multiple authors whose gender I could discern, and of these 56,765, or 28.3 percent, had at least one female author.5 Knowing that women are often less assertive and less inclined to negotiate, I expected to find that they would be pushed out of first authorship. But I found the opposite. After I discarded all papers with only a single author (for which it makes little sense to talk about first authorship) and all papers with authors listed in alphabetical order (to account for the fact that, in fields like mathematics where author order is alphabetical, being first author is no longer prestigious) I was left with 74,829 papers. Had male and female authors been equally likely to come first, there would be 9,683 papers with female-first authors; instead, there are 10,941 — 13 percent more than expected.6 (This difference, like all differences described, is statistically significant.)

But remember that another coveted position on a scientific paper is last author. This often indicates the senior scientist who supervised the work. In the arXiv data set, women are 13 percent less likely to be last authors, possibly because, as noted above, they are less likely to be principal investigators in both the arXiv data set and in previous analyses.


There’s a chance that women are overrepresented as first authors only because they’re underrepresented as last authors. To address this, I looked at all papers with three or more authors and compared how often women were first author to how often they were middle author, and how often they were last author to how often they were middle author. This prevents first authorship from affecting last authorship or vice versa. The results were largely in line with what I found for the entire set: Women were overrepresented in first author positions (relative to middle author) by 8.9 percent and underrepresented in last author positions (relative to middle author) by 10.5 percent.

Women are more likely to be first authors in fields in which they are better represented. A paper written in a field with more female authors is more likely to have a female first author, even when we control for how many authors on the paper are women.7 This effect is like one I observed when studying how women performed in online classes: The more women in a class, the higher the grades they earned relative to men.

This doesn’t necessarily mean that women are benefiting directly from interactions with other women, however. Perhaps the fields with the most women are somehow friendlier to women, making it easier for women to excel and end up as first author. On the other hand, I found evidence that women tend to work together. If a paper has one female author, the other authors on the paper are 35 percent more likely to be female given the share of female authors in the field overall.8 A different study found that female scientists tended to hire more women than male scientists did (and that the gap between whom elite male and female scientists hired was particularly large).

The arXiv data set goes back only 23 years and does not contain every paper in every field. Most papers on the arXiv are in math or physics, and some are in computer science, finance and biology. But there are no papers in the social sciences, and some scientists may not post papers on the arXiv.9 Still, my conclusions are consistent with previous analyses, which have found that female academics publish fewer papers and tend to publish with other women. (One study also found that women’s papers receive fewer citations, data I did not have for the arXiv.)

I also spoke to Jevin West, a professor at the University of Washington, who studies scholarly publication and conducted a similar analysis of gender and authorship using the JSTOR archives. JSTOR, which is not freely available, contains papers back to 1545 and also includes papers from the social sciences and humanities. West said he thought the arXiv contained a fairly comprehensive collection of papers in the fields it focuses on, and our analyses agreed on several points: Women published fewer papers in the JSTOR data set as well, and they were less likely to be last authors. Curiously, the overrepresentation of women in first-author positions may be specific to the hard sciences. Although women were more likely to be first authors in fields such as ecology and molecular biology in the JSTOR data set, they were not in law or sociology.

Once we’ve identified the gender gaps, the next step is to explain them. How much of women’s underrepresentation is due to bias and how much to other factors? While it’s clear that gender bias in science exists, it’s hard to prove merely by examining publication data (though some convincing cases have been made). Other studies have shown that female scientists spend more time on non-research activities, like child­-rearing and teaching, tend to work at institutions that emphasize teaching over research and are more likely to leave the workforce for family reasons. Social dynamics with male scientists may also affect female scientists detrimentally. Women also tend to cite themselves less, self­-promote less, negotiate less and see smaller performance gains from competition. Wendy Cieslak, the former principal program director for nuclear weapons science and technology at Sandia National Laboratories, emphasized the importance of the confidence gap. “We often don’t recognize and accept that [it] is holding us back until much later in life, when we look back,” she said.

The intense competition in academic science, combined with the gender gap — and uncertainty about how much of that gap is due to bias — is enough to drive a female scientist a little crazy. Was I left off a paper because I’m not smart enough or because I’m female? Do I need to negotiate more forcefully to keep up with my male peers, or will doing so backfire? I once applied for a fellowship and was told, “While clearly a very smart student, applicant’s ‘confidence’ comes across as arrogance.” I wondered whether the reviewer would have written that had I been a man.

The data gives us two causes for hope. The first is that while we are far from gender parity in the sciences, we’re getting closer. As the first chart above shows, women are gaining ground in papers published and posted to the arXiv, and their representation has also increased in the JSTOR data set.

The second is that the rise of big data has made it far easier to study gender inequality. A recent Wharton School study showing that professors are less likely to respond to students who are women or minorities was made far easier by the ability to email 6,548 professors rather than enlist an army of stamp-­licking graduate students. Anyone with a computer can analyze a data set in an effort to find signs of inequality. I hope that the next time I look at the arXiv, I will find more of these analyses. And I hope that more of them will be by lone women.


  1. Names lists are often used in analyzing gender (see here, here or here), and this particular names list was used in a previous gender analysis. I discarded androgynous names like Pat. Some names are not included in the names list, meaning authors of certain nationalities may be underrepresented. Inferring gender as a binary variable (i.e. male or female) based on name also does not account for scientists whose gender does not fit a binary description or does not match that implied by their name. ^
  2. This is not just because women have entered scientific fields more recently and had less time to publish; it also appears when I look at each year individually. To compare the number of papers written by men and women, I looked at all papers for which I could identify the gender of any author (not just all authors). I did this to avoid a potential bias: Women tended to write papers with more authors, and it’s harder to identify the gender of all authors for papers with more authors (because there are more names to fail on), meaning that these papers disproportionately get thrown out if we look only at papers for which we can determine every author’s gender. ^
  3. It is worth noting that women might appear to write fewer papers in part because they are more likely to change their last names when they marry, making it look as though their papers were published by two different scientists. ^
  4. I use betweenness centrality as a measure of how central a scientist is. ^
  5. This percentage of the entire data set may seem low, but it isn’t a problem as long as the selection isn’t biased in favor of male or female names, because we still have plenty of papers from which to draw statistically significant conclusions; a previous analysis using the same names list showed no evidence of such bias. ^
  6. Throwing out the papers with authors listed in alphabetical order loses statistical power but should yield unbiased estimates, unless there’s some weird correlation between sex and the lexicographic order of last name. Just in case, I also ran a logistic regression in which each author on each paper was one data point, with the independent variables being the lexicographic rank of the author’s last name (relative to the paper’s other authors) and whether she was female, and the dependent variable was whether she was first author. Both these methods concluded that women were overrepresented in the first author position: The first method yielded 13 percent more female first authors than expected, and the second method yielded an odds ratio of 1.14. One could also compare the fraction of women’s first authorships to the fraction of all women’s authorships, as was done in a PLOS ONE paper. Women were again overrepresented among first authors when I replicated this method. ^
  7. I ran a regression in which each paper was a data point and the dependent variable was whether it had a female first author, discarding papers with only one author; the independent variables were the fraction of the paper’s authors who were female and the average fraction of authors on a paper in the field who were female. ^
  8. This is not just because women tend to cluster in similar fields; I saw the effect when I controlled for field. ^
  9. Some scientists might also have the same names and thus be counted in my analysis as a single scientist. This effect might be particularly pronounced for men, because there are more of them to overlap and would tend to underestimate the relative number of female authors. ^

Filed under , , , , ,

Comments Add Comment

Powered by VIP