In an upcoming @ieeesoftware I've conducted a longitudinal of authors of commits from @swheritage, analyzing 1.6 billion commits contributed by 33 million distinct authors over a period of 50 years. Short thread w/ links at the end 👇 [1/6]

Key findings are a mixture of good and bad news. The bad news is that, even at this scale, female authors are massively under-represented in contributions: male authors have contributed more than 92% of all commits over the past 50 years [2/6]

Show thread

The good news is that the ratio of commits by female authors has grown steadily over the past 50 years, reaching in 2019 for the first time 10% of all contributions [3/6]

Show thread

The ratio of active female authors (having contributed at least one commit per year) shows similar stable growth. Both trends are more evident and stable in the 2005-2020 period than in previous decades [4/6]

Show thread

If these trends were to continue 🤞, in contributions could increase significantly over the next decade (the extrapolation about when it will reach 50% is left as an exercise ☺) [5/6]

Show thread

For the gory details, more stats, discussion, and limitations, checkout the full paper. Early access version on @ieeesoftware: ; preprint: [6/6]

Show thread

@zacchiro How representative is the data older than CVS, say?

The steady disappearance of women in computing since the 80s–90s is well studied, notably by Isabelle Collet (e.g., <>). How well do commit counts capture that history?

The general trend of decline of women participation until 2005 is well visible also in my data.

But the sample size is significantly smaller in early years (the corpus grows exponentially with a very stable trend, so if you go *back* 50 years it shrinks a lot).

I agree with you that in terms of (hope for) diversity, the most interesting findings are those about the most "recent" 15 years.

@zacchiro 50 years ago was ~13 years before the concept of “free software” was articulated, so that looks like an anachronism. :-)

The trend you observe in the 2005–2019 period is surely representative, and a tiny bit encouraging. Nice work!

@civodul and that's one reason why I wrote "public code" instead of "free software" :-)

Sign in to participate in the conversation

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!