October 9, 2017
Dear arXiv Leaders,
TL;DR: It would be great if arXiv provided a feature that allows anonymized posts — posts where personal information and comments are withheld until the authors decide to reveal them or some number of years have passed. This may help communities that use double-blind reviewing to overcome some of their struggles regarding arXiv. I elaborate below.
I am a Professor at The University of Texas at Austin and my primary research area is Computer Architecture. My community does not yet heavily use arXiv, but I know quite a few of us are increasingly interested. One thing holding some of us back is the potential large impact arXiv postings have on the double-blind review process we uniformly use in all our conferences. I am worried that arXiv is re-introducing bias against under-represented groups into the review process.
Specifically, my community has generally been working to minimize any conscious or unconscious bias in the paper review process, and we are continuing to do so and refine the process. A cornerstone of our approach is to use a double-blind review process where author names are withheld, usually until a paper is accepted. Bias toward author names with a certain sound and to some institutions has been documented and analyzed.
Posting to arXiv disrupts this process in two important ways. First, it becomes common for people to post first and leaks significant information. Second, anecdotes I’ve heard from colleagues in communities that do extensively use arXiv suggest that some reviewers are significantly influenced by whether they themselves have already seen the paper on arXiv or not. This is an even greater concern because not only does it bring up all the racial, country of origin, religious, and gender biases but also improves the chances of those already connected.
I’ve heard of some conferences forbidding authors from first posting on arXiv, primarily for similar concerns as I raise above. However, I think that’s a big mistake and we should all recognize the benefits of sharing discoveries efficiently. So, finally, a proposal: “anonymized arXiv”.
The anonymized arXiv will allow authors to post and share their work without revealing their personal information and institution. Such papers will still get arXiv references to cite, still be searchable, still collect any statistics tracked by arXiv and others, and, still allow authors to stake their claims. At the same time, not exposing personal information will allow those communities who are serious about eliminating bias to continue their efforts without giving up on the significant benefits of using arXiv.
I suggest that arXiv itself should not make, check, or enforce any anonymity policies. Rather, arXiv simply provide the mechanism that enables venues to do so. One policy might be that a venue will decline to review articles that are not anonymized, allowing authors to make the tradeoff between when to reveal their personal information in an arXiv posting vs. which venues they would still be able to submit to. Authors will be able to choose to reveal their names at any time, and perhaps a default time of some small number of years would be used to automatically reveal authorship.
If I am first to suggest this, I hope I provided you food for thought. If this is not the first time this has come up, well, then +1.
Why is this important?
Science is best when it is open and inclusive and when ideas are shared rapidly. Using arXiv takes care of the rapid sharing part and provides open access, but can severely hamper inclusiveness and openness to ideas. As also discussed below, arXiv posts are highly visible (a good thing) and the chance that a reviewer knows the identity of the authors is far greater than if the paper is not posted to arXiv. Knowing the identity and affiliation of the authors introduces bias to the review process (see McKinley2015 or TomkinsHeavlin2017 for examples). Consider for yourself whether, for example, you would be able to write completely fair reviews for two similar papers where one is authored by a respected team a top research university and the second is a single-author paper from a university you’ve never heard of in a country without a strong research track record.
Why does it matter, it’s just like technical reports and such?
The issue with arXiv is that it provides push alerts and is very visible on Google Scholar. This means that unblinding via arXiv is far more likely than if posted in some other way.
Won’t arXiv object to anonymous posts?
Yes, but these posts are not anonymous. They are anonymized. All author information must be included as usual and the posts will become non-anonymized after a period of time or when de-blinded by the authors.
Do professional societies already have preprint rules that prohibit this anonymized arXiv?
Good question, we’re figuring this out, but there is already precedence for some venues to have strong preprint rules even if the professional society’s rules are lax.