Tuesday, July 14, 2020

Birds of a feather: Reviewing SE research papers

A few weeks ago, I was asked by Patanamon Thongtanunam, Margaret-Anne (Peggy) Storey, Liliana Pasquale, and Jane Cleland-Huang if I could chair a "Birds of a feather" session at ICSE 2020. After a few email exchanges, we settled on the topic of "Reviewing SE research papers".

This took place on Friday 1oth July, 2020 at 17:10 (UTC) like any other session at ICSE 2020 on Zoom. There were more than 30 participants in the call. I had prepared a few questions to drive the discussion and emailed a few people ahead of time so that they could chime in.

We heard from Margaret-Anne (Peggy) Storey, Premkumar Devanbu, Lionel Briand, Denae Ford, Matt Dwyer, Nenad (Neno) Medvidovic, Paul Ralph and Sarah Nadi.

I asked Neno to start it off by presenting some details from the ACM Peer Review Task Force. he emphasized the need for good reviewers. They are currently looking at how to motivate reviewers to review papers. Some of the others chimed in on this topic at various points. Sarah brought up the question of motivation for a reviewer. The reviewing load has increased so much and what benefit does a reviewer get from doing another review. One suggestion that Matt made was that as researchers we should want to review as many papers as the number of reviews we get for our own papers. This is the philosophy that I was taught in the various places I have studied and worked. Nonetheless, there is a problem - not enough reviewers and we need to see how to solve that too. While seeing how to increase the number of reviewers, it was brought up that we should be looking at how to improve the quality as well.

Then the first question that I had was - What is a good review for a SE paper (and what type of SE paper are you referring to). Matt started off by mentioning that there are two things about a review: (a) the decision and (b) the review. Regarding the decision, he suggested that we should as good reviewers take a stand and help in making a decision. Either we do support the paper for acceptance or we do not think this paper is good enough to be published. Regarding the review itself, he thinks a review should provide constructive and actionable feedback so that the authors can take steps to address the issues.

The next panelist was Denae who has published in SE venues and HCI venues like CHI and CSCW. She compared the PCs of the two communities. In the HCI community, students are involved in the review process a lot earlier in a more official capacity. She echoed the comments from Matt too and reiterated the point that we as reviewers need to provide clear details of what needs to be improved and how we think it should be improved, especially if we are knowledgable in the area?

The next speaker was Peggy who talked about a "Who, What, and How" framework that she developed with Neil Ernst (who was on the call as well) that they use when reviewing papers. The first thing they look for is who will benefit from this research. The next is the "what". There they look at what type of research contribution it is. This is where it matters why type of research it is. Is it a descriptive paper, or exploratory or a solution based contribution. Is it about understanding a problem or is it about addressing a problem? The last thing they look for is the "How". Here they look at what methodologies were used and whether the chosen methodology is appropriate for the "what" that the authors are trying to answer. She also warned about reviewing a paper for which we may have the technical expertise but not the contextual knowledge. For example, we may know about the "What" but maybe we do not know how research is done in an industrial setting. More details of it can be seen here.

Prem talked about an anti-pattern: what he called a drive-by-review. He defines this as a "reflexive deployment of a very generic critique to a paper in a manner that is usually not well-matched to the specific context". He tied it back to Neno's earlier comment that one might do this type of review because of a lack-of-time. He gave examples of "cheap shots" that could be applied to any paper. I find that such a review also does not fit the requirements that Matt mentioned above. They are seldom useful or actionable. Prem then went on to present how such cheap shots can have ripple effects that impact our research area. As an example of a good review, he talked about how every good review should start with a detailed summary of the goals and contributions. And if we are going to use a cheap shot, then there has to be a clear argument on how the cheaps shots are relevant to a paper. This can help the author see if as a reviewer we understood their paper. Slides from Prem can be found here.

At this point, I chimed in arguing that such drive-by reviews can negatively impact students who are submitting their first paper. Matt added that in a talk at a doctoral symposium he told the students that reviewing is a social contract. One should write a review in a way that one expects reviews from others for their own work.

The next speaker was Lionel who presented an alternative, sobering, and pragmatic point. We as a community have been discussing good reviews for decades now. If everyone knows what a good review is then why are we still getting bad reviews? There may a different problem which is the root cause here. In the meantime, he trains his students to receive unfair criticism. He then went on to talk about industrial papers and how the criteria for evaluation has to be different. But he believes that industrial or academic papers all need to be in one track since they are all research. He also mentions that impact is very difficult to assess for academic reviewers and hence it is undervalued. He also talked about the technical diversity in SE. With so many sub-areas in the field, it is difficult to get a set of reviewers who are all experts. And this problem is bigger in a general conference like ICSE/FSE. He, therefore, prefers the journal model where there is more time to find expert reviewers. But due to a large number of conferences, there is now little to no time for people to accept journal review invitations. At this point, there was a suggestion from Sarah Nadi in the chat window asking about open reviews.

The next speaker was Paul. He and Romain Robbes have been working as part of an ACM SIGSOFT initiative that is looking to improve peer-review. He believes that this is because there is no consensus on what is a good review in our area. There are wildly differing opinions. To build consensus, he notes it should be a true consensus from the whole community and not just the senior people. He gave an example of what should we look for when reviewing a systematic literature review. He said that such guidelines are being developed for 9 different types of papers. Denae, asked at that point, on how these guidelines were being developed and who was asked for feedback. This question was rooted in how the checklist/criteria will be able to support the experiences of newcomers to the community (not just Ph.D. students) as well as previously successful 'ICSE/FSE veteran' submitters (more on this below). Paul replied that 40 different people have contributed to the standards and that the first draft will be made available for the larger community to comment on (I see it as a kind of RFC). Meanwhile, in the chat, Sarah and Lionel worried about cookie-cutter papers and shopping list reviews. A possible answer to this was that the guidelines are just that - guidelines. They should not be used as a list to accept or reject a paper. The guidelines will also have examples of when there can be an exception. But as a reviewer, we now have an implicit list that we use when we are reviewing a paper. This list is not based on consensus and hard to write for by an author. So having these guidelines will help as authors could either meet the guideline or explain why a particular aspect of a guideline does not apply thereby helping a reviewer better understand the contribution. Neno also warned that we should not look to break the review down into categories and assign scores to each as this has not worked in the past. One reason is that it is hard to quantify a year or two worth research as a few metrics.

Prem added that one of the things that authors want is to be seen and heard. If the reviewer convinces the author in the review that they have read and understood the paper then that in itself will be a good review for the authors. Lionel pointed out that we should aim to reduce the number of bad reviews but we cannot avoid them completely. And that is where an associate editor (AE) in a journal can step in a say what they want the authors to do in a revision. Thus a two-layer review system can reduce the number of bad reviews that an author needs to address. A parallel discussion in the chat window at this time was - what can a PC chair/associate editor do when they receive a bad review? What if there is a power dynamic that prevents the PC chair from taking any harsh actions? In a journal review perhaps, an AE can ignore or ask the author to focus on other aspects and not the bad review. But in a conference, how can a junior PC member or a PC Chair who is not as senior as the reviewer call them out? I think this remains a problem.

After the talk, Denae clarified that the concept she was referring to in the question to Paul is a dimension of procedural fairness (related book) where if you give people a voice in the matter they will be more apt to adopt a set of rules. One way to do this is by having a community 'vote' on the rules or have the collective community contribute to the rules in a low friction approach that encourages feedback.

Youtube link of the session: https://youtu.be/9Wup0WUvPWI


Slides from the BoF session:

Margaret-Anne (Peggy) Storey: Slides.

Premkumar Devanbu: Slides.

Other Links (suggested in the chat and after):

Niel Ernst: On appealing editor's decision.

Denae Ford: To support the 'newly minted Ph.D.' perspective, she suggested a couple pieces that she has written in her blog that could help - dealing with paper rejections and celebrating the wins even when there are rejections.