Skip to main content

Think big, then scrutinize: evaluating thought leadership with scientific methods

Published onNov 18, 2024
Think big, then scrutinize: evaluating thought leadership with scientific methods
key-enterThis Pub is a Comment on
Uncovering the Benefits and Challenges of Continuous Integration Practices
Uncovering the Benefits and Challenges of Continuous Integration Practices
Description

In 2006, Fowler and Foemmel defined ten core Continuous Integration (CI) practices that could increase the speed of software development feedback cycles and improve software quality. Since then, these practices have been widely adopted by industry and subsequent research has shown they improve software quality. However, there is poor understanding of how organizations implement these practices, of the benefits developers perceive they bring, and of the challenges developers and organizations experience in implementing them. In this paper, we discuss a multiple-case study of three small- to medium-sized companies using the recommended suite of ten CI practices. Using interviews and activity log mining, we learned that these practices are broadly implemented but how they are implemented varies depending on their perceived benefits, the context of the project, and the CI tools used by the organization. We also discovered that CI practices can create new constraints on the software process that hurt feedback cycle time. For researchers, we show that how CI is implemented varies, and thus studying CI (for example, using data mining) requires understanding these differences as important context for research studies. For practitioners, our findings reveal in-depth insights on the possible benefits and challenges from using the ten practices, and how project context matters.

Thank goodness we have thought leaders.

These are individuals who are recognized as authorities in a particular field, industry, or domain, and we trust them and follow them not only for their deep domain knowledge, but also for their creative and innovative thinking. The most influential thought leaders have the power to inspire change at massive scale, sparking industry revolutions in tooling, processes, philosophies and more.

But how do we know that these thought leaders are right? That the ideas they propose are sound? That the methods and strategies and techniques they evangelize are effective in achieving the gains they purport to achieve?

Too often, we don’t. Too often, we accept the ideas of thought leaders without holding them up to empirical scrutiny – without doing science on them, if you will.

And so, thank goodness that in addition to thought leaders, we have scientists. Like the ones who authored “Uncovering the benefits and challenges of continuous integration practices” [1]. Using a mixed-methods research design (i.e., analyzing both quantitative and qualitative data), Omar Elazhary and his colleagues investigate a set of ten core Continuous Integration (CI) practices that were proposed back in 2006 [2] by two thought leaders in the software industry:

  1. Maintain a single source repository

  2. Automate the build

  3. Make your build self-testing

  4. Everyone commits to the mainline every day

  5. Every commit should build the mainline on an integration machine

  6. Keep the build fast

  7. Test in a clone of the production environment

  8. Make it easy for anyone to get the latest executable

  9. Ensure that system state and changes are visible

  10. Automate deployment

These ten core CI practices were subsequently seen and adopted by a lot of people, teams, and organizations. Not always all of them simultaneously on any given project, of course – technical limitations and resource constraints can hinder adoption as this paper partly shows – but these CI practices were discussed and accepted and to varying degrees adopted by Ops and Software Engineering teams.

And yet, fifteen years later, Elazhary and his co-authors write that while there has been research studying CI’s impact on software quality, at the time this study was conducted, existing research had not yet contributed to “understanding how the ten CI practices each play a role in quality improvement or developer productivity.”

So, that’s fifteen years that this list of core CI practices had been around without being examined empirically. That’s a long time, folks. As the authors note, we lack empirical evidence around “how or if organizations implement the individual practices,” as well as developers’ perceptions of both their benefits and implementation challenges. As software practitioners, we should be asking: Without this empirical examination, how can we feel confident that these practices are actually worth implementing?

In this paper, the authors fill this gap in the research by taking each of the ten CI practices deemed “core” by the software thought leaders who proposed them, and outlining what the existing empirical evidence has to say about the practice (again, not a whole lot), as well as contributing to what we do know by using evidence to elucidate how that practice is implemented and rationalized, what trade-offs are perceived by software practitioners, and how implementations differs between contexts.

Outside of the fact that it fills a research gap opened up by thought leadership, there’s a lot of other stuff I love about this paper:

Robust Treatment of Qualitative Data: This is a mixed-methods study, meaning both quantitative and qualitative data were used to answer the research questions. Qualitative data collection and analysis is really challenging, and really time-consuming, and it is so easy to cut corners here (I know this first-hand, having just collaborated with Dr. Carol Lee and Dr. Cat Hicks on the qualitative data analysis for a study examining the effectiveness of a code review anxiety intervention). The robustness with which these researchers approached their qualitative data collection and analysis (as described in their methods section) felt outstanding to me, particularly when taken in the context of the larger body of software engineering research, which in general and this moment in time does not always follow best practices in qualitative methods [3] [4] [5].

Replication Package: A replication package in the context of scientific research is a set of materials and resources provided by researchers to enable others to replicate, reproduce, or verify the results of their study. It typically includes all the data, code, scripts, and documentation necessary for an independent researcher to repeat the original analysis and check whether they obtain the same results. Replication packages are important for promoting transparency, rigor, and reproducibility in science, as well as building trust in the findings by making the research process more open and accountable.

As a software practitioner who always reads research like this with a mind toward adapting it to organizational contexts, I see replication packages as potential blueprints for how to conduct case studies within my own organization or team. I think in particular, the qualitative interview materials provided in this study’s replication package could be helpful to folks hoping to put together an informal case study:

“The interviews helped us form case report profiles that give a deeper introduction to the three organizations and how they use CI practices. The interviews further helped us answer our three research questions.”

Here, you can think about being your own scientist. The interview scripts were developed by folks with deep expertise in research methods. By replicating these methods, software practitioners can partly replicate the research in their own organizations.

These authors also include implications for developers call-outs, which I wish we saw more of in software engineering research. Reading empirical research is cognitively-demanding as it is, without having to also answer the question, “How can we use this on our team?”

Construct Development: The authors describe future research they hope to do in order to develop a set of valid CI constructs, which is huge. Construct development is a scientific practice that ensures we’re all talking about the same thing when we talk about a thing (e.g., grit, developer experience, flow state, productivity, etc.). Developing a construct is time-consuming and requires not only deep expertise in a field or domain, but also a certain level of consensus-gathering in a community of practice, and usually, multiple steps of empirical testing, validation and iteration within the specific populations that you want to measure [6] [7].

As you read this research, I encourage you to think about other processes and paradigms in the software space that are widely accepted, and then do a little sleuthing to discover what kind of empirical research has been done to validate that process or paradigm.

And if you can’t find anything, maybe reach out to your favorite software engineering scientist to put that research gap on their radar 😀

Comments
0
comment
No comments here
Why not start the discussion?