Thank goodness we have thought leaders.
These are individuals who are recognized as authorities in a particular field, industry, or domain, and we trust them and follow them not only for their deep domain knowledge, but also for their creative and innovative thinking. The most influential thought leaders have the power to inspire change at massive scale, sparking industry revolutions in tooling, processes, philosophies and more.
But how do we know that these thought leaders are right? That the ideas they propose are sound? That the methods and strategies and techniques they evangelize are effective in achieving the gains they purport to achieve?
Too often, we don’t. Too often, we accept the ideas of thought leaders without holding them up to empirical scrutiny – without doing science on them, if you will.
And so, thank goodness that in addition to thought leaders, we have scientists. Like the ones who authored “Uncovering the benefits and challenges of continuous integration practices” [1]. Using a mixed-methods research design (i.e., analyzing both quantitative and qualitative data), Omar Elazhary and his colleagues investigate a set of ten core Continuous Integration (CI) practices that were proposed back in 2006 [2] by two thought leaders in the software industry:
Maintain a single source repository
Automate the build
Make your build self-testing
Everyone commits to the mainline every day
Every commit should build the mainline on an integration machine
Keep the build fast
Test in a clone of the production environment
Make it easy for anyone to get the latest executable
Ensure that system state and changes are visible
Automate deployment
These ten core CI practices were subsequently seen and adopted by a lot of people, teams, and organizations. Not always all of them simultaneously on any given project, of course – technical limitations and resource constraints can hinder adoption as this paper partly shows – but these CI practices were discussed and accepted and to varying degrees adopted by Ops and Software Engineering teams.
And yet, fifteen years later, Elazhary and his co-authors write that while there has been research studying CI’s impact on software quality, at the time this study was conducted, existing research had not yet contributed to “understanding how the ten CI practices each play a role in quality improvement or developer productivity.”
So, that’s fifteen years that this list of core CI practices had been around without being examined empirically. That’s a long time, folks. As the authors note, we lack empirical evidence around “how or if organizations implement the individual practices,” as well as developers’ perceptions of both their benefits and implementation challenges. As software practitioners, we should be asking: Without this empirical examination, how can we feel confident that these practices are actually worth implementing?
In this paper, the authors fill this gap in the research by taking each of the ten CI practices deemed “core” by the software thought leaders who proposed them, and outlining what the existing empirical evidence has to say about the practice (again, not a whole lot), as well as contributing to what we do know by using evidence to elucidate how that practice is implemented and rationalized, what trade-offs are perceived by software practitioners, and how implementations differs between contexts.
Outside of the fact that it fills a research gap opened up by thought leadership, there’s a lot of other stuff I love about this paper:
Robust Treatment of Qualitative Data: This is a mixed-methods study, meaning both quantitative and qualitative data were used to answer the research questions. Qualitative data collection and analysis is really challenging, and really time-consuming, and it is so easy to cut corners here (I know this first-hand, having just collaborated with Dr. Carol Lee and Dr. Cat Hicks on the qualitative data analysis for a study examining the effectiveness of a code review anxiety intervention). The robustness with which these researchers approached their qualitative data collection and analysis (as described in their methods section) felt outstanding to me, particularly when taken in the context of the larger body of software engineering research, which in general and this moment in time does not always follow best practices in qualitative methods [3] [4] [5].
Replication Package: A replication package in the context of scientific research is a set of materials and resources provided by researchers to enable others to replicate, reproduce, or verify the results of their study. It typically includes all the data, code, scripts, and documentation necessary for an independent researcher to repeat the original analysis and check whether they obtain the same results. Replication packages are important for promoting transparency, rigor, and reproducibility in science, as well as building trust in the findings by making the research process more open and accountable.
As a software practitioner who always reads research like this with a mind toward adapting it to organizational contexts, I see replication packages as potential blueprints for how to conduct case studies within my own organization or team. I think in particular, the qualitative interview materials provided in this study’s replication package could be helpful to folks hoping to put together an informal case study:
“The interviews helped us form case report profiles that give a deeper introduction to the three organizations and how they use CI practices. The interviews further helped us answer our three research questions.”
Here, you can think about being your own scientist. The interview scripts were developed by folks with deep expertise in research methods. By replicating these methods, software practitioners can partly replicate the research in their own organizations.
These authors also include implications for developers call-outs, which I wish we saw more of in software engineering research. Reading empirical research is cognitively-demanding as it is, without having to also answer the question, “How can we use this on our team?”
Construct Development: The authors describe future research they hope to do in order to develop a set of valid CI constructs, which is huge. Construct development is a scientific practice that ensures we’re all talking about the same thing when we talk about a thing (e.g., grit, developer experience, flow state, productivity, etc.). Developing a construct is time-consuming and requires not only deep expertise in a field or domain, but also a certain level of consensus-gathering in a community of practice, and usually, multiple steps of empirical testing, validation and iteration within the specific populations that you want to measure [6] [7].
As you read this research, I encourage you to think about other processes and paradigms in the software space that are widely accepted, and then do a little sleuthing to discover what kind of empirical research has been done to validate that process or paradigm.
And if you can’t find anything, maybe reach out to your favorite software engineering scientist to put that research gap on their radar 😀