KIT | KIT-Bibliothek | Impressum | Datenschutz

A fine-grained data set and analysis of tangling in bug fixing commits

Herbold, Steffen ; Trautsch, Alexander; Ledel, Benjamin; Aghamohammadi, Alireza; Ghaleb, Taher A.; Chahal, Kuljit Kaur; Bossenmaier, Tim 1; Nagaria, Bhaveet; Makedonski, Philip; Ahmadabadi, Matin Nili; Szabados, Kristof; Spieker, Helge; Madeja, Matej; Hoy, Nathaniel; Lenarduzzi, Valentina; Wang, Shangwen; Rodríguez-Pérez, Gema; Colomo-Palacios, Ricardo; Verdecchia, Roberto; ... mehr

Abstract:

Context
Tangled commits are changes to software that address multiple concerns at once. For researchers interested in bugs, tangled commits mean that they actually study not only bugs, but also other concerns irrelevant for the study of bugs.

Objective
We want to improve our understanding of the prevalence of tangling and the types of changes that are tangled within bug fixing commits.

Methods
We use a crowd sourcing approach for manual labeling to validate which changes contribute to bug fixes for each line in bug fixing commits. Each line is labeled by four participants. If at least three participants agree on the same label, we have consensus.

Results
We estimate that between 17% and 32% of all changes in bug fixing commits modify the source code to fix the underlying problem. However, when we only consider changes to the production code files this ratio increases to 66% to 87%. We find that about 11% of lines are hard to label leading to active disagreements between participants. Due to confirmed tangling and the uncertainty in our data, we estimate that 3% to 47% of data is noisy without manual untangling, depending on the use case.
... mehr


Verlagsausgabe §
DOI: 10.5445/IR/1000150986
Veröffentlicht am 24.09.2022
Originalveröffentlichung
DOI: 10.1007/s10664-021-10083-5
Scopus
Zitationen: 18
Web of Science
Zitationen: 11
Dimensions
Zitationen: 23
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
Publikationstyp Zeitschriftenaufsatz
Publikationsmonat/-jahr 11.2022
Sprache Englisch
Identifikator ISSN: 1382-3256, 1573-7616
KITopen-ID: 1000150986
Erschienen in Empirical Software Engineering
Verlag Springer
Band 27
Heft 6
Seiten Art.-Nr.: 125
Vorab online veröffentlicht am 02.07.2022
Nachgewiesen in Web of Science
Scopus
Dimensions
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page