Empirical research into open source evolution – should we adjust the agenda to what seems possible?
In: Workshop on Maintenance and Evolution of Free-Libre-Open Source Software - MEFLOSS 2008, 03 Oct 2008, Beijing, China.
Full text available as:
Due to copyright restrictions, this file is not available for public download
The quantitative study of software evolution seem to have kicked off in the early 1970’s with the studies of Manny Lehman and Les Belady of “large program growth dynamics” in the IBM 360-370 operating system. The phenomenon studied was subsequently called software evolution. Their work led to the now eight Lehman’s laws of software evolution and other insights such as the SPE program classification . During its first 30 or so years, the topic of empirical studies of software evolution was pursued almost exclusively by Lehman and other few investigators (e.g. ) and it was limited to the analysis of metrics data from proprietary systems. Open source has opened up this topic to almost any interested researchers with Internet access, knowledge of the data extraction tools and sufficient computing power, not just for those with contacts in industry. This is because open source offers unlimited access to code and other to other artifacts and researchers are free to disclosure what they find (this not, by the way, the same as to say that open source researchers should be free from ethical considerations). Open source has triggered an increasing interest in empirical studies of software evolution, as reflected in the growing number of empirical studies of open source evolution being published. The MEFLOSS workshop gives also testimony to this interest. With more than 10 years involvement in software evolution research, I ask myself whether the actual possibilities of conducting thorough and solid scientific work in this area will match our current expectations (see, for example, the impressive list of, to me, difficult issues present in MEFLOSS’08 call for papers). Without any doubt, there are big opportunities (e.g. data availability) for research into this field. There are, however, big challenges, some of which are, to my view, not sufficiently addressed. I would go further and argue the following: research in this area is still in its infancy. The most serious challenges need to be systematically tackled, if we are going to achieve one day an understanding of the evolution of software similar to that achieved today in traditional engineering disciplines like mechanical or electrical engineering.
Actions (login may be required)