Measuring the cost and impact of open source software innovation on GitHub
Open Source Software (OSS), defined by Open Source Initiative, is computer software with its source code shared with a license in which the copyright holder provides the rights to study, change, and distribute the software to anyone and for any purpose. OSS is developed, maintained, and extended both within and outside of the private sector, through the contribution of independent developers as well as people from universities, government research institutions, businesses, and nonprofits. Examples include Apache server software, and R statistical programming software. Despite its ubiquity and extensive use, reliable measures of the scope and impact of OSS developed outside of the business sector are scarce. Activities around OSS development, a vital component of science activity, are not well-measured in existing federal statistics on innovation. Many of the OSS projects are developed and maintained in free repositories, such as GitHub, and information embedded in these repositories, including the code, contributors, and development activity, is publicly available.
In this paper, we use data from GitHub, the largest platform with over 30 million users and developers worldwide, obtaining information about OSS projects. We collect 7.8 million project repositories, containing metadata such as author, license, commits (approved code edits), and lines of code. We adopt methods used in software engineering to estimate the resource cost associated with creating OSS. We use lines of code as the measure of effort to estimate the time spent on software development and calculate the monetary value using the average compensation for computer programmers from Bureau of Labor Statistics wage data and other costs based on national accounts methodologies. The preliminary estimates show that the resource cost for developing open source software projects exceeds $928 billion dollars, based on 2017 costs. Finally, we propose to use network analysis methods developed for bibliometrics and patent analysis to study the impact of these projects, and the actors of the OSS ecosystem.