This document discusses software ecosystems and their complex evolution. It defines a software ecosystem as interdependent software projects that evolve together. Research analyzes ecosystems using ideas from biology and complex systems across disciplines. Ecosystems are huge networks of thousands of interdependent parts and contributors that are difficult to manage and grow superlinearly over time. They exhibit properties of complex networks like following power laws and being small worlds. Simple models can explain ecosystem growth patterns.
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
Software Ecosystem Evolution Insights
1. Software Ecosystem Evolution. It’s complex!
Tom Mens, COMPLEXYS Research Institute
University of Mons, Belgium
http://blog.christianposta.com/images/disorg
anized.png
2. Software Ecosystems
A software ecosystem is a collection
of [inderdependent] software
projects that are developed and
evolve together in the same
environment. Mircea Lungu
(PhD, 2008)
4. Interdisciplinary research
“Many challenges we face are not solvable by people
remaining in their single discipline silos”…
www.newscientist.com/article/mg20928002-100-open-your-mind-to-interdisciplinary-research/
5. Interdisciplinary research
Ecological Studies of Open Source Software Ecosystems
• Use ideas from biological ecology to understand
and improve evolution of software ecosystems
• Ongoing research project (2012-2017)
in collaboration with Prof. Philippe Grosjean
COMPLEXYS Research Institute of UMONS
• Use ideas from complex systems research
across different scientific disciplines.
10. > 317K packages
> 728K dependencies
in June 2016
https://exploringdata.github.io/vis/npm-packages-dependencies/ (July 2013)
11. Software Ecosystems
Are inherently socio-technical
– Thousands of interdependent
software parts
– Thousands of interacting
contributors
T Mens. An ecosystemic and socio-
technical view on software maintenance
and evolution. ICSME 2016 keynote.
12. Software Ecosystems
Are difficult to manage
– Unclear structure
– Backward incompatible changes, breaking
dependencies
– Unexpected removal of software components
– Departure of key contributors
– Cascading security problems
– Nontransparent painful submission/update policies
– Violations of policies (versioning, licensing, …)
13. Software Ecosystems
Are all different
Every software ecosystem
– has specific habits, expectations, change policies
– uses specific tools
Bogart et al. How to break an API: Cost negotiation and community values in
three software ecosystems. FSE 2016
14. Software Ecosystems
Share similar topologies
– Most non-isolated packages (~90%) belong to a
single weakly-connected component
Alexandre Decan, Tom Mens, Maelick Claes:
- On the topology of package dependency networks: A comparison of programming language
ecosystems, WEA 2016
- An empirical comparison of dependency issues in OSS packaging ecosystems, SANER 2017
16. Mirroring hypothesis
Conway’s law
Software structure tends to mirror the
organisational/social structure
A.k.a. socio-technical congruence
alignment between technical dependencies and
social coordination in a project
17. Mirroring hypothesis
Conway’s law
• Evidence in favor: commercial “in-house” development
MacCormack et al. “Exploring the duality between product and
organizational architectures: A test of the mirroring hypothesis.” Research
Policy, 2012.
18. Mirroring hypothesis
Conway’s law
• Evidence against: “community-based” development
Colfer et al. “The mirroring hypothesis: Theory, evidence and
exceptions.” Harvard Business School, 2010.
More modular software with emergent structure?
reminiscent of “complex systems”?
Syeed et al. “Socio-technical congruence in the Ruby ecosystem.”
OpenSym, 2014.
19. Complex Systems
“A new approach to science that investigates how
relationships between parts give rise to the
collective behaviors of a system and how the
system interacts and forms relationships with its
environment.”
Emergence: process whereby larger entities,
patterns, and regularities arise through interactions
among smaller or simpler entities that themselves
do not exhibit such properties.
22. Complex Networks
Citation from Melanie Mitchell:
“network thinking is providing novel ways to think
about difficult problems such as how to do efficient
search on the Web, […] how to manage large
organisations, how to preserve ecosystems, […] and,
more generally, what kind of resilience and
vulnerabilities are intrinsic to natural, social, and
technological networks, and how to exploit and
protect such systems.”
23. Complex Networks
Examples of complex technological networks
– World-Wide Web
– Software dependency graphs
– Social networks (e.g. Facebook)
– Socio-technical software ecosystems
25. Complex Networks
Small-world property
• Like random networks but …
• Low average path length
between any two nodes
– 6 degrees of separation
• High clustering coefficient
• Clusters of components
linked through “hubs”
26. Complex Networks
Example of small-world property
• Bugzilla collaboration networks in large OSS projects
M. Zanetti, E. Sarigol, I. Scholtes, C. Tessone, F. Schweitzer. A quantitative study
of social organisation in open source software communities, 2012
27. Complex Networks
Skewed distributions (power law behaviour)
• Few nodes with very high in (resp. out) degree
• Many nodes with very small in (resp. out) degree
28. Complex Networks
Scale-freeness
• Observed degree distribution is very similar
regardless of the scale of the observation
Scale-free networks are more resilient to changes
– Robust to deletion of random (non-hub) nodes
– Vulnerable to the deletion of hubs
29. Complex Networks
Possible applications for SECOs
• Provide prediction/forecasting models
– of how SECOs emerge
– of how SECOs grow
• Estimate SECO resilience after major disturbances
• Assess risk of deleting hub nodes
(key components or key people) bus factor!
30. How do SECOs grow?
Popular model: preferential attachment
Barabasi et al. Emergence of Scaling in Random Networks.
Science 286, 1999
31. How do SECOs grow?
Popular model: preferential attachment
Reasons for preferential attachment
• Popularity
“the rich get richer”
• Quality
“the good get better”
• Mixed
those reaching critical mass first will become stars
Barabasi et al. Emergence of Scaling in Random Networks.
Science 286, 1999
32. How do SECOs grow?
Extension of preferential attachment model to
simulate growth of complex software systems
By mimicking the principle of coupling & cohesion
Li et al. Multi-Level Formation of Complex Software
Systems. Entropy 18(178), 2016
33. Simple growth models
for complex systems
• A complex system may have thousands of variables
and degrees of freedom
• Yet, some of its dynamic behaviour can be
explained surprisingly well by simple models like
exponential, logistic, or Gompertz)
– Due to emergent organisation and properties
– Due to constraints limiting the degrees of freedom
34. Simple growth models
for complex systems
G. West, J. Brown. The origin of allometric scaling laws in biology from
genoms to ecosystems: towards a quantitative unifying theory of biological
structure and organization. Journal of Experimental Biology, 2005
Allometric Scaling
• Many fundamental phenomena in living systems
scale as a simple power law
35. Simple growth models
for complex systems
Allometric Scaling
Growth rate of a mammal’s
mass or size during its life time
36. Simple growth models
for complex systems
Allometric Scaling
Metabolic rate scales
as a ¾ power of mass
37. Simple growth models
for complex systems
Allometric Scaling
• Expected lifespan of mammal increases
as a ¼ power of mass
• Animal heart rate decreases
as a –¼ power of mass
• Population density in ecosystems decreases
as – ¾ power of body size
38. How do SECOs grow?
Evidence of allometric scaling in software?
– Growth rate as a function of artefact size?
(software components, individual software
systems, software ecosystems)
– Lifetime as a function of artefact size?
– …
39. Ongoing Work
What social factors affect growth, resilience of
SECOs?
• Temporary or permanent effect of joiners and
leavers?
• Impact of competing SECOs
(e.g. Ruby on Rails vs. Node.js vs Django)
• Impact of technological disruptions
(e.g. migration to git; new major release; …)
Rely on complex network theory to study these…
40. Previous Work
• Challenges in software ecosystems research. A Serebrenik, T Mens.
IWSECO-WEA 2015
• When GitHub meets CRAN: An analysis of inter-repository package
dependency problems. A Decan et al. SANER 2016
• An ecosystemic and socio-technical view on software maintenance
and evolution. T Mens. ICSME 2016 keynote
• On the topology of package dependency networks: A comparison of
three programming language ecosystems. A Decan, T Mens, M
Claes. WEA 2016
• Social and technical evolution of software ecosystems: A case study
on Rails. E Constantinou, T Mens. WEA 2016
• An empirical comparison of dependency issues in OSS packaging
ecosystems. A Decan, T Mens, M Claes. SANER 2017
• Socio-technical evolution of the Ruby ecosystem in GitHub. E
Constantinou, T Mens. SANER 2017
Notes de l'éditeur
This talk will bemore visionary in nature. I do not intend to present research we are currently conducting,
Buth rather explore how we can rely on interdisciplinary research,
borrowing techniques and theroies from biology and complex systems,
to understand and support software ecosystem evolution.
Original focus of ECOS was to borrow ideas from biology/ecology/natural ecosystems for understanding SECOs. We have opened up the idea to include ideas and techniques from other disciplines as well. I will present some of those interdisciplinary techniques later in my talk.
Definition: “in the same environment” => this environment includes the community of ecosystem contributors, the hardware and software development platform underlying the ecosystem (e.g. language-specific, operating system specific), possible interactions with other ecosystems, and many more.
(We do not focus on the business aspects of sw ecosystems in our research.)
All of these ecosystems are quite large, containing (tens of) thousands of different software components, with many interdependencies, an evolution history of many years, a large and active community of contributors.
Studying such software ecosystems can be quite challenging
Developing and maintaining components within these ecosystems can also be quite challenging.
Original focus of ECOS was to borrow ideas from biology/ecology/natural ecosystems for understanding SECOs. We have opened up the idea to include ideas and techniques from other disciplines as well. I will present some of those interdisciplinary techniques later in my talk.
Alexander Serebrenik => you probably all know who his is, since he is ICSME chair
Alexandre Decan first carried out research on formal database theory but I managed to convert him to the more practical side of SE research
Bogdan Vasilescu, obtained his PhD with Serebrenik, and after a 2 year postdoc at UCDavis now joined CMU in Pittsburgh.
Explain that SECOs are huge, in terms of number of LOC, number of contributors, duration in time, number of packages and depenencies, number of commits, …
Also explain that, compared to other research disciplines they can still seem small. E.g. compared to research in particle physics, astronomy, DNA analysis,
Where the data to be analysed is even several magnitudes higher.
Originallyfocus of ECOS was to borrow ideas from biology/ecology/natural ecosystems for understanding SECOs. We have opened up the idea to include ideas and techniques from other disciplines as well. I will present some of those interdisciplinary techniques later in my talk.
Definition: “in the same environment” => this environment includes the community of ecosystem contributors, the hardware and software development platform underlying the ecosystem (e.g. language-specific, operating system specific), possible interactions with other ecosystems, and many more.
(We do not focus on the business aspects of sw ecosystems in our research.)
Originallyfocus of ECOS was to borrow ideas from biology/ecology/natural ecosystems for understanding SECOs. We have opened up the idea to include ideas and techniques from other disciplines as well. I will present some of those interdisciplinary techniques later in my talk.
Definition: “in the same environment” => this environment includes the community of ecosystem contributors, the hardware and software development platform underlying the ecosystem (e.g. language-specific, operating system specific), possible interactions with other ecosystems, and many more.
(We do not focus on the business aspects of sw ecosystems in our research.)
Taking into account SECO differences is important
to provide ecosystem-specific support for SECO maintenance and evolution
to generalise research findings across SECOs
Taking into account SECO differences is important
to provide ecosystem-specific support for SECO maintenance and evolution
to generalise research findings across SECOs
Taking into account SECO differences is important
to provide ecosystem-specific support for SECO maintenance and evolution
to generalise research findings across SECOs
Before coming to my next example of interdisciplinary research, let me again take another side track, and talk about the mirroring hypothesis, aka Conway’s law.
M. Cataldo, J. D. Herbsleb, and K. M. Carley. Socio-technical congruence: A framework for assessing the impact of technical and work dependencies on software development productivity. In Int’l Symp. Empirical Software Engineering and Measurement, pages 2–11. ACM , 2008.
Another evidence against can be found in the paper “Socio-Technical Congruence in the Ruby Ecosystem” by Syeed et al. in OpenSym 2014. (Based on an analysis of the Ruby software ecosystem.)
Another evidence against can be found in the paper “Socio-Technical Congruence in the Ruby Ecosystem” by Syeed et al. in OpenSym 2014. (Based on an analysis of the Ruby software ecosystem.)
The behavior of a complex system is bigger than the sum of its parts: the behaviour of the system as a whole cannot be understood by looking at the interaction between the individual entities that compose it.
“A new approach to science that investigates how relationships between parts give rise to the collective behaviors of a system and how the system interacts and forms relationships with its environment.”
Emergence: process whereby larger entities, patterns, and regularities arise through interactions among smaller or simpler entities that themselves do not exhibit such properties.
Small-world property: Low average path length between any two nodes. Highly-clustered components linked through hubs
Skewed distributions (power law behaviour): Few nodes with very high in-degree (resp. out-degree), many nodes with very small in-degree (resp. out)
Scale-freeness: Observed degree distribution is very similar regardless of the scale of the observation
The concept of a small world was originally observed in the late 1960’s by the social psychologist Stanley Milgram.
- S. Milgram, “The Small World Problem,” Psychology Today, 2, 1967 pp. 60–67.
- J. Travers and S. Milgram, “An Experimental Study of the Small World Problem,” Sociometry, 32(4), 1969 pp. 425–443.
The concept of a small world was originally observed in the late 1960’s by the social psychologist Stanley Milgram.
- S. Milgram, “The Small World Problem,” Psychology Today, 2, 1967 pp. 60–67.
- J. Travers and S. Milgram, “An Experimental Study of the Small World Problem,” Sociometry, 32(4), 1969 pp. 425–443.
Bugzilla collaboration networks of several software development communities.
(Graph visualisation of the largest connected component )
Nodes represent contributors to the project, links represent collaborations between them. This picture illustrates the significant differences between highly centralized projects (left), projects with a modular or hierarchical social organisation (middle) and projects with less structured collaboration networks (right).
The concept of a small world was originally observed in the late 1960’s by the social psychologist Stanley Milgram.
- S. Milgram, “The Small World Problem,” Psychology Today, 2, 1967 pp. 60–67.
- J. Travers and S. Milgram, “An Experimental Study of the Small World Problem,” Sociometry, 32(4), 1969 pp. 425–443.
Robustness to deletion in the sense that it does not change the structural/topological properties of the network, which remains scale-free, small-world, and skewed distribution after the deletion…
The vulnerability to deletion of hub nodes could be linked easily to the aforementioned notions of technical and social bus factors. Hub nodes have a considerably higher bus factor, since the ecosystem/network is much more vulnerable to their deletion. This implies that managers of the (eco)system should take care to “protect” these hub nodes from getting deleted…
Examples of major disturbances:
Competing SECOs (e.g., Ruby on Rails vs Node.js vs Django)
New technologies: SVN versus Git
Major new releases, e.g. Gnome 2 toGnome 3
Several models have been proposed that lead to scale-free networks.
A popular model is “preferential attachment” proposed in 1999 by Barabasi et al.
While this growth mechanism seems plausible, other mechanisms have been proposed. It remains an open question which mechanism actually causes the scale-free networks we can observe.
Preferential attachment has been used in software evolution research by several authors:
Valverde et al. [20] suggest that the emergence of scal- ing arises from logical optimisation process.
Myers et al. [15] proposed the process of refactoring to improve the structure of existing code as a possible explanation for the emergence of scale-free networks in software.
Inspired by Darwin’s ideas of evolutionary adaptation, Venkatasubramanian et al. proposed a generic model based on network parameters such as efficiency, robustness, cost, and environmental selection pressure [21]. Using a genetic algorithm their model was able to generate different types of network structures, depending on the chosen parameters.
Li et al. [8] proposed an extended model of preferential attachment adapted to software systems, and used it to simulate growth models that mimic the well-known design principle of low coupling and high cohesion. If software developers strive towards this principle, they will naturally obtain systems containing highly cohesive components that are lowly coupled between them, reminiscent of the hubs and clusters structure.