Tag Archives: community

Contributor Sustainability Impacts Risk and Adoption of OSS Projects

I’ve spent a lot of time over the years thinking about the sustainability of open source projects and the role that contributor sustainability plays in overall project sustainability. When I was co-chair of the CNCF Contributor Strategy Technical Advisory Group, contributor sustainability came up often as a concern for CNCF projects, and the most common question was about how to get more people contributing to our projects. This is a hard problem, but there are some resources at the bottom of this post to help grow your contributor base and increase the sustainability of your open source projects.

What I think many people underestimate is how contributor sustainability is viewed through the lens of risk by companies who are deciding whether to adopt your project. It’s easy to think that your project is different. No one will leave, and the project will be wildly successful forever, but that’s not how many companies think about open source adoption. Some companies think hard about which projects to adopt, especially if those technologies are crucial for delivering solutions to their customers, and would be difficult to replace if the project suddenly wasn’t available. Projects with a single dominant contributor or contributions coming almost entirely from a single company are going to be perceived as riskier and companies will be less likely to adopt or use those projects. This is especially true given the recent wave of companies relicensing open source projects and putting them under proprietary licenses. Put in simple terms, contributor sustainability risk makes it harder to get people to adopt your open source projects.

When I was Director of Open Source Community Strategy at VMware, I would often evaluate the risks of adopting specific open source projects, especially if we were considering building commercial products that incorporated those open source technologies in ways that were critical to delivering products to our customers. Contributor sustainability played a big role in deciding whether we would adopt a project. This was especially true for projects that were more strategically important for us, and which would be hard to replace if the project became unsustainable in the future. Given the choice, we’d select projects with better contributor sustainability, which would be a lower risk for us as a company.

Just last week, I was looking at an open source project where almost all of the contributions came from employees of the company driving the project, and there was a single lead developer who made the vast majority of the contributions and code reviews / approvals. That lead developer and their employer are single points of failure for the project. These single points of failure introduce risk for potential adopters and are likely to cause people to think twice before using a project. If I was a company looking for a solution, I would be unlikely to select a project that could suddenly cease to be updated (including security updates) if something happened to the dominant contributor or the company.

In summary, contributor risk stemming from a single person or a single employer makes your project riskier and less likely to be adopted.

While growing your contributor base is hard work, there are quite a few resources to help you improve contributor sustainability along with gaining a better understanding about how companies think about risk when adopting open source projects. Here are a few of those resources, most of which also have links to additional resources:

Update: You might also be interested in reading this follow up post: Companies Can Mitigate Sustainability Risks

Photo by Jan Kopřiva on Unsplash

CHAOSS Data Science Working Group

When I started in the role of Director of Data Science for CHAOSS, one of the first things I did was start the Data Science Working Group (WG) as a way to build community around the data science work that many of us were already doing within the CHAOSS project. I am incredibly proud of what we’ve accomplished in less than 2 years.

Yesterday, we published a CHAOSS blog post about what we’ve been working on lately, but here are a few highlights.

We’ve released 7 Practitioner Guides: Introduction, Contributor Sustainability, Responsiveness, Organizational Participation, Security, Building Diverse Leadership, and Sunsetting an Open Source Project. I’ve covered these in more detail in 2 recent blog posts about Using CHAOSS Practitioner Guides to Improve your OSS Projects and From Data to Action: Building Healthy and Sustainable Open Source Projects.

We are also driving several research projects out of the working group. I’ve already blogged about the Relicensing and Forks research that I’ve been working on, but we also have research looking into projects that move from private ownership into a foundation, archived projects, and a collection of research taxonomies.

You can read the CHAOSS blog post to learn more!

I also wanted to remind people that like all of the CHAOSS working groups, the Data Science WG is open to everyone! All you need to join the Data Science WG is an interest in using data to understand the open source world around us. Most of our work is analysis of data, writing guides, and discussions about using metrics. You don’t need any special skills, and you don’t need to know any advanced statistics, machine learning, or AI. We’re even planning a CHAOSS Data Science Hackathon, which will be  co-located with Open Source Summit North America and CHAOSScon in Denver, CO on June 26, 2025. To learn more, visit our repository, join our meetings, or reach out to us in the #wg-data-science channel in CHAOSS Slack. We hope you’ll join us!

VMware and Other Updates

I realized that I haven’t posted anything in over a year and a half here, but I’ve definitely been busy! The biggest change is that Pivotal was acquired by VMware a few months ago, and I have moved into the Open Source Program Office as Director of Open Source Community Strategy where I continue to work remotely from my flat in the UK. I love my new job, and I get to work with a bunch of really amazing people! While I haven’t been blogging here, I have written several blog posts on the VMware Open Source Blog about building community and strategy.

I’ve been doing quite a few talks at conferences and other events, including some virtual ones, on a wide variety of topics including community building, open source metrics, Kubernetes, and more. Links to presentations and videos where available can be found on the speaking page.

I’m one of the rotating hosts for the new CHAOSScast podcast where we chat about a wide variety of open source metrics topics. I also wrote a post on the CHAOSS blog with a video that talks about how I’m using metrics at VMware to learn more about the health of our open source projects. If you’re as passionate about data and metrics as I am, CHAOSS is an open source community that welcomes contributors of all types, and it’s a fun group of people, so you should join us!

I’ve joined the OpenUK Board of Directors to help promote collaboration around open technologies (open source, open hardware, and open data) throughout the UK. We have weekly presentations that are free for anyone to attend every Friday, and we’re always looking for volunteers who want to help out on a wide variety of committees.

There are also a few other miscellaneous things that I’ve done recently:

I hope to see all of you around the internet, and maybe we’ll even be able to catch up in person after this silly pandemic is over!

Extracting Data from Open Source Communities

On Sunday at FOSDEM, I have a 5 minute lightning talk about extracting data from open source communities in the HPC, Big Data, Data Science devroom (slides).

Open source communities are filled with huge amounts of data just waiting to be analyzed. Getting this data into a format that can be easily used for analysis may seem intimidating at first, but there are some very useful open source tools that make this task relatively easy.

Metrics GrimoireThe primary tools used in this talk are the open source Metrics Grimoire tools that take data from various community sources and store it in a database where it can be easily queried and analyzed.

Tools covered:

  • CVSAnalY to gather and analyze source code repository data
  • MLStats to gather and analyze mailing list data
  • Other Metrics Grimoire tools for bug trackers, IRC, Wikis and more
  • Gource to visualize source code repository data

MLStats and CVSAnaly – Installation and data import:

It’s very easy to get started with MLStats and CVSAnaly and use them to import data from your mailing lists and code repositories.

  1. Install
  2. $ python setup.py install

  3. Create database
  4. mysql> create database mlstats;
    
mysql> create database cvsanaly;

  5. Import data
  6. $ mlstats http://URLOFYOURLIST
    
$ cvsanaly2 /path/to/repo

MLStats – Queries to extract data:

  • Top 100 messages (most replied to threads):
  • SELECT subject, COUNT(*) as total 
FROM messages 
GROUP BY subject 
ORDER by total DESC 
LIMIT 100;

  • Other queries:

    • # of messages from a specific person

    • # of messages per person from email domain


    • Find all messages with specific word in subject line (patch)

    • More queries

CVSAnalY – Queries to extract data:

  • Number of commits per person by email domain:
  • SELECT p.name, p.email, 
COUNT(distinct(s.id)) as num_commits 
FROM people p, scmlog s 
WHERE email like "%company.com" 
AND p.id=s.author_id 
GROUP BY email 
ORDER BY num_commits DESC;

  • Other queries:

    • Top commit authors all time

    • # of commits for specific person
    • More Queries

Other Metrics Grimoire Tools:

Gource:

Gource is an amazing tool to visualize activity from your source code repositories. I did a full talk about Gource on Friday at the FLOSS Community Metrics meeting, so have a look at that blog post for details about using Gource.

Consulting Again

Scale FactoryAs most of you know, I moved to London to start working toward a PhD last January. Now that I’m off to a good start on the PhD, I find that I actually miss working, so I’m going to start consulting again.

I’ll be working part-time at The Scale Factory here in London. I’m interested in doing consulting projects related to building communities, open source, data analysis, etc. You can find all of the details on my consulting page. I’m also open to doing other types of projects.

If you are interested in getting my help for any of your projects, please email me: dawn@scalefactory.com.

Network Analysis and Community Visualizations

dawn_presentingAs usual, I’ve been neglecting my blog; however, you may notice that I finally did a little redesign using a modern template to make it more mobile-friendly and more accessible to avoid the Google search penalties. With this fresh new design, I decided that I needed something more recent than my last post in January.

So, I thought it would be nice to talk about my presentations from OSCON and the FLOSS Community Metrics Meeting in lovely Portland, OR in July.

If you want to skip my ramblings and get right to the content, you can find all of the code, data sets, instructions and links to the presentation materials on SlideShare by visiting my OSCON 2015 GitHub repository. UPDATE (Aug 23): The video for the OSCON portion is available now, too.

If you missed this presentation and want to see it live and in person, I’ll be doing similar talks at LinuxCon Seattle in August and LinuxCon Dublin in October. You might also be interested in reading the interview that Nicole Engard did with me on Opensource.com right before the conference to give me a chance to talk about my OSCON presentation and metrics in general.

What is Network Analysis?

The presentations both centered around network analysis, which studies relationships between units and looks for patterns and structure in those relationships. This is an oversimplified definition of network analysis, since it’s a fairly complicated discipline, so the best way to describe it is with a few examples of how people use network analysis.

  • My presentations looked at relationships and activity between people participating in an open source project.
  • It’s also used to study the relationships between organizations. Examples include looking at which companies have common people on their board of directors or to look at parent / subsidiary relationships between companies.
  • People are also using it to study animal social networks, like aggression and dominance between horses or food sharing between birds.
  • Someone at the University of Greenwich is doing historical social network analysis to look at the networks of people in medieval Scotland by using data from witness signatures on legal documents.
  • Friendship networks, work relationships, and other ways that people interact are also common examples of network analysis

MetricsGrimoire Tools

Metrics GrimoireThe MetricsGrimoire is the go-to set of tools that you’ll probably want to use to gather data from your open source community and store it into a database where you can write queries to extract the information you need. In these talks, I used mlstats data, but in my research, I also make heavy use of CVSAnalY. The OSCON 2015 GitHub repository README file has more instructions, but in short, you need to install mlstats, create the database, run mlstats on your mailing list to import the data into this new mlstats database, and finally use database queries to extract the data used for this presentation. You can also use my oscon.py script from the GitHub repository to extract the data.

Static Network Visualization

Dawn OSCONI took the output from the oscon.py script and used a combination of RStudio and Visone to visualize the data and create the network using data from one of the Linux kernel mailing lists (IOMMU) from January 2015 to keep the data set to a manageable size. In the end, we created a network diagram showing mailing list replies between people. The people with the most replies (degree centrality) are shown with larger circles (nodes), and the number of replies between any two people is shown by bolder or lighter arrows. Again, the OSCON 2015 GitHub repository README file has all of the details and instructions for how to do this, so I won’t duplicate it here.

Dynamic Visualization

Gource is a tool that most people use to easily visualize source code commits by each person for any repository; however, it can also be used with custom data. If you’ve never used Gource, you might want to take a brief detour and look at some of the many Gource visualizations on YouTube. I only had time in my OSCON talk to briefly cover Gource, but luckily, I was able spend 20 minutes on the topic during the FLOSS Community Metrics Meeting the weekend before OSCON. In the presentation, I showed how to create a custom log format file using mailing list data from mlstats and feed it into Gource for visualization. See the the OSCON 2015 GitHub repository README file for details about exactly how I did this.

What Else?

There are so many different tools available to do visualization of social network analysis. I used Visone because it runs on most major operating systems, and it’s fairly easy to get started with, but there are so many other options that you might want to play around with.

Python has quite a few packages that provide social network analysis, like NetworkX, for example. I haven’t had a chance to play with this much yet, but I know others who do quite a bit of their analysis using these tools, so they are on my list to try.

The final thing that I want to stress is that network analysis is so much more than just having cool graphs that allow you to look at your data. The visualizations are often the first step to see what might be happening in your network, but for those of us doing this type of work, it’s just the first step. The next steps usually involve many different calculations and measures to really understand what might be going on in the community. One example is how we changed the node size based on degree centrality for how many links that person had. It’s easy to explain, but it’s not a particularly sophisticated measurement of network centrality, and there are others that do a better job of looking at how well-connected people are to give you a better measure for influence. For example, if I regularly talk to 2 people within the Linux kernel, and if those people are Linus Torvalds and Greg K-H, I’m likely to be better connected within the network as a whole than if I’m talking to 10 other people with little or no influence.

If you are interested in my academic research, I also did a presentation recently at an academic conference here in the UK. That presentation and others can be found on my Academic page.

Photo credits

OSCON photo by Luis Cañas-Díaz and the FLOSS Metrics Gource photo by Stephen Walli.

Your Metrics Strategy at FLOSS Community Metrics

Cat measuring TapeI’m here in Brussels today for the FLOSS Community Metrics meeting, and I just gave a presentation about how to build Your Metrics Strategy. If you are interested, have a look at my presentation materials.

Talk description:

You probably know that community metrics are important, but how do you come up with a plan and figure out what you want to measure? Most open source projects have a very diverse community infrastructure with code repositories, IRC, mailing lists, wikis and other content sites, forums, and more. Deciding where to focus and what to measure across these many technologies can be a challenge.

What you measure can have a huge impact on behavior within the community, and you want to make sure that you are encouraging people to contribute in sane ways by measuring the activities that matter for your project.

In this presentation, I’ll talk about how you decide what to measure and give you examples of how I’ve done this at Puppet Labs and in other projects.

Photo credit: Sophie on Flickr

The Past Few Months: A Recap

Since I haven’t been blogging here on my own blog lately, I thought maybe a short post talking about what I have been doing would be interesting for at least a few people!

While neglecting this blog, I have been blogging elsewhere and have been spending a lot of time traveling and speaking at conferences. I’ve also been busy with all sorts of other work, so I’ll try to give you the short recap of my activities over the past few months.

A quick summary of a few things that I’ve been doing / blogging / whatever:

The run-down of some recent talks that I’ve given:

Lastly, a few upcoming talks:

I don’t have the patience for digging through the spam to find the legitimate comments on my blog, so comments here are disabled. However, I love feedback and you can reach out to me as @geekygirldawn on Twitter or via various other methods located in the sidebar.

Lessons about Community from Science Fiction

everythingisfine-drwhoIf you think you’ve seen this presentation before, you’re wrong! In the spirit of making sure that every talk at Monki Gras is handcrafted and unique, I prepared a completely new set of slides and lessons just for Monki Gras.

While it is probably obvious from the title, this talk focuses on community tips told through science fiction. While the topic is fun and a little silly, the lessons about communities are real and tangible. Here are just a few of the things that I explored in this presentation:

  • Borg assimilation and bringing new community members into your collective for new ideas.
  • Specialization is for insects. The best community members are the ones who can help in a wide variety of ways.
  • Community members are valuable, don’t treat them like minions.
  • Travel to strange new worlds and meet interesting people

You can get the slides (with my speaker notes) on SlideShare.

Note: Comments are disabled on this post, since I’m tired of dealing with spam, but please ping me on Twitter, @geekygirldawn, or at the email address in the presentation if you have any questions.

What Science Fiction Can Teach Us About Building Communities

Sci-Fi and CommunitiesAt LinuxCon North America in New Orleans and at LinuxCon Europe in Edinburgh, I presented about “What Science Fiction Can Teach Us About Building Communities“.

You can download or view the presentation from Edinburgh or get the original version from New Orleans.

Description
Communities are one of the defining attributes that shape every open source project, not unlike how Asimov’€™s 3 laws of robotics shape the behavior of robots and provide the checks and balances that help make sure that robots and community members continue to play nicely with others. When looking at open source communities from the outside, they may seem small and well-defined until you realize that they seem much larger and complex on the inside, and they may even have a mind of their own, not unlike the TARDIS from Doctor Who. We can even learn how we should not behave in our communities by learning more about the Rules of Acquisition and doing the opposite of what a good Ferengi would do. My favorite rules to avoid include, “Greed is eternal”€, €”You can always buy back a lost reputation€” and “€œWhen in doubt, lie”€. This session focuses on tips told through science fiction.

Note: Comments are disabled on this post, since I’m tired of dealing with spam, but please ping me on Twitter, @geekygirldawn, or at the email address in the presentation if you have any questions.

Updated October 22, 2013: Added the Edinburgh information to this post, instead of creating a new post, since the version presented in Edinburgh contained only small changes from the New Orleans version.