Tag Archives: metrics

Strategy Before Metrics

Two chess pieces facing off on a chess board

I’ve been involved with open source project metrics for a very long time, and people often ask me which metrics they should use, but this isn’t really a question that I can answer. What you measure and how you interpret those metrics depend on your goals and what your organization is trying to accomplish. For this blog post, I’m using “organization” loosely. It could mean your company, university, or non-profit, but it might also mean aligning with funding organizations if your OSPO or other open source efforts were funded by another organization. 

The CHAOSS Practitioner Guide: Introduction – Things to Think about When Interpreting Metrics mentions:

“one of the best places to start isn’t actually with the metrics, but by spending some time understanding the overall goals for the project. If the project is primarily driven by one organization or owned by an organization, you should also consider the goals for that organization. By thinking strategically about the overall goals, you’ll be in a better place to decide what you need to measure to determine whether the project is achieving its goals. Open source projects generate a tsunami of data that can be overwhelming, but by focusing on the goals, you can develop a metrics strategy that helps you focus on the metrics that matter most for a particular project.”

It can help to ask yourself, “what is important for my organization or the project?” This often means starting with your team’s open source strategy and aligning it with your organization’s goals. The most important part of putting together strategies and plans for your open source efforts is aligning them with the overall goals for your organization. By taking some time and effort to make sure you support the overall strategy for your organization, then it will be much easier to justify continuing these efforts during the next planning or funding cycle. This will also help you make the case to senior management, executives, funders, and others on the leadership team who aren’t likely to be involved in the low level details. Explaining how your open source contributions support the goals of your organization can help the executive or leadership team understand the strategic importance of this work so that you can continue your work in open source.

Once you have a strategy defined that aligns with the strategy of your organization, then you can figure out what you need to measure to show whether you are achieving your goals. There are a couple of reasons that starting with the goals is important, since metrics can go awry if you aren’t focused on the right things.

  • You won’t know if you are successful if you aren’t measuring the right things. If you aren’t measuring the things that are important for your project or organization, you won’t know if you are making progress in the areas that you care the most about. For example, if you want to improve the performance of a particular piece of open source software, you’ll want to have success criteria and measurement based on specific types of performance. If you want to gain influence within an open source project, maybe you measure increases in contributions or the number of employees moving into positions of influence. 
  • Measurement impacts behavior, and people do different things depending on what you measure. For example, if you publish metrics that focus on the number of comments on issues, you are likely to start getting more comments on issues. If what you are really trying to do is get people to help review and approve contributions, then additional comments on issues might not help as much as looking at reviews on change requests (pull requests / merge requests). 

Once you decide on your success criteria, you need to make sure that you can get the data required to measure it and start measuring it now to get a good baseline. There are plenty of tools available to gather contribution data about open source projects. Some of the commonly used tools can be found in the CHAOSS project, but you can also likely get a pretty good sense for the data by looking at your code repositories and other communication channels. GitHub, for example, has some pretty good data under the insights tab.

After you have your metrics, you need to actually do something with them to show that you are making progress toward accomplishing your goals. Think about which metrics you should be showing to your leadership and which ones should be shared with your team and the broader community. But communicating metrics is much more than just showing some charts or graphs, you also need to interpret those metrics and tell the story about what they mean. The CHAOSS Practitioner Guides can help you think about the interpretation and how you might tell the story about what your metrics mean. Without interpretation and explanation, all you have are numbers, which are way less powerful than the story about what the data means in the context of how it helps your organization achieve their goals. 

Here are a few additional links and resources to help you think about building your metrics strategy and telling the story about what the metrics mean:

Photo by Hassan Pasha on Unsplash.

A New Chapter at CHAOSS

For my regularly scheduled (once every year and a half) blog post, I wanted to announce that July 3rd is my last day at VMware, and I will be joining the CHAOSS project as their new Director of Data Science

CHAOSS Logo

It was really hard to leave VMware after almost 5 years (including my time at Pivotal). The work was fun, and I worked with so many amazing people that I will miss dearly! But as many of you know, I have a deep passion for data, and in particular open source community metrics, so the opportunity to work full time on the CHAOSS project is the dream job that I just couldn’t turn down. I’ve been working in this space for 10+ years with the CHAOSS project, and before CHAOSS, I was working with Bitergia and a variety of open source tools that later evolved into the software that is now part of the CHAOSS project. I’ll be taking July off and then will be starting my role with CHAOSS in August. A big thank you to the Alfred P. Sloan Foundation for making this possible through the grant that is funding the Director of Data Science position and other CHAOSS project initiatives.

I will be continuing my work on the OpenUK Board and as co-chair for the CNCF Contributor Strategy Technical Advisory Group, which have kept me very busy in addition to my work at VMware and in my role on the CHAOSS Governing Board.

Over the past year and a half, I’ve done quite a few presentations on topics ranging from how companies can work in open source communities to open source health / metrics to leading in open source, which can be found on my Speaking page. The highlight was giving a keynote about growing your contributor base at KubeCon EU in front of an audience of 10,000+, which was amazing and terrifying at the same time! 

In addition to my world tour of conference presentations, I was quoted in a Linux Foundation Diversity Report, won a few awards for my UK work in open source as part of the OpenUK Honours list in 2021 and 2023, and I’ve written a few blog posts since my last post here on my own blog:

On the personal side, Paul and I bought a new house in November, and we have become the people who sit in their back garden and talk about how adorable the squirrels and birds are. Since we live in an area near quite a bit of green space, we have regular visits from foxes and even spotted one badger on our backyard wildlife camera! 

Since I don’t post here often, if you want to keep up with what I’ve been doing, I post occasionally on Mastodon and Instagram.

Extracting Data from Open Source Communities

On Sunday at FOSDEM, I have a 5 minute lightning talk about extracting data from open source communities in the HPC, Big Data, Data Science devroom (slides).

Open source communities are filled with huge amounts of data just waiting to be analyzed. Getting this data into a format that can be easily used for analysis may seem intimidating at first, but there are some very useful open source tools that make this task relatively easy.

Metrics GrimoireThe primary tools used in this talk are the open source Metrics Grimoire tools that take data from various community sources and store it in a database where it can be easily queried and analyzed.

Tools covered:

  • CVSAnalY to gather and analyze source code repository data
  • MLStats to gather and analyze mailing list data
  • Other Metrics Grimoire tools for bug trackers, IRC, Wikis and more
  • Gource to visualize source code repository data

MLStats and CVSAnaly – Installation and data import:

It’s very easy to get started with MLStats and CVSAnaly and use them to import data from your mailing lists and code repositories.

  1. Install
  2. $ python setup.py install

  3. Create database
  4. mysql> create database mlstats;
    
mysql> create database cvsanaly;

  5. Import data
  6. $ mlstats http://URLOFYOURLIST
    
$ cvsanaly2 /path/to/repo

MLStats – Queries to extract data:

  • Top 100 messages (most replied to threads):
  • SELECT subject, COUNT(*) as total 
FROM messages 
GROUP BY subject 
ORDER by total DESC 
LIMIT 100;

  • Other queries:

    • # of messages from a specific person

    • # of messages per person from email domain


    • Find all messages with specific word in subject line (patch)

    • More queries

CVSAnalY – Queries to extract data:

  • Number of commits per person by email domain:
  • SELECT p.name, p.email, 
COUNT(distinct(s.id)) as num_commits 
FROM people p, scmlog s 
WHERE email like "%company.com" 
AND p.id=s.author_id 
GROUP BY email 
ORDER BY num_commits DESC;

  • Other queries:

    • Top commit authors all time

    • # of commits for specific person
    • More Queries

Other Metrics Grimoire Tools:

Gource:

Gource is an amazing tool to visualize activity from your source code repositories. I did a full talk about Gource on Friday at the FLOSS Community Metrics meeting, so have a look at that blog post for details about using Gource.

Using Gource to Visualize Your Repositories

Today at the FLOSS Community Metrics meeting in Brussels, Belgium, I gave a short, 5-minute lightning talk about using Gource to visualize your source code repositories with a focus on navigating the myriad of Gource configuration options and how to tweak them to make Gource work better for your repository. In this blog post, I’ll give an overview of the talk, but for all of the details or to replicate the demo, you should have a look at the GitHub repository for the talk.

In the talk, I did a visualization of the MailingListStats (mlstats) repository from the Metrics Grimoire suite of tools, and here is the video generated using these options:

gource -f --logo images/bitergia_logo_sm.png --title "MailingListStats AKA mlstats" --key --start-date '2014-01-01' --user-image-dir images -a 1 -s .05 --path ../MailingListStats

Option Details:

  • --path /path/to/repo (or omit and run Gource from the top level of the repo dir)
  • -f show full screen
  • --logo images/bitergia_logo_sm.png
  • --title "MailingListStats AKA mlstats"
  • --key (shows color key for file types)
  • --start-date '2014-05-01'
  • --user-image-dir images (Directory with .jpg or .png images of users ‘Full Name.png’ for avatars)
  • -a 1 (auto skip to next entry if nothing happens in x seconds – default 3)
  • -s .05 (speed in seconds per day – default 10)

You can also manipulate the video while Gource is running:

  • Space bar to pause
  • Ctrl + / – to speed up or slow down
  • Use arrow keys to move camera
  • Mouse over timeline widget at the bottom and click on a date to move in time.

For additional information:

Network Analysis and Community Visualizations

dawn_presentingAs usual, I’ve been neglecting my blog; however, you may notice that I finally did a little redesign using a modern template to make it more mobile-friendly and more accessible to avoid the Google search penalties. With this fresh new design, I decided that I needed something more recent than my last post in January.

So, I thought it would be nice to talk about my presentations from OSCON and the FLOSS Community Metrics Meeting in lovely Portland, OR in July.

If you want to skip my ramblings and get right to the content, you can find all of the code, data sets, instructions and links to the presentation materials on SlideShare by visiting my OSCON 2015 GitHub repository. UPDATE (Aug 23): The video for the OSCON portion is available now, too.

If you missed this presentation and want to see it live and in person, I’ll be doing similar talks at LinuxCon Seattle in August and LinuxCon Dublin in October. You might also be interested in reading the interview that Nicole Engard did with me on Opensource.com right before the conference to give me a chance to talk about my OSCON presentation and metrics in general.

What is Network Analysis?

The presentations both centered around network analysis, which studies relationships between units and looks for patterns and structure in those relationships. This is an oversimplified definition of network analysis, since it’s a fairly complicated discipline, so the best way to describe it is with a few examples of how people use network analysis.

  • My presentations looked at relationships and activity between people participating in an open source project.
  • It’s also used to study the relationships between organizations. Examples include looking at which companies have common people on their board of directors or to look at parent / subsidiary relationships between companies.
  • People are also using it to study animal social networks, like aggression and dominance between horses or food sharing between birds.
  • Someone at the University of Greenwich is doing historical social network analysis to look at the networks of people in medieval Scotland by using data from witness signatures on legal documents.
  • Friendship networks, work relationships, and other ways that people interact are also common examples of network analysis

MetricsGrimoire Tools

Metrics GrimoireThe MetricsGrimoire is the go-to set of tools that you’ll probably want to use to gather data from your open source community and store it into a database where you can write queries to extract the information you need. In these talks, I used mlstats data, but in my research, I also make heavy use of CVSAnalY. The OSCON 2015 GitHub repository README file has more instructions, but in short, you need to install mlstats, create the database, run mlstats on your mailing list to import the data into this new mlstats database, and finally use database queries to extract the data used for this presentation. You can also use my oscon.py script from the GitHub repository to extract the data.

Static Network Visualization

Dawn OSCONI took the output from the oscon.py script and used a combination of RStudio and Visone to visualize the data and create the network using data from one of the Linux kernel mailing lists (IOMMU) from January 2015 to keep the data set to a manageable size. In the end, we created a network diagram showing mailing list replies between people. The people with the most replies (degree centrality) are shown with larger circles (nodes), and the number of replies between any two people is shown by bolder or lighter arrows. Again, the OSCON 2015 GitHub repository README file has all of the details and instructions for how to do this, so I won’t duplicate it here.

Dynamic Visualization

Gource is a tool that most people use to easily visualize source code commits by each person for any repository; however, it can also be used with custom data. If you’ve never used Gource, you might want to take a brief detour and look at some of the many Gource visualizations on YouTube. I only had time in my OSCON talk to briefly cover Gource, but luckily, I was able spend 20 minutes on the topic during the FLOSS Community Metrics Meeting the weekend before OSCON. In the presentation, I showed how to create a custom log format file using mailing list data from mlstats and feed it into Gource for visualization. See the the OSCON 2015 GitHub repository README file for details about exactly how I did this.

What Else?

There are so many different tools available to do visualization of social network analysis. I used Visone because it runs on most major operating systems, and it’s fairly easy to get started with, but there are so many other options that you might want to play around with.

Python has quite a few packages that provide social network analysis, like NetworkX, for example. I haven’t had a chance to play with this much yet, but I know others who do quite a bit of their analysis using these tools, so they are on my list to try.

The final thing that I want to stress is that network analysis is so much more than just having cool graphs that allow you to look at your data. The visualizations are often the first step to see what might be happening in your network, but for those of us doing this type of work, it’s just the first step. The next steps usually involve many different calculations and measures to really understand what might be going on in the community. One example is how we changed the node size based on degree centrality for how many links that person had. It’s easy to explain, but it’s not a particularly sophisticated measurement of network centrality, and there are others that do a better job of looking at how well-connected people are to give you a better measure for influence. For example, if I regularly talk to 2 people within the Linux kernel, and if those people are Linus Torvalds and Greg K-H, I’m likely to be better connected within the network as a whole than if I’m talking to 10 other people with little or no influence.

If you are interested in my academic research, I also did a presentation recently at an academic conference here in the UK. That presentation and others can be found on my Academic page.

Photo credits

OSCON photo by Luis Cañas-Díaz and the FLOSS Metrics Gource photo by Stephen Walli.

Your Metrics Strategy at FLOSS Community Metrics

Cat measuring TapeI’m here in Brussels today for the FLOSS Community Metrics meeting, and I just gave a presentation about how to build Your Metrics Strategy. If you are interested, have a look at my presentation materials.

Talk description:

You probably know that community metrics are important, but how do you come up with a plan and figure out what you want to measure? Most open source projects have a very diverse community infrastructure with code repositories, IRC, mailing lists, wikis and other content sites, forums, and more. Deciding where to focus and what to measure across these many technologies can be a challenge.

What you measure can have a huge impact on behavior within the community, and you want to make sure that you are encouraging people to contribute in sane ways by measuring the activities that matter for your project.

In this presentation, I’ll talk about how you decide what to measure and give you examples of how I’ve done this at Puppet Labs and in other projects.

Photo credit: Sophie on Flickr

Open Source Community Metrics and State of the Puppet Community

Many of you probably know that I’ve spent the past week in Belgium for Puppet Camp Ghent and FOSDEM. I’ll be writing a blog post on the Puppet Labs blog later this week to talk about Puppet Camp Ghent, but I wanted to at least get my presentations out here while I finished writing the longer post.

Puppet Camp Ghent was amazing. I saw a few old friends and connected in person with quite a few community members that I had not yet met in person. Overall, I was very happy with the event, and the people at HoGent were great hosts. There were so many amazing presentations, and we’re getting them uploaded to the Puppet Camp page as soon as we get the slides from the speakers. Here is the presentation that I delivered on the State of the Puppet Community.

state-of-puppet-community

I had an amazing time at FOSDEM, too. I helped facilitate the Configuration / Systems Management DevRoom on Saturday along with a DevRoom dinner that evening. I love working in such a collaborative industry. The DevRoom and the dinner were organized collaboratively with our primary competitors, but we all worked together to pull it off in a way that benefited the industry. Aside from the DevRoom, I got to see a lot of old friends and had a great time!

At FOSDEM, I also gave a short version of my Open Source Community Metrics talk. If you are interested in open source metrics, you might rather look at the longer version that I presented at LinuxCon Barcelona in November. I also had a great conversation from Jesus at Bitgeria, and they are doing some awesome stuff with open source community metrics that you should look at if you are interested in metrics.

Next on my agenda are trips to Stockholm, Sweden and Oslo, Norway for two more Puppet Camps in the next two weeks before heading back home to Portland.

Blogging Elsewhere

I’ve been neglecting my blog, but it isn’t because I haven’t been blogging. I just haven’t been blogging here 🙂

Here are a few posts that I’ve written over the past couple of months for work as part of my Community Manager gig at Puppet Labs:

I also wrote a few posts on my cookbook site, What Dawn Eats. Don’t forget that you can buy the cookbook (available in paperback, kindle edition or PDF)!

Open Source Community Metrics: LinuxCon Barcelona

I wanted to share the presentation that I will be giving today at LinuxCon Barcelona at 1:20pm, Open Source Community Metrics: Tips and Techniques for Measuring Participation. This is similar to the presentation that I gave a few weeks ago at the LibreOffice Conference in Berlin, but I have added some new data and included different examples. You might also be interested in seeing the Puppet Community Metrics that I recently started posting on the Puppet Labs website.

You can download the presentation from SlideShare.

Talk Abstract:

Do you know what people are really doing in your open source project? Having good community data and metrics for your open source project is a great way to understand what works and what needs improvement over time, and metrics can also be a nice way to highlight contributions from key project members. This session will focus on tips and techniques for collecting and analyzing metrics from tools commonly used by open source projects. It’s like people watching, but with data.

The best thing about open source projects is that you have all of your community data in the public at your fingertips. You just need to know how to gather the data about your open source community so that you can hack it all together to get something interesting that you can really use. This session will be useful for anyone wanting to learn more about the communities they manage or participate in.

LibreOffice Conference: Open Source Metrics

Today at the LibreOffice Conference in Berlin, I will be presenting a session titled, “Open Source Community Metrics: Tips and Techniques for Measuring Participation.” It has tools, techniques and examples of metrics from the LibreOffice project, Puppet and MeeGo to illustrate several ways to gather and interpret the metrics for your open source project.

If you are interested in watching the presentation, it will be on the LibreOffice Conference live stream starting at 18:00 CEST in Berlin or 9am Pacific time.

You can also download a copy of the presentation from SlideShare.

Talk Abstract

Do you know what people are really doing in your open source project? Having good community data and metrics for your open source project is a great way to understand what works and what needs improvement over time, and metrics can also be a nice way to highlight contributions from key project members. This session will focus on tips and techniques for collecting and analyzing metrics from tools commonly used by open source projects. It’s like people watching, but with data.

The best thing about open source projects is that you have all of your community data in the public at your fingertips. You just need to know how to gather the data about your open source community so that you can hack it all together to get something interesting that you can really use. We’ll start with some general guidance for coming up with a set of metrics that makes sense for your project and talk about the LibreOffice community metrics. The focus of the session will be on tips and techniques for collecting metrics from tools commonly used by open source projects: Bugzilla, MediaWiki, Mailman, IRC and more. It will include both general approaches and technical details about using various data collection tools, like mlstats. The final section of the presentation will talk about techniques for sharing this data with your community and highlighting contributions from key community members. For anyone who loves playing with data as much as I do, metrics can be a fun way to see what your community members are really doing in your open source project.