After my post on stinky metrics a reader asked for good metrics. Good measures are difficult to convey due to their context specific nature. Teams often have very different ways of measuring, filtering out the noise, and evaluating success.
Therefore it's best to start with classes of metrics. Leading indicators are better than trailing indicators. One of my favorites classes of measurement is peer review. Peer review happens at the time of development before code enters production. Here are three metrics for peer review.
I pulled the code review stats over 3 months found the teams with the highest change rate and lowest incident rate had the following metrics in common.
Internal team reviews 1) Peer reviews were done by the internal team not an external change review board. The internal team had context in the work. This led to better questions and better insights.
High review iterations 2) Reviews took on average 2-3 iterations. These multiple back and forth cycles indicated reviewers asked in depth questions there was no rubber stamp review/approval.
Diversity Reviews/Contributors 3) There was a diversity of contributors and reviews. Over the course of 3 months everyone worked together. To me this indicated a high degree of shared team knowledge and contributors had to meet the high bar across each team member.
From talking to teams it seemed high performing teams seemed to have a set of culture norms for testing and quality evaluations. These norms were expressed as a series of patterns for what to test, how to structure testable interfaces, what tools to use, and how to structure risky code.
As an example one team spent a lot of time creating mock objects to stub out service calls. It was never an explicit ask, just part of what they did. The same team scrutinized asynchronous calls.
Making software leaders better