Six Examples of Measuring Incomplete Metrics ...and How to Fix Them
If you want to create an effective charity, how do you know if it is effective? A very common answer is, by doing detailed measurement and evaluation. But this is harder than it sounds. While research is hard to get right in general, the difficulty starts just with the measurement… which metric should you measure?
The Case of the Hard Worker Imagine that you work for an EA organization. You work really hard. In fact, last week you completed 154 pomodoros of work. Was it effective? It’s hard to know without knowing what the pomodoros were actually spent on. Sure, your inputs matter some and it’s possible that more pomodoros means more good happening. But instead you need to measure your outputs. Time spent is just a means to accomplish certain high-impact goals. How much good did you accomplish with those pomodoros? The Case of the Growing Organization Imagine that you’re the CEO of an EA organization. Your EA organization was founded in 2011 and back then you had a budget of $30K. Now it’s 2016 and you have a budget of $300K. That’s 10x growth in five years! That’s amazing! ...But is it effective? The amount of budget your organization spends, like the amount of time your employees work, are only a means to an end. Again, what matters is the results of your budget and what you accomplish with them. It’s possible to do more good with $30K than $300K. The Case of the Fundraiser Imagine that I’m a fundraiser for GiveWell top charities. How do I know whether I’m effective? What if I told you that I was able to send out over one million cold contact fundraising emails in 2015? Would you think I’m an effective fundraiser? Sure, it’s likely that emails lead to donations and that more emails likely means more donations. But without tracking any sort of metric of whether people are (a) opening the email, (b) clicking the donate link, and (c) actually making a donation through that link, it’s impossible to tell. Only by measuring actual donations made can we truly measure or impact. The Case of the Charity Club Imagine I created a college group called “Charity Club” where we had monthly meetings about donations and career choice. In 2015, we got 100 new members and held ten meetings with an average attendance of thirty. Does this mean "Charity Club" was effective? Sure people attending meetings about donating to effective causes is likely a good thing. But attending meetings about donations is not good in itself… instead what matters is people actually making effective donations. So we instead should measure the actual donation habits of club members. But what if club members were just more likely to donate, entirely irrespective of the existence of the club? Ideally we would measure counterfactual impact by randomly only taking some of the interested people into the club and measure their donation habits relative to the people who weren’t allowed in the club. ...But chances are that this RCT is not very practical. The Case of the Immigration Reform Imagine that I think improving immigration in the US is important for economic growth and the welfare of immigrants. So I set up an advocacy website that encourages people to write to their congressperson and encourage immigration reform. ...How do I know if this website is effective? One way is that I can measure web traffic. More traffic should be good, right? But what if I get a lot of visitors but no one follows through and writes to their congressperson? Okay, that’s bad, so maybe I should measure the number of letters to congress people that get delivered. That does measure our influence over the public process, but what if the petitions get ignored? How do we know our petitions lead to legislation change? What if the legislation would have changed anyway? What we really want to measure is counterfactual legislation change. To do this, we construct an RCT where we randomly select some legislators to be targeted and some not to be and then we see whether the targeted legislators are more likely to sponsor immigration reform than the non-targeted legislators. While web traffic -- or even the amount of letters sent -- is a positive thing and might contribute to more immigration reform, it could easily not be connected to immigration reform. Only measuring the right thing helps us check. The Case of the Developing World Charity Imagine you’re the executive director of a charity that does unconditional cash transfers to the global poor. How do you know you’re doing good? The problem is there is nothing inherently good about people having more money than they used to. Money is just a means to an end. So to know how much good we are doing, we need to see what is happening with the money that we give. What are people spending on? Does the money actually make them happier? To do this, we need to measure the effects of giving money, hopefully with an RCT. Why might charities focus on incomplete metrics? In each of the above examples, we consider the metric being measured to be incomplete, or that the metric actually needs to be investigated further before we can clearly connect it with positive impact. But why might charities focus on these incomplete metrics? It is easier Some metrics are way easier to measure than others. It’s far easier to measure the web traffic to your advocacy website than to do an RCT on your legislative impact. It’s even much easier to measure web traffic than to measure the actual amount of petitions sent. Thus increases in web traffic get cited a lot as a criteria for success, even when it may not be connected to the charity's real goals. It looks more impressive Reporting on several metrics looks more impressive because you are showing more data even if the data is not reflective of the good your charity is doing. Additionally, the more metrics you have, the easier it is to cherry-pick the ones that are going well and play down the ones that are not going as well. This kind of practice makes your organization more appealing to donors and members, even if it is ultimately an illusion. Charities are unsure what the important metrics are This is true particularly with younger charities, as well as charities with less of a clear focus. When the goals are not clear, this will often result in reporting on several unhelpful metrics or missing very important ones. Ways to avoid this mistake Think of the number one most important metric This is hugely important as it makes clear what your organization is really aiming to do, and how you will measure it. Letting the public know your most important metric also allows them to focus on what really matters. Be sure of the connection between your metric and to real good happening in the world Even with a straightforward metric, you have to make sure that it really translates into good getting done. With money raised you would need to look at the charities you are moving money to, and make sure they will accomplish good with extra donation. For website traffic, you would have to make sure website traffic really correlates with actions that you really want to achieve and furthermore make sure those actions correlate with more impact. Think if it would be possible to cheat this metric Is it possible to be “game” this metric, thus making it less valuable? For example, if I wanted to gain a bunch of website traffic, it would be quite easy for me to invest in non-targeted online ads or just directly buy “views” to my website. Although this would boost my website traffic, it’s very unlikely to cause any real good on the metric I really care about. Watch out for counterfactuals An easy mistake to make is to measure metrics that have many possible causes. Given that many organizations are working towards the same goals, it is necessary to be able to isolate the impact your organization is having when compared to the wider movement. Be cautious of longer causality chains Consider an unconditional cash transfer charity. Their “chain to impact” looks like this: We give grants of unconditional cash transfers to the global poor → the global poor spend the money on what they desperately need → They are happier because they could afford a basic necessity or invest in their future → Good is achieved. Furthermore, we’re pretty confident that each link on this causal chain because there are multiple studies supporting each link. Now consider an organization that fundraises for the unconditional cash transfer charity and cites web traffic as their metric of success: website traffic → People are then more interested in donating → More people go to the cash transfer charity website to learn more → More money is donated → The transferred cash is spent on basic necessities or investments → People are happier → Good is achieved Not only is this chain longer, but there is also a huge problem in the assumption that website traffic results in more donations. While we can easily track website visits, it’s very difficult to track how many of these visits translates into more donations, and it’s easy for the metric to get cheated by getting large amounts of "lower quality" traffic. Generally the more steps you have the more confidence you need to have in each of the steps working. Focusing on the right metrics in Charity Entrepreneurship For folks like us interested in creating the most effective charities, we also need to be careful about metrics. We care most about having the largest counterfactually positive impact on global well-being (for both humans and nonhuman animals). We don’t want to look at any incomplete metrics, like the number of people we help, the size of our budget, or how many people read our blog posts. But well-being isn’t a very precisely defined metric and very few RCTs look at this. Instead, there are many more precisely defined metrics that we could measure, such as the impact of our charity on improving the length of life, on reducing the burden of disease and disability, on improving income, improving subjective well-being, etc. After a lot of research we have opinions on each of these metrics. While each of the metrics is a lot more nuanced than the ones in our examples and none of them are clear examples of incomplete metrics, we do think that some of these metrics are more complete than others relative to measuring the ultimate goal of global well-being and we plan on writing up detailed thoughts on them soon.