top of page
Our Website Backgrounds (2).jpg


This is one of the documents on Charity Entrepreneurship’s 2019/2020 research process. ​A summary of the full process is here.



This document explains why and how CE uses weighted factor models (WFM) as part of its research process. WFM consists of generating a set of criteria with assigned weighting for each and then assessing how a possible option scores on each of these. It is particularly useful as it allows researchers to combine a large number of objective and subjective factors and identify which ones drive the results. However, it has some weaknesses such as its lack of flexibility, which suggests that it is best used in combination with other methods.

CE uses WFMs at three stages of our research. At the first stage (idea sort), each intervention is assessed for twenty minutes based on all the methodologies, including WFMs. At the second stage (prioritization report), two hours are spent on each intervention using this method, but only for our animal research area. Finally, WFM is one of the four methods used for the eighty-hour assessment of each of the top interventions (intervention report). Concretely, lead researchers will fill in a prebuilt model taking into account four main criteria: strength of the idea, execution difficulty, limiting factors, and externalities. Each of these will be estimated using included built-in questions, which are more or less specific depending on the amount of time allocated.

Table of contents:

1. What is the weighted factor model
2. Why is this a helpful methodology
3. Why it is not our only or endline perspective
4. How much weight we give to the weighted factor model
5. How CE generated the weighted factor model
6. Different lengths of weighted factor estimates
7. Criteria
​8. Deeper reading



Broadly, the process of creating a WFM involves generating preset criteria and weightings and then evaluating how a possible option scores on each of these. WFMs often involve a number of preset criteria ranging from three to twelve factors. They typically generate an endline score based on the option score and the criteria score (normally multiplied together). Both hard factors (such as population size in absolute numbers) and soft factors (such as a score out of ten for population size) can be used in WFMs. The way our team uses WFMs involves pre-generating consistent research questions that are asked across all charity ideas to produce a score for a given criteria.

Example weighted factor model (Charity Entrepreneurship 2019)


Example of synthesized expert data (Charity Entrepreneurship 2018)


The WFM is a highly versatile tool because it can incorporate a large number of factors including subjective ones clearly, but simultaneously uses a numerical calculation to determine the endline result. This allows it to produce surprising results and makes it easier to track down the factors that lead to the result. 

Reasons this is a helpful tool (in rough order of strength)

  • Systematism in idea comparison

  • Enables comparison of all ideas with equal rigor 

  • Reduced gaps

  • Allows integration of multiple factors

  • Sandboxing

  • Allows soft and hard inputs to be combined

  • More angles for learning

  • Understandability

  • Encourages quantified consideration

  • Can lead to novel conclusions 

  • Makes it easier to communicate conclusions

Systematism in idea comparison: WFMs encourage considering the same aspects of criteria across multiple ideas. This allows much closer idea comparison than the other models, which each have more idea-to-idea variability. For example, comparing an idea’s limiting factor in very similar terms across all charity ideas can lead to a much stronger sense of how well an idea does in this aspect. 

Enables comparison of all ideas with equal rigor: We ensure that we apply equal rigor when evaluating the ideas by answering the same research questions that define the criteria in the same way, spending the same amount of time on each idea, and evaluating it in the same way.  

Reduced gaps: Many models can be largely affected by unconsidered factors or gaps in the information. For example, if a single important factor was not included in a cost-effectiveness analysis (CEA), it would be hard to detect but could largely affect the results. Due to the same questions being asked across all interventions and the same factors being filled in, there is a lower chance of gaps affecting one idea but not another in a WFM. 

Allows integration of multiple factors: Many models are not conducive to including many different factors in a single number. For example, CEAs do not handle limiting factor concerns very well unless multiple CEAs are done for many different possible levels of scale. Similarly, many CEAs do not include strength of evidence other than as a simple discount at the end of the calculation, which does not capture how to weigh different types of uncertainty (e.g., Knightian vs. non-Knightian).  

Sandboxing: A large difference between CEAs and WFMs is the total weight that a single factor can hold. In a CEA, one very large number can swamp many small numbers. For example, if an intervention affects a huge number of beings but has a very low chance of working, this initial huge number can make all the other numbers in the CEA trivial. Due to each factor having an effective maximum weight, a single factor affects a WFM far less. You could say the impact of that factor is “sandboxed” within a single factor. 

Allows soft and hard inputs to be combined: Some important factors are easy to get a single hard number on, for example “total population affected by measles”; however, other factors are impossible to put a hard number on, for example “the tractability of founding a new charity in India.” These factors can be given a soft number but in a consistent and comparable way in a WFM. These soft numbers can be calculated with harder numbers’ Z-scores to determine what ideas are outliers in terms of many positive factors. 

More angles for learning: One of the purposes of our research process overall is to generate better empirical information about how to rule in or out charity ideas more quickly in the future. WFM is the only system we use in which subcomponents could be correlated individually to our endline results. For example, we could determine if the evidence base predicts very strongly what interventions are recommended after deep reports are conducted. Pulling out a single aspect like this from a CEA or expert interviews would not be easy.

Preregistration: In many ways a WFM leaves the fewest areas open to interpretation, with preset questions and descriptions for how different items would score ahead of time. This means that researchers with fairly different starting points and intuitions will more often reach the same conclusions when compared to systems that are more open to researchers’ interpretations. This concern most affects our informed consideration (IC) but can also largely affect CEAs.

Understandability: Intuitive systems can be built into a WFM, making it quick and easy to understand relative to other systems. Color coding is easily used to show areas of comparative strength and weakness across a large number of ideas. Both expert views (EpV) and IC lend themselves to written paragraphs, which are slower. A CEA is quicker to understand the endline number but takes longer than any other system to understand the full logics and weightings behind the numbers. 

Encourages quantified consideration: Like CEAs, WFMs encourage quantified and numerical consideration of factors. By default, most people (including experts) do not think in quantitative terms. For example, when asked if an event will happen, most people think of this as a binary question (yes/no) rather than thinking about the probability of the event happening. WFM’s require quantitative inputs for each variable, which encourages quantitative thinking and calibration (e.g. an event being 20% vs. 80% likely).

Can lead to novel conclusions: Like CEAs, WFMs can lead to surprising conclusions. Due to the methodology and the calculations being preset, it is common that after filling out the data, a WFM will suggest something to be high impact that would not have appeared so by taking a softer, more higher-level look. 

Makes it easier to communicate conclusions: Because all the factors are researched and scored separately, we can easily distill the advantages and disadvantages of each idea and explain why the given idea is better than the other.  ​




Last year this model was the primary one we used when comparing charity ideas, although in some cases we also used unweighted factor models. We think that although this model has considerable promise, it also has many weaknesses that can be counteracted by using multiple models. We also see considerable learning value in testing multiple models and seeing which ones best predict our endline mixed model conclusions. 

Flaws of WFM (in order of importance) 

  • Not a commonly used methodology 

  • Low flexibility

  • Limited question cross applicability 

  • Considerable upfront time required

  • Can make nonnumerical data look numerical

  • Can be hard to determine source or reasoning of weighting criteria 

Not a commonly used methodology: WFMs are not a commonly used system in the same formal way we use them. Thus there are few established norms and a lower level of initial understanding from both researchers and readers. It also suggests there might be an unknown but good reason why this sort of methodology is not used more often.

Low flexibility: This system is the least flexible and adaptable across different charity ideas with the questions, full methodology, and criteria weightings all preset. This reduces bias but can also give a large amount of weighting to a factor that might be important overall but far less important for a specific idea.  

Limited question cross applicability: A subsidiary concern of flexibility is that specific questions will not be important to cover but research hours will go into them anyway. Likewise an idea that is important but specific to a given charity is less likely to be covered by this methodology. 

Considerable upfront time required: A huge amount of upfront methodological time is required when compared to other systems, because most of the methodology is designed ahead of time and closely followed throughout the process. This means that research is not produced for a long time at the start of a research year and also does not yield feedback loops as quickly when updating the methodology.

Can make nonnumerical data look numerical: A concern with the WFM is that it assigns numerical ratings to nonnumerical data. This can both confuse and mislead people when considering the objectivity of the system if not explained clearly. 

Can be hard to determine source or reasoning of weighting criteria: Endline weights are often the only factor closely examined, and due to endline weightings being used to represent a large number of questions and sources of evidence it can be hard to track down what questions factored into this weighting and how heavily each question was factored. 



Despite its flaws, we view the WFM as a highly important aspect of our process; we see it as having many of the benefits of CEAs but also being somewhat less error prone and likely to have gaps. Ultimately we think that WFMs, as one of our four perspectives, will generally get between one-quarter and one-half of our total endline weighting, with there being considerable variation depending on the specific charity idea and cause area. We expect WFM to be stronger in areas where there are many different factors at play and limited hard data.





Related posts hyperlinked

Detailed information on what question is asked to research each criterion and how a score for each criterion is generated are covered below.


5.2. Z SCORE


A Z-score is a numerical measurement, used in statistics, of a value’s relationship to the mean of a group of values, measured in terms of standard deviations from the mean. If a Z-score is 0, it indicates that the data point’s score is identical to the mean score. A Z-score of 1.0 would indicate a value that is one standard deviation from the mean. Z-scores may be positive or negative, with a positive value indicating the score is above the mean and a negative score indicating it is below the mean.
Z-scores can be used informally to:
i) Standardize values measured across multiple different criteria, so they can be combined into an overall score and compared to other ideas. For example, we can have an overall z-score for a given idea based on how it compares to an average in terms of CEA, expressed in $ per DALY; population size affected, expressed in millions; and crowdedness, expressed in percentage of the problem addressed by other entities.    
ii) Assess how a given idea scores compared to all the other ideas considered (including an average idea), for example, idea x is better than 70 percent of the ideas on our list. 
iii) Spot what values are anomalous. For example, if one of the factors in the scale was an objective number such as population size, a Z-score value would show which countries are outliers relative to others even though population size can differ by orders of magnitude. 
iv) Reduce risk of some biases, for example, in a situation where the score is not converted to a z-score, we may happen to use a higher range of values for one criteria but not for another one, effectively changing its weight. For example, suppose a given intervention is evaluated on each factor on an arbitrary scale of 1 to 10. However, one criterion, scale, varies significantly, and you tend to give out sevens and eights frequently, while on the criterion of tractability, you tend to give very consistent scores of four or five. The net effect is that even if you think tractability is more important, you end up weighting scale higher. The method of converting this to a z-score takes care of this.
More on z-scores can be seen here and in “The Failure of Risk Management” by Douglas W. Hubbard.



Color coding is used throughout the spreadsheet to increase ease of reading, with red values generally being areas of weakness and green values being areas of strength. This can allow the reader to quickly see which areas to look deeper into and which areas lead to the resulting total score. 





In five minutes, the larger-scale questions cannot be considered in depth. Instead, each factor can be intuitively considered and given a ranking; these rankings can then be added together, resulting in a total score. The factors should be understood first (by reading the content in this document and the linked content for each of the metrics). 

Questions to consider

  • How well does the intervention do on only this factor?

  • If I researched deeper, do I expect this intervention would score well or poorly on this factor?

  • Does thinking through each of these subfactors change the score?

Expected outcomes

  • Four subjective ratings (one for each of the large categories)​



  • Consider looking at an important factor as a whole for a few hours and then attributing the ratings to all the interventions for that factor. For example, look at funding allocation in the space for three hours, and then rate all the ideas on availability of funding.

  • This is less relevant for a factor such as strength of the evidence, for example, for which you will be better off going through each intervention one after the other and spending one to two minutes looking for supporting studies on Google.



At the two-hour stage, each question can be considered from the full set of prebuilt questions with deeper research occurring for the most important question within each section. In this stage the questions work as more of a guide than a necessary list to answer. This year, this stage will only be applied to animal advocacy. In other cause areas we will use a different two-hour methodology. 

Questions to consider
The weighted factor model questions

Expected outcomes

  • The first-page summary filled out, connected to sources and ready for polishing

  • A description for each section and subsection and why it was scored that way 

  • A simple causal chain (we suggest using, for example:




At twenty hours, each question in the document should be considered, researched, and answered. Ratings should be given thoughtfully and updated as new evidence comes in.

Template for 20h WFM:

These questions are guide questions and should be answered, even if tentatively. More questions can be added and answered that seem applicable for the specific charity idea.






Key question: When you look at the theory of change that includes assessment of the strength of evidence, and rough cost effectiveness model, how promising does this idea look?

Theory of change and cost effectiveness

​We suggest implementing this section at the end of the factored model. NB do not make a cost effectiveness model because this will be an entirely different section. This section is more to flesh out other people’s CEA models and plausible theories of change that will then be used to create a CEA. 

Key question: What is the plausible path to impact of this charity idea? What do current estimates or expert views of cost effectiveness look like? Does it seem like it compares favorably to other ideas in the area? 


  • Consider possible paths to impact

    • How are other actors in the space making an impact in this area? If you made three different possible paths to impact, what would each look like?

  • What is the causal chain of this charity idea leading to impact in the world?

    • Create a theory of change including long-term effects, and list the evidence base for each step including soft steps like chance of sentience (best resources). If no empirical evidence can be found for long-term effects, note how confident you are in your predictions about it. 

    • What aspects of it are least certain? What aspects are most important? Research each of these.

  • Are there any alternative ways to approach it that have the potential of making it more cost effective? Such as...

    • Only doing a part of the intervention (e.g., marketing not inventing new in vitro meat)

    • Partnering with another organization

    • Pairing it with other interventions that have similar distribution channels 

    • Implementing it on a mass scale, even if it is only effective for a percentage of the population

    • Starting in the country that could absorb change faster 

  • What metric will make the most sense to use when making a formal CEA?

    • Welfare points? DALYs? SWB? 

    • Review previously written metric documents.

  • What do the experts think about the cost effectiveness of this intervention? Are there any CEA models created by other organizations or costs that can be pulled out of studies?


Key question: Overall, how well-evidenced does this intervention look? Does the supporting evidence come from many different robust sources? 


  • Experts

    • What do the experts think from reading their online content?

    • What specific pieces of data do experts point to?

    • Search for critiques of this intervention. What do people say is wrong with it?

  • Quality and quantity of studies

    • Approximately how many randomized controlled trials (RCTs) or other well-designed studies have there been on this intervention or related interventions?

    • Format them into an evidence table like this one

    • Use this template for assessment of evidence base

  • Historical evidence

    • How many charitable dollars have been spent on this intervention?

    • Has there been historical success of interventions like this one?

    • How much good has been done with the average dollar donated to this cause historically?

    • Does this intervention seem responsible for a disproportionate amount of impact per spending put into it?

    • Are there any situation-related factors that should be discounted? 

  • Other sources of data

    • How direct is the effect?

    • Are there any positive macro-level data? 

      • Is there any empirical analysis of broad (e.g., country-level) trends that could inform us about the effectiveness of this approach? 

      • What would we see in a world where this sort of intervention works? What would we see in a world where it does not?

      • Do these macro-level data support what the experts claim about the impact of historical work?

Evidence: Robustness

  • Does the intervention look good across multiple values and epistemic views? 

  • Does this intervention rely heavily on one type or piece of evidence? 

  • Could many things change in the evidence of this intervention that would change the endline conclusion on it?

  • Would this intervention still do well on other plausible metrics outside of the one selected for these questions?

  • Are there any other sources of confirming or disconfirming evidence that could be researched? Brainstorm at least five and research the top two. 

  • Do multiple pieces of evidence converge on the same answer or do they point in opposite directions?​​



​Key question: What is the main limiting factor to scaling this intervention? At what size does it cap the intervention?

Funding availability 

  • Is there reason to think that this intervention will be difficult to market?

  • How much will this appeal to our different group of funders? Within EA? Tangential to EA? Disconnected from EA?

  • What is the total amount of funding in this area?

  • How much would it cost to run this intervention at the smallest level of scale. Is funding available for that?

  • What is the probability that a new funding pool would appear after introducing an intervention?

  • How high is the bar for funding? Are weak projects getting funded in a similar space?

  • How hard or easy would it be to build a funding base up over time?

Talent availability

  • How specific and difficult-to-find a skill set does this charity need in its staff? 

  • What skill sets are required to execute this intervention successfully? Are the skills overlapping with the talent gaps in the EA movement?

  • How many existing charities working in the same field are hiring for leadership and management positions?

  • How many existing charities working in the same field are hiring for positions that are crucial for the success of this intervention?

  • Could the talent gap be closed by non-EA/AR experts? 

Counterfactual replaceability

  • Are there any strong charities working in this area?

  • Are there many other people/groups in the area?

    • Who are the other major bodies working on this intervention?

    • How many standard charities seem to be working in the area? 

    • Are charities activity planning on moving into the area?

    • How common is this specific intervention in nonprofits that work in the cause area?

    • Is there a government program in place doing this intervention?

    • Are new orgs getting founded in the area? 

    • What percentage of this problem seems to be uncovered by other actors in the space (a rough idea)?

  • Are they big and competent?

    • Are they doing a good job? 

      • Are they doing what seems to be the most cost-effective intervention?

      • Do they seem scientifically minded and aiming to do the most good? 

      • What is your rough impression of them? 

    • What areas are covered? What do the coverage rates look like? Is it money or is there some other obstacle to full coverage?

  • Will they generate relevant research that your group can use as time goes on?

  • Could organizations/groups be influenced to do a better job? 

  • If the area is not crowded, then why?

  • Is the attention toward this invention growing fast?

Size of problem

  • On what scale could this intervention be delivered? Is it an ultimate limit or just a matter of diminishing returns?

  • Approximately how many animals/humans are affected and what percentage of the problem is covered by other orgs.

Logistical bottlenecks 

  • What is the total number of hires who would be effective in this area?

  • Is the intervention a one-off thing or does it need to be regularly repeated?

  • Would this need to be evaluated with an RCT before scaling?

  • How fast could this be scaled?

  • Could it be run not only by employees but also by interns and volunteers? 

  • Are some parts of the intervention easy to automatize? 

  • Would it be easy to build a community around the organization?




Key question: Overall, how hard is it to set up and run this intervention well relative to others on the list?

Difficulty of founding 

  • Would a charity in this area have to establish any major partnerships with corporate or government bodies, or could it be mostly run with in-house staff?

  • What do experts see as the biggest challenge of running this intervention? 

  • Cost structure

    • Where will the money be spent on this intervention? What is the rough breakdown of costs between staff, materials, manufacturing, logistics (supply chain and distribution), technology, and administration?

    • How much of the work of administering the program can be performed by employees recruited locally? 

    • Are any of the steps provided for free or at discounted rates? (e.g., flyers can be provided for free by other NGOs; Google grants)

    • Can we bring down costs over time with scale, technology, or the right expertise? 

  • Will this area seem intimidating to founders?

Difficulty of running well 

  • Do experts consider executing this intervention to be easy or difficult relative to other interventions?

  • How many people would the charity have to interact well with? 

  • Are there any cultural factors that may make performing this intervention difficult? 

  • What is the biggest barrier to successful implementation?

  • What have been the stumbling blocks for charities/governments who have tried this intervention before?

  • Are there any unexpected factors that make this intervention vulnerable to lower impact if things don’t go as planned? For example, reliance on tech, need for highly skilled people, low uptake of intervention, etc.

  • Does it have value drift, affecting factors that might make a strong EA founder lose his or her focus on altruism? 

  • Is cost effectiveness highly sensitive to details of ongoing staff decision-making. i.e., will it fail to be cost effective if they make a subtle wrong call on a, b, c, d, e, f…? 

  • Is there anyone (private persons/farmer’s associations/political bodies/other activists) who would be opposed to this intervention? How much influence and power do they have? Do you see a potential for conflict resolution?

  • Have past projects in the area failed? 

Feedback loop 

  • How hard is it to test the effectiveness of this intervention?

    • How feasible is it to run micro pilots or RCTs on this intervention?

    • How quickly could this intervention be tested? (e.g., how long does treatment take?)

  • How clear are progress metrics on a short time frame, e.g., month to month?

  • What would be the evidence that would show this charity is working 

    • Well? 

    • OK?

    • Poorly?

  • How easy would it be for an outsider to see if the charity is performing well or not?

Probability of success 

  • Is this intervention hits based or reliable in generating results?

  • What has the historical hit rate been for other groups doing this?

  • Is the expected value of this a power law distribution or bell curve distribution? 

  • Is the success metric more robust or more fragile? Does the metric rely on many assumptions to count as a “hit”?



​Key question: What other possible negative and positive effects will this charity have? How large are they estimated to be? How much evidence is there? How much confidence do you have in the effects?

Within cause area

  • Could this intervention turn people off the cause area as a whole?

  • Could this intervention affect group interactions within the cause area in a negative way?

  • Look for data on whether anyone has reported being negatively affected by this intervention in the past for any reason.

  • What is the risk of this intervention causing damage within the cause area?

  • Can this risk be mitigated? If so, how?

  • What is the risk of value drift for cofounders who work in this area?

  • Relative to other ideas in the area, will it be easy or hard to be transparent while implementing this idea?

  • Could a field be largely changed in a more positive direction if a charity was founded in the area?

  • What are the odds this field will grow significantly in the next ten years? Fifty years?

  • Does this intervention have a net effect of support or harm for other charities in the area?

  • If successful, would the actions increase the probability of success in similar actions of other groups? (e.g., legislative change in one country would make it easier to change the law in another country.)

  • Would starting a charity in this area be generally recognized as a good thing? From supporters of this cause area?

Outside of cause area

  • Are there any huge flow-through effects that could dwarf the direct effects of this cause (e.g., meat eater problemsmall animal replacement problem)? 

  • Are there any significant positive effects that could improve the impact (e.g., environmental impacts, animal impacts)?

  • Does it affect the culture’s morals in a positive direction or expand the moral circle?

  • Does this intervention have effects in terms of helping humans’ GDP or speeding up science and technology?

  • Does this intervention promote cross-applicable ideas such as anti-speciesism that could help in the far future?

  • Is there any evidence that this program will reduce wild animal suffering (including bugs)?

Information value

  • Relative to other interventions based on priority programs, does this intervention provide the charity world with learning value? (e.g., additional evidence)

  • Is the field young or new?

  • Would starting a charity in this area give you skills you could pass on to other charity entrepreneurs?


  • Will this charity be able to help other new charities in the future?

  • Is it a good time to start this charity vs. starting it at any other point in the future?​

How WFM compares to other methods used (timeline of an 80-hour report) 

  • [10 hours]Broad undirected reading and crucial considerations (informed consideration)

  • [16 hours] Directed research (weighted factor model)

  • [10 hours] Finding and talking to experts (experts)

  • [20 hours] Cost-effectiveness analysis creation (CEA)  

  • [4 hours] Directed research (weighted factor model) 

  • [10 hours] Summary writing and internal contemplation (informed consideration) 

  • [10 hours] Showing endline report to experts (experts)

Expected outcomes

  • The full document of questions filled out in sufficiently clear detail that a reader could follow it

  • A description for each section and subsection and why it was scored that way

  • A strong theory of change model

  • Filled-out details in the charity idea comparison spreadsheet 



1) Our process for narrowing down which charity ideas to research
2) Metrics
3) Cost-effectiveness
4) The importance of evidence
5) The importance of being flexible
6) Why you should care about scalability
7) Why you should care about indirect effects
8) How logistics will influence which intervention you pick
9) Counterfactual impact: what would happen if you didn’t act?
10) Why we look at the limiting factor instead of problem scale
11) Using a spreadsheet to make good decisions 
12) Sequence thinking vs cluster thinking
13) Larrick, Richard P. “Broaden the decision frame to make effective decisions.” Handbook of Principles of Organizational Behavior (2009): 461–480.

What is the Weighted Factormodel
Why is This a Helpful Methodolgy?
Why It Is Not Our Only Endline Prspective
Deeper Reading
Different Lenghts of WFE
How CE generated the WFM
How Much Weight We Give to the Factor Model
bottom of page