A pre-analysis plan checklist (2023)

A pre-analysis plan is a step-by-step plan setting out how a researcher will analyze data which is written in advance of them seeing this data (and ideally before collecting it in cases where the researcher is collecting the data). They are recently starting to become popular in the context of randomized experiments, with Casey et al. and Finkelstein et al.’s recent papers in the QJE both using them. There is also some discussion in political science – see this recent paper by Macarten Humphrey’s and co-authors.

There are several goals in specifying an analysis plans, but one important reason is to avoid many of the issues associated with data mining and specification searching by setting out in advance exactly the specifications that will be run and with which variables. This is particularly important for interventions which have a whole range of possible different outcomes, like the CDD programs looked at by Casey et al. They look at 334 different outcomes, and illustrate that they could have picked 7 outcomes that made their program look like it strengthened institutions, or alternatively have picked 6 alternate outcomes that make the program look like it weakened institutions. This is less of an issue in evaluating many other policies in which there are one or two most important key outcome (e.g. profits and sales for a firm intervention, attendance and test scores for a school intervention, or incidence of some disease for some health interventions). But even in those cases there are often many different possible choices of how to measure the key outcome, so some ex-ante discipline on how this outcome is defined can be useful.

(Video) 17 Pre-Registration and Pre-Analysis Plans: A Pre-Analysis Plan Example - A Sierra Leone Study (FR)

I’m new to writing these, but have now done them for four different projects. They take quite a bit of work to put together, but I have found two other really useful results from doing them. First, they help in thinking through questionnaire design. By mapping every equation of interest that we want to estimate to the specific questions in the questionnaire that measure the variables in these equations, we can make sure that we don’t inadvertently omit variables that we need to know, as well as thinking more carefully about how to measure outcomes in ways which are most amenable to use in analysis. Second, once the data is collected, data analysis is much quicker and easier, since the pre-analysis plan provides a roadmap to follow through, and you pretty much have half the paper already written.

So what should you include in a pre-analysis plan? Here is my checklist for writing one for an evaluation using a randomized experiment – many of the same items would apply for using other types of data:

1. Description of the sample to be used in the study: this should include discussion of how the sample was obtained, what the expected sample size is, how randomization was done (see my paper with Miriam Bruhn for a checklist on what should be reported on this), and what variables will be included in tests of randomization balance and in tests of survey attrition.

(Video) Reshaping Institutions: Evidence on Aid Impacts Using a Pre-analysis Plan

2. Key data sources: discussion of what the key sources of data will be for the study, including which surveys are planned, and what types of administrative data are planned.

3. Hypotheses to be tested throughout the causal chain: this should specify the key outcomes of interest, the steps along the causal chain to be measured, and the subgroup or heterogeneity analysis that is to be done and the hypotheses that accompany each of these tests. These should be as specific as possible, and link each outcome specifically to how it will be measured. For example, rather than just saying the outcome will be employment, you should say that the “outcome will be employment, as measured by question D21 on the follow-up questionnaire which asks whether the individual currently works for 20 hours or more per week.”

4. Specify how variables will be constructed: this includes, for example, where log or levels of particular variables will be used, how missing variables will be handled, what procedures will be used to deal with outliers, etc. For example, “hours worked per week in last month employed will be measured by question D25 on the follow-up survey. This will be coded as zero for individuals who are not currently working; This will be top-coded at 100 hours per week (99th percentile of baseline response) to reduce influence of outliers. No imputation for missing data from item non-response at follow-up will be performed. We will check whether item non-response is correlated with treatment status following the same procedures as for survey attrition, and if it is, construct bounds for our treatment estimates that are robust to this.”

(Video) How to Prepare for Your Trading Day; My Pre Market Checklist 👍

5. Specify the treatment effect equation to be estimated: for example, is a difference-in-differences, ancova, or post specification to be used? What controls will be included in the regression? How will standard errors be calculated? The exact equation to be estimated should be written out.

6. What is the plan for how to deal with multiple outcomes and multiple hypothesis testing? As noted in my post last week, there are a number of methods for dealing with multiple hypothesis testing. These typically involve either aggregating different measures into a single index – in which case one needs to specify precisely which variables will get included in this aggregate; or what variables will be considered as part of the same family when looking at outcomes within a family of domains.

7. Procedures to be used for addressing survey attrition: what checks will be done for attrition, and what adjustments will be made if these checks show that there is selective attrition?

(Video) The Ultimate Tax Planning Checklist!

8. How will the study deal with outcomes with limited variation? An issue which can arise is that the intent is to look at impacts on an outcome which ex post it turns out that everyone in the control group does and where the intended treatment is meant to increase this outcome. There is no power to be gained from looking at this type of outcome, and including it in a family of outcomes can reduce the power to detect an overall impact. So, for example one can write “In order to limit noise caused by variables with minimal variation, questions for which 95 percent of observations have the same value within the relevant sample will be omitted from the analysis and will not be included in any indicators or hypothesis tests. In the event that omission decisions result in the exclusion of all constituent variables for an indicator, the indicator will be not be calculated.” Likewise, one might pre-determine that outcomes which have item non-response rates above a certain threshold will be omitted.

9. If you are going to be testing a model, include the model: in many cases papers include a model to explain their findings. But often these models are written ex post as a way of trying to interpret the results that are found, but presented in a way that makes it seem like the point of the paper is to test this model. Setting out a model in advance makes clear the model the authors have in mind before seeing the data – as well as making sure they collect information on all the key parameters in this model.

10. Remember to archive it: Since part of the purpose of doing this is to pre-commit to examining particular measures and outcomes, it is important to make sure this can be verified by others. A couple of examples are the JPAL Hypothesis Registry (which is where I have filed mine so far) and the EGAP (Experiments in Governance and Politics) registry. Note that this does not need to mean that the pre-analysis plans are publicly viewable before the study is released – instead they are time-stamped and released upon the request of the researcher. This is important if you are worried about the possibilities of contamination of the experiment if its details are available online. The AEA and 3ie are currently developing registries of RCTs and developing country evaluations respectively, which will allow, but as far as I know so far not mandate, the option of filing such a detailed plan.

(Video) Do THIS Before EVERY Trade you Take: Profitable Trading Checklist 😅📈

If you are interested in writing one, some examples can be found in the 4 currently publicly viewable examples at the JPAL registry, this example for a program in Afghanistan filed in the EGAP registry, and here is one of mine – for a project that tests vocational training in Turkey.

I should note that there are a number of concerns researchers have about tying their hands too much – Casey et al discuss these trade-offs in some detail, and I think most people agree that the use of these plans is not to rule out the possibility of surprise discoveries (see Bill Easterly’s satirical post showing Columbus had no impact since he did not find what he was initially looking for). Their use is still rare in economics, so I am sure we will learn a lot more about how to do them and use them over the coming years. Readers: Any other ideas of key things that should be in the checklist that I am omitting?


How do you write a pre-analysis plan? ›

In A Pre-Analysis Plan Checklist, David McKenzie outlines the elements to include in a PAP: a description of the sample to be used in the study, key data sources, hypotheses to be tested throughout the causal chain, a specification of how variables will be constructed and how the treatment effect equation will be ...

What are pre-analysis plans? ›

A pre-analysis plan (PAP) is filed publicly, typically before intervention start but at the latest before data analysis begins, and describes how the researchers plan to conduct the study and analyze the resulting data.

What should a preliminary analysis include? ›

Preliminary analyses on any data set include checking the reliability of measures, evaluating the effectiveness of any manipulations, examining the distributions of individual variables, and identifying outliers.

What are the 4 steps to write an analysis? ›

Choose your argument. Define your thesis. Write the introduction. Write the body paragraphs.

How do you conduct a preliminary analysis? ›

A preliminary analysis is fairly simple. First, outline your plan. Second, examine the potential market and the commercial viability of your plan. Third, assess the strengths and weaknesses of your plan.

What is the main objective of preliminary analysis? ›

The objectives of preliminary data analysis are to edit the data to prepare it for further analysis, describe the key features of the data, and summarize the results. This chapter deals with quantitative and qualitative approaches to achieving these objectives.

What are the 3 steps of analysis? ›

These steps and many others fall into three stages of the data analysis process: evaluate, clean, and summarize.

What is an analysis plan? ›

An analysis plan helps you think through the data you will collect, what you will use it for, and how you will analyze it. Creating an analysis plan is an important way to ensure that you collect all the data you need and that you use all the data you collect. Analysis planning can be an invaluable investment of time.

What are the 8 stages of data analysis? ›

data analysis process follows certain phases such as business problem statement, understanding and acquiring the data, extract data from various sources, applying data quality for data cleaning, feature selection by doing exploratory data analysis, outliers identification and removal, transforming the data, creating ...

How many steps are there in analysis? ›

There are six steps for Data Analysis.

What are the main parts of an analysis? ›

The elements to be analyzed are plot, setting, characters, point of view, figurative language, and style. This will serve as the evidence/support throughout your paper. Setting: Is there anything to note about the role that time of day plays in the story?

What is the format of analysis? ›

Format analysis is used to validate the pattern of characters used to store a data value in selective columns (for example, telephone numbers, Social Security numbers) that have a standard general format.

What is the analysis stage of a project? ›

The purpose of the Analysis Phase is to thoroughly examine and evaluate the viable options identified in the Feasibility Phase and to arrive at an optimum high-level solution that will satisfy the client's requirements and the project's constraints.

What are the two main types of analysis? ›

Descriptive and inferential are the two general types of statistical analyses in quantitative research.

What are the three 3 kinds of data analysis? ›

There are three types of analytics that businesses use to drive their decision making; descriptive analytics, which tell us what has already happened; predictive analytics, which show us what could happen, and finally, prescriptive analytics, which inform us what should happen in the future.

What is data analysis with example? ›

It is concerned with “What is likely to happen.” Used in sales analysis, it often combines demographic data and purchase data with other data points to predict the actions of customers. For example, as the demographics of a certain area change, this will affect the ability of certain businesses to exist there.

What does a preliminary analysis look like? ›

Preliminary analyses on any data set include checking the reliability of measures, evaluating the effectiveness of any manipulations, examining the distributions of individual variables, and identifying outliers.

What is preliminary analysis explain it in detail with all phases? ›

Preliminary analysis means a decision process and document that evaluates a determination of source classification, of applicable emission and effluent sources and throughput and consumption, emission factors, monitoring plans and a determination of compliance with applicable environmental requirements within the EMS ...

What is the initial analysis process? ›

Perspective Initial data analysis is the process of data in- spection steps to be carried out after the research plan and data collection have been finished but before formal statistical analyses. The purpose is to minimize the risk of incorrect or misleading results.

What is the main purpose of analysis? ›

Analysis (PL: analyses) is the process of breaking a complex topic or substance into smaller parts in order to gain a better understanding of it.

What is preliminary project analysis report? ›

Preliminary Project Report means the report to be prepared by the Construction Consultant of its review of the Project Cost Budget, the Plans and Specifications, the Construction Schedule, the Disbursement Schedule, the Construction Manager's Agreement, the Trade Contracts, the Major Contracts, tests and all other ...

What comes after preliminary analysis? ›

After the preliminary analysis, the analyst should report the findings to management, with recommendations outlining the acceptance or rejection of the proposal.

What is checklist analysis method? ›

Checklist analysis (CLA) is a technique that can be used to identify and manage risk. The checklist is developed by listing items, steps, or tasks and is then analyzed against criteria to determine if the procedure is completed correctly.

What is simple checklist method? ›

Under checklist method, a checklist is forwarded to the rater regarding the performance and behaviour of the employees. The rater on analysing the question and the employees, rates the employees. Such questions carry score which is given by the HR manager.


1. Do THIS before every trade - 10 step trade checklist
2. Architectural Site Analysis Part 1 - The Checklist
(First In Architecture)
3. Pre Production Planning Checklist
(Demand Metric)
4. Affiliate Marketing Checklist - The Ultimate Checklist for Beginners
(Tyler Stokes)
5. Checklist for Starting a Business | SUCCESS from the Start
(The Freedom Strategy)
6. Day 1 | Rachael Meager: Improved Specification: Pre-registration and Pre-Analysis Plans
(Berkeley Initiative for Transparency in the Social Sciences (BITSS))
Top Articles
Latest Posts
Article information

Author: Twana Towne Ret

Last Updated: 11/23/2022

Views: 5824

Rating: 4.3 / 5 (64 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Twana Towne Ret

Birthday: 1994-03-19

Address: Apt. 990 97439 Corwin Motorway, Port Eliseoburgh, NM 99144-2618

Phone: +5958753152963

Job: National Specialist

Hobby: Kayaking, Photography, Skydiving, Embroidery, Leather crafting, Orienteering, Cooking

Introduction: My name is Twana Towne Ret, I am a famous, talented, joyous, perfect, powerful, inquisitive, lovely person who loves writing and wants to share my knowledge and understanding with you.