top of page
  • shreyatandon0209

Promoting Transparency and Rigor in Experimental Work

Updated: Sep 6, 2021


Alongside my work on the Sri Lanka housing lottery I’ve been helping out on a project related to tax administration in Indonesia. In 2018 Professor Hanna and her co-authors launched a large-scale RCT where tax offices (KPPs) in Indonesia were randomly assigned to receive additional personnel or increased funding to help them improve tax enforcement and increase tax revenues. As part of this larger study, the team conducted a survey experiment to understand whether priming tax office employees to think about tax fairness changes their approach to tax enforcement.

Tax officers were asked to consider a hypothetical increase in the tax revenue target of their KPP and asked how they would prioritize hypothetical taxpayers for follow-up enforcement activities such as sending penalty notifications for late payment of taxes, in-person visits to collect delinquent payments, and audits. One-third of the respondents were asked for their opinion on progressive taxation and the fair distribution of tax burdens before ranking taxpayers, while the rest answered these questions afterwards. I have been analyzing data from the survey experiment to determine whether being primed to think about tax equity changes how tax officers prioritize taxpayers for additional follow-up. This is my first time working on an RCT and I’ve enjoyed learning more about best practices for experimental research in economics!


There has been a big push towards increased transparency in the field over the last few years. In settings where researchers have a lot of discretion over how to process and analyze data, for example when there are several ways to define the key outcome variable or several sub-groups within which one could test for treatment effects, there may be a tendency to engage in data mining in the hope of uncovering a statistically significant treatment effect. Researchers may experiment with different combinations of regressors, different sub-samples, different functional forms, or different estimation techniques and then only report those regressions with significant coefficients or coefficients with a particular sign (Hoover and Perez (2000)). As noted in Deaton (2009): “A sufficiently determined examination of any trial will eventually reveal some subgroup for whom the treatment yielded a significant effect of some sort, and there is no general way of adjusting standard errors to protect against the possibility. In drug trials, the FDA rules require that analytical plans be submitted prior to trial, and at least one economic experiment—moving to opportunity—has imposed similar rules on itself.”


This is a concern because these cherry-picked regression estimates do not reflect the true treatment effect of the program being evaluated. Casey, Glennerster, and Miguel (2012) measure the impact of the GoBifo program, a community-driven development project, on social capital. They collected over 300 indicators to measure social capital, conducted a comprehensive examination of all outcomes, and concluded that the program had zero impact overall. Importantly, the authors noted that running 300+ regressions and selectively reporting the few statistically significant results would have given the impression that GoBifo had positive or negative impacts on a particular aspect of social capital, whereas if the results of all regressions had been reported, it would have been clear that there was no impact on a vast majority of the outcome variables. Similarly, if researchers test for treatment effects in every possible sub-group, they are likely to find a significant result in one of these purely by chance.


To address this issue, pre-analysis plans (PAPs) have become increasingly common for RCTs. PAPs require researchers to lay out the outcomes of interest, regressions specifications, and key dimensions of heterogeneity/sub-group analysis before they start analyzing the data. This practice discourages data mining and can lend greater credibility to the findings because the analysis approach is laid out ex-ante.


Another solution to the problem is blind analysis, commonly used in physics. Data or analysis results are hidden from the researcher so they do not have access to information (particularly treatment assignments) that might motivate them to select certain estimation approaches or specifications over others. In keeping with this, we have been using a fake treatment variable when estimating the treatment effects of the survey experiment, rather than the true treatment assignments, until the code and specifications have all been finalized. This seems like a great way of ensuring that analytical decisions aren’t being driven by the results of the experiment, and are guided by theory/first principles instead.


Another recent development is the increased use of randomization statistical inference to test the null hypothesis of no treatment effects. Although random assignment of the treatment variable enables the identification and estimation of causal effects, Young (2019) showed that using conventional econometric tests that are only asymptotically accurate can produce false positives, i.e. causal effects may appear to be statistically significant when they are not. Randomization inference is an alternative approach which allows for the construction of exact tests of statistical significance. Under this approach, the distribution of the test statistic is known regardless of the sample size or the true nature of the errors, i.e. we need not rely on assumptions that are only valid in large samples. This enables us to compute the exact p-values for each estimated coefficient, and thus determine whether the coefficient is significant using our usual thresholds of 0.1, 0.05 and 0.01.


Young (2019) reported that in a sample of 53 experimental studies published in the journals of the American Economic Association, the number of statistically significant results found when using randomization inference was 13 to 22 percent lower than those based on standard inference techniques. The discrepancies in significance of coefficients between these two approaches become particularly pronounced when authors conducted exploratory data analysis in restricted samples, for example checking for treatment effects in a sub-group or testing for heterogeneity of treatment effects based on demographic and other non-randomized characteristics. Keeping this in mind, I have been conducting randomization inference tests for all coefficients of interest in the survey experiment. After studying the theory and conceptual framework behind randomization inference in my econometrics class this Spring, I’ve really enjoyed working on how to implement it!


PAPs, blind analysis, and randomization inference are an important step towards greater transparency and rigor in social science research. PAPs are particularly useful when researchers are partnering with organizations that have a vested interest in the outcome of the program evaluation. That being said, it is certainly difficult to scope out all possible theories before data analysis begins and having to specify the research plan in advance can reduce flexibility. As the study evolves, researchers may come across new information that they wish to incorporate into their research design, for example testing for heterogenous treatment effects along a new dimension. Since using randomization inference to test for statistical significance reduces the risk of false positives, using this approach can help lend credibility to results that emerge from exploratory analyses which were not specified ex-ante.


While working on the tax RCT project, I’ve had the chance to build technical skills in econometrics as well as think about big picture questions of transparency, reproducibility, and ethics in research. This internship has been a great introduction to the nuances of conducting program evaluations and randomized experiments, and I look forward to drawing on my experiences once I start working on my own research projects!


17 views0 comments

Recent Posts

See All
bottom of page