Originally posted on the Global Anticorruption Blog
Many anticorruption advocates are excited about the prospects that “big data” will help detect and deter graft and other forms of malfeasance. As part of a project in this vein, titled Curbing Corruption in Development Aid-Funded Procurement, Mihály Fazekas, Olli Hellmann, and I have collected contract-level data on how aid money from three major donors is spent through national procurement systems; our dataset comprises more than half a million contracts and stretching back almost 20 years. But good data alone isn’t enough. To be useful, there must be a group of interested and informed users, who have both the tools and the skills to analyse the data to uncover misconduct, and then lobby governments and donors to listen to and act on the findings. The analysis of big datasets to find evidence of corruption – for example, the method developed by Mihály Fazekas to identify “red flags” of corruption risks in procurement contract data—requires statistical skills and software, both of which are in short supply in many parts of the developing world, such as sub-Saharan Africa.
Yet some ambitious recent initiatives are trying to address this problem. Lately I’ve had the privilege to be involved in one such initiative, led by Oxford mathematician Balázs Szendrői, that helps empower a group of young African mathematicians to analyse “big data” on public corruption.
The first step in this project was to develop software; this may seem trivial, but many cash-strapped African universities simply don’t have the resources to purchase the latest statistical software packages. The African Maths Initiative (AMI), a Kenyan NGO that works to create a stronger mathematical community and culture of mathematics across Africa, has helped to solve this problem by developing a new open-source program, R-Instat (which builds on the popular but difficult-to-learn statistics package R), funded through crowd-sourcing. Still in development, it is on track for launch in July this year. AMI has also helped develop a menu on R-Instat that can be used specifically for analysing procurement data and identifying corruption risk indicators.
Once we’ve got the data and the software to analyze it, the third and most crucial ingredient are the people. For “big data” to be useful as an anticorruption tool, we need to bring together two groups: people who understand how to analyze data, and people who understand how procurement systems can be manipulated to corrupt ends. Communication between the two is essential. So last month I tried to do my part by visiting AIMS Tanzania, an institute that offers a one-year high-level Master’s programme to some of Africa’s best math students, to help conduct a one-day workshop. After a preliminary session in which we discussed the ways in which the procurement process can be corrupted, and how that might manifest in certain red flags (such as single-bidder contracts), the students had the opportunity to use the R-Instat software to analyse the aid-funded procurement dataset that my colleagues and I had created. Students formed teams and developed their own research questions that they attempted to answer by using R-Instat to run analyses on the data.
Even the simplest analyses revealed interesting patterns. Why did one country’s receipts from the World Bank drop off a cliff one year and never recover? Discussion revealed a few possible reasons: Perhaps a change of government led donors to change policy, or the country reached a stage of development where it no longer qualified for aid? Students became excited as they realized how statistical methods could be applied to identify, understand and solve real-world problems. Some teams came up with really provocative questions, such as the group who wanted to know whether Francophone or Anglophone countries were more vulnerable to corruption risks. Their initial analysis revealed that contracting in the Francophone countries was more associated with red flags. They developed the analysis to include a wider selection of countries, and maintained broadly similar results. Another group found that one-quarter of contracts in the education sector in one country had been won by just one company, and more than half of contracts by value in this sector had been won by three companies, all of which had suspiciously similar names. Again, there might be perfectly innocent reasons for this, but in just a couple of hours, we had a set of preliminary results that certainly warrant further analysis. Imagine what we might find with a little more time!
It is programs like these, that develop the tools and cultivate the skills in the next generation of analysts, that will determine whether the promise of “big data” as an anticorruption tool will be realized in the developing world.
Post written by Dr. Elizabeth Dávid-Barrett of the University of Sussex