Big data. Small data. Better data.

An interview with Nathaniel Heller, Results for Development. Originally published here

By now, I think we can all agree that we’ve reached the peak of big data, returned to base camp, washed our kit and started planning the next climb. For a short while, big data was presented as the solution to all our problems. The premise was simple — collect more data, make it look pretty, push it out and people would start using it to make decisions that would end poverty, expose corruption and reverse unsustainable exploitation of our environment.

But things didn’t work out that way. In the rush to deliver data to the people, the people forgot the people. Bigger didn’t mean better and data dashboards became graveyards filled with withering flowers.

Data designed for the living need to be centered around humans and the unique needs we all have. Results for Development (R4D) is an organisation that puts the users of data at the centre of all their efforts to achieve sustainable progress in health, education and nutrition. I spoke to Nathaniel Heller, Executive Vice President for Integrated Strategies at R4D to learn more about their user-centric approach to data and the importance of thinking ‘small’ when it comes to helping people make better use of data.

“There’s a mistaken belief that if we present people with pretty data, good decisions will happen,” said Nathaniel. “But data isn’t the only input into decision-making. You have to consider the capacity of the governments or organisations involved to carry out the task they’ve been given and what hurdles they have to overcome. The use of data in decision-making is much more nuanced than simply making more data available.”

R4D works with change agents to find long-lasting solutions. Focusing on identifying important and transformational data, R4D will only invest in data tools if there’s a strong case for it. “Sometimes it seems like there’s a data problem,” explained Nathaniel, “but once you start talking to people about what they need, you’ll see there’s another underlying issue that has nothing to do with the data.” It’s these underlying issues that R4D’s user-centric approach to problem solving uncovers.

To illustrate his point, Nathaniel told me about a current project that he’s particularly excited about. R4D spent about year poring over all different kinds of country-level agricultural data in several African countries to identify opportunities for agricultural transformation — the kind of macro shift that has the potential to lift tens of millions out of poverty and address nutritional needs. The initial idea was to create a dashboard and open up access to the data, assuming this would motivate national political leaders to embrace a push for change. But when R4D spotted an opportunity in the data (only a tiny percentage of smallholder farmers in Kenya use inorganic fertilizer), they decided to shift strategies.

In Kenya, getting the right fertiliser can be an expensive and time consuming effort for farmers. A half-day journey to the market might end with the purchase of the wrong fertilizer, or worse, a counterfeit product that does more harm than good. R4D and their partners at the Local Development Research Institute saw an opportunity to create a service that would help people locate the right fertiliser, for the right price, from a location within easy travelling distance.

MazaoPlus+, an SMS service for farmers (and its accompanying Android app used by field agents to onboard users) was built in just two weeks. More than 10,000 farmers have already subscribed to receive fertiliser advice via their phones. We have to wait until harvest time to see if the app has helped improve yields through improved access to fertilizer, but Nathaniel sees a great potential in this service, both in terms of the agricultural impact and potential for scaling up into something bigger.

Screenshot of dashboard page on the MazaoPlus+ platform.

“The Kenyan fertiliser SMS service is a good example of our methods where we emphasise fit-for-purpose principles when it comes to leveraging data; we often focus on the small data, not the big data,” said Nathaniel. “We thought it through first and built second; which is exactly how every project should go.”

Small data is a term that’s never been as popular as big data but it describes data that are presented in a volume and format that’s easy for humans to access and use. Whereas reams of big data can be collected and processed by artificial intelligence, small data is curated by humans for other humans. The personal touch of small data ensures the solutions being developed to improve education, healthcare and agricultural systems are meeting a real need and supporting change.

On their website, R4D speaks about “artificial solutions”, whereby resource-constrained governments find themselves forced to adopt data-for-development tools without adequate planning or data uptake strategies. I asked Nathaniel how these artificial solutions could be avoided. “When someone proposes a solution, you start by asking, ‘has anyone (other than the funder) asked for this?’” said Nathaniel. “If they say yes, good, but if not you need to dig deeper and ask more questions. Structured interviews with potential users provides lots of interesting feedback that will help you understand their needs and pain points, enabling you to determine if the root cause of their problems really is a data issue, or something else entirely.”

Talking and listening to your users to learn what they need is common sense but it’s always worth reminding ourselves why. As Nathaniel and R4D have shown, understanding the needs of people and developing a solution that’s tailored to them will always be more effective than taking a ‘store-bought’ solution and moulding it to their situation. After all, one-size-fits-all rarely fits anyone. When — and only when — data is identified as the true issue, every care must be taken to curate it and package it in ways that are accessible, usable and useful for the users. These are principles Vizzuality shares with R4D, so let’s think small when it comes to big data.

Turning Big Data Into a Useful Anticorruption Tool in Africa

Originally posted on the Global Anticorruption Blog

Many anticorruption advocates are excited about the prospects that “big data” will help detect and deter graft and other forms of malfeasance. As part of a project in this vein, titled Curbing Corruption in Development Aid-Funded Procurement, Mihály Fazekas, Olli Hellmann, and I have collected contract-level data on how aid money from three major donors is spent through national procurement systems; our dataset comprises more than half a million contracts and stretching back almost 20 years. But good data alone isn’t enough. To be useful, there must be a group of interested and informed users, who have both the tools and the skills to analyse the data to uncover misconduct, and then lobby governments and donors to listen to and act on the findings. The analysis of big datasets to find evidence of corruption – for example, the method developed by Mihály Fazekas to identify “red flags” of corruption risks in procurement contract data—requires statistical skills and software, both of which are in short supply in many parts of the developing world, such as sub-Saharan Africa.

Yet some ambitious recent initiatives are trying to address this problem. Lately I’ve had the privilege to be involved in one such initiative, led by Oxford mathematician Balázs Szendrői, that helps empower a group of young African mathematicians to analyse “big data” on public corruption.

The first step in this project was to develop software; this may seem trivial, but many cash-strapped African universities simply don’t have the resources to purchase the latest statistical software packages. The African Maths Initiative (AMI), a Kenyan NGO that works to create a stronger mathematical community and culture of mathematics across Africa, has helped to solve this problem by developing a new open-source program, R-Instat (which builds on the popular but difficult-to-learn statistics package R), funded through crowd-sourcing. Still in development, it is on track for launch in July this year. AMI has also helped develop a menu on R-Instat that can be used specifically for analysing procurement data and identifying corruption risk indicators.

Once we’ve got the data and the software to analyze it, the third and most crucial ingredient are the people. For “big data” to be useful as an anticorruption tool, we need to bring together two groups: people who understand how to analyze data, and people who understand how procurement systems can be manipulated to corrupt ends. Communication between the two is essential. So last month I tried to do my part by visiting AIMS Tanzania, an institute that offers a one-year high-level Master’s programme to some of Africa’s best math students, to help conduct a one-day workshop. After a preliminary session in which we discussed the ways in which the procurement process can be corrupted, and how that might manifest in certain red flags (such as single-bidder contracts), the students had the opportunity to use the R-Instat software to analyse the aid-funded procurement dataset that my colleagues and I had created. Students formed teams and developed their own research questions that they attempted to answer by using R-Instat to run analyses on the data.

Even the simplest analyses revealed interesting patterns. Why did one country’s receipts from the World Bank drop off a cliff one year and never recover? Discussion revealed a few possible reasons: Perhaps a change of government led donors to change policy, or the country reached a stage of development where it no longer qualified for aid? Students became excited as they realized how statistical methods could be applied to identify, understand and solve real-world problems. Some teams came up with really provocative questions, such as the group who wanted to know whether Francophone or Anglophone countries were more vulnerable to corruption risks. Their initial analysis revealed that contracting in the Francophone countries was more associated with red flags. They developed the analysis to include a wider selection of countries, and maintained broadly similar results. Another group found that one-quarter of contracts in the education sector in one country had been won by just one company, and more than half of contracts by value in this sector had been won by three companies, all of which had suspiciously similar names. Again, there might be perfectly innocent reasons for this, but in just a couple of hours, we had a set of preliminary results that certainly warrant further analysis. Imagine what we might find with a little more time!

It is programs like these, that develop the tools and cultivate the skills in the next generation of analysts, that will determine whether the promise of “big data” as an anticorruption tool will be realized in the developing world.

Post written by Dr. Elizabeth Dávid-Barrett of the University of Sussex