Should the Colombian National government invest in a road, in a school or in a hospital? Would they have the same impact? Will they even get built? In a developing country every penny counts and maximizing the investments between thousands of projects is not easy. How should a policy maker prioritize an investment?
The National Planning Agency (NPA) is the entity that oversees the budget allocation in investment projects among the different secretaries of the Colombian national government and the local and regional entities. Every year the NPA allocates on average $15 Billion (5% GDP) into hundreds of different projects. Its main objective is to design policies and allocate the investment budget in the most efficient public policies with the largest impact in the different areas: infrastructure development, efficient justice system, poverty reduction and wealth redistribution, among others. The Agency struggles every year in understanding what the most efficient projects are and which are the most likely to be completed without the need of additional resources.
Big data and machine learning allow to increase the efficiency of the policies selected and implemented in the country. It allows to use information of previously selected and implemented projects and draw trends that will help understand what the most efficient projects are.
One type of projects that the NPA approves and invests on come from proposals from local entities (state governors and mayors). Once the agency receives the different proposals it analyze them and have them go through an investment committee, which decides whether or not to fund them. For the committee it is a big challenge to predict what projects will actually be implemented, which could suffer from corruption, and which will need additional resources to be completed.
In this case, the NPA management has created a small team of data scientists that have been pooling information from over 8,000 projects that have been developed in the country and approved by this committee. They have been developing a machine learning tool that have allowed to draw trends among these projects and understand which are most likely to be successful. With a confidence level of 76% the system has been able to predict which projects are not successful and give a red flag to the committee. If this mechanism had been applied four years ago, the system would have avoided a misuse of almost 30% of the resources (equivalent to almost $3 Billion).
Another example that is being developed and will be implemented in the following two years is an enhancement of the social programs offered by the National Government through machine learning. The mechanism is helping make the data more reliable and show trends on what has been the most successful mix of policies with the highest impact on alleviating poverty. The data scientists are identifying the impact of projects as housing subsidies, conditional cash transfers, pension aid, among others and will suggest the adequate mix of programs considering the population’s observable characteristics. This project is expected to be implemented in the following two years.
The NPA has started a medium-term effort to increase the use of machine learning for decision making inside the National Government. For the following five to ten years the government has committed to design and implement the data infrastructure from all the ministries, automate the processes and procedures and generate the capabilities to mine the data. Also, it will formalize the data science careers in Colombia and incentivize for people to study it.
Even if this process is very helpful in the decision making and prioritization of investments the government should avoid data biases. The poorest municipalities might be over penalized because of the difficulties of making the first investments in these areas. I would suggest calibrating the mechanism for it to weight in these considerations and do not perpetrate the poverty cycles in these underserved communities. Also, I would suggest the NPA to consider information from national surveys and census and track the historic progress of municipalities and the synergies created between several projects.
It is undeniable the importance of data processing in decision making. Colombia has started implementing machine learning to better understand the efficient policies and maximize the impact of the taxpayers’ money in the country’s development. Nonetheless it is important to ask how much we value this additional efficiency and how much we want to risk privacy and information leakage or misuse in the future. How should Colombia move forward and how to protect the people from these threats? (799 words)
 Colombian National Budget 2018. www.minhacienda.gov.co , Retrieved Nov 10, 2018.
 Conversation with Juan Camilo Mejia, former member of the Division of Data Analysis at Colombia’s National Planning Agency. November 10, 2018.
 CONPES 3920 of 2018. https://colaboracion.dnp.gov.co/CDT/Conpes/Econ%C3%B3micos/3920.pdf