Monday, June 9, 2014

GISRUK 2014 - Crowdsourced Data and Agent-based Modelling

Quite a while after the event, I should say that I enjoyed the conference at GISRUK this April, and I particularly enjoyed talking to some really great researchers about their current projects. It's been rare that I've attended a conference in which so many sessions were so applicable to my research; many thanks to the organisers and the amazing job they did! Also, whoever suggested the ceilidh is great and should feel really good about him/herself.

As for my presentation, I was interested in how rapidly the datasets generated in response to crisis scenarios efforts become usable in a modelling context. Essentially, I wanted to address the question: at what point can we use crowdsourced data in a crisis/crowdsourcing scenario?


The context

A while back, Andrew Crooks and I built a basic model of aid distribution in Haiti after the 2010 earthquake (documented in this paper). The model made use of a number of different data sources, many of which had to be crowdsourced because of the paucity of good information about Haiti before the earthquake.

Figure from the paper. Data on the devastation focused on Port-au-Prince. Model inputs include: geo-referenced information about the (A) amount of damage to buildings, (B) the population, and (C) roads and aid points.

Based on the underlying data about where people were, how damaged their houses were likely to be, and the state of the road network, we simulated how different aid centre setups might be utilised by the population, trying to gain a sense of the relative quality of the various setups. The setups were analysed based on how much aid was successfully distributed and how energetic/healthy the population was at the end of the simulation.

By design, agent movement was influenced by the road setup. In the original paper, the road network was drawn from OpenStreetMap a few months after the earthquake; as a result, the model was run on a dataset which volunteers had had time to construct and clean. And they did a tremendous amount of work, as shown here:


OpenStreetMap - Project Haiti from ItoWorld on Vimeo.

Given that the amount of data available to responders changed rapidly over the first few days of the crisis, I wondered when the data could be pulled into an agent-based model and meaningfully used.


Testing the model

To that end, I decided to run the model with a particular aid centre setup on a variety of different extractions of the data, essentially running the model on the roadmap as it existed at different points in time. Six different datasets are drawn from the extractions of OpenStreetMap by Geofabrik. The extractions I utilised were taken from the GeoCommons repository. To give a sense of the development of the road network over time, the following image compares the change in road mapping within Port au Prince during the second week of mapping:

Map showing OpenStreetMap data for Port au Prince as of different days of January, 2010. Newer roads are in yellow.

So how did the model fare on these different datasets?

As in the original paper, I compare the results along a number of different metrics, specifically the cumulative amount of energy left among all of the individuals at the end of the simulation and the amount of aid which goes unconsumed in each case. In the following table, I show the resulting statistics from 100 runs each of the various extracted datasets where 80 units of aid are initially available.


Dataset
Energy
Aid Leftover
GeoFabrik (Jan 18)1732063215 (554143)15.2 (2.2)
GeoFabrik (Jan 19)1732124066 (689490)14.7 (2.8)
Ouest (Jan 27)1732148510 (552321)20.1 (1.6)
Ouest (Jan 29)1732659245 (594081)20.5 (1.3)
Ouest (Feb 9)1732432082 (634083)20.4 (1.5)
Ouest (Feb 16)1731946456 (676668)20.2 (1.4)
The average value (standard deviation) of each metric for the given dataset.

Analysing the results

Looking at the results, it seems clear that there exists some variation depending on the data. Individuals make choices about whether to seek out aid depending on their perceived costs associated with getting to it, and the results reflect this decision-making. There's a clear break between the scenarios using data generated after January 27 versus by January 19 when it comes to the amount of aid left over. In terms of energy levels, the difference is far less pronounced. If more individuals seek out aid, there is more crowding around the centres, which can be costly in terms of energy and can limit aid distribution.

So what does this suggest for future work about crowdsourcing data?

Firstly, the fact that the same aid setup scenario produces different results depending on the dataset suggests that the datasets we use are in fact important, and that we haven't been worrying about this in vain all this time (a comfort!).

Secondly, the results shown here suggest that despite the changing quality of the road network, the results of the model are usable fairly early in the game; the food utilisation changes, but not radically, and the overall energy levels remain fairly constant. Differences certainly exist, but they won't prevent the results from being usable.

Future steps

Based on this work, I remain enthusiastic about the potential of agent-based models to contribute in this kind of context. I made some bold assertions in the previous paragraph, and I think they bear further exploration:
  • exactly how do the results change over time, in terms of the experiences of individuals or specific populations?
  • can we project which areas will be most influenced by the data and either forecast or direct data-gathering accordingly?
  • what steps can modellers take to build uncertainty and assumptions of uncleaned data into these models so that they are useable in these contexts?
  • where are the tipping points in the dataset? How do behaviours interact with the data we utilise?
Obviously these are bigger than a single blogpost, but I'll be thinking about them and posting about them in the future!

No comments:

Post a Comment