Tuesday, June 17, 2014

MSF Scientific Day: Understanding the Spread of Disease

I wrote earlier about having had the pleasure of working with Ivan Gayton of Medecins Sans Frontieres (MSF) on a project which was shown at MSF's Scientific Day. The goal was to explore how agent-based modelling can help researchers understand how disease spreads in space and time.

ABM is a powerful tool in this context because of its ability to fuse a number of important features which influence the spread of disease. In particular, it can bring together aspects of disease which elude other methodologies:

  • ABM can incorporate spatiality in a way that, for example, SIR models can't, capturing the patterns that John Snow highlighted in his early work in GIS
  • because ABM can simulate individuals with heterogeneous ages, healths, access to resources, and so forth, it allows for richer representation of disease transmission (i.e. the unusual tendency of Swine Flu to infect the young more than the elderly). System dynamics models of any kind have difficulty doing the same
  • human behaviours such as seeking treatment or moving around the environment rather than staying home obviously influence the spread of disease; these dynamic processes can't be addressed by standard GIS techniques
  • all of these apply not only to the susceptible individuals, but to the disease and its vectors as well: simulating how tsetse flies move relative to a river or how long a reservoir of bacteria will live on a water pump allows us to project the interactions among these things

Putting my money where my mouth is, Ivan and I have been developing a simulation - or rather, a library to support simulations - which can simply and quickly generate patterns of the spread of disease. The model Ivan presented at MSF Scientific Day highlighted how human-to-human infections such as influenza can spread through an environment, in contrast with water-borne or insect-borne diseases.

A sample visualisation of human-to-human infections is shown in the following video, which reflects a very simple model of individuals moving around the environment throughout the incubation period of the disease, only choosing to seek treatment or stay home during the second phase of the disease. Individuals have homes, and may choose to visit random "friends" based on how distant they are: they therefore frequently make short trips and occasionally travel much longer distances.



This is a very simple case, with very simple behaviours and transmission mechanisms; regardless, it tells the story of how individuals bring diseases home from the larger city to their small towns. It suggests how an epidemic might develop in space and time, and it represents a first step toward understanding how we can respond to and limit the outbreak of disease.

I'm very interested in this project, and while I don't have much time to devote to it, I hope to be writing about it here in the future. In the meantime, I have some (shamefully underdocumented) code available on GitHub here, so do watch both these spaces for updates.

Monday, June 9, 2014

GISRUK 2014 - Crowdsourced Data and Agent-based Modelling

Quite a while after the event, I should say that I enjoyed the conference at GISRUK this April, and I particularly enjoyed talking to some really great researchers about their current projects. It's been rare that I've attended a conference in which so many sessions were so applicable to my research; many thanks to the organisers and the amazing job they did! Also, whoever suggested the ceilidh is great and should feel really good about him/herself.

As for my presentation, I was interested in how rapidly the datasets generated in response to crisis scenarios efforts become usable in a modelling context. Essentially, I wanted to address the question: at what point can we use crowdsourced data in a crisis/crowdsourcing scenario?


The context

A while back, Andrew Crooks and I built a basic model of aid distribution in Haiti after the 2010 earthquake (documented in this paper). The model made use of a number of different data sources, many of which had to be crowdsourced because of the paucity of good information about Haiti before the earthquake.

Figure from the paper. Data on the devastation focused on Port-au-Prince. Model inputs include: geo-referenced information about the (A) amount of damage to buildings, (B) the population, and (C) roads and aid points.

Based on the underlying data about where people were, how damaged their houses were likely to be, and the state of the road network, we simulated how different aid centre setups might be utilised by the population, trying to gain a sense of the relative quality of the various setups. The setups were analysed based on how much aid was successfully distributed and how energetic/healthy the population was at the end of the simulation.

By design, agent movement was influenced by the road setup. In the original paper, the road network was drawn from OpenStreetMap a few months after the earthquake; as a result, the model was run on a dataset which volunteers had had time to construct and clean. And they did a tremendous amount of work, as shown here:


OpenStreetMap - Project Haiti from ItoWorld on Vimeo.

Given that the amount of data available to responders changed rapidly over the first few days of the crisis, I wondered when the data could be pulled into an agent-based model and meaningfully used.


Testing the model

To that end, I decided to run the model with a particular aid centre setup on a variety of different extractions of the data, essentially running the model on the roadmap as it existed at different points in time. Six different datasets are drawn from the extractions of OpenStreetMap by Geofabrik. The extractions I utilised were taken from the GeoCommons repository. To give a sense of the development of the road network over time, the following image compares the change in road mapping within Port au Prince during the second week of mapping:

Map showing OpenStreetMap data for Port au Prince as of different days of January, 2010. Newer roads are in yellow.

So how did the model fare on these different datasets?

As in the original paper, I compare the results along a number of different metrics, specifically the cumulative amount of energy left among all of the individuals at the end of the simulation and the amount of aid which goes unconsumed in each case. In the following table, I show the resulting statistics from 100 runs each of the various extracted datasets where 80 units of aid are initially available.


Dataset
Energy
Aid Leftover
GeoFabrik (Jan 18)1732063215 (554143)15.2 (2.2)
GeoFabrik (Jan 19)1732124066 (689490)14.7 (2.8)
Ouest (Jan 27)1732148510 (552321)20.1 (1.6)
Ouest (Jan 29)1732659245 (594081)20.5 (1.3)
Ouest (Feb 9)1732432082 (634083)20.4 (1.5)
Ouest (Feb 16)1731946456 (676668)20.2 (1.4)
The average value (standard deviation) of each metric for the given dataset.

Analysing the results

Looking at the results, it seems clear that there exists some variation depending on the data. Individuals make choices about whether to seek out aid depending on their perceived costs associated with getting to it, and the results reflect this decision-making. There's a clear break between the scenarios using data generated after January 27 versus by January 19 when it comes to the amount of aid left over. In terms of energy levels, the difference is far less pronounced. If more individuals seek out aid, there is more crowding around the centres, which can be costly in terms of energy and can limit aid distribution.

So what does this suggest for future work about crowdsourcing data?

Firstly, the fact that the same aid setup scenario produces different results depending on the dataset suggests that the datasets we use are in fact important, and that we haven't been worrying about this in vain all this time (a comfort!).

Secondly, the results shown here suggest that despite the changing quality of the road network, the results of the model are usable fairly early in the game; the food utilisation changes, but not radically, and the overall energy levels remain fairly constant. Differences certainly exist, but they won't prevent the results from being usable.

Future steps

Based on this work, I remain enthusiastic about the potential of agent-based models to contribute in this kind of context. I made some bold assertions in the previous paragraph, and I think they bear further exploration:
  • exactly how do the results change over time, in terms of the experiences of individuals or specific populations?
  • can we project which areas will be most influenced by the data and either forecast or direct data-gathering accordingly?
  • what steps can modellers take to build uncertainty and assumptions of uncleaned data into these models so that they are useable in these contexts?
  • where are the tipping points in the dataset? How do behaviours interact with the data we utilise?
Obviously these are bigger than a single blogpost, but I'll be thinking about them and posting about them in the future!

Monday, June 2, 2014

Reboot!

After a long silence, I can report that I've ~FINISHED MY THESIS~ and am reemerging into the world of humans. To that end, I'll attempt to be actually blogging in the near future, god help us all.

Since last I posted, I've had a few big-ticket items I wanted to mention:

  1. I successfully wrote and defended my thesis; as of May 17, I am now Dr. Wise (definitely either a superhero or supervillain name, it remains to be seen which)

  2. CASA was generous enough to award me the best paper award at GISRUK 2014, for research I'll be posting about here in the coming weeks

  3. friend and collaborator Ivan Gayton of Medecins Sans Frontieres (MSF) presented some work we've been doing simulating the spread of disease at MSF Scientific Day on May 30. I'll talk about that work a bit more as it develops further
That's about the shape of things at the moment. I'll be writing here every week or so about what I'm doing, so please do watch this space!