Smith turned to a tech-savvy EEB colleague, Sohini Ramachandran, PhD, assistant professor of biology, who with Michael Goldberg ’13 wrote a program to encode the data embedded in the text. A year later, their code had generated enormous spreadsheets organized by variables like location, pathogen, vector, and numbers of hospitalizations and deaths.
“We were floored,” Smith says. “In disease biogeography, there was nothing like that anywhere.”
Crunching the Numbers
Ramachandran, a faculty member in the Center for Computational Molecular Biology who studies the geographical distribution of human genetic variation, says in the past researchers had to comb through outbreak records by hand to analyze a single disease over time and space, or multiple diseases at a particular moment. Her program, which parses sentences to extract pertinent information, changes the game. “Computation is key to generate a data set of this scale,” Ramachandran says. “We got to contribute a new type of data for this field.”
Their spreadsheets include a field for additional text that retains some of the storytelling element of the source material. “The prose made the data set exciting,” Ramachandran says. “It wasn’t geared toward future aggregate analyses, but it had a lot of details,” like an outbreak of 54 cryptosporidiosis cases at a wedding in Pennsylvania that was traced to raspberries; or another of tuberculosis that was associated with an accounting office in Japan.
Climate, including temperature and precipitation, will be another variable in the data set. Smith says that, historically, natural changes in climate have been tied to changes in infectious disease. During the Little Ice Age, in which temperatures worldwide plunged for about 500 years beginning in 1300, outbreaks flourished in societies stressed by famine. Could the current warming trend similarly pave the way for more or bigger epidemics? Smith also wondered about the overall impact of human-specific diseases, like measles, versus zoonoses, such as Ebola, which humans catch from an animal host. Which would be worse, globally, for human health? Now, with their database, the team could address some of her field’s long-standing debates.
“We had a million questions to go after,” Smith says. “But we could only tackle them if we brought in other experts from around campus.”
John Mustard ScM’86 PhD’90, professor of geological sciences, says Smith approached him about mapping the data using geographic information systems (GIS), to correlate latitude and longitude with outbreak variables like date, number, and type; and remote sensing, such as satellite imaging, which measures the sun’s radiation reflected off the Earth’s surface, to track how landscapes have changed over time.
Though Mustard noted that Ebola is a “classic example” of what happens when humans directly interact with wildlife in formerly inaccessible areas, his research focuses on landscapes highly modified by human activities like agriculture. A remote sensing expert, for years he has tracked rapid land use changes in Brazil, where enormous swaths of rainforest have given way to huge soybean fields, to provide data for ecologists and social scientists to understand environmental, economic, and other implications. Now, with Smith, he’s applying the technology to human health.
The project, for which the team received a two-year grant from the University’s new Institute for the Study of Environment and Society (see sidebar), is in its pilot stages, and they are now focusing on individual countries, including Brazil and India. The latter, Mustard says, presented new challenges. “The