Aedes Forecasting 2019

Aedes Forecasting Challenge 2019

Aedes aegypti and Ae. albopictus are the vectors of chikungunya, dengue, yellow fever, and Zika viruses, the most important arboviruses globally. These vectors are present across broad regions of the United States, but their spatiotemporal distribution is dynamic and not well understood. Data are limited and current models based on those data have not been evaluated on external data. Evaluating model-based, county-level forecasts on future data can help clarify model accuracy and utility, the seasonal and geographical dynamics of these species, and key directions for future research. These advances can contribute to improved preparedness for arboviral invasion in the U.S. and in other regions where Aedes suitability may be limited and changing.

Challenge

This is an open forecasting challenge to predict the monthly presence of Ae. aegypti and Ae. albopictus in a subset of U.S. counties during the 2019 calendar year. The forecasting targets are described on the Targets page. The subset of counties are listed on the Data page along with historical data for each county. Participation guidelines are described on the Participation page and evaluation criteria are described on the Evaluation page.

Timeline

  • Project announcement and data release: February 4, 2019.
  • Data release completed: March 1, 2019.
  • Registration deadline: Teams may register up to Wednesday, March 27, 2019.
  • Forecast deadlines: The first forecast should be for April, 2019 and is due on March 31, 2019 at 11:59 Eastern Standard Time (UTC−05:00). Forecasts for May through December are due at 11:59 EST on the final day of the preceding month (i.e. the 30th or 31st) with the last forecast due on November 30, 2019 (forecast for December, 2019). More details are available on the Participation page.
  • Forecast evaluation: Early 2020, as soon as final surveillance data for 2019 are available.

Targets

Forecasts will be made on a monthly basis for the presence of Aedes aegypti or Ae. albopictus in a subset of US counties.

Monthly presence of Ae. aegypti or Ae. albopictus

DEFINITION

Presence of either species for a given calendar month and county will be determined by collection data for adult mosquitoes from that month and county. If any adult mosquito of the species is collected on any day of the month, the species will be considered “present” for that month. If no adults are reported and trapping effort is reported and consistent with historical data for that county, the species will be considered “absent”. Final determinations will be made when final collection data from counties are available (December 2019 or later). Note that collection efforts are highly variable across counties and “absence”, as defined here, does not necessarily mean that the species is truly absent.

MOTIVATION

Because of their role as vectors for arboviruses, Ae. aegypti and Ae. albopictus mosquitoes are targeted by many mosquito surveillance and control programs. Forecasts of where and when these mosquitoes are likely be found can help mosquito control agencies efficiently plan and implement these programs.

Mosquito surveillance data

The data are now complete for 95 counties in 8 states (as of March 1, 2019 - previously only a subset had been provided). There are at least two years of historical data for each participating county (2017-2018), but some have many more. The surveillance methods employed for each county are different and may vary over time (e.g. many counties do not trap in the winter months). The compiled data include indicators of trapping effort including trap type, number of collections, and number of trap nights by county. Forecasters should assume that trapping effort for each county will be similar in 2019 to what was reported for 2018. Forecasts for counties with substantially different trapping effort in 2019 will be analyzed separately and not included in overall scores.

The data are provided in standardized csv files below with one file for each state (including all participating counties). A detailed description of the fields included in these csv files is also included below.

Other data sources may also be used for forecasting. There are no restrictions as to what data may be used. Forecasters may use as many data sources as they wish, but all should be listed in the model description (see Participation page). Links to potentially useful environmental data sources are included below on this page.

DATA USE

Data providers are indicated for each state. These data providers and the 2019 Epidemic Prediction Initiative Aedes Forecasting Challenge should be acknowledged when using these data.

California

41 counties: Alameda, Butte, Colusa, Contra Costa, Fresno, Glenn, Imperial, Inyo, Kern, Kings, Lake, Los Angeles, Madera, Marin, Merced, Mono, Monterey, Napa, Orange, Placer, Riverside, Sacramento, San Benito, San Bernardino, San Diego, San Francisco, San Joaquin, San Luis Obispo, San Mateo, Santa Barbara, Santa Clara, Santa Cruz, Shasta, Solano, Sonoma, Stanislaus, Sutter, Tulare, Ventura, Yolo, Yuba

Data providers: Alameda County Vector Control Services District, Alameda County Mosquito Abatement District, Antelope Valley Mosquito and Vector Control District, Butte County Mosquito and Vector Control District, Colusa Mosquito Abatement District, Consolidated Mosquito Abatement District, Contra Costa Mosquito and Vector Control District, Coachella Valley Mosquito and Vector Control District, Delano Mosquito Abatement District, Delta Vector Control District, East Side Mosquito Abatement District, Fresno Mosquito and Vector Control District, Fresno Westside Mosquito Abatement District, Greater LA County Vector Control District, Imperial County Vector Control, Owens Valley Mosquito Abatement Program, Kern Mosquito and Vector Control District, Kings Mosquito Abatement District, Los Angeles West Vector and Vector-borne Disease Control District, Lake County Vector Control District, Long Beach Vector Control Program, Madera County Mosquito and Vector Control District, Marin-Sonoma Mosquito and Vector Control District, Merced County Mosquito Abatement District, City of Moorpark Vector Control, Napa County Mosquito Abatement District, North Salinas Valley Mosquito Abatement District, Northwest Mosquito and Vector Control District, Orange County Mosquito and Vector Control District, Placer Mosquito and Vector Control District, Riverside County Department of Environmental Health Vector Control Program, San Bernardino County Mosquito and Vector Control, San Diego County Dept of Environmental Health Vector Control, San Mateo County Mosquito and Vector Control District, Sacramento-Yolo Mosquito and Vector Control District, Mosquito and Vector Management District of Santa Barbara County, San Benito County Agricultural Commission, Santa Cruz County Mosquito and Vector Control District, San Francisco Public Health Environmental Health Section, San Gabriel Valley Mosquito and Vector Control District, Shasta Mosquito and Vector Control District, Oroville Mosquito Abatement District, San Joaquin County Mosquito and Vector Control District, Solano County Mosquito Abatement District, Santa Clara County Vector Control District, Sutter-Yuba Mosquito and Vector Control District, Tulare Mosquito Abatement District , Turlock Mosquito Abatement District, Ventura County Environmental Health Division, West Side Mosquito and Vector Control District, West Valley Mosquito and Vector Control District, California Vector-borne Disease Surveillance (CalSurv) Gateway University of California Davis

Download California data

Connecticut

2 counties: Fairfield, New Haven

Data provider: Connecticut Agricultural Experiment Station

Download Connecticut data

Florida

25 counties: Calhoun, Collier, Escambia, Gadsden, Hillsborough, Holmes, Jackson, Jefferson, Lee, Liberty, Madison, Manatee, Martin, Miami-Dade, Okaloosa, Osceola, Pasco, Pinellas, Polk, Santa Rosa, St. Johns, Taylor, Wakulla, Walton, Washington

Data providers: Florida State University, Collier Mosquito Control District, Escambia County Mosquito Control, Hillsborough County Public Works Department, Lee County Mosquito Control District, Manatee County Mosquito Control District, Martin County Public Works Department, Miami-Dade County Department of Solid Waste Management, Osceola County Public Works, Pasco County Mosquito Control District, Pinellas County Mosquito Control, Polk County Mosquito Control, Anastasia Mosquito Control District, South Walton County Mosquito Control District

Download Florida data

New Jersey

8 counties: Cumberland, Essex, Mercer, Monmouth, Morris, Salem, Sussex, Warren

Data providers: Cumberland County Mosquito Control Division, Essex County Mosquito Control, Mercer County Mosquito Control, Monmouth County Mosquito Control, Morris County Division of Mosquito Control, Salem County Mosquito Control, Sussex County Office of Mosquito Control, Warren County Mosquito Control Commission

Download New Jersey data

New York

8 counties: Bronx, Kings, Nassau, New York, Queens, Richmond, Rockland, Westchester

Data providers: New York City Department of Health and Mental Hygiene, Nassau County Department of Health, Rockland County Department of Health, Westchester County Department of Health, New York State Department of Health, Columbia University

Download New York data

North Carolina

5 counties: Forsyth, New Hanover, Pitt, Transylvania, Wake

Data providers: Forsyth County Department of Public Health, New Hanover County Vector Control,Pitt County Environment Health, Transylvania Public Health, North Carolina State University

Download North Carolina data

Texas

3 counties: Cameron, Hidalgo, Tarrant

Data providers: City of Brownsville Public Health, Hidalgo County Health and Human Services, Tarrant County Public Health

Download Texas data

Wisconsin

3 counties: Dane, Milwaukee, Waukesha

Data providers: University of Wisconsin-Madison

Download Wisconsin data

Data Dictionary

The dataset includes monthly adult Ae. aegypti and Ae. albopictus trapping data by trap type. This includes data on the total trapping effort (number of sites, trap nights, and collections) and the collections of both species (number of positive collections and number of adults collected). Each line represents all collections for a single county in a single month with a single trap type. Note that some counties have multiple lines for a single month because they employed more than one trap type.

DESCRIPTION OF VARIABLES

state

State of collection

Examples: Florida, California

statefp

Two digit FIPS code for state of collection https://www.census.gov/geo/reference/codes/cou.html.

Examples: 12, 06

county

County of collection, without the word "County"

Examples: Broward, Los Angeles

countyfp

Three digit FIPS code for county of collection https://www.census.gov/geo/reference/codes/cou.html

Examples: 011, 037

year

Four digit year of collection

Examples: 2016, 2017

month

Numeric month of collection (1-12)

Examples: 1, 10

trap_type

Type of trap used for collection, includes:

  • BGS: BG sentinel traps including those baited with CO2
  • GAT: Traps that target gravid Aedes females with enclosed water containers including BGGAT and AGO traps.
  • GRAV: Traps that target gravid females with open water containers including CDC gravid traps. These traps target Culex species but may also detect Aedes species.
  • CO2: Traps baited with CO2 including CDC and ABC light traps and Fay-Prince traps (BG sentinel traps using CO2 are classified as BGS) .

For collections where only presence or absence was recorded, “-PRES” is included at the end of the trap type name.

num_sites

Number of distinct trap sites

num_collection_events

Number of distinct trap collections. This may be less than the number of trap nights when a trap was set for multiple nights.

num_trap_nights

Cumulative number of nights distinct traps were run at all sites

num_aegypti_collected

Number of collected mosquitoes identified as Aedes aegypti. This may not be available and marked as NA in the case that collections were classified as present or absent rather than counting the number of mosquitoes. If only presence or absence was recorded for a trap, that is indicated in num_collections_aegypti.

num_albopictus_collected

Number of collected mosquitoes identified as Aedes albopictus. This may not be available and marked as NA in the case that collections were classified as present or absent rather than counting the number of mosquitoes. If only presence or absence was recorded for a trap, that is indicated in num_collections_albopictus.

num_collections_aegypti

Number of collections where Aedes aegypti were detected.

num_collections_albopictus

Number of collections where Aedes albopictus were detected.

Environmental data

This list includes potential sources of useful environmental data. There is no obligation to use any particular data source or to limit data sources to these.

Participation guidance

How to participate

To participate in the challenge, one team member must register on this website and, after logging in, register specifically for the Aedes Forecasting Challenge (instructions for registration). The forecast submissions will be made using this account. Only one forecast can be submitted for each deadline per account and the same account must be used for all submissions. If teams wish to submit different forecasts (e.g. from different models), they need to use multiple accounts.

Full participation requires: Electronic submission of forecasts for all included counties and months by the respective deadlines (details below and Evaluation page). Submission of a model description document by email (details below).

Download registration instructions

Electronic submission

Forecast format

Forecasts should be made for each month in csv files matching the format in this submission template. Each csv should contain forecasts for all counties and both species, but only for a single month. For internal record keeping, teams may find it useful to include the month in the file name.

Download forecast submission template

The forecast file includes one line for each forecast. That line includes:

  • location: “State” and “County” as written in the data files with a hyphen: “State-County”. For example, “California-San Diego” or “Connecticut-Fairfield”. Omit the word “County” and include spaces between words within the county or state name. The template provided above should match the state and county fields in the provided surveillance data.

  • target: “Ae. aegypti” or “Ae. albopictus”

  • type: “binary” for all. Indicates that it is a yes or no forecast.

  • unit: “present” for all. Indicates the forecasts are for the observation of the species. I.e. a forecast of 0.8 indicates 80% chance of being trapped and reported (and 20% chance of not being trapped and reported).

  • value: A probability for the observation of a specific species in that particular month and county. The value indicates the probability on a scale from 0 to 1, where 0 is certain absence, 0.5 is an equal chance of absence or presence (50/50), and 1.0 is certain presence). In the template, the values are 0.5 for all species and counties. These should be changed to reflect the forecast of each team. Note: when a forecast has high confidence but not absolute certainty, it may be beneficial to assign a very small probability to the unexpected outcome to prevent a very low score should that occur, e.g. 0.999 instead of 1.0 for presence or 0.001 instead of 0 for absence. See the Evaluation page for details on scoring.

Submission process

Registered participants will have access to a Submit page. The individual csv files can be uploaded on that page any time before the specific deadlines. Take care to match the forecast time frame as there is no built in check for this. The due dates correspond to 11:59 PM EST the day before the forecasted month begins. For example, forecasts for April are due by 11:59 PM EST on March 31 and should be submitted in the row showing that due date. If you do not see the due date, check the “previous submissions” and “future submissions”. Likewise, forecasts for May have the due date April 30. Forecasts may be submitted and updated at any time prior to the due date. Successful submission can be checked by clicking on the “Open JSON” link (the JSON format is a format used by the server). Note that if you are in a different time zone, the date displayed for each deadline may be one day off, but the deadline is still 11:59 PM on the last day of the month in the Eastern Standard Time Zone.

Model description

Each team should select their best model for forecasts and submit a brief model description (details below) by email to aedeschallenge@cdc.gov prior to the first forecast deadline. Teams may update their model during the challenge provided they submit an updated model description. The description should include the following components:

  1. Date
  2. Team name: This should match the registration name and may be used in forecast visualizations on the website (predict.cdc.gov).
  3. Team members: List every person involved with the forecasting effort and their institution. Identify a team leader and include the email address of the team leader.
  4. Agreement: Include the following statement: “By submitting these forecasts, I (we) indicate my (our) full and unconditional agreement to abide by the project's rules and data use agreements.” See the participation agreement below.
  5. Model description (no more than 400 words): Is the model mechanistic, statistical? Is it an instance of a known class of models? The description should include sufficient detail for another modeler to understand the approach being applied. It may include equations, but that is not necessary. If multiple models are used, describe each model and how they were combined.
  6. Data sources: What data were used in the model? Historical case data? Weather data? Other data?
  7. Computational resources: What programming languages/software tools were used to write and execute the forecasts?
  8. Publications: Does the model derive directly from previously published work? If so please include references.

Participation agreement

All participants provide consent for their forecasts to be published in real-time on the CDC’s Epidemic Prediction Initiative website (https://predict.cdc.gov/), GitHub page (https://github.com/cdcepi), and, after the season ends, in a scientific journal describing the results of the challenge. The forecasts can be attributed to a team name (e.g., John Doe University) or anonymous (e.g., Team A) based on individual team preference. Team names should be limited to 25 characters for display online. The team name registered with the EPI website will be displayed alongside a team’s forecasts. Any team may publish results from their forecast at any time, but no participating team may publish the results of another team’s model in any form without the team’s consent. Any mosquito surveillance data used should acknowledge the sources of those data as stated on the Data page. The manuscript describing the accuracy of forecasts across teams will be coordinated by a representative from CDC.

Evaluation

All forecasts will be scored by comparison to data reported by participating counties in 2019. Those data will be collected from participating counties when the data are complete for the year (late 2019 or early 2020). Forecasts will be ranked by average logarithmic score (see below) across all counties and months for each species separately. If reported trapping effort in 2019 is substantially different from 2018 for some counties/months, those counties/months may be removed from the analysis. Forecasts for counties without trapping data for a given month (e.g. November and December) will not be scored.

The top ranked team will be announced publicly by CDC in early 2020.

Eligibility

To be eligible for an overall ranking, teams must: Submit forecasts for every county listed on the Data page and for every month of the challenge: April, May, June, July, August, September, October, November, and December. Submit forecasts electronically prior to the respective deadline (the day before each new month starts). Submit a model description (see Participation page). Forecasts from teams that do not submit all required forecasts may still be evaluated, but they will not be ranked for overall performance. Teams may consider submitting some naive forecasts (e.g. 0.5) if they have difficulty producing all forecasts by the deadlines.

Results

Preliminary results will be distributed to all teams in early 2020. A joint manuscript will be prepared by the project organizers to disseminate all forecasts, findings of this analysis, and the general performance of submitted forecasts. Participants may publish their own forecasts and results at any time.

Logarithmic Score

All forecasts are probabilistic, a probability, ;;p;;, of the mosquito species being reported in a specific month in a specific county. Reported data from 2019 will be used to classify presence for each species, county, and month combination, ;;x;;, as present (1) or absent (0) following the definition on the Targets page. The logarithmic score is calculated as:

$$S(p,x) = x\text{ln}(p) + (1-x)\text{ln}(p - 1)$$

Logarithmic scores (;;S;;) can be averaged across many different predictions. In this case they will be averaged for all included counties and months for each species separately.

Example: A forecast predicts there is a probability of 0.2 (i.e. a 20% chance) that Ae. aegypti is reported in County X in June 2019. Collection of an adult Ae. aegypti in June is reported later in 2019. The logarithmic score is therefore ;;\text{ln}(0.2) = -1.6;;. Alternatively, if no Ae. aegypti were reported, the logarithmic score would be higher, ;;\text{ln}(1 - 0.2) = \text{ln}(0.8) = -0.22;;.

Notes

  • A 50/50 chance or a probability of 0.5 gives a logarithmic score of -0.69 regardless of whether the species was observed or not.
  • A forecast probability of 0 will give a logarithmic score of -Infiniti if the species is reported. The same is true for a probability of 1 when the species is not reported. Even a small probability for unlikely events can substantially improve average scores. For example, a forecast probability of 0.01 will give a logarithmic score of -4.6 if the species is observed.

References