FluSight 2015–2016

FluSight: Seasonal Influenza Forecasting

Influenza (flu) is a respiratory virus that can result in illness ranging from mild to severe. Each year, millions of people get sick with influenza, hundreds of thousands are hospitalized and thousands of people die from flu. Tracking flu activity to inform prevention measures is an important public health function that is currently performed by CDC’s flu surveillance system, which can lag behind real-time flu activity. But what if it were possible to predict flu activity accurately weeks or months in advance for multiple locations? While this is not currently possible, the goal of flu forecasting is to provide a more-timely and forward-looking tool that health officials can use to target medical interventions, inform earlier public health actions, and allocate resources for communications, disease prevention and control. The potential benefits of flu forecasting are significant.

Since 2013, the Influenza Division at the Centers for Disease Control and Prevention has worked with external researchers to improve the science and usability of influenza forecasts by coordinating seasonal influenza prediction challenges. This work includes defining prediction targets, facilitating data access, establishing evaluation metrics to assess accuracy, and developing forecast visualizations.

There currently are eight research teams that have developed different flu forecasting models and which are providing flu activity forecasts to CDC. This beta website houses the weekly influenza activity forecasts provided by the various research teams. It’s important to note that these are not CDC forecasts and that the forecasts on this website are not endorsed by CDC. These forecasts are based on different models, can vary significantly, and may be inaccurate.

*The baseline is an indicator of the influenza season and is two standard deviations above the average ILINet percentage in non-influenza weeks over the previous three years: www.cdc.gov/flu/weekly/overview.htm.

Forecast Targets

For each week during the influenza season, participants provide national and regional probabilistic forecasts for the entire influenza season (seasonal targets) and for the next four weeks (short-term targets, often referred to as Nowcasts or Nearcasts). The seasonal targets are the start week, the peak week, and the peak intensity of the 2015-2016 influenza season. The short-term targets are the percent of outpatient visits experiencing ILI one week, two weeks, three weeks, and four weeks ahead from date of the forecast.

Start week

DEFINITION

The start of the season is the MMWR surveillance week when the percentage of visits reported through ILINet reaches or exceeds the baseline value for three consecutive weeks. MMWR week definitions are available at http://wwwn.cdc.gov/nndss/script/downloads.aspx and updated 2015-16 ILINet baseline values for the US and each HHS region are available at http://www.cdc.gov/flu/weekly/overview.htm. Forecasted start week values should be for the first week of the three week period.

MOTIVATION

Accurate and timely forecasts for the start of the season can be useful in planning for influenza prevention and control activities. For the general public, the start of the season offers an important opportunity to take preventive measures, such as getting vaccinated, before flu becomes widespread. For clinicians and public health authorities, the start of the season indicates that influenza should be high on their list of possible diagnoses for patients with respiratory illness. This is particularly important for the management of hospitalized patients and high-risk patients with suspected influenza when early treatment with influenza antivirals can be critical.

Peak week

DEFINITION

The peak week is the MMWR surveillance week with the highest weighted ILINet percentage in the 2015-16 influenza season. MMWR week definitions are available at http://wwwn.cdc.gov/nndss/script/downloads.aspx.

MOTIVATION

Accurate and timely forecasts for the peak week can be useful for planning and promoting activities to increase influenza vaccination prior to the bulk of influenza illness. For healthcare, pharmacy, and public health authorities, a forecast for the peak week can guide efficient staff and resource allocation.

Peak intensity

DEFINITION

The peak intensity will be defined as the highest numeric value that the weighted ILINet percentage reaches during the 2015-16 influenza season.

MOTIVATION

Accurate and timely forecasts for the peak week and intensity of the influenza season can be useful for influenza prevention and control, including the planning and promotion of activities to increase influenza vaccination prior to the bulk of influenza illness. For healthcare, pharmacy, and public health authorities, a forecast for the peak week and intensity can help with appropriate staff and resource allocation since a surge of patients with influenza illness can be expected to seek care and receive treatment in the weeks surrounding the peak.

Short-term

DEFINITION

Short-term forecasts target for the percent of ILINet outpatient visits experiencing influenza-like illness one week, two weeks, three weeks, and four weeks ahead from date of the forecast.

MOTIVATION

Forecasts capable of providing reliable estimates of influenza activity over the next month are critical because they allow healthcare and public health officials to prepare for and respond to near-term changes in influenza activity and bridge the gap between reported incidence data and long-term seasonal forecasts.

Data for forecasting and evaluation

On this page, you can find influenza data for the United States that can be used to develop and evaluate predictive models. This includes current data that can be downloaded from CDC each week and historical data including data that was available during each epidemiological week from 2009/10 to 2014/15.

Current ILINet Data

Current data on the weekly proportion of people seeing their health care provider for influenza-like illness (ILI) is reported through the ILINet System for the United States and for each HHS health region. These data can be accessed directly from CDC. Alternatively the R package cdcfluview (available from CRAN) can be used to access the data.

To use this package, run the following code in R as an example:

library(cdcfluview)<br> usflu <- get_flu_data("national", "ilinet", years=1997:2015)<br> regionflu <- get_flu_data("HHS", sub_region=1:10, "ilinet", years=1997:2015)<br>

Historical ILINet Data (2009/10 to 2014/15)

This dataset provides national and regional ILINet data that was available at any given week in previous influenza seasons. For example, data in the 2013/14 file with Release_Week = 1342 includes all data for the 2013/14 season that was available during at week 42 of that season.

Download 2009/10

Download 2010/11

Download 2011/12

Download 2012/13

Download 2013/14

Download 2014/15

Forecast Evaluation

All forecasts will be evaluated using the weighted observations pulled from the ILINet system in week 28, and the logarithmic score will be used to measure the accuracy of the probability distribution of a forecast. Logarithmic scores will be averaged across different time periods, the seasonal targets, the four-week ahead targets, and locations to provide both specific and generalized measures of model accuracy. Unlike last year, forecast accuracy will be measured by log score only. Nonetheless, forecasters are requested to continue to submit point predictions, which should aim to minimize the absolute error (AE).

Logarithmic Score

If p is the set of binned probabilities for a given forecast, and pi is the probability assigned to the bin containing the observed outcome, i, the logarithmic score is:

S(p,i)=ln(pi)

The probability assigned to that correct bin (based on the weighted ILINet value) plus the probability assigned to the preceding and proceeding bins will be summed to determine the probability assigned to the observed outcome:

S(p,i)=ln(pi-1+pi+pi+1)

If the correct bin is the first or last bin, the probabilities will be summed over the first three or last three bins, respectively. In the case of multiple peak weeks, the probability assigned to the bins containing the peak weeks and the preceding and proceeding bins will be summed. Undefined natural logs (which occur when the probability assigned to the observed outcomes was 0) will be assigned a score of -10. Forecasts which are not submitted (e.g., if a week is missed) or that are incomplete (e.g., sum of probabilities greater than 1.1) will also be assigned a score of -10.

Example: A forecast predicts there is a probability of 0.2 (i.e., a 20% chance) that the flu season starts on week 44, a 0.3 probability that it starts on week 45, and a 0.1 probability that it starts on week 46 with the other 0.4 (40%) distributed across other weeks according to the forecast. Once the flu season has started, the prediction can be evaluated, and the ILINet data show that the flu season started on week 45. The probabilities for week 44, 45, and 46 would be summed, and the forecast would receive a score of ln(0.6) = -0.51. If the season started on another week, the score would be calculated on the probability assigned to that week plus the values assigned to the preceding and proceeding week.

References
  • Gneiting T and AE Raftery. (2007) Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association. 102(477):359-378. Available at: https://www.stat.washington.edu/raftery/Research/PDF/Gneiting2007jasa.pdf..

  • Rosenfeld R, J Grefenstette, and D Burke. (2012) A Proposal for Standardized Evaluation of Epidemiological Models. Available at: http://delphi.midas.cs.cmu.edu/files/StandardizedEvaluationRevised12-11-09.pdf.

Absolute Error

Absolute error (AE) is the absolute difference between the forecast x and the observation y:

AE(x,y)=|x-y|

For example, a forecast predicts that the flu season will start on week 45; flu season actually begins on week 46. The absolute error of the prediction is |45-46| = 1 week.