Published
Edited
Sep 7, 2022
Insert cell
# Forecasting stream temperature
Insert cell
### Predicting temperature exceedance
How likely is it that stream temperature will exceed 75F?

Ideas:
- show how forecasts change with different # of ensemble models, provide explanation of sources of uncertainty
- show how forecasts change as issue time approaches actual date. how accurate are we 7 vs 3 vs 1 day out?

the forecast data is limited to a single site (1450), with issue time from 6/27-7/07. my thought is to show predictions for 3 days at each issue time to see how they improve through time.
Insert cell
### Overview of data and possible dimensions
* lots of ensembles w/ different predictions, and final mean prediction
* For a given date, making prediction 8 days out up through 0 days out
* accuracy of predictions across lead times
* Accuracy of predictions w/i single lead time regardless of date
* E.g. day 1 prediction accuracy
Insert cell
### First chart
* timeseries + 90% CI - adjust # of ensembles and which lead times shown, and see how forecast differs
* opacity, so that see overlap
* live calculating mean of all shown ensembles
* actual observations

Data: what's in here (forecast_1450 - 10 issue dates of forecasts (for all lead times), with all ensembles

**NOTE: forecast_1450 doesn't include the CIs**
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
function get_issue_dates(in_data) {
let date_dict = {}
for (var l=0; l <in_data.length; l++) {
let raw_date = in_data[l].issue_time;
let formatTime = d3.timeFormat("%Y-%m-%d");
let formatted_date = formatTime(raw_date)
date_dict[formatTime(raw_date)] = [] //{} date_dict[formatTime(raw_date)]
// date_dict[formatTime(raw_date)]['raw_date'] = raw_date//in_data[l].issue_time
// date_dict[raw_date] = []
// date_dict[raw_date]['raw_date'] = in_data[i].issue_time//in_data[l].issue_time
}
return(date_dict)
}
Insert cell
function get_line_chart_data(chart_1_data, ensembles) {
// get array of ensembles
let ensembleArray = makeEnsembleArray(ensembles)
// Use first ensemble to define issue dates (since same for all ensembles)
let ensembleData_1 = filterDataToEnsemble(chart_1_data, 1)
let issue_dates_dict = get_issue_dates(ensembleData_1)
// iterate over issue dates to populate dictionary w/
// data for each ensemble on each issue date
// let all_data_array = []
let issue_date;
for (issue_date in issue_dates_dict) {
// let issue_date_raw = issue_dates_dict[issue_date]['raw_date']
for (var m = 0; m<ensembleArray.length; m++) {
var ensembleData = filterDataToEnsemble(chart_1_data, m+1)
issue_dates_dict[issue_date][m] = filterEnsembleDataToIssueDate(ensembleData, m+1, issue_date)
// all_data_array.push(filterEnsembleDataToIssueDate(ensembleData, m+1, issue_date))
}
}
return(issue_dates_dict)
// return(all_data_array)
}


// ensembleArray.forEach(function(ensemble_num) {
// let ensembleData = filterDataToEnsemble(dataIn_1, ensemble_num)
// let ensembleIssueDates = {}
// for (var i=0; i< ensembleData.length; i++) {
// ensembleIssueDates[ensembleData[i].issue_time] = {}
// };
// for (var issue_date in ensembleIssueDates) {
// data[issue_date] = {}
// data[issue_date][ensemble_num] = filterEnsembleDataToIssueDate(ensembleData, issue_date)
// }
// data['issue dates'] = ensembleIssueDates
// // data[ensemble_num] = {};
// // data[ensemble_num]['issue_dates'] = ensembleIssueDates;
// // for (var issue_date in data[ensemble_num]['issue_dates']) {
// // data[ensemble_num]['issue_dates'][issue_date] = filterEnsembleDataToIssueDate(ensembleData, issue_date)
// // }
// })

Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
### Second chart
* concept = focus on difference between observed and predicted, show change in CI over lead times
* facet(?) diagram -- confidence intervals of the predictions and you can toggle how many days out you show predictions for - each lead time = new row
* toggles: lead time

Data: 1 date, at all the different lead times, precalculate mean and CI for predictions
Insert cell
chart_2_data = forecasts.filter(d => formatDate(d.time) === filterDate)
Insert cell
function get_chart2_data(forecasts, filterDate) {
let single_day = []
single_day = forecasts.filter(d => formatDate(d.time) === filterDate)

single_day = tidy(single_day,
mutate({time: d => formatDate(d.time)}))

let summary = []
summary = tidy(
single_day,
groupBy(
['issue_time','time'],
summarize({mean_max_temp: mean('max_temp'),
min_max_temp: min('max_temp'),
max_max_temp: max('max_temp'),
n_ensembles: n()})
)
)
return(summary)
}
Insert cell
chart_2_data_summary = get_chart2_data(forecasts, filterDate)
Insert cell
forecasts[7].time === forecasts[15].time
Insert cell
new Date(forecasts[7].time) == new Date(forecasts[15].time)
Insert cell
formatDate = d3.timeFormat("%Y-%m-%d")
Insert cell
formatDate(forecasts[7].time)
Insert cell
formatDate(forecasts[7].time) === formatDate(forecasts[15].time)
Insert cell
filterDate = formatDate(forecasts[7].time)
Insert cell
forecasts
Insert cell
chart_2_data.length
Insert cell
### Third chart
* concept = highlight accuracy for a given lead time - each row = lead time
* Timeseries bubble matrix, with dots scaled by size according to difference between observed and predicted, color light to dark where dark = small confidence interval
* x-axis = date (time)
* y-axis = lead time (day 7 at top) (time - issue time)
* see [this punchcard plot](https://observablehq.com/@observablehq/integration-test-flakiness)
* could later add toggle for site
* Later can encode rmse, too
* Precalculate radius and CI metric - range of CI across all dates and all lead times

Data: One site, max temp pred for X dates for all lead times
Insert cell
## Data in
Insert cell
forecasts = FileAttachment("forecasts_1450.csv").csv({ typed: true })
Insert cell
Inputs.table(forecasts)
Insert cell
SummaryTable(forecasts)
Insert cell
## Imports
Insert cell
d3 = require("d3@^6.1")
Insert cell
import { SummaryTable } from "@observablehq/summary-table"
Insert cell
Insert cell

Purpose-built for displays of data

Observable is your go-to platform for exploring data and creating expressive data visualizations. Use reactive JavaScript notebooks for prototyping and a collaborative canvas for visual data exploration and dashboard creation.
Learn more