Public
Edited
Feb 21, 2023
Insert cell
Insert cell
{
return EMBED(SUMMARY(`
We propose to develop a cloud-based interactive visualization service that allows users to explore and analyze data in a more intuitive and interactive way. This would make it easier for them to find insights and make decisions.
`, SUMMARY(`
A community data repository is a repository of datasets that are created, curated, and shared by the community that uses them. We propose to extend cloud-based functionalities of a data repository beyond data services by providing interactive visualization services. This would allow users to explore and analyze data in a more intuitive and interactive way, making it easier for them to find insights and make decisions.
`, SUMMARY(`
A community data repository is a repository of datasets that are created, curated, and shared by the community that uses them.
`, `
SCIENTISTS organize their research around data. When a
research community starts to continuously accumulate, curate
and share community datasets, community data repositories become
a catalyst of future research. They are also a powerful source to
engage the public, make science relevant, and deepen societal
impact of science. Such continuously growing and open datasets
are precious assets of the whole world. As an example, the
data used in this work is the Climate Forecast System (CFS)
ensemble data repository [52] from NOAA National Centers for
Environmental Prediction (NOAA NCEP) that captures annual
global atmospheric patterns at spatial 0.5×0.5 degree precision
and a temporal resolution of 6 hours.
`), SUMMARY(`
We propose to extend cloud-based functionalities of a data repository beyond data services by providing interactive visualization services. This would allow users to explore and analyze data in a more intuitive and interactive way, making it easier for them to find insights and make decisions.
`, `
While cloud has already become the leading solution for
creating distributed systems to support large data repositories,
in this work, we propose to extend cloud-based functionalities
of a data repository beyond data services. Our primary proposal
suggests interactive visualization services can become a component
of cloud-based data repositories too.
`))))

function EMBED(x) {
return htl.html`<style>details > * { margin-left: 2em; }</style><p>${x}`;
}
function SUMMARY(output, ...inputs) {
return htl.html.fragment`
<details>
<summary>${output}</summary>
${inputs.map((d, i) => htl.html.fragment`<p>${i+1}:${d}`)}
</details>
`;
}
}
Insert cell
$0 = {
const $0 = `\
SCIENTISTS organize their research around data. When a
research community starts to continuously accumulate, curate
and share community datasets, community data repositories become
a catalyst of future research. They are also a powerful source to
engage the public, make science relevant, and deepen societal
impact of science. Such continuously growing and open datasets
are precious assets of the whole world. As an example, the
data used in this work is the Climate Forecast System (CFS)
ensemble data repository [52] from NOAA National Centers for
Environmental Prediction (NOAA NCEP) that captures annual
global atmospheric patterns at spatial 0.5×0.5 degree precision
and a temporal resolution of 6 hours.

While cloud has already become the leading solution for
creating distributed systems to support large data repositories,
in this work, we propose to extend cloud-based functionalities
of a data repository beyond data services. Our primary proposal
suggests interactive visualization services can become a component
of cloud-based data repositories too.

Such a broadened scope of data repositories can help lower
many barriers of adoption by diverse user communities. For
example, researchers can accelerate their work by being able to
always see the latest data to test and refine their hypotheses without
needing to maintain an up-to-date local replica of the entire dataset.
They can also collaborate with more people, by being able to share
their findings with others who do not have, or cannot afford to
have, an entire local copy of the data.

Recently, interactive volume visualization as a service has
been shown to be feasible [49], [50]. In this work, we focus
on creating interactive parallel flow visualization as a service,
because flow visualization is important to many disciplines,
including atmosphere, ocean, fusion, petroleum, aerodynamics,
and cardiovascular biophysics, where “seeing” the flow is often the
first step of scientific research.

In addition, interactive parallel flow visualization offers a
unique opportunity to study how to use heterogeneous cloud
resources to achieve consistent parallel accelerations in support
of synchronous user interactions. Cloud resources are a promising alternative to traditional HPC computing and visualization
resources that require scientists to have a working relationship with
supercomputing centers. Due to this reason, our work focuses
on using cloud platforms to make leading-edge scientific datasets
interactively usable at an incremental cost.

Our prototype system is called Visualization Cloud Instances
(VCIs). VCIs work collaboratively as a self-organizing swarm for
parallel computing. Each swarm appears as a single cloud service,
i.e. a VCI Service, which can be used locally on an institutional
cluster, or remotely on a public cloud such as Amazon AWS. Our
results show that the VCI approach is able to support large flow
data and ensembles of data and still maintain crucial cloud-based
characteristics: (i) built for large flow data and ensembles of flow
data; (ii) achieves fast interactivity; (iii) instantaneously available;
(iv) serves multiple users concurrently; (v) serves users locally
and remotely; (vi) supports a variety of user devices, and (vii)
lightweight to integrate into applications.

Desktop applications can use VCI Service through a JavaScript
library, vci.js, which transparently manages parallelism, performance, and fault-tolerance. By hosting NOAA NCEP CFS data in
the cloud, we show (1) an application that can provide interactive
visualization of 3D global atmospheric flow field to students
(Figure 1); (2) a comparative visualization for scientists to analyze
deviations between forecast vs. observed ground truth (Figure 8);
and (3) a public-awareness application that integrates a VCI Service
into the popular D3 [1] library. This last example enables any
citizen to investigate how pollution from nuclear power plants in
the United States, on any particular day in the year, can impact the
entire globe (Figure 9).

All applications use a year-long observation dataset of the
Earth’s 3D atmospheric flow (1,463 timesteps, 720 × 361 × 36
spatial resolution, 150 GB) from the NCEP CFS repository [52].
The second application adds a corresponding forecast dataset of the
same dimensionality from the same CFS repository. In all cases,
the AWS setup required is less than $0.70/hour.

Our results suggest cloud services, like the VCI Service, can
conduct parallel computing using heterogeneous resources and
support flexible, interactive, general use of large datasets on a
desktop. These interactive use cases, coupled with efforts to
quantify the exact cost-performance benefits of the cloud [22],
expand the application potential of cloud hosted data resources.

All software of VCI will be open-sourced upon publication of
the paper. The remainder of this paper is organized as background
in Section 2, system architecture in Section 3, and application
development in Section 4. We show results in Section 5 and discuss
conclusion, and future works in Section 6.`;

return $0.split(/\n\n/).map((s) => s.replaceAll(/\s+/g, ' '));
}
Insert cell
$1 = {
let $1 = '';
for (const paragraph of $0) {
const summary = await summarize(`${paragraph}\n1 sentence summary:`);
$1 += `\n${STRIP(summary)}`;
}

return STRIP($1).split(/\n\n/).map((s) => s.replaceAll(/\s+/g, ' '));
function STRIP(s) {
return s.replace(/^\s+/, '').replace(/\s+$/, '');
}
}
Insert cell
summarize(`${1[0]}\n1 sentence summary:`);
Insert cell
{
return htl.html`
<p>${SUMMARIZE(
summarize(`SCIENTISTS organize their research around data. When a research community starts to continuously accumulate, curate and share community datasets, community data repositories become a catalyst of future research. They are also a powerful source to engage the public, make science relevant, and deepen societal impact of science. Such continuously growing and open datasets are precious assets of the whole world. As an example, the data used in this work is the Climate Forecast System (CFS) ensemble data repository [52] from NOAA National Centers for Environmental Prediction (NOAA NCEP) that captures annual global atmospheric patterns at spatial 0.5×0.5 degree precision and a temporal resolution of 6 hours.`);
function SUMMARIZE(...inputs) {
return inputs.map((input) => htl.html.fragment`
<details><summary>${STRIP(summarize(input))}
}

function STRIP(s) {
return s.replace(/^\s+/, '').replace(/\s+$/, '');
}
}
Insert cell
summarize = {
return;
const history = new Map();
return async function summarize(prompt) {
if (history.has(prompt)) {
return Promise.resolve(history.get(prompt));
}

const response = await fetch(`https://api.openai.com/v1/completions`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer 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',
},
body: JSON.stringify({
prompt: `${prompt}\nTL;DR:`,
model: 'text-curie-001',
max_tokens: 256,
temperature: 0.2,
}),
});

const json = await response.json();
console.log({ json });

const completion = json.choices[0].text;
history.set(prompt, completion);

return completion;
};
}
Insert cell

Purpose-built for displays of data

Observable is your go-to platform for exploring data and creating expressive data visualizations. Use reactive JavaScript notebooks for prototyping and a collaborative canvas for visual data exploration and dashboard creation.
Learn more