Playing with GPT-3 summarization / Tanner Hobson

Dr. Tanner is a research professor at the University of Tennessee working towards making data science accessible using AI tools.

Workspace

Public

Edited

Feb 21, 2023

{

return EMBED(SUMMARY(`

We propose to develop a cloud-based interactive visualization service that allows users to explore and analyze data in a more intuitive and interactive way. This would make it easier for them to find insights and make decisions.

`, SUMMARY(`

A community data repository is a repository of datasets that are created, curated, and shared by the community that uses them. We propose to extend cloud-based functionalities of a data repository beyond data services by providing interactive visualization services. This would allow users to explore and analyze data in a more intuitive and interactive way, making it easier for them to find insights and make decisions.

`, SUMMARY(`

A community data repository is a repository of datasets that are created, curated, and shared by the community that uses them.

`, `

SCIENTISTS organize their research around data. When a

research community starts to continuously accumulate, curate

and share community datasets, community data repositories become

a catalyst of future research. They are also a powerful source to

engage the public, make science relevant, and deepen societal

impact of science. Such continuously growing and open datasets

are precious assets of the whole world. As an example, the

data used in this work is the Climate Forecast System (CFS)

ensemble data repository [52] from NOAA National Centers for

Environmental Prediction (NOAA NCEP) that captures annual

global atmospheric patterns at spatial 0.5×0.5 degree precision

and a temporal resolution of 6 hours.

`), SUMMARY(`

We propose to extend cloud-based functionalities of a data repository beyond data services by providing interactive visualization services. This would allow users to explore and analyze data in a more intuitive and interactive way, making it easier for them to find insights and make decisions.

`, `

While cloud has already become the leading solution for

creating distributed systems to support large data repositories,

in this work, we propose to extend cloud-based functionalities

of a data repository beyond data services. Our primary proposal

suggests interactive visualization services can become a component

of cloud-based data repositories too.

`))))

function EMBED(x) {

return htl.html`<style>details > * { margin-left: 2em; }</style><p>${x}`;

}

function SUMMARY(output, ...inputs) {

return htl.html.fragment`

<summary>${output}</summary>

${inputs.map((d, i) => htl.html.fragment`<p>${i+1}:${d}`)}

</details>

}

$0 = {

const $0 = `\

SCIENTISTS organize their research around data. When a

research community starts to continuously accumulate, curate

and share community datasets, community data repositories become

a catalyst of future research. They are also a powerful source to

engage the public, make science relevant, and deepen societal

impact of science. Such continuously growing and open datasets

are precious assets of the whole world. As an example, the

data used in this work is the Climate Forecast System (CFS)

ensemble data repository [52] from NOAA National Centers for

Environmental Prediction (NOAA NCEP) that captures annual

global atmospheric patterns at spatial 0.5×0.5 degree precision

and a temporal resolution of 6 hours.

While cloud has already become the leading solution for

creating distributed systems to support large data repositories,

in this work, we propose to extend cloud-based functionalities

of a data repository beyond data services. Our primary proposal

suggests interactive visualization services can become a component

of cloud-based data repositories too.

Such a broadened scope of data repositories can help lower

many barriers of adoption by diverse user communities. For

example, researchers can accelerate their work by being able to

always see the latest data to test and refine their hypotheses without

needing to maintain an up-to-date local replica of the entire dataset.

They can also collaborate with more people, by being able to share

their findings with others who do not have, or cannot afford to

have, an entire local copy of the data.

Recently, interactive volume visualization as a service has

been shown to be feasible [49], [50]. In this work, we focus

on creating interactive parallel flow visualization as a service,

because flow visualization is important to many disciplines,

including atmosphere, ocean, fusion, petroleum, aerodynamics,

and cardiovascular biophysics, where “seeing” the flow is often the

first step of scientific research.

In addition, interactive parallel flow visualization offers a

unique opportunity to study how to use heterogeneous cloud

resources to achieve consistent parallel accelerations in support

of synchronous user interactions. Cloud resources are a promising alternative to traditional HPC computing and visualization

resources that require scientists to have a working relationship with

supercomputing centers. Due to this reason, our work focuses

on using cloud platforms to make leading-edge scientific datasets

interactively usable at an incremental cost.

Our prototype system is called Visualization Cloud Instances

(VCIs). VCIs work collaboratively as a self-organizing swarm for

parallel computing. Each swarm appears as a single cloud service,

i.e. a VCI Service, which can be used locally on an institutional

cluster, or remotely on a public cloud such as Amazon AWS. Our

results show that the VCI approach is able to support large flow

data and ensembles of data and still maintain crucial cloud-based

characteristics: (i) built for large flow data and ensembles of flow

data; (ii) achieves fast interactivity; (iii) instantaneously available;

(iv) serves multiple users concurrently; (v) serves users locally

and remotely; (vi) supports a variety of user devices, and (vii)

lightweight to integrate into applications.

Desktop applications can use VCI Service through a JavaScript

library, vci.js, which transparently manages parallelism, performance, and fault-tolerance. By hosting NOAA NCEP CFS data in

the cloud, we show (1) an application that can provide interactive

visualization of 3D global atmospheric flow field to students

(Figure 1); (2) a comparative visualization for scientists to analyze

deviations between forecast vs. observed ground truth (Figure 8);

and (3) a public-awareness application that integrates a VCI Service

into the popular D3 [1] library. This last example enables any

citizen to investigate how pollution from nuclear power plants in

the United States, on any particular day in the year, can impact the

entire globe (Figure 9).

All applications use a year-long observation dataset of the

Earth’s 3D atmospheric flow (1,463 timesteps, 720 × 361 × 36

spatial resolution, 150 GB) from the NCEP CFS repository [52].

The second application adds a corresponding forecast dataset of the

same dimensionality from the same CFS repository. In all cases,

the AWS setup required is less than $0.70/hour.

Our results suggest cloud services, like the VCI Service, can

conduct parallel computing using heterogeneous resources and

support flexible, interactive, general use of large datasets on a

desktop. These interactive use cases, coupled with efforts to

quantify the exact cost-performance benefits of the cloud [22],

expand the application potential of cloud hosted data resources.

All software of VCI will be open-sourced upon publication of

the paper. The remainder of this paper is organized as background

in Section 2, system architecture in Section 3, and application

development in Section 4. We show results in Section 5 and discuss

conclusion, and future works in Section 6.`;

return $0.split(/\n\n/).map((s) => s.replaceAll(/\s+/g, ' '));

}

$1 = {

let $1 = '';

for (const paragraph of $0) {

const summary = await summarize(`${paragraph}\n1 sentence summary:`);

$1 += `\n${STRIP(summary)}`;

}

return STRIP($1).split(/\n\n/).map((s) => s.replaceAll(/\s+/g, ' '));

function STRIP(s) {

return s.replace(/^\s+/, '').replace(/\s+$/, '');

}

summarize(`${1[0]}\n1 sentence summary:`);

{

return htl.html`

<p>${SUMMARIZE(

summarize(`SCIENTISTS organize their research around data. When a research community starts to continuously accumulate, curate and share community datasets, community data repositories become a catalyst of future research. They are also a powerful source to engage the public, make science relevant, and deepen societal impact of science. Such continuously growing and open datasets are precious assets of the whole world. As an example, the data used in this work is the Climate Forecast System (CFS) ensemble data repository [52] from NOAA National Centers for Environmental Prediction (NOAA NCEP) that captures annual global atmospheric patterns at spatial 0.5×0.5 degree precision and a temporal resolution of 6 hours.`);

function SUMMARIZE(...inputs) {

return inputs.map((input) => htl.html.fragment`

<details><summary>${STRIP(summarize(input))}

}

function STRIP(s) {

return s.replace(/^\s+/, '').replace(/\s+$/, '');

}

summarize = {

return;

const history = new Map();

return async function summarize(prompt) {

if (history.has(prompt)) {

return Promise.resolve(history.get(prompt));

}

const response = await fetch(`https://api.openai.com/v1/completions`, {

method: 'POST',

headers: {

'Content-Type': 'application/json',

'Authorization': 'Bearer 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',

body: JSON.stringify({

prompt: `${prompt}\nTL;DR:`,

model: 'text-curie-001',

max_tokens: 256,

temperature: 0.2,

}),

});

const json = await response.json();

console.log({ json });

const completion = json.choices[0].text;

history.set(prompt, completion);

return completion;

};

}

Purpose-built for displays of data

Observable is your go-to platform for exploring data and creating expressive data visualizations. Use reactive JavaScript notebooks for prototyping and a collaborative canvas for visual data exploration and dashboard creation.

Learn more