Published
Edited
Dec 6, 2019
1 fork
Importers
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
group1= " (Network-adjacent forecasters)"
Insert cell
group2 = " (Online crowdworkers)"
Insert cell
plotAggregateAfterEnd(foretoldPredictors,"Aggregate score at the end of the experiment"+group1, 50, "2019-10-9", "2019-11-10")
Insert cell
plotAggregateAfterEnd(controlGroup,"Aggregate score at the end of the experiment "+group2, 50, "2019-10-9", "2019-11-30")
Insert cell
plotAggregateAfterEndComparison(foretoldPredictors, controlGroup)
Insert cell
plotAggregateEvolution(foretoldPredictors,"Aggregate score across time"+group1, 50, "2019-10-9", "2019-11-30")
Insert cell
plotAggregateEvolution(controlGroup,"Aggregate score across time"+group2, 50, "2019-10-9", "2019-11-20")
Insert cell
plotAggregateWithConfidenceIntervals("Evolution of aggregate score across time (± standard error)", foretoldPredictors, controlGroup,50,"2019-10-9","2019-10-9", "2019-11-20","2019-11-30")
Insert cell
plotMarginalImprovementForetoldAndControl(foretoldPredictors, controlGroup, "Marginal improvement of n:th prediction after prior (± standard error)", 50, "2019-10-9", "2019-11-30", 1)
Insert cell
Insert cell
Insert cell
md `### How to import the plots in another observable notebook`
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
getQuestions("f19015f5-55d8-4fd6-8621-df79ac072e15", 50, "2019-10-9", "2019-11-20")
Insert cell
"Aleluya".includes("Al")
Insert cell
Insert cell
Insert cell
md `The functions which follow *could* call getEvolutionOfQuestionsInChannel every time, but that would be too slow. Instead, we'll save the input of that function in an object, and reference the object, but give the option to calculate it from scratch if need be. Furthermore, we'll save those objects as a file attachment, so that we don't have to make the API calls every time, either.`
Insert cell
// EV = getEvolutionOfQuestionsInChannel("f19015f5-55d8-4fd6-8621-df79ac072e15", 50, "predictionNumber", "2019-10-9", "2019-11-20")
Insert cell
Insert cell
// EVControl = getEvolutionOfQuestionsInChannel(controlGroup, 50, "predictionNumber", "2019-10-9", "2019-11-30")
Insert cell
Insert cell
Insert cell
// You can see more details about vega-lite here: https://vega.github.io/vega-lite-api/

{
const line = vl.markLine().data(EV).encode(
vl.y().field('score').type('quantitative'),
vl.x().field('timestamp').type('quantitative'),
vl.color().field('questionName').type('nominal').legend({orient: 'bottom', direction: "vertical", labelLimit:1000}),
);

const point = vl.markCircle().data(EV).encode(
vl.y().field('score').type('quantitative').scale({type: 'log'}),
vl.x().field('timestamp').type('quantitative').title("Prediction #"),
vl.color().field('questionName').type('nominal')
);
return vl.layer(line, point)
.width(500)
.title("Amplification Experiment: Aggregate score across time ")
.render();
}
Insert cell
md ` Wrap this in a function which can be exported`
Insert cell
async function plotAggregateEvolution(channelId="f19015f5-55d8-4fd6-8621-df79ac072e15", title, numberOfQuestionsToFetch=50,dateBeginning="2019-10-9", dateEnd= "2019-11-20", fromStatic=true){
// This is just to avoid having to fetch the data from the API every time
async function getData(fromStatic){
if(fromStatic){
return (channelId == "f19015f5-55d8-4fd6-8621-df79ac072e15")? EV: EVControl;
}else{
return getEvolutionOfQuestionsInChannel(channelId, numberOfQuestionsToFetch, "predictionNumber", dateBeginning, dateEnd);
}

};
let df = await getData(fromStatic)
df = df.filter(question => !question.questionName.includes("Meta"))
// And this is the same as above
const line = vl.markLine().data(df).encode(
vl.y().field('score').type('quantitative'),
vl.x().field('timestamp').type('quantitative'),
vl.color().field('questionName').type('nominal').legend({orient: 'bottom', direction: "vertical", labelLimit:1000}),
);

const point = vl.markCircle().data(df).encode(
vl.y().field('score').type('quantitative').scale({type: 'log'}),
vl.x().field('timestamp').type('quantitative').title("Prediction #"),
vl.color().field('questionName').type('nominal')
);
return vl.layer(line, point)
.width(500)
.title(title)
.render();
}
Insert cell
// Uncomment this to see what the function will output
// plotAggregateEvolution("f19015f5-55d8-4fd6-8621-df79ac072e15", "Score Across time", 50,"2019-10-9", "2019-11-20", true)
Insert cell
Insert cell
md `We care about how participants did in different questions, and it's interesting that the order of the questions is preserved between our forecasters and the control, even though the control did much worse`
Insert cell
Insert cell
// Uncomment this to see what the function will output
// plotAggregateAfterEnd("f19015f5-55d8-4fd6-8621-df79ac072e15")
Insert cell
Insert cell
// plotAggregateAfterEndComparison("f19015f5-55d8-4fd6-8621-df79ac072e15", controlGroup)
Insert cell
Insert cell
Insert cell
// EV2 = getEvolutionOfQuestionsInChannel("f19015f5-55d8-4fd6-8621-df79ac072e15", 50, "time", "2019-10-9", "2019-11-20")
Insert cell
// EV2Comparison = getEvolutionOfQuestionsInChannel(controlGroup, 50, "time", "2019-10-9", "2019-11-30")
Insert cell
EV2 = FileAttachment("EV2 (1).json").json()
Insert cell
EV2Comparison = FileAttachment("EV2Comparison (1).json").json()
Insert cell
Insert cell
Insert cell
async function getDataAggregateWithConfidenceIntervals(channelId, legend, numQuestionsToGet=70, dateBeginning="2019-10-9", dateEnd= "2019-11-20", fromStatic = true){
// Avoid making an API call
let dataframe
if(fromStatic){
dataframe = channelId == controlGroup ? EV2Comparison : EV2
}else{
dataframe = await getEvolutionOfQuestionsInChannel(channelId, numQuestionsToGet, "time", dateBeginning, dateEnd)
}
// get the initial Score; the score of the prior
let initialScores = []
let lastQuestionName=""
let numQuestions =0;
dataframe.forEach(element => {
if(element.questionName != lastQuestionName && !element.questionName.includes("Meta")){
initialScores.push(element.score)
lastQuestionName = element.questionName
numQuestions = numQuestions+1
}
})
let initialScore = math.mean(initialScores)
let initialScoreStd = math.std(initialScores)
// What happens to a question while other questions are updated?
// Here, at every event at which a timestamp exists, we take a snapshot of every question at that moment.
let timestamps = getTimeStamps(dataframe)
let dfNew = []
let counter = 0;
for(let i=0; i<dataframe.length; i++){if(!dataframe[i].questionName.includes("Meta")){
if(i!=(dataframe.length-1) && dataframe[i].questionName == dataframe[i+1].questionName){
let tempDataPoint = dataframe[i] // Note that this is *not* a copy, and thus this is a destructive procedure
let originalTimeStamp = dataframe[i].timestamp
let nextTimeStamp = dataframe[i+1].timestamp
while(originalTimeStamp >= timestamps[counter]){
tempDataPoint.timestamp = timestamps[counter];
dfNew.push({
score: tempDataPoint.score,
timestamp: timestamps[counter],
questionName: tempDataPoint.questionName
})
counter++
}
}else if(i==(dataframe.length-1) || (dataframe[i].questionName != dataframe[i+1].questionName)){
let tempDataPoint = dataframe[i] // Note that this is *not* a copy
let lim = timestamps.length
let originalTimeStamp = dataframe[i].timestamp

while(counter < lim){
tempDataPoint.timestamp = timestamps[counter];
dfNew.push({
score: tempDataPoint.score,
timestamp: timestamps[counter],
questionName: tempDataPoint.questionName
})
counter++
}
counter = 0
}
}}
// At every point t, we have a copy of all questions. We use that to get the average, standard deviation, etc. at every point.
// We also prepare to plot the Prior and its standard deviations, but we later don't use it.
let dfMean = []
let dfPrior = []
for(let i=0; i<timestamps.length; i++){
let scores = []
for(let j=0; j< dfNew.length; j++){
if(dfNew[j].timestamp == timestamps[i]){
scores.push(dfNew[j].score)
}
}
let mean = math.mean(scores)
let std = math.std(scores)
if (mean+std*1.96/math.sqrt(numQuestions)<0 || legend == "Online crowdworkers" || legend.includes("Control")){ // This is a hack to solve a bug in foretold. See the spike which happens if you remove this line.
dfMean.push({
score: mean,
lowerConfidenceInterval: mean-std/math.sqrt(numQuestions), // *1.96
upperConfidenceInterval: mean+std/math.sqrt(numQuestions),
std: std,
timestamp: i*100/(timestamps.length -1),
initialScore: initialScore,
color: legend
})
dfPrior.push({
timestamp: i*100/timestamps.length,
initialScore: initialScore,
lowerConfidenceIntervalPrior: initialScore-1.96*initialScoreStd/math.sqrt(numQuestions),
upperConfidenceIntervalPrior: initialScore+1.96*initialScoreStd/math.sqrt(numQuestions),
color: legend+"' Prior",
})
}
}
return {dfMean: dfMean} // If one wishes, one could add dfPrior
}
Insert cell
Insert cell
plotAggregateWithConfidenceIntervals("Average Question Across time, with standard deviation", foretoldPredictors, controlGroup, 50, "2019-10-9","2019-10-9", "2019-11-20", "2019-11-30")
Insert cell
Insert cell
Insert cell
Insert cell
async function getMarginalImprovementData(channelId, legend, numQuestionsToGet=50, dateBeginning="2019-10-9", dateEnd= "2019-11-20", startAt=1){
// Define the data frame
let data = await getEvolutionOfQuestionsInChannel2(channelId, numQuestionsToGet, "predictionNumber",dateBeginning, dateEnd)
let maxPredictionNumber = math.max(data.map(object => object.timestamp))
let marginalImprovementInScore = Array(maxPredictionNumber-1)
marginalImprovementInScore.fill([])
// Get the proportional marginal improvement of the aggregate after every prediction.
let lastTimeStamp = Infinity;
let lastScore = Infinity
// Exclude question which was only answered by one of the two groups in the experiment
data.forEach(element => {if(!element.questionName.includes("Meta")){
if(element.timestamp>lastTimeStamp){
// Calculate marginal improvement as how much closer to a perfect score each prediction moved the aggregate, as a percentage of the distance between the aggregate and the perfect score.
let functionOfScore = (element.score - lastScore) / math.abs(lastScore)
// Push to array for storing the values
marginalImprovementInScore[element.timestamp -2] = marginalImprovementInScore[element.timestamp-2].concat([functionOfScore])
}

lastTimeStamp = element.timestamp
lastScore = element.score
}})
// Average it and push it to a dataframe suitable for plotting
let counter = 0;
let dataframe = []
// Iterate over prediction #
marginalImprovementInScore.forEach(element => {
counter++
if(counter>=startAt){
// Compute errorbar width
let mean = math.mean(element)
let std = math.std(element)
let length = element.length
let errorbar = std/math.sqrt(length) // Multiply by 1.96 to get 95% confidence intervals
// Push to dataframe
dataframe.push({
predictionNumber: counter,
marginalImprovementInScore: mean,
lowerConfidenceInterval: mean-errorbar,
upperConfidenceInterval: mean+errorbar, // 1.96
std: std,
color: legend,
})
}
})
return dataframe
}
Insert cell
Insert cell
// marginalImprovementForecasters = getMarginalImprovementData(foretoldPredictors, "Network-adjacent forecasters", 50, "2019-10-9", "2019-11-20", 1)
Insert cell
marginalImprovementForecasters = FileAttachment("marginalImprovementForecasters.json").json()
Insert cell
// marginalImprovementControl = getMarginalImprovementData(controlGroup, "Online crowdworkers", 50, "2019-10-9", "2019-11-20", 1)
Insert cell
marginalImprovementControl = FileAttachment("marginalImprovementControl (1).json").json()
Insert cell
Insert cell
Insert cell
vl.layer(...F2)
.width(500)
.title("Marginal Improvement of going from n=> n+1 predictions ÷ Previous Score")
.render();
Insert cell
F2 = plotMarginalImprovement(foretoldPredictors,"blue","green", "Network-adjacent forecasters", 50,"2019-10-9", "2019-11-20", 1, 0.05)
Insert cell
/* vl.layer(...C2)
.width(500)
.title("Marginal Improvement of going from n=> n+1 predictions ÷ Previous Score")
.render();
*/
Insert cell
C2 = plotMarginalImprovement(controlGroup, "green","blue", "Online crowdworkers",50,"2019-10-9", "2019-11-20", 1, 0.05)
Insert cell
vl.layer(...C2, ...F2)
.width(500)
.title("Amplification Experiment: Marginal Improvement of going from n=> n+1 predictions ÷ Previous Score")
.render();
Insert cell
Insert cell
// Uncomment to reveal

plotMarginalImprovementForetoldAndControl(foretoldPredictors, controlGroup, "Amplification Experiment: Marginal improvement of going from n-1 => n predictions", 50, "2019-10-9", "2019-11-20", 1)
Insert cell
// plotMarginalImprovementForetoldAndControl(foretoldPredictors, controlGroup, "Amplification Experiment: Marginal improvement of going from n-1 => n predictions", 50, "2019-10-9", "2019-11-20", 4)

// The difference is that now the Prediction # starts at 4
Insert cell
Insert cell
Insert cell
import {vl} from '@vega/vega-lite-api'
Insert cell

One platform to build and deploy the best data apps

Experiment and prototype by building visualizations in live JavaScript notebooks. Collaborate with your team and decide which concepts to build out.
Use Observable Framework to build data apps locally. Use data loaders to build in any language or library, including Python, SQL, and R.
Seamlessly deploy to Observable. Test before you ship, use automatic deploy-on-commit, and ensure your projects are always up-to-date.
Learn more