Published
Edited
Nov 25, 2019
Importers
1 star
Insert cell
md`# Shapley Value Experiments. Part I`
Insert cell
md `# Results`
Insert cell
table(SortedBySV(SV), {
sortable: true,
columns: {
name: {
title: "Name",
},
sv : {
title: "~ helpfulness",
},
numberofquestions: {
title: "Number of questions / 7"
}
}
})
Insert cell
md `## Commentary: What is happening here?

### Setup
Background: We are using the data from [an amplification experiment](https://observablehq.com/@jjj/untitled/2) and cutting the results in a different way

We are taking a prediction (P_initial), and for every prediction (Pn) that follows, we are asking:
- Is (Pn) closer to the last aggregate before it (AGn), or to (P_initial)?
- How much better, or worse, does (Pn) do in comparison to (AGn)? Call this quantity X.
- If (Pn) is closer to (P_initial), and (Pn) does better than (AGn), increase or decrease the helpfulness of (P_initial)'s user by X
- If there are m (P_initial)s such that (Pn) is closer to (P_initial) than to the (AGn), then instead of X, use X/m

With this, what we want to answer is: Do predictions influenced by someone do better? Is this person a positive influence?

### Comments
- Predicting earlier and wrong is penalized much more strongly than predicting late and wrong
- This set-up penalizes users who make a lot of predictions because of the same issues that the normal scoring system has: people don't outperform the market. But this isn't the only factor; compare Misha Yagudin and JK, and geesh and holomanga
- The following users are particularly interesting:
- Reprisal
- Individually, he did pretty badly in:
- [Unbound Prometheus: Europe had a lower birthrate than China [1]](https://www.foretold.io/c/f19015f5-55d8-4fd6-8621-df79ac072e15?state=closed)
- [Unbound Prometheus: The Han Dynasty in China established a state monopoly on salt and metals [1]
](https://www.foretold.io/c/f19015f5-55d8-4fd6-8621-df79ac072e15/m/050b48c9-022e-4b53-95ec-a8f371292b93)
- [Unbound Prometheus: pre-Industrial Revolution, average French wage was what percent of the British wage? [2]](https://www.foretold.io/c/f19015f5-55d8-4fd6-8621-df79ac072e15/m/7a2774d8-6b6e-468c-84cd-d91104ebefbb)
- etc.
- But correctly in:
- [Unbound Prometheus: Pre-Industrial Britain had a legal climate more favorable to industrialization than continental Europe [5]
](https://www.foretold.io/c/f19015f5-55d8-4fd6-8621-df79ac072e15/m/bbe62da4-e100-4935-a419-30477a167540), where people moved in his direction.
- [https://www.foretold.io/c/f19015f5-55d8-4fd6-8621-df79ac072e15/m/e827ff9f-7327-4243-9e72-cbf9e6da9376](https://www.foretold.io/c/f19015f5-55d8-4fd6-8621-df79ac072e15/m/e827ff9f-7327-4243-9e72-cbf9e6da9376)
- etc.
- Because in the first case he predicted relatively late, and because his errors were not repeated. i.e. he didn't have much influence, he gets a small positive score. Personally I would be very curious to see how well he does if he participates in the next rounds, and in particular if he learns to use wider intervals.
- holomanga & geesh. They're the users who earnt the most money, but both had an (unadjusted) negative score. They also tend to predict earlier, and so get hit hard. But holomanga gets hit (much) harder.
- NunoSempere. This is myself. This position depends on the specific distance (see below), and that depends on a number of judgement calls; with other distances I am slightly negative.
- Elizabeth. Her answer to [Unbound Prometheus: Just before the Industrial Rev…ere less friendly towards science than Europe [4]" id: "1f56fcf1-21c8-4149-8abf-4feed882d31c](https://www.foretold.io/c/f19015f5-55d8-4fd6-8621-df79ac072e15?state=closed) is substantially different from her prior, and this is interesting because her prior wasn't neutral. I think that Elizabeth's Bayes factor, i.e., the direction of the update is more interesting in this case than the resolution. In general, I would have wanted there to be more cases in which Elizabeth, as opposed to Priors Bot, had made an initial prediction (or, are both the same)?
- Note on degrees of freedom & judgement calls: Geesh is in general "more helpful" than holomanga, but the degree to which this is so depends on the distance used. Elizabeth is generally negative. I am sometimes slightly negative. Reprisal is usually near the top, but can go down
- To compute the distance between cdfs, I am using: Integral(| cdf 1 - cdf2|). A next step, and perhaps using a better distance (KL divergence?), would be to make the influence of one prediction on another depend on the distance. But if you multiply by a factor of 1/(1+constant1.distance), or of (constant2 - constant3.distance), you already have an infinite number of degrees of freedom. The principled way to do this, Shapley values, involves information which we do not have.
- Relationship to Shapley value: Tenuous. This is Shapley value with a lot of terms missing, or counterfactual value dividing by the number of participants.
- The first order effect is missing.
- I get the impression that getting something similar to Shapley values from here wouldn't be that difficult?
- A very convenient assumption to make would be that later users whose predictions are very similar to someone else's previous prediction would have predicted the aggregate or something closer to the aggregate instead.`
Insert cell
md `# Helper functions`
Insert cell
md `### Distance function `
Insert cell
function getFunctionFromCdf(cdf){
let min = cdf.xs[0]
let max = cdf.xs[ cdf.xs.length -1]
function result(p){
if(p<min){
return 0;
}else if(p>max){
return 1;
}else{
let s1 = 0
let s2 = cdf.xs.length -1
let compare = Math.floor((s1+s2)/2)
for(let i=0; i<10; i++){ // This is enough to search 1000 predictions, because 2^10 = 1000.
if(compare > p){
s1 = compare
}else if(compare < p ){
s2 = compare
}
}
return cdf.ys[compare]
}
}
return result

}
Insert cell
function distance(cdf1, cdf2){ // this distance function can be directly replaced by the KL function, as long as it's expressed in terms of cdfs.
let f1 = getFunctionFromCdf(cdf1)
let f2 = getFunctionFromCdf(cdf2)
//let min = 0
let min = cdf1.xs[0] > cdf2.xs[0] ? cdf1.xs[0] : cdf2.xs[0]
//let max = 100
let l1 = cdf1.xs.length
let l2 = cdf2.xs.length
let max = cdf1.xs[l1-1] > cdf2.xs[l2-1] ? cdf1.xs[l1-1] : cdf2.xs[l2-1]
let h = (max-min) / 2000
let d = 0;
let p = min
for(let i = 0; i<2000; i++){
d+= Math.abs(f1(p) - f2(p));
p+=h
}
return d

}
Insert cell
md `### Get the data`
Insert cell
async function getQuestionData(questionID){
let query = `
{
measurements(measurableId:"${questionID}" last:50){
total
edges{
node{
id
description
createdAt
updatedAt
value{
floatCdf{
xs
ys
}
}

valueText
competitorType
agent{
id
name
}
}
}
}
}
`
let req = await request(query)
let total = req.data.measurements.total
let edges = req.data.measurements.edges
function getCreationTimeInitial(){
var result = []
for(let i =0; i<total; i++){
result.push(edges[i].node.createdAt)
}
return result
}
let creationTimeInitial = getCreationTimeInitial()
function getArrayIndicesSortedByCreationTime(){
var len = creationTimeInitial.length;
var indices = new Array(len);
for (var i = 0; i < len; ++i) indices[i] = i;
indices.sort(function (a, b) { return creationTimeInitial[a] < creationTimeInitial[b] ? -1 : creationTimeInitial[a] > creationTimeInitial[b] ? 1 : 0; });
return indices
}
let arrayIndicesSortedByCreationTime = getArrayIndicesSortedByCreationTime()

function sortBy(arrayActedUpon, arrayWithIndices){
if(arrayActedUpon.length != arrayWithIndices.length){
return "Error; the arrays should be of the same length"
}else{
var result = []
for(let i=0; i<arrayActedUpon.length; i++){
result.push(arrayActedUpon[arrayWithIndices[i]])
}

return result

}
}

let creationTimes = sortBy(creationTimeInitial, arrayIndicesSortedByCreationTime)

function getAgentIds(){
let result = []
for(let i =0; i<total; i++){
result.push(edges[i].node.agent.id)
}
result = sortBy(result, arrayIndicesSortedByCreationTime)
return result
}
let agentIds =getAgentIds()

function getValues() {
let result = []
for(let i =0; i<total; i++){
result.push(edges[i].node.value)
}
result = sortBy(result, arrayIndicesSortedByCreationTime)
return result
}
let values = getValues()

function getCompetitorTypes(){
let result = []
for(let i =0; i<total; i++){
result.push(edges[i].node.competitorType)
}
result = sortBy(result, arrayIndicesSortedByCreationTime)
return result
}
let competitorTypes = getCompetitorTypes()
function getAgentNames(){
let result = []
for(let i =0; i<total; i++){
result.push(edges[i].node.agent.name)
}
result = sortBy(result, arrayIndicesSortedByCreationTime)
return result
}
let agentNames = getAgentNames()

return({
total: total,
creationTimes: creationTimes,
agentIds: agentIds,
agentNames: agentNames,
values: values,
competitorTypes: competitorTypes
})
}
Insert cell
getQuestionData("1f56fcf1-21c8-4149-8abf-4feed882d31c")
Insert cell
md `### Get the locations where a person predicts`
Insert cell
function findLocations(name, data){
let locations = []
for(let i = 0; i<data.values.length; i++){
if(data.agentNames[i] == name){
locations.push(i)
}
}
return locations
}
Insert cell
md `### Get the counterfactual value of a person in a question`
Insert cell
function getCVofPersonInQuestion(locations, totalpredictions, data, resolution, radius){
let cv = []
let distances = []
let agents = []

for(let i=0; i<locations.length; i++){
// console.log("In getSVofPersonInQuestion, originalData == undefined is", data.values[locations[i]] == undefined)
let originalData =data.values[locations[i]].floatCdf
let loc = locations[i]
let nextLocation = Infinity
if(i <locations.length-1){
nextLocation = locations[i+1]
}
let d0 = Infinity
if(locations[i]-1 >=0){
d0 = distance(originalData, data.values[locations[i]-1].floatCdf)

}

while(loc+2 < totalpredictions-1 && loc+2 < nextLocation){
let previousAggregation = data.values[loc+1].floatCdf
//console.log(data.agentNames[loc+1], "should always be Aggregation Bot")
let nextPrediction = data.values[loc+2].floatCdf
let d1 = distance(originalData, nextPrediction)
let d2 = distance(previousAggregation, nextPrediction)
//console.log("Distance to agent", d1)
//console.log("Distance agent-aggregate;", d2)
// d2-d1 > 0 )
// && (d1<300)
// d2 - d1 > radius
// d2-d1 > 0
if((d2 - d1 > radius) && (d1<d0)){
//if(d2 - d1 > radius){
//console.log(data.agentNames[loc+2],data.creationTimes[loc+2])
//console.log(data.agentNames[loc+1], data.creationTimes[loc+2])
//data.creationTimes[loc+1]
let contextLocation = ({
agentPrediction: {
data: { xs: data.values[loc+2].floatCdf.xs, ys: data.values[loc+2].floatCdf.ys },
dataType: "floatCdf"
},
marketPrediction: {
data: { xs: data.values[loc+1].floatCdf.xs, ys: data.values[loc+1].floatCdf.ys },
dataType: "floatCdf"
},
resolution: resolution.measurement

})
//console.log(contextLocation)
let PA = new predictionAnalysis.PredictionResolutionGroup(contextLocation)
if(PA != undefined && PA.pointScore("MarketScore") != undefined){
//cv.push({ cv: PA.pointScore("MarketScore").data * (1/(1+(d1/Math.min(d0,100)))), location: loc+2})
cv.push({ cv: PA.pointScore("MarketScore").data, location: loc+2})

//console.log(PA.pointScore("MarketScore").data)
}

}else{
cv.push({cv: 0, location: loc+2})
}
loc +=2
}
}
return(cv)

}
Insert cell
{
async function test(){
let data = await getQuestionData("1f56fcf1-21c8-4149-8abf-4feed882d31c")
let locations = findLocations("geesh", data)
let resolution= findResolution(data)
return(getCVofPersonInQuestion(locations, data.total, data, resolution, 100))
}
return test()
}
Insert cell
md `### Do this for all users in a question`
Insert cell
function findResolution(data){
let result = []
let InitialTime = new Date(data.creationTimes[0])
for(let i = 0; i<data.values.length; i++){
if(data.competitorTypes[i] == "OBJECTIVE"){
return({
time: (new Date(data.creationTimes[i]) - InitialTime)/(1000*3600*24), // number of days
measurement: {
data: {
xs: data.values[i].floatCdf.xs,
ys: data.values[i].floatCdf.ys
},
dataType: "floatCdf"
}
})
}
}

}
Insert cell
async function getCVofPeopleInQuestion(questionID, radius){

let data = await getQuestionData(questionID)
// return data;
let result = []

let resolution = findResolution(data);
var uniqueNames = data.agentNames.filter((v, i, a) => a.indexOf(v) === i);
for(let uniqueName of uniqueNames){
if(uniqueName != "Priors Bot" && uniqueName != "Aggregation Bot"){
result.push({
name: uniqueName,
cvarray: (getCVofPersonInQuestion(findLocations(uniqueName, data), data.total, data, resolution, radius)) //.reduce((a, b) => a + b, 0)
// Remove the reduce to find out where they come from.
})
}
}
return {result: result, total: data.total}
}
Insert cell
md `### Get the SVs from the CVs`
Insert cell
async function getSVofPeopleInQuestion(questionID, radius){
let CVs = await getCVofPeopleInQuestion(questionID, radius)
let total = CVs.total
CVs = CVs.result
let arrayLocation = Array(total)
arrayLocation.fill(0,0,total)
for(let i =0; i<CVs.length; i++){
let cvarray = CVs[i].cvarray
for(let j=0; j<cvarray.length; j++){
if(cvarray[j].cv != 0){
arrayLocation[cvarray[j].location] += 1//cvarray[j].cv
}
}
}
for(let i =0; i<CVs.length; i++){
let cvarray = CVs[i].cvarray
for(let j=0; j<cvarray.length; j++){
if(cvarray[j].cv != 0){
cvarray[j] = cvarray[j].cv / (arrayLocation[cvarray[j].location] +1) // * (cvarray[j].cv/ (arrayLocation[cvarray[j].location] +1))
}else{
cvarray[j] = 0
}
}
CVs[i] = {name: CVs[i].name, sv: cvarray.reduce((a, b) => a + b, 0)}
}
return CVs
}
Insert cell
getSVofPeopleInQuestion("5be0fba1-5701-4a83-816b-0ee4ab71073a", 100)
Insert cell
md `### Do this for every question in a channel`
Insert cell
async function getSVofQuestionsInChannel(radius){
let queryChannel = `
{
measurables(channelId: "f19015f5-55d8-4fd6-8621-df79ac072e15" last:100){
total
edges{
node{
id
name
state
min
max
}
}
}
}`
let r2= await request(queryChannel)
function getQuestions(){
let edges2 = r2.data.measurables.edges
let result = []
for(let edge of edges2){
result.push({id: edge.node.id,
name: edge.node.name,
state: edge.node.state,
min: edge.node.min,
max:edge.node.max})
}
return result
}
let questions = getQuestions()

let result = []
for(let question of questions){
if(question.state =="JUDGED"){ // && question.min == 0 && question.max ==100){
var sv = undefined
try {
sv = await getSVofPeopleInQuestion(question.id, radius)
}catch(error){
console.log(error)
console.log(`Error in question ${question.name}`)
sv = undefined
}
if(sv !=undefined){
result.push({
name: question.name,
id: question.id,
sv: sv
})
}
}

}
return result
}
Insert cell
SV = getSVofQuestionsInChannel(100)
Insert cell
md `### Order the result`
Insert cell
function SortedBySV(SV){
function getUniqueNames(SV){
let names= []
for(let a of SV){
let b = a.sv
for(let i=0; i<b.length; i++){
names.push(b[i].name)
}
}
var uniqueNames = names.filter((v, i, a) => a.indexOf(v) === i);
return uniqueNames
}
let result = []
let uniqueNames = getUniqueNames(SV)
for(let uniqueName of uniqueNames){
var sv= 0;
var numberofquestions = 0;
for(let a of SV){
let b = a.sv
for(let i=0; i<b.length; i++){
if(b[i].name == uniqueName){
sv+=b[i].sv
numberofquestions+=1;
}
}
}
result.push({name: uniqueName, sv: sv, numberofquestions: numberofquestions})
}
result.sort((a,b)=>{return b.sv-a.sv})
return(result)

}

Insert cell
md `## Present the result in a table`
Insert cell
import {table} from "@tmcw/tables@513"
Insert cell
table(SortedBySV(SV), {
sortable: true,
columns: {
name: {
title: "Name",
},
sv : {
title: "~ Shapley value ~ helpfulness",
},
numberofquestions: {
title: "Number of questions / 7"
}
}
})
Insert cell
md `## Foretold functions`
Insert cell
questionID = "bbe62da4-e100-4935-a419-30477a167540"
Insert cell
query = `
{
measurements(measurableId:"bbe62da4-e100-4935-a419-30477a167540" last:500){
total
edges{
node{
id
description
createdAt
updatedAt
value{
floatCdf{
xs
ys
}
}

valueText
competitorType
agent{
id
name
}
}
}
}
}
`
Insert cell
main_url = "https://api.foretold.io/graphql?token=";
Insert cell
HEADERS = ({
"Accept-Encoding": "gzip, deflate, br",
"Content-Type": "application/json",
Accept: "application/json",
Connection: "keep-alive",
DNT: "1",
Origin: "https://api.foretold.io"
});

Insert cell

options = query => ({
method: "POST",
headers: HEADERS,
body: JSON.stringify({ query })
});


Insert cell
async function request(query) {
let foretold_token = ""
return await fetch(main_url + foretold_token, options(query)).then(response =>
response.json()
);
}
Insert cell
predictionAnalysis = require('@foretold/prediction-analysis@0.0.7/dist/index.js')
Insert cell

One platform to build and deploy the best data apps

Experiment and prototype by building visualizations in live JavaScript notebooks. Collaborate with your team and decide which concepts to build out.
Use Observable Framework to build data apps locally. Use data loaders to build in any language or library, including Python, SQL, and R.
Seamlessly deploy to Observable. Test before you ship, use automatic deploy-on-commit, and ensure your projects are always up-to-date.
Learn more