Public
Edited
Mar 28, 2020
Importers
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
displacement = samplecars.map(obj => obj["Displacement"]) // create an array of just displacement values
Insert cell
horsepower = samplecars.map(obj => obj["Horsepower"]) // create an array of just horsepower values
Insert cell
Insert cell
mean = d => d.reduce((accumulator, currentValue) => accumulator + currentValue,0)/d.length
Insert cell
mean(horsepower) //test it out
Insert cell
variance = d => d.reduce((a,c,i,ar) => a + (c-mean(ar))**2,0)/d.length
Insert cell
variance(horsepower) // test it out
Insert cell
standardDeviation = function(d) {return Math.sqrt(variance(d))}
Insert cell
standardDeviation(horsepower)
Insert cell
md`## Test against jstat Library

Lets see if I got all of this right, by testing against jstat`
Insert cell
Insert cell
jstat.variance(horsepower);
Insert cell
jstat.stdev(horsepower) // Yep, get the same value I calculated
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
regressionOutput = (x,slope,intercept) => slope * x + intercept // gives us a y value for regression lines
Insert cell
regressionLine = (xMin,xMax,slope,intercept) => [
{"x": xMin, "y": regressionOutput(xMin,slope,intercept)},
{"x": xMax, "y": regressionOutput(xMax,slope,intercept)}
]

// gives us the start and end points of a regression line, from the minimum x value of the data to the maximum x value of the data.
Insert cell
md`## Calculate Regression Line

Let's create the functions that calculate the regression line using the least squares formulas below.

Using the convention of *b* for slope and *a* for intercept.

`
Insert cell
Insert cell
Insert cell
xyMean = (xData, yData) => xData.reduce((agg,cx,i) => agg + cx*yData[i])/xData.length
Insert cell
x2Mean = xData => xData.reduce((agg,cx) => agg + cx**2)/xData.length
Insert cell
regressionSlope = (xData, yData) => ((xyMean(xData,yData) - mean(xData)*mean(yData)) / (x2Mean(xData) - mean(xData)**2))
Insert cell
calculatedSlope = regressionSlope(displacement,horsepower)
Insert cell
regressionIntercept = (xData, yData) => mean(yData) - regressionSlope(xData,yData)*mean(xData)
Insert cell
calculatedIntercept = regressionIntercept(displacement,horsepower)
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
squaredError = (xData,yData) => yData.reduce((agg,y,i) => agg + (y - (regressionSlope(xData,yData)*xData[i]+regressionIntercept(xData,yData)))**2)
Insert cell
testSquaredError = squaredError(allDisplacement,allHorsepower)
Insert cell
totalError = yData => yData.reduce((agg,y,i) => agg + (y - mean(yData))**2)
Insert cell
testTotalError = totalError(allHorsepower)
Insert cell
rsquared = (xData, yData) => 1 - squaredError(xData,yData)/totalError(yData)
Insert cell
testrsquared = rsquared(allDisplacement,allHorsepower)
Insert cell
jstatTest = jstat.corrcoeff(allDisplacement,allHorsepower)**2 // test that I'm getting to the right answer, approximately.
Insert cell
Insert cell
covariance = (xData,yData) => xyMean(xData,yData) - mean(yData) * mean(xData)
Insert cell
testCovar = covariance(allDisplacement,allHorsepower)
Insert cell
jstat.covariance(allDisplacement,allHorsepower) // confirming I calculated it correctly...
Insert cell
alternativeSlopeCalc = covariance(allDisplacement,allHorsepower) / (variance(allDisplacement)) // covar/var is supposed to give the slope. It is close but slightly smaller.
Insert cell
ssxy = (xData,yData) => yData.reduce((agg,y,i) => agg + (y - mean(yData))*(xData[i] - mean(xData)))
Insert cell
ssxy(allDisplacement,allHorsepower)
Insert cell
alternativessxy = covariance(allDisplacement,allHorsepower) * allDisplacement.length
Insert cell
d3 = require("d3-format")
Insert cell

Purpose-built for displays of data

Observable is your go-to platform for exploring data and creating expressive data visualizations. Use reactive JavaScript notebooks for prototyping and a collaborative canvas for visual data exploration and dashboard creation.
Learn more