Published
Edited
Dec 31, 2021
1 star
Dummy data for visualization projects
Insert cell
# Dummy data for visualization projects

_Using dummy (or mock or fake) data helps to bootstrap a visualization project, to focus on design/visual parts rather than data loading and parsing. It also helps understand the shape of expected data if this part is not settled, yet. I'll report on some ideas I've used to generate such dummy dataset. Please keep in mind there is no intention to arm or counterfeit anything, but rather provide a technical heuristic._

Working on a data(-driven) visualization project requires a plethora of skills, from user studies, design, JavaScript, D3, deployement, etc.. and most importantly data!

But data storage, collection preparation, acessibility (API, remote server) are as many distractions that prevent you from focusing on early design decisions that are important especially usually as they will stick for the rest of your project.

From my experience building and mentoring a couple of visualization projects, you may need to dummy the dataset you work on (unless of course if is already available, and if so I suggest to copy/paste a couple of lines)

What are the benefits of faking data for an early project?

- It gets you started _right now_ without waiting for thre actual data

- Helps you shape the data schema, types, distribution.

- No privacy issues! You can share dummy data without any concern regarding data copyright or anonymity

- Useful to test to benchmark with larger datasets, different distributions, data types and eventually missing data

Better getting started please make sure you stick to the _[tidy data](https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html)_ principles:

<pre>
1. Each variable forms a column.
2. Each observation forms a row.
3. Each type of observational unit forms a table.
</pre>
Insert cell
Insert cell
iris = d3.csv("https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv")
Insert cell
Array.from({length: 10}).map((d, i) => i)
Insert cell
Array.from({length: 10}).map(d3.randomNormal(0.4, 0.1))
Insert cell
d3 = require("d3")
Insert cell

**Sample datasets**

_Such dataset are very common in the data science community_

- [Iris Flower dataset](https://en.wikipedia.org/wiki/Iris_flower_data_set) ([csv](https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv))

<pre>
sepal_length,sepal_width,petal_length,petal_width,species
5.1,3.5,1.4,0.2,setosa
4.9,3,1.4,0.2,setosa
4.7,3.2,1.3,0.2,setosa
4.6,3.1,1.5,0.2,setosa
5,3.6,1.4,0.2,setosa
</pre>


- [Cars](https://ai.stanford.edu/~jkrause/cars/car_dataset.html.) ([csv](https://gist.githubusercontent.com/noamross/e5d3e859aa0c794be10b/raw/b999fb4425b54c63cab088c0ce2c0d6ce961a563/cars.csv))

<pre>
"","mpg","cyl","disp","hp","drat","wt","qsec","vs","am","gear","carb"
"Mazda RX4",21,6,160,110,3.9,2.62,16.46,0,1,4,4
"Mazda RX4 Wag",21,6,160,110,3.9,2.875,17.02,0,1,4,4
"Datsun 710",22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
"Hornet 4 Drive",21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
"Hornet Sportabout",18.7,8,360,175,3.15,3.44,17.02,0,0,3,2
"Valiant",18.1,6,225,105,2.76,3.46,20.22,1,0,3,1
"Duster 360",14.3,8,360,245,3.21,3.57,15.84,0,0,3,4
</pre>

- Individuals [https://www.mockaroo.com/](https://www.mockaroo.com/)

<pre>
id,first_name,last_name,email,gender,ip_address
1,Gale,Bernardini,gbernardini0@flickr.com,Female,236.165.167.229
2,Ravid,Magnar,rmagnar1@indiegogo.com,Male,127.20.137.234
3,Courtney,Simcox,csimcox2@hp.com,Male,25.250.201.67
4,Farlay,Killeley,fkilleley3@behance.net,Male,211.236.251.254
5,Ambros,Godier,agodier4@i2i.jp,Male,198.226.197.211
6,Melicent,Ahren,mahren5@thetimes.co.uk,Female,128.171.235.98
7,Freedman,Paullin,fpaullin6@posterous.com,Male,10.133.34.122
8,Jabez,Jonsson,jjonsson7@comsenz.com,Male,173.77.112.108
</pre>

- [Stocks data](https://raw.githubusercontent.com/LyonDataViz/MOS5.5-Dataviz/master/data/stocks.csv)

<pre>
symbol,date,price
MSFT,Jan 2000,39.81
MSFT,Feb 2000,36.35
MSFT,Mar 2000,43.22
MSFT,Apr 2000,28.37
MSFT,May 2000,25.45
MSFT,Jun 2000,32.54
MSFT,Jul 2000,28.4
MSFT,Aug 2000,28.4
MSFT,Sep 2000,24.53
</pre>

**What not to do**

- Stick too long to the same dataset, then the visualization might get too specific
- Forget to explore the actual dataset being used to find interesting patterns, properties, semantic, ..
- The generated dataset can then be aggregated, filtered, reduced, etc. as a regular dataset


**Other**

- https://www.kelp.nyc/
- https://tuftsvalt.github.io/snowcat/
- https://amnesia.openaire.eu/
- https://github.com/jiananlu/faked_csv
- API testing https://reqres.in/

Insert cell
md`
TODO
- Add Les miserables for node/link example
- Nested datasets
- Aggregated datasets
- Temporal data
- De-aggregate data
- Int32Array.of(1, 2, 3, 4),
- Float64Array.of(5, 6, 7, 8)
- d3.group
- d3.rollup
- Time parse/format
`
Insert cell

One platform to build and deploy the best data apps

Experiment and prototype by building visualizations in live JavaScript notebooks. Collaborate with your team and decide which concepts to build out.
Use Observable Framework to build data apps locally. Use data loaders to build in any language or library, including Python, SQL, and R.
Seamlessly deploy to Observable. Test before you ship, use automatic deploy-on-commit, and ensure your projects are always up-to-date.
Learn more