Published
Edited
Apr 30, 2020
Insert cell
Insert cell
import {query} from '@visnup/baby-names-by-birth-year'
Insert cell
import {printTable} from '@uwdata/data-utilities'
Insert cell
table = "\`bigquery-public-data.usa_names.usa_1910_current\`"
Insert cell
q = `select count(*) as cnt from ${table}`
Insert cell
result = query(q) // do I really want reactive execution for queries?
Insert cell
query(`select * from ${table} limit 1`)
Insert cell
printTable(result)
Insert cell
md`## What names saw the biggest one year drop-off?
I
1. Aggregate by state level to countrywide (missed this step at first)
1. use a window to get the lag of number,
1. calc absolute and % drop
1. get top 10 by gender

https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts
OVER (PARTITION BY division ORDER BY finish_time ASC)

Possible iterations:
1. names with biggest 5 year drop.
1. names with biggest 1 year rise
1.

Other good query questions:

1. What name was most popular with both genders in the most recent year of data? (define "most popular with both genders meaningfully")
1. Is the the percentage of girls names ending in vowels increasing or decreasing over the last 30 years?

`
Insert cell
q1 = `
WITH group_by_gender as (
select gender, year, name, sum(number) as number
from ${table}
group by gender, year, name
)
---
select * from
(select *
, lag(number) over w as lag_number
, number-lag(number) over w as net_change_yoy
, -1+number/lag(number) over w as yoy_pct_change
from group_by_gender
window w as (partition by gender,name order by year)) q
where net_change_yoy is not null
-- and year >= 1980 and year <=1989 -- Let's restrict the analysis to my fellow 80's babies
and number >= 1000
order by yoy_pct_change
limit 50
`
Insert cell
printTable(res1)
Insert cell
res1 = query(q1)
Insert cell
Insert cell
q2 = `
WITH q_filter as (
select
*
, REGEXP_CONTAINS(name, r'[aeiouy]$') as ends_with_vowel
FROM ${table}
where gender = 'F' and year >= extract(year from current_date()) - 50
)

select year, sum(cast(ends_with_vowel as int64) * number) / sum(number) as pct_ends_with_e
from q_filter
group by year
order by year desc
`
Insert cell
printTable(res2)
Insert cell
res2 = query(q2)
Insert cell
import {vl} from '@vega/vega-lite-api'
Insert cell
vl.markLine()
.data(res2)
.encode(vl.x().fieldQ('year'),
vl.y({scale:{domain:[.5, 1]}}).fieldQ('pct_ends_with_e'))
.width(width)
.render()
Insert cell
md`# Names that are popular with both genders

I think a useful metric is the max of the ranks by gender. So the we'll find:

1. The most popular names for boys that are more popular for girls
1. The most popular names for girls that are more popular for boys

Not sure converting to ranks will change things much, but it'll facilitate comparison across decades and other groupings where the number of people born is different in each grouping.

---

I ended up using the sum of the boy and girls ranks. This seems to get results that make sense, but it seems sketchy.
`
Insert cell
q3 = `
WITH
q as (
select floor(year/10)*10 as decade, name, gender, sum(number) as number
from ${table}
group by decade, name, gender
),
w_rank as (
select *
, rank() over (partition by cast(decade as string), gender order by number desc) as rank

from q
),

boys as (select * from w_rank where gender = 'M'),
girls as (select * from w_rank where gender = 'F')

select
boys.decade
, boys.name as name
, boys.rank as rank_boy
, girls.rank as rank_girl
, girls.rank+boys.rank as sum_rank
from boys
inner join girls
on boys.name = girls.name and boys.decade = girls.decade
order by sum_rank
limit 100
`
Insert cell
printTable(res3)
Insert cell
res3 = query(q3)
Insert cell
query("EXPLAIN " + q3)
Insert cell

One platform to build and deploy the best data apps

Experiment and prototype by building visualizations in live JavaScript notebooks. Collaborate with your team and decide which concepts to build out.
Use Observable Framework to build data apps locally. Use data loaders to build in any language or library, including Python, SQL, and R.
Seamlessly deploy to Observable. Test before you ship, use automatic deploy-on-commit, and ensure your projects are always up-to-date.
Learn more