Platform
Resources
Pricing
Sign in
Get started
Ian Johnson
pixel flipper, data sifter. trying to see what I can while I'm here
Workspace
Fork
Public
The Pile
By
Ian Johnson
Edited
May 26, 2023
2 stars
1
The Pile
Test set embeddings
00.jsonl token count
Test set token counts
Insert cell
Insert cell
db
select count(*)::int as total from counts;
Insert cell
db
Filter
Columns
Sort
Slice
0–100
Save
Type Table, then Shift-Enter. Ctrl-space for more options.
Insert cell
Insert cell
db
SELECT * FROM "counts"
LIMIT 100
Insert cell
db
select pile_set_name, sum(tokens)::int as sum, count(*)::int as count
from counts
group by pile_set_name
order by sum desc
Insert cell
Insert cell
Insert cell
db
select pile_set_name, count(*)::int as count
from counts
group by pile_set_name
order by count desc
Insert cell
Insert cell
db
select pile_set_name, count(*)::int as count
from counts
where tokens < 4000
group by pile_set_name
order by count desc
Insert cell
Insert cell
db
=
DuckDBClient
.
of
(
{
counts
:
FileAttachment
(
"00-counts-noindex.parquet"
)
}
)
Insert cell
nformat
=
d3
.
format
(
",d"
)
Insert cell
import
{
colors
}
from
"@enjalot/the-pile-test-set-token-counts"
Insert cell
Purpose-built for displays of data
Observable is your go-to platform for exploring data and creating expressive data visualizations. Use reactive JavaScript notebooks for prototyping and a collaborative canvas for visual data exploration and dashboard creation.
Try it for free
Learn more
Fork
View
Export
Edit
Add comment
Select
Duplicate
Copy link
Embed
Delete
JavaScript
Markdown
HTML
total
Add comment
Copy import
Select
Duplicate
Copy link
Embed
Delete
Cells
db
File attachments
00-counts-noindex.parquet
Parquet
Databases
Add comment
Select
Duplicate
Copy link
Embed
Delete
Cells
db
File attachments
00-counts-noindex.parquet
Parquet
Databases
Filter
Column
Operator
Columns
Sort
Column
Direction
Descending
Ascending
Slice
From
Start
To
End
Edit
Add comment
Select
Duplicate
Copy link
Embed
Delete
JavaScript
Markdown
HTML
Add comment
Select
Duplicate
Copy link
Embed
Delete
Cells
db
File attachments
00-counts-noindex.parquet
Parquet
Databases
sums
Add comment
Copy import
Select
Duplicate
Copy link
Embed
Delete
Cells
db
File attachments
00-counts-noindex.parquet
Parquet
Databases
Edit
Add comment
Select
Duplicate
Copy link
Embed
Delete
JavaScript
Markdown
HTML
Edit
Add comment
Select
Duplicate
Copy link
Embed
Delete
JavaScript
Markdown
HTML
counts
Add comment
Copy import
Select
Duplicate
Copy link
Embed
Delete
Cells
db
File attachments
00-counts-noindex.parquet
Parquet
Databases
Edit
Add comment
Select
Duplicate
Copy link
Embed
Delete
JavaScript
Markdown
HTML
counts_4000
Add comment
Copy import
Select
Duplicate
Copy link
Embed
Delete
Cells
db
File attachments
00-counts-noindex.parquet
Parquet
Databases
Edit
Add comment
Select
Duplicate
Copy link
Embed
Delete
JavaScript
Markdown
HTML
db
Add comment
Copy import
Select
Duplicate
Copy link
Embed
Delete
JavaScript
Markdown
HTML
nformat
Add comment
Copy import
Select
Duplicate
Copy link
Embed
Delete
JavaScript
Markdown
HTML
Add comment
Copy import
Select
Duplicate
Copy link
Embed
Delete
JavaScript
Markdown
HTML