deck = [
`# On-device LLM with Gemini Nano
Chrome has an experiment to allow a language model run entirely on your device!`,
`## What is it?
- It's an early preview program
- Only in Chrome Dev+ with flags enabled
- It's roughly ~2GB (uncompressed to 22GB on fs!)
- Surprisingly powerful`,
`## How do I get it
- https://observablehq.com/@ryanseddon/chrome-ai
- chrome://flags/#optimization-guide-on-device-model
- chrome://flags/#prompt-api-for-gemini-nano
\`\`\`js
(await ai.languageModel.capabilities()).available
// -> "readily" | "after-download" | "no"
\`\`\`
`,
md`## APIs
\`\`\`js
const session = await ai.languageModel.create();
session.prompt('Write me a joke about JavaScript')
\`\`\`
${promptJokeButton}
`,
`## APIs
\`prompt()\` will probably never ship :(
\`\`\`js
ai.summarizer()
ai.writer() // Generate text, tone, context
ai.rewriter() // Rewrite text in certain style
translator.translate()
\`\`\`
`,
`<img src="https://lh5.googleusercontent.com/proxy/mk0r4A6KcTIqEHXc1i6dRVCqL4-ayXEE7R_Bx3qFtkKy6Y_Hwyc1X9kViUdhLYXwRDfxpdVLS03KIHMfEbV-cY5IcA_4WGaDSgsX8gKeJEAqQxEQwYkf33kA-sabVLiX5BSPW82DYQPKnDErtgZfuvgJHC-h" style="position: absolute; top: 0; right: 0; width: 100%; height: 100%; object-fit: contain;">`,
`
\`\`\`js
await ai.createTextSession()
await ai.createGenericSession()
await ai.assistant.create()
await ai.languageModel.create()
\`\`\`
`,
`## \`prompt()\` focus
- Most interesting
- Works how you're used to working with LLMs
- Is for finding use cases
- Has \`promptStreaming()\` for faster response times
`,
`## Why?
- Privacy, all happens on device
- Compliance, PII, sensitive data
- No network latency, promptStreaming has sub ms repsonse times
- Trade off is device resources
- Access across all origins, so better than WebGPU models
`,
`## Resources
From my less than scientific investigations:
- Uses ~2GB of memory while actively generating text
- With an active session uses ~0.8GB of memory
- promptStreaming would return a chunk every 0.05ms on average
`,
`## Limitations
- Prompt has a 1024 tokens per prompt
- The actual context window is 4096 tokens
- Has a sliding window of context
- System prompt is kept in the window
- It's bad at jokes and factual stuff`,
`## Is it just a toy?
- Maybe
- It is quite capable with the task specific APIs
- Think of it as a layered approach
- This may never ship and was just a fun thing
`,
`## Advanced use-cases
Let's take a look at some more advanced stuff we can do`,
`## N-Shot prompting
\`\`\`js
await ai.languageModel.create({
initialPrompts: [
{ role: "system", content: "Predict up to 5 emojis as a response to a comment. Output emojis, comma-separated." },
{ role: "user", content: "This is amazing!" },
{ role: "assistant", content: "❤️, ➕" },
{ role: "user", content: "LGTM" },
{ role: "assistant", content: "👍, 🚢" }
]
});
\`\`\`
`,
md`## Emojis
${emojiPrediction}
`,
`<img src="https://github.com/user-attachments/assets/0a9f33da-f3b4-47a5-a34c-98ff31c62979" style="position: absolute; top: 0; right: 0; width: 100%; height: 100%; object-fit: contain;">
`,
`## JSON
\`\`\`js
await ai.languageModel.create({
initialPrompts: [
{ role: "system", content: "Predict the sentiment of the text. Output either neutral, postive or negative." },
{ role: "user", content: "This is amazing!" },
{ role: "assistant", content: '{"sentiment": "positive"}' }
]
});
\`\`\`
`,
md`## JSON
${jsonFormat}`,
`## Session cloning
- The trick here is to use \`session.clone()\`
- This allows you to not accidentally overflow you guidance
- No clone would mean the json example could be lost due to windowing
`,
`## Complex JSON
- Open discussion on the [prompt-api explainer repo](https://github.com/explainers-by-googlers/prompt-api/issues/35)
- Existing projects: [Guidance-ts](https://github.com/mmoskal/guidance-ts), [AiBrow](https://github.com/axonzeta/aibrow), [TypeChat](https://github.com/microsoft/TypeChat/tree/main/typescript)
- TypeChat sort of works with Nano now
- Others require JSON Schema, GBNF Grammars
`,
`<img src="https://github.com/user-attachments/assets/774a8513-001b-441d-9702-c1edef819d63" style="position: absolute; top: 0; right: 0; width: 100%; height: 100%; object-fit: contain;">`,
gist("ryanseddon", "703ef4eefbe56e3f92c50c5298b2dc18"),
gist("ryanseddon", "8f43d21196b83bf75db5495b7ed891c9"),
`## Curling Nano
A lot of eval tools such as TypeChat assume a server side LLM
- So I made a node script that spins up playwright and calls prompt api
- Now I can directly curl or use tools that assume an endpoint
- Follows the Open AI \`/v1/chat/completions\` schema`,
`<video controls onclick="event.stopPropagation()">
<source src="https://github.com/user-attachments/assets/32cb4826-f6a3-40b7-b0e3-0e7f44f0f3f3" type="video/mp4">
Your browser does not support the video tag.
</video>`,
md`# Thanks!
${llmPoem}`,
{
theme: ["mytheme", "dark"]
}
]