CS · Data Science

Text-to-SVG generation

Glyf

A text-to-SVG glyph generator. You give it a vibe and a letter; a fine-tuned Qwen2.5-Coder-7B streams a glyph token-by-token; the browser renders the path as the tokens arrive, lets you edit anchors by hand, and packs a library of saved letters into a downloadable TTF.

The project has two halves: a custom path editor and a LoRA fine-tune, sharing a single SSE contract.

Demo: type a vibe and a letter, watch the glyph stream in, edit, save, repeat. Compose a word in the preview row and download a TTF.

What it does

You type a vibe — the model trained on adjective lists like humanist, calm, stiff (Cambay) or shaded, playful, awkward, rugged (Rubik Doodle Shadow) — and pick a letter. You hit generate. A path streams into a canvas, drawing the glyph the way a pen would, one segment at a time. When the stroke finishes you see the anchors. You drag a curve, soften an edge, fit the glyph to cap-height, and save it to the library on the next row.

Generation runs per letter or in a batch: with BATCH checked, one vibe produces glyphs for the full A–Z, a–z, 0–9, and punctuation set in a single run. Batched letters share the prompt but the model generates them independently, so consistency across the set could vary. The library holds whatever you save, and the FONT button exports it as a TTF at any point. The preview row composes a word live from the saved set — the screenshot below spells typocraphy! from a 16-glyph library.

Top half of the Glyf interface: vibe and letter input on the left, canvas with the letter R drawn with editable Bezier anchors and typographic rulers (ASC / CAP / X / BL / DESC) on the right, status bar below.

Above the library — the generation surface. VIBE and LETTER on the left, the canvas with editable anchors and typographic rulers on the right.

Design

The output flow treats each generation as a draft that the user finishes by hand. A glyph is small enough that direct manipulation is feasible: a few dozen Bezier anchors, one viewBox, one path. The editor handles the refinement step; the library accumulates the finished glyphs; the TTF exporter packs them as a font.

The interface is laid out as a typographic specimen sheet. Vibe and letter input on the left, canvas with rulers on the right, library and preview rows below. Saved glyphs persist in the library independent of how many regenerations preceded them.

Streaming as a drawing

The model sends tokens, not finished SVG. A token boundary doesn’t respect XML boundaries — a chunk can land mid-attribute (d="M10 20 L3 with the next chunk extending the 3 to 37) or mid-tag. If you hand that buffer to a browser SVG parser as-is, the canvas flickers and the pen visibly jumps backward when a half-written number gets extended.

A small sanitizer rewrites the partial buffer into a valid <svg>...</svg> envelope on every chunk: it closes any open attribute string, trims a trailing partial number, and self-closes the last element. Every chunk is renderable. The canvas redraws continuously, and the glyph appears to be drawn rather than appearing all at once.

From a glyph to a font

The path editor is a Bezier surface implemented directly on a canvas. Anchors and handles are draggable; a small set of ops compose the rest — soften (Catmull-Rom resampling, tuned by the SOFTEN slider), simplify away redundant points, and fit to typographic ratios. The horizontal rulers in the screenshot (ASC 215, CAP 265, X 540, BL 750, DESC 830) are the targets fit aligns against, so a library of generated glyphs lands on a shared baseline and reads as a coherent type family.

The TTF export goes through opentype.js. Each saved glyph carries its bounding box relative to that baseline; the exporter derives the font’s typographic metrics and emits a real TTF.

Bottom of the Glyf interface: the preview row spelling 'typocraphy!' composed live from the saved library, with the same word echoed in monospace below.

Below the library — the preview row. The composed word is rendered with the saved glyphs as a live font.

Backend

The fine-tuned model runs on Modal. A slim FastAPI router on a tiny image (~2 s cold start) fronts a GPU class container (L4, bf16) that holds the base model + LoRA adapter warm; weights live in a Modal volume so a new container re-reads ~14 GB from cache rather than pulling from HuggingFace.

Scope

Single-glyph generation only. Editing an existing SVG by natural-language instruction was scoped out at the start. The product stays narrow on purpose: one prompt, one letter, finishable in the view that produced it.