$ttl

Prereq

Book PDF in `10-sources/`. Know the pack slug (e.g. `visual-identity`) and a short source-slug (e.g. `bokhua-principles-of-logo-design-2022`).

Steps

1. Verify text vs scan — on ≥20 pages

pdftotext -f 1 -l 20 book.pdf - | tr -d '[:space:]' | wc -c

Gotcha: check 20pp, NOT 5 — front matter (cover/title) is image-only and returns ~0 chars, falsely flagging a text book as scan. If genuinely scan (<1500 ch on 20pp) → OCR:

~/Sync/digitalGarden/bin/ocr.sh book.pdf -p 1-18 -l en+de -o /tmp/book.txt

Gotcha: for IA-sourced books prefer `ocr.sh` over the IA `_djvu.txt` (the latter is heavily degraded).

2. Ground concepts

pdftotext book.pdf /tmp/book.txt
  grep -niE "concept|anchor|terms" /tmp/book.txt | head -20

Read anchors; abstract concepts into your own words (summaries, never verbatim).

3. Bulk-create EU files + register (mkeu pattern)

cd ~/Sync/digitalGarden/10-extracted
  APP=~/.claude/skills/runa-classify/append-event-v13.sh
  SRC=$(md5 -q ~/Sync/digitalGarden/10-sources/book.pdf | cut -c1-32)
  RUN=vi-<book>-2026-05-25
  mkeu(){ id="$1";tier="$2";ttl="$3";b1="$4"
  cat > "$id.podlite" <<EOF

EU statement

$b1

EOF snip=$(md5 -q "$id.podlite"|cut -c1-16) "$APP" --event=eu_created --eu-id="$id" --details="$ttl" --tier-impact="$tier+1" \ --run-id="$RUN" --source-file-hash="$SRC" --snippet-hash="$snip" \ --ingestion-version=L1-manual --plugin=manual.read --pack=visual-identity; } mkeu my-eu-slug T2 "Title" "Statement." # ... repeat per EU

4. Close session + verify pyramid

"$APP" --event=runa_session_completed --eu-id="$RUN" --details="N EUs (T1:a T2:b ...)" \
    --tier-impact="T1+a,T2+b,..." --run-id="$RUN" --pack=visual-identity \
    --source-file-hash="$SRC" --ingestion-version=L1-manual --plugin=manual.read
  ~/.claude/skills/runa-pyramid/invariant-check.sh --session "$RUN"   # must be 0 violations

Gotcha: `runa_session_completed` REQUIRES `--ingestion-version` AND `--plugin` (errors without). The declared `--tier-impact` aggregate MUST equal the sum of the run's `eu_created` tier-impacts, or the pyramid fails.

Gotchas (cost real time 2026-05-25)

  • Pyramid correction: a mis-declared `session_completed` is NOT fixed by appending a corrected one — `invariant-check` reads the first. Fix the original line via targeted `sed` (unique string) + remove the dup, with a journal backup first.

  • Batch-edit by `:pack`, not filename glob: `sed 10-extracted/muller-*.podlite` collides Jens Müller (visual-identity) with Müller-Brockmann (design-functionalism). Use `grep -l ':pack<slug' *.podlite`. See [[pack-batch-edit-by-pack-not-glob]].

  • index.podlite is high-level, not a per-EU registry — discovery = file-grep by `:pack`.

  • Acquire URLs: IA download `https://archive.org/download/<id>/<urlencoded-filename>`; Are.na full-book PDFs live at `attachments.are.na/*.pdf` (curl-able).

Verification

grep -lr ":pack<visual-identity" 10-extracted/ | grep -vc "hub-\|index"   # file count
  # == journal eu_created with pack= (no retro gap when tagged at creation)

Related