<aside>
How to read this doc: 🆕 NEW RULE — added by customer, not in the previous guidelines ⚠️ CHANGED — overrides a rule from the previous guidelines
</aside>
| Area | Previous | New | Status |
|---|---|---|---|
| Core philosophy | Not stated explicitly | Preserve Everything — when in doubt, keep it | 🆕 NEW |
| Partial-word stutters (e.g., "Wh-wh-what") | Do not transcribe — drop the fragments → "What makes you think that?" | Keep the fragments with hyphens → "Wh-wh-what makes you think that?" | ⚠️ CHANGED |
| Complete-word repetitions ("I I I went") | Keep without hyphens | Keep without hyphens | Unchanged |
| Hesitations (um, uh, er, ah, mm) | Treated as filler/crutch words — keep, comma-offset | Explicit category — keep all, comma-offset where natural | 🆕 Clarified |
| Brand & product names | Concatenate — Xbox360, iPhone13Pro | Use official written form — Xbox 360, iPhone 13 Pro, Windows 11, PlayStation 5 | ⚠️ CHANGED |
| Arbitrary alphanumeric codes | Concatenate — D48AA | Concatenate — D48AA, RX7, B2B, 3DPrinting (regardless of pacing) | Unchanged (expanded examples) |
| Spoken "dash" inside a code | Not specified | Render as a literal dash → A-37B, KL-902 | 🆕 NEW |
| Personal data formats (phone, email, address, SSN, credit card, dates) | General "transcribe all PII" rule, no formatting spec | Use official/standard format regardless of how speaker paces it | 🆕 NEW |
| Spelling out a word (e.g., a name) | Not specified | Capital letters separated by dashes → S-M-I-T-H | 🆕 NEW |
| Unintentional repetitions ("the the problem") | Offset with commas | Keep all; tight repetitions may render without commas | ⚠️ CHANGED |
| Background speech (TV, bystanders, etc.) | Not specified | Insert [BG SPEECH] tag in place of the speech | 🆕 NEW |
| Filler words (like, you know, I mean) | Keep, comma-offset | Keep, comma-offset | Unchanged |
| False starts / self-corrections | Two hyphens (--) + space | Two hyphens (--) + space | Unchanged |
| Informal contractions (tryna, gonna, dunno) | Keep as spoken | Keep as spoken | Unchanged |
| Emphasized / elongated words | Don't change spelling | Don't change spelling | Unchanged |
| Grammar mistakes | Don't correct | Don't correct | Unchanged |
| Pronunciation mistakes | Correct | Correct | Unchanged |
| Speaker identification (Speaker 1, Speaker 2) | Required formatting | Unchanged | |
| Speakers' sounds — (laughing), (gasping), etc | Lower-case present continuous, in round brackets, in-line | Unchanged | |
| Foreign language tagging — [foreign language 00:00:02] | Italicize uncommon foreign words; tag long stretches | Unchange |
What is verbatim for AI training?
Verbatim for AI training is a customer-specific service level of transcription which is different from HappyScribe's standard verbatim guidelines.
<aside> 🧑🤝🧑
Quality tip! The following guidelines are an addition to the general guidelines for transcription formatting, sentence structure, symbols, italics, etc.
However, the instructions below must always be followed when working on these verbatim files unless otherwise specified by Admin.
</aside>
<aside> 🆕
NEW RULE — Core Philosophy: Preserve Everything. The goal is to retain as much of the target speaker's content as possible. Downstream the customer can always remove disfluencies, fillers, and hesitations — but they cannot recover information that was never transcribed. When in doubt, keep it.
</aside>
Transcriptions will be required in the following languages:
To make sure speakers can be distinguished while remaining anonymous:
Use Speaker 1, Speaker 2, etc. as labels. Do not use their name.
If it's a company entity or celebrity, you can still write their name.
If it is a complete word, transcribe it.