<aside>

How to read this doc: 🆕 NEW RULE — added by customer, not in the previous guidelines ⚠️ CHANGED — overrides a rule from the previous guidelines

</aside>

What's changed at a glance

Area Previous New Status
Core philosophy Not stated explicitly Preserve Everything — when in doubt, keep it 🆕 NEW
Partial-word stutters (e.g., "Wh-wh-what") Do not transcribe — drop the fragments → "What makes you think that?" Keep the fragments with hyphens → "Wh-wh-what makes you think that?" ⚠️ CHANGED
Complete-word repetitions ("I I I went") Keep without hyphens Keep without hyphens Unchanged
Hesitations (um, uh, er, ah, mm) Treated as filler/crutch words — keep, comma-offset Explicit category — keep all, comma-offset where natural 🆕 Clarified
Brand & product names Concatenate — Xbox360, iPhone13Pro Use official written form — Xbox 360, iPhone 13 Pro, Windows 11, PlayStation 5 ⚠️ CHANGED
Arbitrary alphanumeric codes Concatenate — D48AA Concatenate — D48AA, RX7, B2B, 3DPrinting (regardless of pacing) Unchanged (expanded examples)
Spoken "dash" inside a code Not specified Render as a literal dash → A-37B, KL-902 🆕 NEW
Personal data formats (phone, email, address, SSN, credit card, dates) General "transcribe all PII" rule, no formatting spec Use official/standard format regardless of how speaker paces it 🆕 NEW
Spelling out a word (e.g., a name) Not specified Capital letters separated by dashes → S-M-I-T-H 🆕 NEW
Unintentional repetitions ("the the problem") Offset with commas Keep all; tight repetitions may render without commas ⚠️ CHANGED
Background speech (TV, bystanders, etc.) Not specified Insert [BG SPEECH] tag in place of the speech 🆕 NEW
Filler words (like, you know, I mean) Keep, comma-offset Keep, comma-offset Unchanged
False starts / self-corrections Two hyphens (--) + space Two hyphens (--) + space Unchanged
Informal contractions (tryna, gonna, dunno) Keep as spoken Keep as spoken Unchanged
Emphasized / elongated words Don't change spelling Don't change spelling Unchanged
Grammar mistakes Don't correct Don't correct Unchanged
Pronunciation mistakes Correct Correct Unchanged
Speaker identification (Speaker 1, Speaker 2) Required formatting Unchanged
Speakers' sounds — (laughing), (gasping), etc Lower-case present continuous, in round brackets, in-line Unchanged
Foreign language tagging — [foreign language 00:00:02] Italicize uncommon foreign words; tag long stretches Unchange

General Information

What is verbatim for AI training?

Verbatim for AI training is a customer-specific service level of transcription which is different from HappyScribe's standard verbatim guidelines.

<aside> 🧑‍🤝‍🧑

Quality tip! The following guidelines are an addition to the general guidelines for transcription formatting, sentence structure, symbols, italics, etc.

However, the instructions below must always be followed when working on these verbatim files unless otherwise specified by Admin.

</aside>

<aside> 🆕

NEW RULE — Core Philosophy: Preserve Everything. The goal is to retain as much of the target speaker's content as possible. Downstream the customer can always remove disfluencies, fillers, and hesitations — but they cannot recover information that was never transcribed. When in doubt, keep it.

</aside>

Languages and services

Transcriptions will be required in the following languages:

Speaker identification

To make sure speakers can be distinguished while remaining anonymous:

Use Speaker 1, Speaker 2, etc. as labels. Do not use their name.

If it's a company entity or celebrity, you can still write their name.

Transcription Logic

Complete words

If it is a complete word, transcribe it.

Partial words / False starts