§ Widget · 01 Prefix ambiguity, in a vocabulary that holds five plausible next tokens

Line 1 shows how BPE segments the full sequence. Line 2 zooms into the step right after <ACGTA>: the same DNA prefix can be extended by five different vocab entries, but only one matches the original segmentation. Click a candidate to see how teacher forcing scores it.

Actual tokenization
Equivalent tokenization options
input token
candidate character level token level
Pick a row: BPE teacher forcing only credits <TCG> as correct, even though all five candidates extend the next nine bases TCGTATAGG with no error.