Why does a Japanese or Chinese string overflow a terminal table?

Most CJK characters are one code point but occupy two columns in a monospace grid — Unicode calls this East Asian Width. Layout code that budgets by character or rune count charges a wide character as one column, so East Asian rows overflow the box. Measure display width instead — runewidth.StringWidth in Go, the unicode-width crate in Rust, wcwidth in Python, string-width in JS.

What is the difference between string length and display width?

Length counts units of storage — UTF-16 code units, or code points and runes. Display width counts the columns a monospace terminal uses to draw the text. A kanji is one code point but two columns, so the two numbers disagree for any CJK, full-width, or wide-emoji text. Terminals align by display width, not length.

How do I measure the display width of a string?

Use a display-width library — runewidth.StringWidth in Go, the unicode-width crate in Rust, wcwidth in Python, string-width in JS. When you slice, iterate by grapheme cluster so a combining mark stays with its base. Ambiguous-width characters (East Asian Width category A) have no single right answer; they depend on locale and font.

String length is not display width: the CJK bug hiding in tables and terminals

A command-line tool prints a table. The ASCII rows line up. The row with a Japanese name pushes its right border two columns past the rest and the box tears open. Nobody typed anything wrong. The layout measured that name by counting characters, and a kanji counts as one character but fills two columns.

This one keeps coming back. In a corpus of 97 real CJK and Unicode bugs I’ve catalogued from open-source libraries, six are width and normalization, and the display-width variant turns up in table renderers, CLI progress bars, and editor autocomplete alike. Same mistake, different repo.

A character can be one code point and two columns wide

Terminals are grids. A Latin letter fills one cell. A wide character — most kanji, hanzi, hangul, full-width forms — fills two. Unicode calls this East Asian Width. So there are at least four different numbers you can get back when you “measure” a string, and they are not the same:

What you measure	JS	Go	Rust	Python	When it is wrong
UTF-16 code units	`s.length`	n/a	n/a	n/a	splits astral chars (𠮷 counts as 2)
Code points / runes	`[...s].length`	`utf8.RuneCountInString`	`s.chars().count()`	`len(s)`	ignores width (生 counts as 1)
Grapheme clusters	`Intl.Segmenter`	`rivo/uniseg`	`unicode-segmentation`	`regex \X`	still not columns
Display columns	`string-width`	`mattn/go-runewidth`	`unicode-width`	`wcwidth`	this is the one terminals use

For alignment and truncation in a fixed-width grid you want the last row. Everything above it undercounts wide text.

The bug: budgeting by rune count instead of width

I hit this in go-pretty, the Go table and progress-bar library. Its text.Trim kept maxLen runes regardless of how wide they were. But every caller passes a display-width budget: the table trims a rendered line only after its measured width exceeds WidthMax, and the progress renderer trims to terminal width. So a two-column character was charged as one column, and wide East Asian rows overflowed the box.

The before:

text.Trim("生命生命", 4)          // => "生命生命"   display width 8, budget was 4
text.Snip("生命生命生命", 5, "~") // => "生命生命~"  overflows the 5-column budget

The fix adds RuneWidth per rune and stops once a rune no longer fits, while still copying any trailing escape sequences so color codes stay closed. Twenty-six lines, with wide-character cases added to the existing TestTrim and TestSnip. It merged.

Receipt: jedib0t/go-pretty#410 (merged)

The same bug, one repo over

The reason I call this a pattern and not an incident: the identical mistake sits open in the micro editor. Its command-bar autocomplete scrolls by CharacterCountInString while the renderer draws by runewidth, so a suggestion containing full-width CJK gets pushed partly or fully off screen. Different codebase, the same two functions disagreeing about what a “length” is.

Same pattern: micro-editor/micro#4135 (open) — measures scroll by character count, draws by display width.

If you maintain anything that lays text out in cells, this is the first place to look.

Where display width gets genuinely hard

I don’t want to oversell the fix, because the easy 90% (kanji is 2, ASCII is 1) hides a messy tail where “width” is not even well defined:

Ambiguous width. East Asian Width category A (some symbols, box-drawing, Greek in a CJK context) is one column or two depending on locale and font. go-runewidth exposes an EastAsianWidth flag for exactly this. I filed a related one in Zed, where the block cursor was misaligned over ambiguous-width characters because it used the glyph’s intrinsic width, not the rendered cell width (zed-industries/zed#60017, open).
Combining marks. A base plus a combining mark is two code points, one grapheme, and the width of the base. Width-aware code that iterates code points instead of grapheme clusters can split a dakuten off its kana. That is what tabled did when wrapping table cells (zhiburt/tabled#585, open).
ZWJ emoji. A family emoji is many code points joined by zero-width joiners, one grapheme, usually width 2. Counting code points here is nonsense.
Terminals disagree. Emulators don’t all implement the same width tables, so a “correct” width can still render one column off in some terminal. This part is genuinely underspecified, not a library bug.

So: use a display-width library for the budget, iterate by grapheme cluster when you slice, and accept that ambiguous width has no single right answer.

How to test it in five minutes

Feed the layout one wide string and assert on the rendered columns, not the character count.

Trim and truncate: assert width(trim("生命生命", 4)) <= 4. Rune-count code returns the whole string here.
Alignment and padding: render a table with one ASCII row and one CJK row of the same character count; the borders should still line up.
Wrapping: wrap a string with a combining mark at the boundary and check the mark stays with its base.

None of this needs a Japanese keyboard. 生命 and one width assertion catch the common case.

The full width and normalization set, with repros and the sibling that already did it right, is in the CJK failure corpus. The nearest neighbours are measuring problems too: walking a string by the wrong unit is where a slice cuts a character in half, and a word counter that reads a Japanese paragraph as one word is the same “counted the wrong thing” mistake one level up. When the text is Japanese specifically, a romaji table that drifts one row is the pattern moved into transliteration.

A character can be one code point and two columns wide

What you measure	JS	Go	Rust	Python	When it is wrong
UTF-16 code units	`s.length`	n/a	n/a	n/a	splits astral chars (𠮷 counts as 2)
Code points / runes	`[...s].length`	`utf8.RuneCountInString`	`s.chars().count()`	`len(s)`	ignores width (生 counts as 1)
Grapheme clusters	`Intl.Segmenter`	`rivo/uniseg`	`unicode-segmentation`	`regex \X`	still not columns
Display columns	`string-width`	`mattn/go-runewidth`	`unicode-width`	`wcwidth`	this is the one terminals use

For alignment and truncation in a fixed-width grid you want the last row. Everything above it undercounts wide text.

The bug: budgeting by rune count instead of width

The before:

text.Trim("生命生命", 4)          // => "生命生命"   display width 8, budget was 4
text.Snip("生命生命生命", 5, "~") // => "生命生命~"  overflows the 5-column budget

Receipt: jedib0t/go-pretty#410 (merged)

The same bug, one repo over

Same pattern: micro-editor/micro#4135 (open) — measures scroll by character count, draws by display width.

If you maintain anything that lays text out in cells, this is the first place to look.

Where display width gets genuinely hard

I don’t want to oversell the fix, because the easy 90% (kanji is 2, ASCII is 1) hides a messy tail where “width” is not even well defined:

Ambiguous width. East Asian Width category A (some symbols, box-drawing, Greek in a CJK context) is one column or two depending on locale and font. go-runewidth exposes an EastAsianWidth flag for exactly this. I filed a related one in Zed, where the block cursor was misaligned over ambiguous-width characters because it used the glyph’s intrinsic width, not the rendered cell width (zed-industries/zed#60017, open).
Combining marks. A base plus a combining mark is two code points, one grapheme, and the width of the base. Width-aware code that iterates code points instead of grapheme clusters can split a dakuten off its kana. That is what tabled did when wrapping table cells (zhiburt/tabled#585, open).
ZWJ emoji. A family emoji is many code points joined by zero-width joiners, one grapheme, usually width 2. Counting code points here is nonsense.
Terminals disagree. Emulators don’t all implement the same width tables, so a “correct” width can still render one column off in some terminal. This part is genuinely underspecified, not a library bug.

So: use a display-width library for the budget, iterate by grapheme cluster when you slice, and accept that ambiguous width has no single right answer.

How to test it in five minutes

Feed the layout one wide string and assert on the rendered columns, not the character count.

Trim and truncate: assert width(trim("生命生命", 4)) <= 4. Rune-count code returns the whole string here.
Alignment and padding: render a table with one ASCII row and one CJK row of the same character count; the borders should still line up.
Wrapping: wrap a string with a combining mark at the boundary and check the mark stays with its base.

None of this needs a Japanese keyboard. 生命 and one width assertion catch the common case.

# A character can be one code point and two columns wide

# The bug: budgeting by rune count instead of width

# The same bug, one repo over

# Where display width gets genuinely hard

# How to test it in five minutes

# A character can be one code point and two columns wide

# The bug: budgeting by rune count instead of width

# The same bug, one repo over

# Where display width gets genuinely hard

# How to test it in five minutes

A character can be one code point and two columns wide

The bug: budgeting by rune count instead of width

The same bug, one repo over

Where display width gets genuinely hard

How to test it in five minutes

A character can be one code point and two columns wide

The bug: budgeting by rune count instead of width

The same bug, one repo over

Where display width gets genuinely hard

How to test it in five minutes