How this page breaks Japanese lines
Browsers break Japanese between almost any two characters, so words split in half and punctuation lands at the start of a line. Here is how this site avoids that — phrase segmentation at build time plus two lines of CSS.
Open a Japanese sentence in a narrow column and watch where the browser breaks it. It will happily split 特定商取引法 into 特定商取引 / 法, or push a 。impor / tant would look broken to you.
Most sites ship exactly that. It is the kind of thing you only notice if you read the page in Japanese, which is most of the point of this whole site.
The rule we actually want
Japanese wraps at phrase boundaries — 文節, roughly a content word plus its trailing particles. It also follows 禁則: a closing bracket or a 。
CSS gives you half of it for free:
.prose {
line-break: strict; /* keep 。 、 ) off the start of a line */
word-break: keep-all; /* never break inside a run of characters */
overflow-wrap: break-word;
}line-break: strict handles the kinsoku edge. word-break: keep-all tells the browser to stop breaking between characters at all. But now nothing breaks, and a long sentence overflows the column. We have to hand the browser the break points back — the right ones this time.
Finding the phrases
The break points are the phrase boundaries, and finding them means segmenting Japanese, which is the hard part. I use BudouX, Google’s small phrase model. It turns a sentence into chunks:
import { loadDefaultJapaneseParser } from "budoux";
const parser = loadDefaultJapaneseParser();
parser.parse("特定商取引法の表示ページが無い。");
// → ["特定商取引法の", "表示ページが", "無い。"]Then I join the chunks with <wbr>, the “break here if you must” tag. With word-break: keep-all in force, the browser breaks only at those points:
- <p>特定商取引法の表示ページが無い。</p>
+ <p>特定商取引法の<wbr>表示ページが<wbr>無い。</p>Notice the 。
I run this at build time, not in the browser. A small pass walks the rendered HTML, inserts <wbr> into Japanese text, and skips anything inside <code> or <pre> so code samples are left alone. The model stays on the build machine. The reader downloads a few <wbr> tags and no JavaScript.
Where it stops
BudouX is a model, not a rulebook, so it is about right, not exactly right. It occasionally splits a rare compound in a place a typographer wouldn’t, and it has nothing to say about full justification or 約物 spacing. For body text at a normal measure I have not needed to correct it by hand yet. If I do, I will say so here.
The honest limit is the usual one: this fixes the mechanical part. It cannot tell you the Japanese was worth reading. That is still a human call.
Here it is, live. Drag the width down and watch the left column break words in half while the right one holds:
browser default
budoux — this site