logologo

Match Chinese Characters

Feb 25, 2021 · 2min

When you need to detect if a string contains Chinese characters, you would commonly think about doing it will RegExp, or grab a ready-to-use package on npm.

If you Google it, you are likely end up with this solution:

/[\u4E00-\u9FCC\u3400-\u4DB5\uFA0E\uFA0F\uFA11\uFA13\uFA14\uFA1F\uFA21\uFA23\uFA24\uFA27-\uFA29]|[\uD840-\uD868][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|[\uD86A-\uD86C][\uDC00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D]/
/[\u4E00-\u9FCC\u3400-\u4DB5\uFA0E\uFA0F\uFA11\uFA13\uFA14\uFA1F\uFA21\uFA23\uFA24\uFA27-\uFA29]|[\uD840-\uD868][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|[\uD86A-\uD86C][\uDC00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D]/

It works, but a bit dirty. Fortunately, I found a much simpler solution today:

/\p{Script=Han}/u
/\p{Script=Han}/u
!!'你好'.match(/\p{Script=Han}/u) // true
!!'你好'.match(/\p{Script=Han}/u) // true

It’s called Unicode property escapes and already available in Chrome 64, Firefox 79, Safari 11.1 and Node.js 10.

All available scripts here.

CC BY-NC-SA 4.0 2021-PRESENT © Anthony Fu