OpenCC (original) (raw)

Open Chinese Convert 開放中文轉換

CMake Bazel MSVC Node.js CI Python CI AppVeyor

latest packaged version(s)

Introduction 介紹

OpenCC

Open Chinese Convert (OpenCC, 開放中文轉換) is an opensource project for conversions between Traditional Chinese, Simplified Chinese and Japanese Kanji (Shinjitai). It supports character-level and phrase-level conversion, character variant conversion and regional idioms among Mainland China, Taiwan and Hong Kong. This is not translation tool between Mandarin and Cantonese, etc.

中文簡繁轉換開源項目,支持詞彙級別的轉換、異體字轉換和地區習慣用詞轉換(中國大陸、台灣、香港、日本新字體)。不提供普通話與粵語的轉換。

Discussion (Telegram): https://t.me/open_chinese_convert

Features 特點

Installation 安裝

Package Managers 包管理器

Prebuilt 預編譯

Usage 使用

Online 線上轉換

https://opencc.js.org/converter?config=s2t

Node.js

npm install opencc

The npm package supports Node.js >=20.17 <26. It uses bundled Node-API prebuilds when available and falls back to a local node-gyp build when the current platform does not have a matching prebuild.

To install the npm CLI:

npm install -g opencc opencc -c s2t.json -i input.txt -o output.txt

The npm CLI supports basic text conversion. Plugins, --inspect, and--segmentation require the native OpenCC CLI.

import { OpenCC } from 'opencc'; async function main() { const converter: OpenCC = new OpenCC('s2t.json'); const result: string = await converter.convertPromise('汉字'); console.log(result); // 漢字 }

See demo.js and ts-demo.ts.

Python

pip install opencc (Windows, Linux, macOS)

import opencc converter = opencc.OpenCC('s2t.json') converter.convert('汉字') # 漢字

C++

#include "opencc.h"

int main() { const opencc::SimpleConverter converter("s2t.json"); converter.Convert("汉字"); // 漢字 return 0; }

Full example with Bazel

C

#include "opencc.h"

int main() { opencc_t opencc = opencc_open("s2t.json"); const char* input = "汉字"; char* converted = opencc_convert_utf8(opencc, input, strlen(input)); // 漢字 opencc_convert_utf8_free(converted); opencc_close(opencc); return 0; }

Full Document 完整文檔

Command Line

Segmentation and Inspection Modes

OpenCC CLI supports two diagnostic modes that output JSON instead of converted text:

--segmentation — Output segmentation result only (no conversion):

echo "他只看了几行日志,就一叶知秋,猜到整个系统是数据库连接池出了问题" | opencc -c s2twp.json --segmentation

{"input":"他只看了几行日志,就一叶知秋,猜到整个系统是数据库连接池出了问题","segments":["他","只看","了几行","日志",",就","一叶知秋",",猜到","整个","系统","是","数据库","连接池","出了","问题"]}

--inspect — Output full inspection result (segmentation + per-stage conversion + final output):

echo "他只看了几行日志,就一叶知秋,猜到整个系统是数据库连接池出了问题" | opencc -c s2twp.json --inspect

{"input":"他只看了几行日志,就一叶知秋,猜到整个系统是数据库连接池出了问题","segments":["他","只看","了几行","日志",",就","一叶知秋",",猜到","整个","系统","是","数据库","连接池","出了","问题"],"stages":[{"index":1,"segments":["他","只看","了幾行","日誌",",就","一葉知秋",",猜到","整個","系統","是","數據庫","連接池","出了","問題"]},{"index":2,"segments":["他","只看","了幾行","日誌",",就","一葉知秋",",猜到","整個","系統","是","資料庫","連線池","出了","問題"]},{"index":3,"segments":["他","只看","了幾行","日誌",",就","一葉知秋",",猜到","整個","系統","是","資料庫","連線池","出了","問題"]}],"output":"他只看了幾行日誌,就一葉知秋,猜到整個系統是資料庫連線池出了問題"}

Pretty-print with jq:

echo "他只看了几行日志,就一叶知秋,猜到整个系统是数据库连接池出了问题" | opencc -c s2twp.json --inspect | jq .

These modes are useful for diagnosing conversion issues:

  1. Use --segmentation to verify that the input is segmented as expected.
  2. Use --inspect to see which conversion stage produces an unexpected result.

Rules:

Other Ports (Unofficial)

Configurations 配置文件

預設配置文件

指定配置文件

通过环境变量OPENCC_DATA_DIR加载指定路径下的配置文件

OPENCC_DATA_DIR=/path/to/your/config/dir opencc --help

Experimental Plugins 試驗性插件

OpenCC 現已支援外部 C++ 分詞插件。當前第一個插件為 opencc-jieba, 可通過 s2t_jieba.jsons2tw_jieba.jsons2hk_jieba.jsons2twp_jieba.jsontw2sp_jieba.json 等插件配置啓用。

OpenCC now supports external C++ segmentation plugins. The first plugin isopencc-jieba, which can be enabled through plugin-backed configs such ass2t_jieba.json, s2tw_jieba.json, s2hk_jieba.json,s2twp_jieba.json, and tw2sp_jieba.json.

注意:

Notes:

Build 編譯

Build with CMake

Linux & macOS

g++ 4.6+ or clang 3.2+ is required.

make

Windows Visual Studio:

build.cmd

Build with Bazel

bazel build //:opencc

Test 測試

Linux & macOS

make test

Windows Visual Studio:

test.cmd

Test with Bazel

bazel test --test_output=all //src/... //data/... //python/... //test/...

Benchmark 基準測試

make benchmark

Example results (from Github CI, commit ID 9e80d5d, 2026-04-16, CMake macos-latest):

-------------------------------------------------------------------------
Benchmark                               Time             CPU   Iterations
-------------------------------------------------------------------------
BM_Initialization/hk2s                868 us          868 us          665
BM_Initialization/hk2t                139 us          139 us         5059
BM_Initialization/jp2t                203 us          203 us         3448
BM_Initialization/s2hk              26201 us        26200 us           27
BM_Initialization/s2t               26385 us        26382 us           27
BM_Initialization/s2tw              27108 us        27108 us           27
BM_Initialization/s2twp             26446 us        26445 us           25
BM_Initialization/s2twp_jieba      142754 us       141974 us            5
BM_Initialization/t2hk               66.7 us         66.7 us        10519
BM_Initialization/t2jp                166 us          166 us         4215
BM_Initialization/t2s                 797 us          797 us          883
BM_Initialization/t2tw               58.1 us         58.1 us        12075
BM_Initialization/tw2s                845 us          845 us          831
BM_Initialization/tw2sp              1004 us         1004 us          697
BM_Initialization/tw2t               93.3 us         93.3 us         7492
BM_ConvertLongText/s2t                327 ms          327 ms            2 bytes_per_second=5.45069M/s
BM_ConvertLongText/s2twp              554 ms          554 ms            1 bytes_per_second=3.21299M/s
BM_ConvertLongText/s2twp_jieba        742 ms          741 ms            1 bytes_per_second=2.40096M/s
BM_Convert/s2t_100                  0.649 ms        0.649 ms         1083 bytes_per_second=6.15628M/s
BM_Convert/s2t_1000                  6.64 ms         6.64 ms          106 bytes_per_second=6.16118M/s
BM_Convert/s2t_10000                 68.1 ms         68.1 ms           10 bytes_per_second=6.14608M/s
BM_Convert/s2t_100000                 718 ms          717 ms            1 bytes_per_second=5.96785M/s
BM_Convert/s2twp_100                 1.20 ms         1.20 ms          552 bytes_per_second=3.32407M/s
BM_Convert/s2twp_1000                12.3 ms         12.3 ms           57 bytes_per_second=3.32311M/s
BM_Convert/s2twp_10000                126 ms          126 ms            6 bytes_per_second=3.31205M/s
BM_Convert/s2twp_100000              1296 ms         1296 ms            1 bytes_per_second=3.3027M/s
BM_Convert/s2twp_jieba_100           1.51 ms         1.49 ms          495 bytes_per_second=2.67698M/s
BM_Convert/s2twp_jieba_1000          15.0 ms         15.0 ms           48 bytes_per_second=2.72292M/s
BM_Convert/s2twp_jieba_10000          153 ms          153 ms            5 bytes_per_second=2.73681M/s
BM_Convert/s2twp_jieba_100000        1728 ms         1728 ms            1 bytes_per_second=2.47784M/s

Projects using OpenCC 使用 OpenCC 的項目

Please update if your project is using OpenCC.

License 許可協議

Apache License 2.0

Third Party Library 第三方庫

Change History 版本歷史

Contributors 貢獻者

Please feel free to update this list if you have contributed OpenCC.