GitHub - BYVoid/OpenCC: Library for conversion between Traditional and Simplified Chinese (original) (raw)
Open Chinese Convert 開放中文轉換
Introduction 介紹
Open Chinese Convert (OpenCC, 開放中文轉換) is an open source project for conversions between Traditional Chinese, Simplified Chinese and Japanese Kanji (Shinjitai). It supports character-level and phrase-level conversion, character variant handling, and regional vocabulary variants across Mainland China, Taiwan and Hong Kong. This is not a translation tool between Mandarin and Cantonese, etc.
中文簡繁轉換開源項目,支持詞彙級別的轉換、異體字轉換和地區習慣用詞轉換(中國大陸、台灣、香港)及日本新字體轉換。不提供普通話與粵語之間的轉換。
Discussion (Telegram): https://t.me/open_chinese_convert
Features 特點
- 嚴格區分「一簡對多繁」和「一簡對多異」。
- 完全兼容異體字,可以實現動態替換。
- 嚴格審校一簡對多繁詞條,原則爲「能分則不合」。
- 支持中國大陸、台灣、香港異體字和地區習慣用詞轉換,如「裏」「裡」、「鼠標」「滑鼠」。
- 詞庫和函數庫完全分離,可以自由修改、導入、擴展。
詳情參閱OpenCC 設計思想及地區詞收錄標準。
Installation 安裝
Package Managers 包管理器
- Debian
- Ubuntu
- Fedora
- Arch Linux
- macOS (Homebrew)
- WinGet
- 使用
winget install opencc命令可直接安裝 opencc.exe 應用程式,含 Jieba 分詞插件
- 使用
- Bazel
- Node.js
- 使用
npm install -g opencc命令可安裝 OpenCC Node.js CLI - 使用
npm install -g opencc opencc-jieba命令可同時安裝 OpenCC Node.js CLI 及 Jieba 分詞插件
- 使用
- Python
- More (Repology)
Prebuilt binaries 預編譯二進位檔
- Windows (x86_64): OpenCC-1.3.1 (SHA-256)
- This Windows release is available from WinGet. For details, see doc/windows-winget-release.md.
- Requires Microsoft Visual C++ Redistributable for Visual Studio 2015-2026. Download the latest version from Microsoft.
- Debian/Ubuntu (amd64):
Usage 使用
Online 線上轉換
https://opencc.js.org/converter?config=s2t
Node.js
npm install opencc
The npm package supports Node.js >=20.17. It uses bundled Node-API prebuilds when available and falls back to a local node-gyp build when the current platform does not have a matching prebuild.
To install the npm CLI:
npm install -g opencc opencc -c s2t.json -i input.txt -o output.txt
The npm CLI supports basic text conversion. Plugins, --inspect, and--segmentation require the native OpenCC CLI.
import { OpenCC } from 'opencc'; async function main() { const converter: OpenCC = new OpenCC('s2t.json'); const result: string = await converter.convertPromise('汉字'); console.log(result); // 漢字 }
See demo.js and ts-demo.ts.
Python
pip install opencc (Windows, Linux, macOS)
import opencc converter = opencc.OpenCC('s2t.json') converter.convert('汉字') # 漢字
C++
#include "opencc.h"
int main() { const opencc::SimpleConverter converter("s2t.json"); converter.Convert("汉字"); // 漢字 return 0; }
When OpenCC is embedded in a server binary or self-contained application, the JSON config can stay small while dictionary resources are loaded from explicit resource directories:
#include #include
#include "SimpleConverter.hpp"
int main() { auto resources = std::make_sharedopencc::FilesystemResourceProvider( std::vectorstd::string{ "/opt/my-app/opencc", "/opt/my-app/plugins/opencc-jieba", "/usr/share/opencc", }); const opencc::SimpleConverter converter("s2t.json", resources); converter.Convert("汉字"); return 0; }
FilesystemResourceProvider searches directories in order. ExistingSimpleConverter("s2t.json") and CLI behavior continue to use the config file location, current directory, explicit paths, and installed OpenCC data directory as before.
C
#include "opencc.h"
int main() { opencc_t opencc = opencc_open("s2t.json"); const char* input = "汉字"; char* converted = opencc_convert_utf8(opencc, input, strlen(input)); // 漢字 opencc_convert_utf8_free(converted); opencc_close(opencc); return 0; }
Command Line
opencc --helpopencc_dict --help
Segmentation and Inspection Modes
OpenCC CLI supports two diagnostic modes that output JSON instead of converted text:
--segmentation — Output segmentation result only (no conversion):
echo "他只看了几行日志,就一叶知秋,猜到整个系统是数据库连接池出了问题" | opencc -c s2twp.json --segmentation
{"input":"他只看了几行日志,就一叶知秋,猜到整个系统是数据库连接池出了问题","segments":["他","只看","了几行","日志",",就","一叶知秋",",猜到","整个","系统","是","数据库","连接池","出了","问题"]}
--inspect — Output full inspection result (segmentation + per-stage conversion + final output):
echo "他只看了几行日志,就一叶知秋,猜到整个系统是数据库连接池出了问题" | opencc -c s2twp.json --inspect
{"input":"他只看了几行日志,就一叶知秋,猜到整个系统是数据库连接池出了问题","segments":["他","只看","了几行","日志",",就","一叶知秋",",猜到","整个","系统","是","数据库","连接池","出了","问题"],"stages":[{"index":1,"segments":["他","只看","了幾行","日誌",",就","一葉知秋",",猜到","整個","系統","是","數據庫","連接池","出了","問題"]},{"index":2,"segments":["他","只看","了幾行","日誌",",就","一葉知秋",",猜到","整個","系統","是","資料庫","連線池","出了","問題"]},{"index":3,"segments":["他","只看","了幾行","日誌",",就","一葉知秋",",猜到","整個","系統","是","資料庫","連線池","出了","問題"]}],"output":"他只看了幾行日誌,就一葉知秋,猜到整個系統是資料庫連線池出了問題"}
Pretty-print with jq:
echo "他只看了几行日志,就一叶知秋,猜到整个系统是数据库连接池出了问题" | opencc -c s2twp.json --inspect | jq .
These modes are useful for diagnosing conversion issues:
- Use
--segmentationto verify that the input is segmented as expected. - Use
--inspectto see which conversion stage produces an unexpected result.
Rules:
--segmentationand--inspectare mutually exclusive.
Other Ports (Unofficial)
- Swift (iOS): SwiftyOpenCC
- iOSOpenCC (pod): iOSOpenCC
- Java: opencc4j
- Android: android-opencc
- PHP: opencc4php
- Pure JavaScript: opencc-js
- See notes about different OpenCC NPM packages below.
- WebAssembly:
- Browser Extension: opencc-extension
- Go (Pure): OpenCC for Go
- Dart (native-assets): opencc-dart
Configurations 配置文件
預設配置文件
s2t.jsonSimplified Chinese to Traditional Chinese (OpenCC Standard) / 簡體 到 OpenCC 標準繁體t2s.jsonTraditional Chinese (OpenCC Standard) to Simplified Chinese / OpenCC 標準繁體 到 簡體s2tw.jsonSimplified Chinese to Traditional Chinese (Taiwan Standard) / 簡體 到 台灣正體tw2s.jsonTraditional Chinese (Taiwan Standard) to Simplified Chinese / 台灣正體 到 簡體s2hk.jsonSimplified Chinese to Traditional Chinese (Hong Kong variant) / 簡體 到 香港繁體hk2s.jsonTraditional Chinese (Hong Kong variant) to Simplified Chinese / 香港繁體 到 簡體s2twp.jsonSimplified Chinese to Traditional Chinese (Taiwan Standard, with Taiwan Phrases) / 簡體 到 台灣正體(含台灣常用詞彙)tw2sp.jsonTraditional Chinese (Taiwan Standard) to Simplified Chinese (Mainland China Phrases) / 台灣正體 到 簡體(含中國大陸常用詞彙)t2tw.jsonTraditional Chinese (OpenCC Standard) to Traditional Chinese (Taiwan Standard) / OpenCC 標準繁體 到 台灣正體tw2t.jsonTraditional Chinese (Taiwan Standard) to Traditional Chinese (OpenCC Standard) / 台灣正體 到 OpenCC 標準繁體t2hk.jsonTraditional Chinese (OpenCC Standard) to Traditional Chinese (Hong Kong variant) / OpenCC 標準繁體 到 香港繁體hk2t.jsonTraditional Chinese (Hong Kong variant) to Traditional Chinese (OpenCC Standard) / 香港繁體 到 OpenCC 標準繁體
下列配置文件仍在開發中,歡迎貢獻新詞組:
s2hkp.jsonSimplified Chinese to Traditional Chinese (Hong Kong variant, with Hong Kong Phrases) / 簡體 到 香港繁體(香港常用詞彙)hk2sp.jsonTraditional Chinese (Hong Kong variant) to Simplified Chinese (Mainland China Phrases) / 香港繁體 到 簡體(含中國大陸常用詞彙)
下列配置文件僅供探索性研究,不建議用於生產環境:
t2jp.jsonOld Japanese Kanji (Kyūjitai) to New Japanese Kanji (Shinjitai) / 日文舊字體 到 日文新字體jp2t.jsonNew Japanese Kanji (Shinjitai) to Old Japanese Kanji (Kyūjitai) / 日文新字體 到 日文舊字體,並將少量日文詞組轉換爲對應中文
指定配置文件
通过环境变量OPENCC_DATA_DIR加载指定路径下的配置文件
OPENCC_DATA_DIR=/path/to/your/config/dir opencc --help
內聯字典(inline dictionary)
配置檔中的字典可使用 type: "inline",直接在 JSON 裡定義小型自訂詞彙, 不必修改外部字典檔。例如在 group.dicts 最前面加入覆寫規則:
{ "conversion_chain": [ { "dict": { "type": "group", "dicts": [ { "type": "inline", "entries": { "麦旋风": "冰炫風", "服务器": "伺服器" } }, { "type": "ocd2", "file": "STPhrases.ocd2" }, { "type": "ocd2", "file": "STCharacters.ocd2" } ] } } ] }
規則與限制:
entries必須是 JSON 物件。entries的 key/value 必須是非空字串。- 重複 key 不受支援;如包含,載入會直接失敗(拋出錯誤)。
- key/value 會按解析結果原樣使用,不做 trim、大小寫折疊或 Unicode normalization。
- 內聯字典與普通字典行為一致,優先級由
group.dicts的順序決定。 - 內聯字典輸出仍會繼續經過後續
conversion_chain步驟,不提供鎖定最終輸出。
備註:OpenCC 1.3.2+ 解析器支援有限 JSONC 語法(//、/* */ 註解與尾逗號)。 若需跨實作相容,建議使用嚴格 JSON,不依賴 JSONC 擴充。
更多完整示例可見 examples/config/。該目錄僅供學習與自訂參考,不屬於官方內建 配置列表。
Experimental Plugins 試驗性插件
OpenCC 現已支援外部 C++ 分詞插件。當前第一個插件為 opencc-jieba, 可通過 s2t_jieba.json、s2tw_jieba.json、s2hk_jieba.json、s2twp_jieba.json、tw2sp_jieba.json 等插件配置啓用。
OpenCC now supports external C++ segmentation plugins. The first plugin isopencc-jieba, which can be enabled through plugin-backed configs such ass2t_jieba.json, s2tw_jieba.json, s2hk_jieba.json,s2twp_jieba.json, and tw2sp_jieba.json.
注意:
- 該插件機制目前仍為試驗性功能。
jieba插件是可選組件,預設 OpenCC 構建、Python 套件和 Node.js 套件都不要求它。opencc-jieba額外依賴cppjieba及其配套詞典資源,這些依賴僅在構建或分發該插件時需要。- 在下一次正式發布版本之前,插件 ABI 仍可能發生變化,不應視為穩定介面。
- 我們預計從下一次正式發布版本開始,將插件 ABI 視為穩定介面。
- Windows 下插件必須與宿主 OpenCC 二進位使用 ABI 相容的工具鏈/執行時構建;MSVC 與 MinGW 產物不支援混用。
Notes:
- The plugin mechanism is currently experimental.
- The
jiebaplugin is optional and is not required for the default OpenCC build, Python package, or Node.js package. opencc-jiebaadditionally depends oncppjiebaand its dictionary resources. These dependencies are only needed when building or distributing the plugin itself.- The plugin ABI may still change before the next formal OpenCC release and should not yet be treated as stable.
- We expect to treat the plugin ABI as stable starting with the next formal OpenCC release.
- On Windows, plugins must be built with an ABI-compatible toolchain/runtime as the host OpenCC binary. Mixing MSVC-built hosts with MinGW-built plugins, or the reverse, is unsupported.
Build 編譯
Build with CMake
Linux & macOS
g++ 4.6+ or clang 3.2+ is required.
Windows Visual Studio:
Build with Bazel
Test 測試
Linux & macOS
Windows Visual Studio:
Test with Bazel
bazel test --test_output=all //src/... //data/... //python/... //test/...
Benchmark 基準測試
詳情見 doc/benchmark.md 檔案。
Projects using OpenCC 使用 OpenCC 的項目
Please update if your project is using OpenCC.
- ibus-pinyin
- fcitx
- rimeime
- libgooglepinyin
- ibus-libpinyin
- alfred-chinese-converter
- GoldenDict
- China Biographical Database Project (CBDB)
License 許可協議
Apache License 2.0
Third Party Libraries 第三方庫
- darts-clone BSD License
- marisa-trie BSD License
- tclap MIT License
- rapidjson MIT License
- Google Test BSD License
- cppjieba MIT License
- Optional dependency used by the experimental
opencc-jiebaplugin. - 試驗性
opencc-jieba插件使用的可選依賴。
- Optional dependency used by the experimental
Change History 版本歷史
Links 相關連結
- Publications Using OpenCC - 近年來使用了 OpenCC 的研究論文選錄
- 現代漢語常用繁簡轉換匹配辨析表
- 關於 opencc, opencc-js 与 opencc-wasm 三个 NPM packages 區別的說明https://github.com/nk2028/opencc-js/blob/HEAD/README-zh-TW.md#%E8%88%87-opencc-npm-package-%E7%9A%84%E5%8D%80%E5%88%A5
Contributors 貢獻者
- BYVoid
- 佛振
- Peng Huang
- LI Daobing
- Kefu Chai
- Kan-Ru Chen
- Ma Xiaojun
- Jiang Jiang
- Ruey-Cheng Chen
- Paul Meng
- Lawrence Lau
- 瑾昀
- 內木一郎
- Marguerite Su
- Brian White
- Qijiang Fan
- LEOYoon-Tsaw
- Steven Yao
- Pellaeon Lin
- stony
- steelywing
- 吕旭东
- Weng Xuetian
- Ma Tao
- Heinz Wiesinger
- J.W
- Amo Wu
- Mark Tsai
- Zhe Wang
- sgqy
- Qichuan (Sean) ZHANG
- Flandre Scarlet
- 宋辰文
- iwater
- Xpol Wan
- Weihang Lo
- Cychih
- kyleskimo
- Ryuan Choi
- Prcuvu
- Tony Able
- Xiao Liang
- Frank Lin
Please feel free to update this list if you have contributed OpenCC.
