RFC: Docusaurus v2 i18n · Issue #3317 · facebook/docusaurus (original) (raw)
Docusaurus v2 i18n
Here is a brain dump of many things to consider for i18n support in v2.
I'll keep this issue updated over time, but feel free to comment if you have anything to say, particularly if you used v1 i18n support and can provide valuable feedback.
Superseed this older issue (that still have interesting content): #2651
Existing translation systems
Links to get inspiration from.
Git fork based translations
Have an upstream repo (often in English), and one fork per language
A translation strategy first seen on Vue translation: each language creates a git fork.
We can build tooling on top of that, so that a translation change made in the upstream repo can trigger new PRs on forked repos, to automate the process and ensure translations stay in sync.
Pros:
- work fine, as seen on ReactJS and VueJs docs
- stay in sync with upstream repo
- one repo per lang permit per-language git permissions
- developers don't like to contribute through another new saas tool that they have to learn
- developers like their contributions to appear on github graph etc...
Cons:
- developer-centric workflow
- does it work for small communities?
- maintaining many forks can be overwhelming, can we find an owner for each fork?
- need infrastructure to run the sync bots and open the PRs
- no translation automation tools
ReactJS case
Links related to the work of Nat Alison.
Contains some interesting notes on why a SaaS like Crowdin was not a good fit, despite an attempt to use it.
https://reactjs.org/blog/2019/02/23/is-react-translated-yet.html
reactjs/react.dev#1605
https://github.com/reactjs/reactjs.org-translation
https://github.com/reactjs/reactjs.org-translation/blob/master/PROGRESS.template.md
facebook/react#8063
reactjs/react.dev#82
reactjs/react.dev#873
GatsbyJS case
Another translation RFC from Nat Alison, quite close to her work on ReactJS:
https://github.com/gatsbyjs/rfcs/blob/master/text/0010-gatsby-docs-localization.md
I don't think this work is in production.
Also some interesting bits on this thread where she explains her unfortunate situation working at Gatsby.
Git, single repo
You have a repo and you just have a folder per language.
Pros:
- simple approach, easy to contribute
- still git-based, developers will like
- single repo
Cons:
- developer-centric
- difficulty to have per language git permissions
- how to stay in sync with "upstream language"
- scalability, can be overwhelming to have many PRs large sites
Nuxt case
The nuxt doc is a simple repo with language folders.
It works fine, but the author told me it was hard to keep all languages in sync. Looks like a manual process.
TypeScript case
Quite similar, TS website has one languages folder per package: <packageName>/copy/<lang>
and the translations are handled on the same github monorepo, but split by package
microsoft/TypeScript-Website#100
microsoft/TypeScript-Website#181
Note: Orta found a way to solve the per-language permission problem, as he created a bot so that code owners can self merge through a github PR comment despite not having git permissions:
microsoft/TypeScript-Website#130 (comment)
https://github.com/orta/code-owner-self-merge
Additional notes
I think it's possible to handle the "sync with upstream" problem inside a mono repo by using git patch.
- generate a patch on upstream docs:
git diff origin/master HEAD~100 -- ./website/docs
- apply that patch to each translated language (maybe open one PR per language?)
https://stackoverflow.com/questions/9939952/create-a-patch-including-specific-files-in-git
It's a way to emulate the upstream repo -> language forks pattern
SaaS
Using a SaaS like Crowdin / Transifex or others has benefits, like the ability to have advanced translation features (UI, editors supporting various formats (PO, Markdown, ICU key/values), translation memory, automatically pay for platform translators, track translation progress, sync with upstream language, version management...)
Pros:
- Advanced translation features
- Non-developers can use it
- Docusaurus v1 use Crowdin already
Cons:
- Proprietary
- Often paid services
- Need custom scripts to interface with Docusaurus system
- UI not always easy to use
- Developers don't like it much
- Developers contributions are not "visible", lack of incentives
Crowdin
Solution suggested by Docusaurus 1, free plan for open-source, used by Docusaurus site v1, Jest, Yarn, Electron...
We should rather try to make it easy to migrate from v1.
Not everybody like this solution however.
Some drawbacks mentioned here:
https://github.com/gatsbyjs/rfcs/blob/master/text/0010-gatsby-docs-localization.md#saas-platform-crowdin
Note: some questions I have asked to Crowdin here: https://gist.github.com/slorber/30643299196c7efa77084eec10c1c609
Other SaaS
???
Docusaurus 2 translation system
Unlike presented use-cases, we are a framework, not a site, and we don't serve a single community.
I think we want to be able to support both the developers and non-developers.
We can't expect all Docusaurus translators to be developers, nor git users, yet we know that developers don't necessarily always like the lock-in to a SaaS like Crowdin.
Translation management
I think the translation system should be file-system based, as it's probably the common abstraction between git-based workflows and saas-based workflows
Basically, if you build your site for the fr
language, and if you have i18n/fr/docs/myDoc.md
, then it should be used for the french page instead of the file at docs/myDoc.md
.
I think ./i18n
is a good default path to put the translated content, but the paths of such system should be flexible enough so that you can adopt the workflow of your choice, but I thin
- git single repo: you can directly version control the content of
./i18n
- git multiple forks: maybe you'll just fork the main website, but you could as well fork only the content and use git submodules or whatever...
- saas: each saas will require integration scripts to upload/download the translations from/to the correct paths
So, the first step is to support the first case where you just put the translations in a folder of your site. I'm going to experiment with this on Docusaurus 2 website and try to see if I can provide a french translation.
It's unlikely we'll be able to provide integrations with all the existing translation SaaS, but a 2nd step would be to write integration scripts with Crowdin, so that v1 users can keep using it.
Translation runtime lib
It's likely we'll try to use FBT, a translation tool from Facebook.
I have personally a good experience with React-intl as well and prefer it over many react alternatives.
Translated URLs
Supposing en
is the "main" language.
Does https://myDomain.com/en/myDoc
exist?
What should be the behavior of the site if the URL does not contain a language, like https://myDomain.com/myDoc
? Is it the English language? Or do we add code to redirect to the most suitable language?
Is it ok for SEO to have a homepage that just redirects? Or is the homepage english? Then which page is the canonical one?
Note: v1 redirects docs, but not the homepage: https://docusaurus.io/ & https://docusaurus.io/docs/installation
Interesting comment (point 5): #2651 (comment)
Let's not forget to add the proper page meta tags such as:
<html lang="en">
<link rel="alternate" href="https://myDomain.com/fr/myDoc" hrefLang="fr-FR"/>
See also #2471
(I think if we have this header in pages, it's not needed to add it in sitemaps)
Translated URL schemes
There are multiple ways to handle the URLs of translated pages
https://fr.myDomain.com/myDoc
Using a custom subdomain seems not a very good fit, as it would require one separate deployment per lang (or you'd need to have some custom reverse proxy logic to handle that?).
I don't think this is the workflow we'll encourage, but we could still support this if people really want it. Maybe with an option like docusaurus build --fr
, so that it builds a single language site.
Note: this can't be done on simple hosting solutions like Github Pages
https://myDomain.com/fr/myDoc
I think having a path language prefix is a simpler option, and can be easily done with a single deployment.
There's still a choice to be made here:
- 1: Should we build all languages as a single SPA website?
- 2: Should we build one SPA per language, and append the language prefix as
baseUrl
?
Both solutions has cons:
- 1 means the PWA plugin will download all the content of all the languages for offline support (quite overkill)
- 1 means the SPA routes file and other site globals will grow very large and can be a performance problem
- 2 means we'll have to be careful about how the "main sitemap" will have to link to language specific sitemaps
For now I think 2 is a better solution
Details and problems to consider
1 SPA per language, dev experience?
As we have seen above, it may be a good idea for performance to split the site into multiple smaller SPAs.
But this also means that we'll build the SPAs independently, but what would be the dev experience if you run docusaurus start
?
Do we code something completely different in dev so that the routes of all languages are accessible as a single SPA? Do we instead provide a docusaurus start --lang fr
to only run the "french SPA"? I think it's an acceptable tradeoff and have some advantages, but can also be annoying for some users.
Anchor links
Auto-generated ids are a problem for anchor links.
As a translator change a heading of some translated markdown file, the id changes, and links from other files do change as well. We should provide an easy way to make the anchors stable across translations
reactjs/react.dev#1605 (comment)
reactjs/react.dev#1605 (comment)
ethereum/ethereum-org-website#272
https://github.com/reactjs/reactjs.org/pull/1636/files
mdx-js/mdx#810
Right-to-Left support
Support RTL in themes?
Plugin integration
TODO
Doc edit button
If the user is browsing a french doc, and press "edit", he should rather open the correct URL (git or crowdin), so we should make this configurable.
Related:
#648
Default language
We should not assume english will be the default language, like in v1.
Scalability
The build time mostly depends on 3 factors:
- number of docs
- number of versions
- number of languages
To decrease build time and make it sustainable, you can remove older versions from the SPA part, and make them available as a standalone, single version deployment.
We'll work on a cli feature to "archive" older versions more easily: #3286
Fallback
A missing page/translation should be allowed, in such case we'd fallback to the default language and could show a warning
See 6: #2651 (comment)
Creating a language
We need a cli to init a language folder based on current language/versions
Creating a version
See proposal here: #2651 (comment)
We'll have to snapshot each localized folder too
Asset colocation
It's possible to colocate assets close to the docs. Somehow it permits to use a different image per version. What's the story for i18n? This colocated image would likely end up being copied in the language folders too, so it might be duplicated on multiple axis (version/lang). Is it a good thing? At the same time, if an image contains text, that text could be translated differently so it still makes sense...
Slugs
Should we allow to create custom slugs per language?
If we do that, to be able to switch from one lang to the other without loosing context (the doc you are currently reading), one version would have to be aware of the slugs of all the other language versions, which might be quite a lot of data. How do we access such data in a performant way?
To me, it does not look so critical to be able to switch language and preserving context. If the user wants to browse docs in french, he can go through the french home and browse from there, and it's likely google gives him the docs in the correct language in the first place.
We should try to find a solution though, but this can probably be done later, with some code that would, on language switch request, read some json file emitted by the other language, and then obtain a mapping from document id to slug of the other language.
Note: Yarn 1/classic (Jekyll based?) can switch language and preserve context when doing so, but the slugs are not localized: https://classic.yarnpkg.com/es-ES/docs/usage
Translation mode
If you add the ?translate=true
querystring, it could enhance the UI so that we add in-place translation features.
It could be possible to integrate with the translation API of a SaaS like crowdin.
This is mostly for key/value translations, as markdown docs will be translated as a whole and there's already the editUrl
on the docs plugin.
TODO ...
Ongoing PR: #3325