RFC: Docusaurus v2 i18n · Issue #3317 · facebook/docusaurus (original) (raw)

Docusaurus v2 i18n

Here is a brain dump of many things to consider for i18n support in v2.

I'll keep this issue updated over time, but feel free to comment if you have anything to say, particularly if you used v1 i18n support and can provide valuable feedback.

Superseed this older issue (that still have interesting content): #2651


Existing translation systems

Links to get inspiration from.

Git fork based translations

Have an upstream repo (often in English), and one fork per language

A translation strategy first seen on Vue translation: each language creates a git fork.

We can build tooling on top of that, so that a translation change made in the upstream repo can trigger new PRs on forked repos, to automate the process and ensure translations stay in sync.

Pros:

Cons:

ReactJS case

Links related to the work of Nat Alison.

Contains some interesting notes on why a SaaS like Crowdin was not a good fit, despite an attempt to use it.

https://reactjs.org/blog/2019/02/23/is-react-translated-yet.html
reactjs/react.dev#1605
https://github.com/reactjs/reactjs.org-translation
https://github.com/reactjs/reactjs.org-translation/blob/master/PROGRESS.template.md
facebook/react#8063
reactjs/react.dev#82
reactjs/react.dev#873

GatsbyJS case

Another translation RFC from Nat Alison, quite close to her work on ReactJS:
https://github.com/gatsbyjs/rfcs/blob/master/text/0010-gatsby-docs-localization.md

I don't think this work is in production.

Also some interesting bits on this thread where she explains her unfortunate situation working at Gatsby.

Git, single repo

You have a repo and you just have a folder per language.

Pros:

Cons:

Nuxt case

The nuxt doc is a simple repo with language folders.
It works fine, but the author told me it was hard to keep all languages in sync. Looks like a manual process.

TypeScript case

Quite similar, TS website has one languages folder per package: <packageName>/copy/<lang> and the translations are handled on the same github monorepo, but split by package

microsoft/TypeScript-Website#100
microsoft/TypeScript-Website#181

Note: Orta found a way to solve the per-language permission problem, as he created a bot so that code owners can self merge through a github PR comment despite not having git permissions:

microsoft/TypeScript-Website#130 (comment)
https://github.com/orta/code-owner-self-merge

Additional notes

I think it's possible to handle the "sync with upstream" problem inside a mono repo by using git patch.

https://stackoverflow.com/questions/9939952/create-a-patch-including-specific-files-in-git

It's a way to emulate the upstream repo -> language forks pattern

SaaS

Using a SaaS like Crowdin / Transifex or others has benefits, like the ability to have advanced translation features (UI, editors supporting various formats (PO, Markdown, ICU key/values), translation memory, automatically pay for platform translators, track translation progress, sync with upstream language, version management...)

Pros:

Cons:

Crowdin

Solution suggested by Docusaurus 1, free plan for open-source, used by Docusaurus site v1, Jest, Yarn, Electron...

We should rather try to make it easy to migrate from v1.

Not everybody like this solution however.

Some drawbacks mentioned here:
https://github.com/gatsbyjs/rfcs/blob/master/text/0010-gatsby-docs-localization.md#saas-platform-crowdin

Note: some questions I have asked to Crowdin here: https://gist.github.com/slorber/30643299196c7efa77084eec10c1c609

Other SaaS

???


Docusaurus 2 translation system

Unlike presented use-cases, we are a framework, not a site, and we don't serve a single community.

I think we want to be able to support both the developers and non-developers.

We can't expect all Docusaurus translators to be developers, nor git users, yet we know that developers don't necessarily always like the lock-in to a SaaS like Crowdin.

Translation management

I think the translation system should be file-system based, as it's probably the common abstraction between git-based workflows and saas-based workflows

Basically, if you build your site for the fr language, and if you have i18n/fr/docs/myDoc.md, then it should be used for the french page instead of the file at docs/myDoc.md.

I think ./i18n is a good default path to put the translated content, but the paths of such system should be flexible enough so that you can adopt the workflow of your choice, but I thin

So, the first step is to support the first case where you just put the translations in a folder of your site. I'm going to experiment with this on Docusaurus 2 website and try to see if I can provide a french translation.

It's unlikely we'll be able to provide integrations with all the existing translation SaaS, but a 2nd step would be to write integration scripts with Crowdin, so that v1 users can keep using it.

Translation runtime lib

It's likely we'll try to use FBT, a translation tool from Facebook.

I have personally a good experience with React-intl as well and prefer it over many react alternatives.

Translated URLs

Supposing en is the "main" language.

Does https://myDomain.com/en/myDoc exist?

What should be the behavior of the site if the URL does not contain a language, like https://myDomain.com/myDoc ? Is it the English language? Or do we add code to redirect to the most suitable language?

Is it ok for SEO to have a homepage that just redirects? Or is the homepage english? Then which page is the canonical one?

Note: v1 redirects docs, but not the homepage: https://docusaurus.io/ & https://docusaurus.io/docs/installation

Interesting comment (point 5): #2651 (comment)

Let's not forget to add the proper page meta tags such as:

<html lang="en">
<link rel="alternate" href="https://myDomain.com/fr/myDoc" hrefLang="fr-FR"/>

See also #2471

(I think if we have this header in pages, it's not needed to add it in sitemaps)

Translated URL schemes

There are multiple ways to handle the URLs of translated pages

https://fr.myDomain.com/myDoc

Using a custom subdomain seems not a very good fit, as it would require one separate deployment per lang (or you'd need to have some custom reverse proxy logic to handle that?).

I don't think this is the workflow we'll encourage, but we could still support this if people really want it. Maybe with an option like docusaurus build --fr, so that it builds a single language site.

Note: this can't be done on simple hosting solutions like Github Pages

https://myDomain.com/fr/myDoc

I think having a path language prefix is a simpler option, and can be easily done with a single deployment.

There's still a choice to be made here:

Both solutions has cons:

For now I think 2 is a better solution

Details and problems to consider

1 SPA per language, dev experience?

As we have seen above, it may be a good idea for performance to split the site into multiple smaller SPAs.

But this also means that we'll build the SPAs independently, but what would be the dev experience if you run docusaurus start?

Do we code something completely different in dev so that the routes of all languages are accessible as a single SPA? Do we instead provide a docusaurus start --lang fr to only run the "french SPA"? I think it's an acceptable tradeoff and have some advantages, but can also be annoying for some users.

Auto-generated ids are a problem for anchor links.
As a translator change a heading of some translated markdown file, the id changes, and links from other files do change as well. We should provide an easy way to make the anchors stable across translations

reactjs/react.dev#1605 (comment)
reactjs/react.dev#1605 (comment)
ethereum/ethereum-org-website#272
https://github.com/reactjs/reactjs.org/pull/1636/files
mdx-js/mdx#810

Right-to-Left support

Support RTL in themes?

Plugin integration

TODO

Doc edit button

If the user is browsing a french doc, and press "edit", he should rather open the correct URL (git or crowdin), so we should make this configurable.

Related:
#648

Default language

We should not assume english will be the default language, like in v1.

#3317

Scalability

The build time mostly depends on 3 factors:

To decrease build time and make it sustainable, you can remove older versions from the SPA part, and make them available as a standalone, single version deployment.

We'll work on a cli feature to "archive" older versions more easily: #3286

Fallback

A missing page/translation should be allowed, in such case we'd fallback to the default language and could show a warning

See 6: #2651 (comment)

Creating a language

We need a cli to init a language folder based on current language/versions

Creating a version

See proposal here: #2651 (comment)

We'll have to snapshot each localized folder too

Asset colocation

It's possible to colocate assets close to the docs. Somehow it permits to use a different image per version. What's the story for i18n? This colocated image would likely end up being copied in the language folders too, so it might be duplicated on multiple axis (version/lang). Is it a good thing? At the same time, if an image contains text, that text could be translated differently so it still makes sense...

Slugs

Should we allow to create custom slugs per language?

If we do that, to be able to switch from one lang to the other without loosing context (the doc you are currently reading), one version would have to be aware of the slugs of all the other language versions, which might be quite a lot of data. How do we access such data in a performant way?

To me, it does not look so critical to be able to switch language and preserving context. If the user wants to browse docs in french, he can go through the french home and browse from there, and it's likely google gives him the docs in the correct language in the first place.

We should try to find a solution though, but this can probably be done later, with some code that would, on language switch request, read some json file emitted by the other language, and then obtain a mapping from document id to slug of the other language.

Note: Yarn 1/classic (Jekyll based?) can switch language and preserve context when doing so, but the slugs are not localized: https://classic.yarnpkg.com/es-ES/docs/usage

Translation mode

If you add the ?translate=true querystring, it could enhance the UI so that we add in-place translation features.
It could be possible to integrate with the translation API of a SaaS like crowdin.
This is mostly for key/value translations, as markdown docs will be translated as a whole and there's already the editUrl on the docs plugin.


TODO ...

Ongoing PR: #3325