Authoring web pages (original) (raw)
Characters
Getting started
Background reading
- Character encodings for beginners
What is a character encoding, and why should I care? - Introducing character sets and encodings
A brief introduction to some of the concepts associated with character sets and encodings and the Web, with pointers to various techniques sections. - Character encodings: Essential concepts
Basic introductions to concepts related to character encoding. Includes:Unicode, character sets, coded character sets, character encodings, the document character set, character escapes, xhtml & mime types, and standards vs quirks modes. - Quick tips: Encoding
One of the 10 quick tips for internationalization. - Quick tips: Escapes
One of the 10 quick tips for internationalization.
Choosing and applying a character encoding
See also
This section is specifically about how to choose a character encoding for your content and ensure that the content is in that encoding.
For information about how to declare the encoding so that the browser knows how to read your content see Declaring the character encoding for HTML and Declaring the character encoding for your CSS stylesheet.
See also the dedicated section about Changing to UTF-8.
- Choose UTF-8 for all content. more
- If you really can't use a Unicode encoding, use only those legacy encodings listed in the Encoding specification. more
- Avoid the following encodings: UTF-16, UTF-32, JIS_C6226-1983, JIS_X0212-1990, HZ-GB-2312, JOHAB (Windows code page 1361), encodings based on ISO-2022, or encodings based on EBCDIC, CESU-8, UTF-7, BOCU-1, and SCSU. more
How to's
- Choosing & applying a character encoding
Which character encoding should I use for my content, and how do I apply it to my content?
Useful reference links
- Encoding, 4.2 Names and labels
If you have a good reason for not using UTF-8, then use only the encodings and labels shown in the left column of this table.
Spec links
- HTML5, 8.2.2.2 Character encodings
Recommendations for support of particular encodings for browsers implementing HTML5.
Background reading
- Who uses Unicode?
Are corporate Web sites using Unicode right now? This article is somewhat outdated, now that Unicode accounts for around 97% of pages on the Web. - Document character set
What is the 'Document Character Set' for XML and HTML, and how does it relate to the encodings I use for my documents?
Changing to UTF-8
See also
This section is specifically about how migrate your content to the UTF-8 (Unicode) encoding. For more general advice see Choosing and applying a character encoding.
For information about how to declare the encoding so that the browser knows how to read your content see Declaring the character encoding for HTML and Declaring the character encoding for your CSS stylesheet.
- Save the data as UTF-8, don't just change the encoding declaration. more
- Declare the encoding in your page. more
- Ensure that your server does the right thing. more
How to's
- Changing an HTML page encoding to UTF-8
How do I change the encoding of my HTML pages to UTF-8? - The byte-order mark (BOM) in HTML
What is the byte-order mark, and what do I need to know about it when creating HTML? - Migrating to Unicode
Detailed guidelines for the migration of software and data to Unicode.
Background reading
- Document character set
What is the 'Document Character Set' for XML and HTML, and how does it relate to the encodings I use for my documents?
Declaring the character encoding for HTML
- Use the HTTP header if it is available. more
- Always use an in-document encoding declaration, even if you are also using the HTTP header. more
- Ensure that the encoding declaration fits within the first 1024 bytes of the page. more
- If you cannot use UTF-8, use the preferred encoding name indicated in the Encoding specification. more
- Do not use the
charset
attribute ona
orlink
elements. more
How to's
- Declaring character encodings in HTML
How should I declare the encoding of my HTML file? This page contains a quick reference section, followed by more detailed information.
Useful reference links
- Encoding, 4.2 Names and labels
If you have a good reason for not using UTF-8, then use only the encodings and labels shown in the left column of this table.
Spec links
- HTML5
- 8.2.2 The input byte stream
Detailed technical information for browser implementers about how pages are parsed for recognition of the character encoding. - 2.1.6 Character encodings
Information about preferred MIME names, ASCII compatible encodings, and Unicode characters. - 4.2.5.5 Specifying the document's character encoding
How to use the meta element to declare the encoding.
- 8.2.2 The input byte stream
Background reading
- Serving HTML & XHTML
Introduces doctypes, mime-types, and the influence of standards- vs. quirks-mode on character encoding declarations. - Handling character encodings in HTML and CSS
Tutorial style article that gathers together and organizes pointers to articles that, taken together, help you understand how to handle the essential aspects of authoring HTML and CSS related to characters and character encodings.
Declaring the character encoding for a CSS style sheet
- If you use UTF-8 as the character encoding for your style sheets and your HTML pages, and declare that encoding in your HTML, there is no need to declare the encoding for your style sheet. more
- If you use
@charset
, ensure that nothing (except a BOM) comes before it in the style sheet, and use the exact syntax. more - If you cannot use UTF-8, use the preferred encoding name indicated in the Encoding specification. more
- Do not use the
charset
attribute ona
orlink
elements. more
How to's
- CSS character encoding declarations
How do I declare the character encoding of a CSS style sheet?
Useful reference links
- Encoding, 4.2 Names and labels
If you have a good reason for not using UTF-8, then use only the encodings and labels shown in the left column of this table.
Spec links
- CSS Syntax Level 3, 3.2. The input byte stream
Character encoding information in the CSS Level 3 spec.
Using escapes to represent characters
- Avoid using escapes whenever possible. When you use UTF-8 it supports all the characters you need. more
- Use escapes for invisible or ambiguous characters. more
- Use CSS escapes for CSS embedded in HTML, rather than HTML escapes. more
- Always use Unicode codepoints for the numeric part of a character escape. Do not use codepoint values of non-Unicode encodings. more
- Use a single escape (representing the Unicode codepoint value) for supplementary characters. Do not escape surrogate character pairs. more
- Ensure that all
href
attribute values have escaped ampersands in query parameters, ie.&
rather than just&
. more - Avoid named character entities in XHTML. more
How to's
- Using character escapes in markup and CSS
How can I use character escapes in markup and CSS, and when should I use or not use them?
Spec links
- HTML5, 8.5 Named character references
Character reference names that are supported by HTML, and the code points to which they refer.
Checking the encoding of a document
How to's
- W3C Internationalization Checker
Shows the HTTP header information for a page, and all in-page encoding declarations. Also highlights conficts. - Checking HTTP Headers
How can I check the character encoding information sent in the HTTP header of a web document? - Checking the character encoding using the validator
How can I check that the character encoding of my document is correct using the W3C HTML Validator?
Handling the byte-order mark (BOM)
- If you use the byte-order mark with UTF-8-encoded pages, check that any scripts and back-end processes can handle the BOM. more
- If you ignored the advice above and encoded your page as UTF-16, always ensure that it starts with a BOM. more
How to's
- The byte-order mark (BOM) in HTML
What is the byte-order mark, and what do I need to know about it when creating HTML?
Spec links
- HTML5, 8.2.2 The input byte stream
How HTML5 detects the character encoding of a page, and mentions how browsers should handle BOM detection. - CSS Syntax Level 3, 3.2. The input byte stream
Character encoding information in CSS. Mentions how browsers should handle the BOM.
Show more links
- Handling character encodings in HTML and CSS
Tutorial style article that gathers together and organizes pointers to articles that, taken together, help you understand how to handle the essential aspects of authoring HTML and CSS related to characters and character encodings. - CSS 2.1, 4.4 CSS style sheet representation
Character encoding information in the CSS 2.1 spec. Mentions how browsers should handle the BOM.
Handling character normalization
- Ensure that all HTML class names and CSS selectors are saved using the same Unicode normalization form (NFC is recommended). more
How to's
- Normalization in HTML and CSS
What are normalization forms, and why do I need to know about them when creating HTML and CSS content??
Handling encoding issues in forms
- Use UTF-8 for the character encoding of your page. more
- Consider checking on the server that form data is arriving in UTF-8. more
How to's
- Multilingual form encoding
What is the best way to deal with encoding issues in forms that may use multiple languages and scripts?
Using Unicode control codes
See also
If you represent control codes using character escapes, see also Using escapes to represent characters for more information.
- Don't use Unicode characters if there is markup to do the same job. more
- Use character escapes to represent control codes, so that they are visible. more
How to's
- Characters or markup?
There are a range of control-like Unicode characters, some of which fulfill the same role as markup. Which should I use, and which should I avoid? - Unicode in XML & Other Markup Languages
Guidelines on the use of the Unicode Standard in conjunction with markup languages such as XML. - Unicode controls vs. markup for bidi support
To correctly format bidi text in HTML or XML content, should I use Unicode control codes or markup? - Using Unicode controls for bidi text
If I'm unable to use markup to correctly order bidirectional text, what can I do? - HTML, XHTML, XML and Control Codes
How do I handle control codes (ie. the 'C0' U+0000-U+001F and 'C1' U+007F-U+009F ranges) in XML, XHTML and HTML?
Working around unavailable characters/glyphs
How to's
- Missing characters and glyphs
What to do if a Unicode character or font glyph is missing.
Using non-ASCII web addresses
Useful reference links
- Internationalized country code top-level domain
Wikipedia article. Contains news about recent developments. - Internationalized domain name
Wikipedia page. - mod_fileiri: new Apache module under development
Martin Dürst's fileiri Apache module.
Spec links
- RFC 3987 Internationalized Resource Identifiers (IRIs)
IETF Proposed Standard for handling of IRIs. - Unicode Technical Report #36 Unicode Security Considerations
Describes security issues related to phishing.
Background reading
- An Introduction to Multilingual Web Addresses
How IDN and IRIs work, aimed at content authors and general users who want to understand the basics without too many gory technical details.
Language
Getting started
Background reading
- Language on the Web
W3C Getting Started article. - Working with language in HTML
W3C tutorial. - Language tags in HTML and XML
How to choose the right attribute values. W3C article. - Quick tip: Language
One of the 10 quick tips for internationalization. - Why use the language attribute?
Why should I use the language attribute in web pages?
Declaring the overall language of a page
See also
For detailed advice about how to select the right language tags, see Choosing language values.
See also Declaring metadata about the language of the intended audience.
- Always declare the default language for text in the page using attributes on the
html
tag. more - Do NOT use the
meta
element with thecontent
attribute set toContent-Language
. more - Use language attributes rather than HTTP to declare the default language for 'text processing' (ie. when language needs to be known for things such as font choice, styling, spell-checking, hyphentation, quote mark styling, etc.). more
- Do not declare the default language of a document in the
body
element, use thehtml
element. more - Where a document contains content aimed at speakers of more than one language, decide whether you want to declare one language in the
html
tag, or leave the languages undefined until later. - Where a document contains content aimed at speakers of more than one language, try to divide the document linguistically at the highest possible level, and declare the appropriate language for each of those divisions.
- For HTML use the
lang
attribute only, for XHTML 1.0 served astext/html
use thelang
andxml:lang
attributes, and for XHTML served as XML use thexml:lang
attribute only. more
How to's
- Declaring Language in HTML
How should I set the language of the content in my HTML page?
Background reading
- Types of language declaration
Describes two different types of language information, 'metadata' and 'text-processing', and how they differ. - Why use the language attribute?
Why should I use the language attribute in web pages? A number of useful reasons. - HTTP headers, meta elements and language information
For HTML, should we put language declarations in HTTP headers and meta elements, and how are they different from those in language attributes?
Spec links
- HTML
- 3.2.6.2 The lang and xml:lang attributes
The language attributes in HTML. - 4.2.5.3 Pragma directives
How HTML deals with a meta element with http-equiv set to Content-Language.
- 3.2.6.2 The lang and xml:lang attributes
Identifying in-document language changes
See also
See also Declaring the overall language of a page.
For detailed advice about how to select the right language tags, see Choosing language values.
- When the page contains content in another language, add a language attribute to an element surrounding that content. more
- For HTML use the
lang
attribute only, for XHTML 1.0 served astext/html
use thelang
andxml:lang
attributes, and for XHTML served as XML use thexml:lang
attribute only. more - If the text in attribute values and element content is in different languages, consider using a nested approach. more
How to's
- Declaring Language in HTML
How should I set the language of the content in my HTML page?
Choosing language tags
- Use subtags as defined by BCP 47 for language attribute values. more
- Use the shortest possible language tag values. more
- Where possible, use the codes zh-Hans and zh-Hant to refer to Simplified and Traditional Chinese, respectively. more
- Use the subtag zxx when the text is known to be not in any language. more
- When the language is undetermined and you have to label it, use lang="". more
- If you are serving XML, and the format you are using supports it, use xml:lang="", otherwise use xml:lang="und" when the language is undetermined and you have to label it. more
How to's
- Choosing a Language Tag
Which language tag is right for me? How do I choose language and other subtags? Covers all the subtag types in the latest version of BCP47. - Language tags in HTML and XML
A simple overview of the syntax for language tags in BCP 47. - Tagging text with no language
How do I use language markup in HTML or XML content when I don't know the language, or the content is non-linguistic? - Two-letter or three-letter language codes
Should I use two-letter or three-letter ISO language codes in language tags? W3C article. - Picking the Right Language Identifier
Describes how to select Unicode language identifiers.
Useful reference links
- IANA Language Subtag Registry
This is the official location where you will find all subtags available for use in language tags. - Language Subtag Lookup tool
User friendly interface to IANA's language tag registry. Provides for checking of subtags as well as lookup. Up-to-date with latest version of BCP 47. - Internet-Draft: BCP 47
Points to a document containing both RFC 5646 (Tags for the Identification of Languages) and RFC 4647 (Matching Language Tags)- RFC 5646 Tags for the Identification of Languages
The specification that describes language tag syntax. - RFC 4647 Matching of Language Tags
The specification that describes alternative ways of matching language tags.
- RFC 5646 Tags for the Identification of Languages
- Language Tags
Provides various useful links about language tags and a good place to find up-to-date information.
Declaring metadata about the language(s) of the intended audience
See also
This section is specifically about setting metadata for the document as an object. For information about declaring the language of the document for text-processing purposes, see Declaring the overall language of a page.
For detailed advice about how to select the right language tags, see Choosing language values.
- Consider using a
Content-Language
HTTP header to declare metadata about the language(s) of the intended audience of a document. more - Where a document contains content aimed at speakers of more than one language, use the HTTP
Content-Language
header with a comma-separated list of language tags. more
How to's
- HTTP headers, meta elements and language information
For HTML, should we put language declarations in HTTP headers and meta elements, and how are they different from those in language attributes? - Declaring language in HTML
How should I set the language of the content in my HTML page? Includes:- Specifying metadata about the audience language
Talks about using HTTP headers to provide metadata.
- Specifying metadata about the audience language
Background reading
- Types of language declaration
Describes two different types of language information, 'metadata' and 'text-processing', and how they differ.
Spec links
- HTML5, 4.2.5.3 Pragma directives
How HTML5 deals with ameta
element withhttp-equiv
set toContent-Language
. - HTTP 1.1, 14.12 Content-Language
TheContent-Language
HTTP header described in the HTTP1.1 specification.
Indicating the language of a link destination
Setting & changing browser language preferences
How to's
- Setting language preferences in a browser
How do I check or change the language settings of my browser? - Internationalization checker
Tells you what your currentAccept-Language
headers are set to. (See the bottom of the information table.)
Using Accept-Language for locale setting
How to's
- Accept-Language used for locale setting
Is it a good idea to use the HTTP Accept-Language header to determine the locale of the user? - Date formats, Option Three: Use the Accept-Language HTTP header
How do I prepare my web pages to display varying international date formats?
Markup & text
Getting started
How to's
- Quick tips: Presentation vs. content
One of the 10 quick tips for internationalization. - Quick tips: Text authoring
One of the 10 quick tips for internationalization.
Using b and i tags
- Use the class attribute on a b or i element to identify why the element is being used. more
- Consider whether other elements might be more applicable than the b or i element because they carry the right semantics. more
How to's
- Using and elements
Should I use and elements, and if so, what do I need ?
Spec links
- HTML5, 4.5.17 The i element
Thei
element in the HTML5 spec - HTML5, 4.5.18 The b element
Theb
element in the HTML5 spec
Using ruby markup
See also
This section is specifically about how to use markup for ruby annotations. For information about styling ruby see Styling ruby text.
How to's
- Ruby markup
Discusses how to use ruby markup in HTML5, and has pointers to what currently works in browsers.
Background reading
- What is ruby?
What are 'ruby' annotations? - Bopomofo on the Web
A summary of how bopomofo is used and the implications for support on the Web. - Use Cases & Exploratory Approaches for Ruby Markup
Discussion about what is needed in the HTML5 specification, and possibly other markup vocabularies, to adequately support ruby markup. It looks at a number of use cases and how well they are supported by the various markup models. - CJKV Information Processing
Useful information about ruby in general (Ken Lunde's book, CJKV Information Processing, ISBN 1-56592-224-7, especially chapters 6 and 7)
Spec links
- HTML5
- 4.6.21 The ruby element
Theruby
element in the HTML5 spec - 4.6.22 The rt element
Thert
element in the HTML5 spec - 4.6.23 The rp element
rp
in the HTML5 spec
- 4.6.21 The ruby element
- HTML Ruby Markup Extensions
Proposed extensions to the HTML5 markup model.
Working with form controls
See also
In the Characters section see Handling encoding issues in forms.
In the section Text Direction see Managing text direction in form controls.
In the section Styling & Layout see Working with names and Working with date formats.
How to's
- Sorting select options
As part of a form, I have a list of terms in a drop-down box. Why are they not correctly sorted when I translate the items in the list?
Working with strings in JavaScript & databases
- Use a topic-comment approach whenever possible. more
- Avoid sentence-like arrangements when they contain substrings that are predefined translatable text or numeric text. more
- Use sentence-like arrangements with care if you have non-numeric and non-translatable text substrings (ie. text created at runtime). more
- Where the parts of a composite message appear in separate locations, provide the translator with contextual information to show how the various parts of a composite message relate to each other. more
- Provide information to the translator, where needed, to clarify what a substring represents. more
- When requested by the localization group, be prepared to provide information about the size of each substring. more
- Strings should be reused where text is always used in exactly the same context, or where the string is a self-contained, independent sentence or phrase. more
- Reused strings must not refer to more than one text, graphic or conceptual context. more
- If in doubt as to whether a string is a good candidate for re-use, don't. more
- If re-used strings will be displayed in fixed-sized displayers of varying sizes, ensure that the translation will all fit in the smallest sized display box. more
How to's
- Working with Composite Messages
Why you need to be very careful about splitting up and reusing text on-screen. The linguistic differences between languages can lead to real headaches for localizers and may in some cases make a reasonable translation impossible to achieve. - Re-using Strings in Scripted Content
Things to be aware of if you plan to use the same text string in different places on your site or user interface.
Indicating what should and should not be translated
- Use the
translate
attribute on an element to prevent its content being translated by online translation services or by computer-assisted translation tools. more
How to's
- Using HTML's translate attribute
What is the translate attribute for, and how should I use it?
Tests
Styling & layout
Getting started
Preparing for text expansion during translation
- Ensure that your graphic backgrounds can automatically expand with the text they are related to, avoid highly constrained spaces, and anticipate that the box containing your text may grow during translation. more
How to's
- Background images that support localization
How can I ensure that when text expands in translation the background images will still work?
Background reading
- Text size in translation
Overview of text expansion issues. - Display capabilities
Do I need to worry because display capabilities (screen sizes, number of colors, etc.) of computers vary in other countries? - Sliding Doors of CSS
Douglas Bowman's article in A List Apart about how to layer background images, allowing them to slide over each other to create certain effects. (A note from the editors: While brilliant for its time, this article no longer reflects modern best practices.)
Styling by language
See also
Related sections include Using attributes to declare language, and Choosing language values.
- Use :lang to set language-specific styling. more
How to's
- Styling using the lang attribute
Compares :lang, lang |= and lang= selectors, for both HTML and XML. Includes:- The :lang() pseudo-class selector
How to use it. - Using CSS selectors in XML with xml:lang
Dealing with namespaces in documents served as XML.
- The :lang() pseudo-class selector
- Language tags in HTML and XML
How language tags work and where to find which one to use.
Spec links
- Selectors Level 4, 7.2 The Language Pseudo-class: :lang()
- Selectors Level 4, 6 Attribute selectors
- CSS Namespaces Module
Using logical property styles
- Use CSS logical properties wherever possible, so as to facilitate localization into right-to-left and vertically-set scripts.
How to's
- MDN: CSS Logical Properties and Values
An introduction.- Basic concepts of Logical Properties and Values
Introduction to the specification, and explanation of flow relative properties and values. - Logical properties for sizing
Explains the flow-relative mappings between physical dimension properties and logical properties used for sizing elements on our pages. - Logical properties for margins, borders and padding
A look at flow-relative mappings for the various margin, border, and padding properties and their shorthands. - Logical properties for floating and positioning
How to use logical mappings for the physical values of float and clear, and also for the positioning properties used with positioned layout.
- Basic concepts of Logical Properties and Values
Styling counters for lists, etc.
- Use the CSS
@counter-style
rule to define or modify counters used for list markers, figure numbering, chapter headings, etc.. - Don't assume that all writing systems prefer a ragged edge at the line end. Fully-justified text is the default for some scripts/languages..
How to's
- MDN: @counter-style
How to define your own counter styles when the pre-defined styles aren't fitting your needs. - Ready-made Counter Styles
Cut-and-paste code snippets for a large number of international counter styles that can be used for ordered lists and other such counters.
Useful reference links
- Counter styles converter
Allows you to create and test your own styles, or tweak and test the many code snippets listed in the Ready-made Counter Styles doc. - Language enablement index: Lists, counters, etc
Links to information about lists and counter-styles in the language enablement index.
Tests
Managing line breaks
See also
Hyphenation affects line-breaking, but has it's own section here. Line-breaking behaviour is also closely associated with justification. For the latter, see Justifying & aligning text.
- Since default line-breaking rules vary by language, always correctly label your content for language. more
How to's
- MDN: word-break
Specifies whether or not the browser should insert line breaks wherever the text would otherwise overflow its content box due to a lack of spaces. Particularly useful for Chinese, and Japanese. Values includebreak-all
andkeep-all
. - MDN: line-break
For Chinese, Japanese, or Korean (CJK), specifies how (or if) to break lines when working with punctuation and symbols. Values includestrict
,normal
,loose
, andanywhere
.
Background reading
- Approaches to line breaking
High level summary of various typographic strategies for wrapping text at the end of a line, for a variety of scripts.
Spec links
- CSS Text Module Level 3, 5. Line Breaking and Word Boundaries
- CSS Text Module Level 3, 6. Breaking Within Words
Tests
- CSS3 Text, Line breaking, BA, OP, CL and NS
- CSS3 Text, Non-tailorable line breaking
- CSS3 Text, word-break
- Japanese & Chinese line breaks
Hyphenation
Justifying and aligning text
See also
Justification behaviour is closely associated with line-breaking and hyphenation. For more information on those topics, see Managing line breaks.
- Wherever possible use
start
andend
values for the CSStext-align
property, rather thanleft
andright
. Only useleft
andright
on the rare occasions when the alignment has to remain as is, regardless of language. more - Only use
text-align
when you really need to override the alignment produced by the current base direction. Don't litter your markup or stylesheet with unnecessary alignment calls. - Avoid using HTML attributes with values of
left
andright
. Instead add selectors to your CSS stylesheet. This allows you to use logical properties, but also makes it much easier to change things during localisation. - Use CSS property names that include the words 'start' and 'end', rather than 'left', 'right', 'top', and 'bottom'. Eg.
margin-inline-start
andmargin-block-start
. more - Don't assume that all writing systems prefer a ragged edge at the line end. Fully-justified text is the preferred default for some scripts/languages.
- Since justification rules vary by language, always correctly label your content for language. more
How to's
- MDN: text-align
Specifies the horizontal alignment of an inline or table-cell box, including the valuejustify
, which is used to turn on justification. - MDN: text-justify
Defines what type of justification should be applied to text when it is justified (ie. whentext-align:justify
is set). Values includeinter-word
andinter-character
.
Background reading
- Approaches to full justification
High level summary of various typographic strategies for fully justifying text on a line and in a paragraph for a variety of scripts, and some advice for authors and implementers.
Creating vertical text
How to's
- Styling vertical Chinese, Japanese, Korean and Mongolian text
How to use CSS to create vertical text, and what is currently supported. Includes:- Basic setup
Use writing-mode to achieve the basic direction. - Changing the glyph orientation for embedded text
How to make non-native text stand upright, rather than flow down the page. - Horizontal-in-vertical text
Make numbers and short texts run horizontally within the vertical line. - Forms, lists and tables
Working with forms, lists and tables.
- Basic setup
Other links
- Text direction
Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.
Styling ruby text
See also
This section is specifically about styling ruby text. For more information about markup for ruby see Using ruby markup.
How to's
- Ruby Styling
Discusses how to use CSS styling to affect the rendering of ruby content. - MDN: ruby-align
Defines the distribution of the different ruby elements over the base.
Background reading
- Ruby
What is 'ruby'? - CJKV Information Processing
Useful information about ruby in general (Ken Lunde's book, CJKV Information Processing, ISBN 1-56592-224-7, especially chapters 6 and 7)
Tests
- CSS3 Ruby
Includes tests forruby-position
,ruby-align
,ruby-merge
, and ruby autohide
Applying various script-specific typographic conventions
Other links
- Document grids
Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text. - Kumimoji and warichu
Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text. - Emphasis
Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.
Using fonts & webfonts
How to's
- How to Use Cross Browser Web Fonts
Useful tutorial on how to use webfonts, and some things to look out for. - Fonts supplied with Windows and macOS, by script
Lists of fonts provided by the Windows 10 and macOS operating systems, as well as Google's Noto fonts and SIL fonts, grouped by script. Useful to set font-family styles for CSS.
Working with date formats
How to's
- Date formats
How do I prepare my web pages to display varying international date formats?
Working with personal names
- Ask yourself whether you really need to have separate fields for given name and family name. more
- Make input fields long enough to enter long names, and ensure that if the name is displayed on a web page later there is enough space for it. more
- Avoid limiting the field size for names in your database. more
- Try to avoid using the labels 'first name' and 'last name' in non-localized forms. more
- Consider whether it would make sense to have one or more extra fields, in addition to the full name field, where you ask the user to enter the part(s) of their name that you need to use for a specific purpose. more
- Ask separately, when setting up a profile for example, how that person would like you to address them. more
- If you have separate fields for parts of a person's name, ensure that you label clearly which parts you want where more
- Be careful about assumptions built into algorithms that pull out the parts of a name automatically. more
- Be as clear as possible about telling people how to specify their name. more
- Don't assume that a single letter name is an initial. more
- Don't require that people supply a family name. more
- Don't forget to allow people to use punctuation such as hyphens, apostrophes, etc. in names. more
- Don't require names to be entered all in upper case. more
- Allow the user to enter a name with spaces. more
- Don't assume that members of the same family will share the same family name. more
- It may be better for a form to ask for 'Previous name' rather than 'Maiden name' or 'née'. more
- If you hope to get Latin- or ASCII-only, you need to tell the user. more
- You may want to store the name in both Latin and native scripts, in which case you probably need to ask the user to submit their name in both native script and Latin-only form, using separate fields. more
- If you do accept non-ASCII names, you should use a Unicode character encoding (eg. UTF-8) in your pages, your back end databases and in all the software code in between. more
How to's
- Personal names around the world
How do people's names differ around the world, and what are the implications of those differences on the design of forms, databases, ontologies, etc. for the Web?
Bidirectional text
Getting started
How to's
- Unicode Bidirectional Algorithm basics
A gentle introduction to how the bidi algorithm works. - Creating HTML Pages in Arabic, Hebrew and Other Right-to-left Scripts
A tutorial, that gathers together and organizes pointers to articles that, taken together, help you understand the essential aspects of how to work with languages in right-to-left scripts and bidirectional text when authoring HTML and CSS. - Quick tips: Right-to-left text
One of the top 10 quick tips for internationalization is about right-to-left text. - Languages using right-to-left scripts
Lists 12 scripts and over 200 languages using RTL orthographies in the modern day, plus rough Ethnologue data on countries & speaker numbers.
Setting up a right-to-left page
See also
This section is about setting up the default direction for a whole page. For information about working with text direction changes inside the document see Changing the direction of a block element and Mixing text direction inline.
- Only use bidi markup to set the base direction for the document as a whole, or where you need to change the base direction. more
- Add
dir="rtl"
to thehtml
tag any time the overall document direction is right-to-left. more - Don't add
dir="rtl"
to thebody
tag. more - If you need to avoid the scroll bar moving on some browsers, put
dir
on thehead
element and adiv
just inside thebody
element. more - Use logical order, not visual ordering for Hebrew, and choose an appropriate encoding. more
- If you have to use an ISO encoding for a Hebrew page, declare the encoding as ISO-8859-8-i rather than ISO-8859-8. more
- Do not use CSS styling to control directionality in HTML. Use markup. more
How to's
- Text direction and structural markup in HTML
How to use thedir
attribute and handle alignment. Includes:- Setting direction at the document level
Usingdir
on thehtml
tag to set the default direction of the document. - Working with browsers that change the browser chrome
Workarounds if you don't want the browser to change the UI whendir
is set on thehtml
tag.
- Setting direction at the document level
- Visual vs. logical ordering of text
What is the difference between visual and logical ordering of text, and which should I use?
Setting direction on block elements
See also
For information about setting up the default direction for a whole page see Setting up a right-to-left page.
See also Managing direction in form controls.
- Add the
dir
attribute to a block element to change base direction. more - Do not use CSS styling to control directionality in HTML. Use markup. more
- Only use bidi markup to set the base direction for the document as a whole, or where you need to change the base direction. more
How to's
- Text direction and structural markup in HTML
How to use the dir attribute and handle alignment. Includes:- Setting direction on block elements
How to use thedir
attribute and handle alignment. - Working with tables
Particular advice for working with tables. - Handling content whose direction is not known in advance
Besides form-related information, how to insert text into a page with the right base direction, using HTML5 features. - Displaying bidi text in the textarea and pre elements
Howdir=auto
affects elements with multiple paragraphs of plain text.
- Setting direction on block elements
- Unicode controls vs. markup for bidi support
To correctly format bidi text in HTML or XML content, should I use Unicode control codes or markup? - CSS vs. markup for bidi support
Should I use CSS or markup to correctly format Unicode-based bidirectional (bidi) text in HTML and XML-based markup languages?
Managing text direction in form controls
See also
See also Setting direction on block elements.
- Add
dir="auto"
toinput
tags to automatically align text to the correct side of an input field. more - Add
dir="auto"
totextarea
andpre
tags to make paragraphs align to the left or right according to the intial strong character more - Consider using the
dirname
attribute to pass information to the server about the direction of text in a text or search form control. more
How to's
- Text direction and structural markup in HTML
How should I use the dir attribute to set text direction on structural elements in HTML? Includes:- Correcting display of opposite-direction text in the input element
HTML5 techniques for getting the cursor and text to the right side of theinput
element. - Displaying bidi text in the textarea and pre elements
Usingdir=auto
in HTML5 to assign direction to each paragraph independently. - Reporting direction to the server
Using HTML5'sdirname
attribute to pass direction information to the server. - Setting direction on forms explicitly
Keystrokes that make browsers set the direction of form entry fields.
- Correcting display of opposite-direction text in the input element
- Using Unicode controls for bidi text
If I'm unable to use markup to correctly order bidirectional text, what can I do?
Tests
Mixing text direction inline
- Tightly wrap every opposite-direction phrase in markup that sets its base direction. more
- If you know the phrase's direction, wrap it in an element with a
dir
attribute. If you don't already have an element around the text, usespan
orbdi
. more - If you don't know the phrase's direction, ie. unknown text that will be injected at run time, then either wrap the phrase in
bdi
(nodir
attribute needed), or if the phrase is tightly wrapped by an element already, just adddir="auto"
to that element. more - To bulletproof the code for Edge or legacy browsers, if the tightly-wrapped phrase is followed inline (possibly after some intervening neutral characters) by a number, or is one of a list of separate phrases with the same direction, then add a directional mark (RLM or LRM) immediately after the markup of that phrase. more
- Only use Unicode control characters for bidirectional control in attribute text or element text that allows no internal markup. more
- Consider using Unicode control characters to set the base direction around bidirectional text that will be displayed as tooltips, page titles, or on JavaScript dialog boxes. more
- Do not leave white space at the end of inline elements that mark a directional boundary. more
How to's
- Inline markup and bidirectional text in HTML
How to use markup in HTML4 and new HTML5 features for inline bidirectional text. Introduces you gently to the bidi algorithm, if you need that. Includes:- Handling inline bidirectional text in HTML
Brief steps for marking up any type of inline bidirectional text. Following sections give worked examples. - What if I can't use markup?
Use Unicode control characters where markup isn't allowed.
- Handling inline bidirectional text in HTML
- Bidi space loss
Why does my browser collapse spaces between Latin and Arabic/Hebrew text? - CSS vs. markup for bidi support
Should I use CSS or markup to correctly format Unicode-based bidirectional (bidi) text in HTML and XML-based markup languages? - Unicode controls vs. markup for bidi support
To correctly format bidi text in HTML or XML content, should I use Unicode control codes or markup? - Using Unicode controls for bidi text
If I'm unable to use markup to correctly order bidirectional text, what can I do? - RTL rendering of LTR scripts
Ways to produce runs of right-to-left text for languages such as Chinese, Japanese, Egyptian hieroglyphs, Tifinagh, Old Norse runes, and a good number of other now-archaic scripts.
Handling parentheses and other mirrored characters
- Treat mirrored characters as if any word
left
in the name meant 'opening', andright
meant 'closing'. more
How to's
- Inline markup and bidirectional text in HTML
How to use markup in HTML4 and new HTML5 features for inline bidirectional text. Introduces you gently to the bidi algorithm, if you need that. Includes:- Mirrored characters
Understanding how parentheses and other mirroring characters work in bidirectional text.
- Mirrored characters
Overriding the Unicode bidirectional algorithm
- Use the
bdo
element to force the directionality of a sequence of inline characters. more
How to's
- Inline markup and bidirectional text in HTML
How to use markup in HTML4 and new HTML5 features for inline bidirectional text. Introduces you gently to the bidi algorithm, if you need that. Includes:- Overriding the bidi algorithm
How to disable the bidi algorithm, when needed.
- Overriding the bidi algorithm
Navigation
Getting started
Background reading
- Quick tips: Navigation
One of the top 10 quick tips for internationalization. - Monolingual vs. multilingual web sites
What are the trade-offs between international sites that are monolingual vs. multilingual? - International & multilingual web sites
What is an "international" or a "multilingual" web site?
Linking to localized content
See also
See also Indicating the language of a link destination in the Language section.
- Use server-based, language-related content negotiation to point the user to the page that matches their browser preferences, but also add links to each page so that the user can change languages easily if they prefer. more
- Consider how to indicate to the user where the in-page language links are, and if the page is available in a long list of languages, consider whether or not to use something like a select control (and if so, how to make it obvious what its function is). more
- Locate pull-down menus or selection lists at or near the top of the page. more
- Use a recognizable image alongside a pull-down menu to indicate that it is a control which will take the user to localized pages. Do not use text. more
- Consider using the size attribute to display the first set of options in a select control. more
- Translate the links or options into the target language. more
- Encode your page as UTF-8, so that it supports the necessary characters. more
- Decide whether it is a problem that a user won't have fonts for all the list items or menu options. If it is, use javascript menus or some other graphic-based approach. more
- Decide whether to add a description alongside each option, using the language of the current page, so that users can tell what the native word means. more
- Find the most appropriate way of ordering the list of options. more
How to's
- Guiding users to translated pages
If my site contains alternative language versions of the same page, what can I do to help the user see the page in their preferred language? - Using to Link to Localized Content
What are the best practices for using pull-down menus based on the select element to direct visitors to localized content? - About languages and flags
On some Web pages you’ll find country flags as symbols for languages. This article explains why this approach is problematic, and what you should do instead.
Using content negotiation
See also
See also Indicating the language of a link destination in the Language section.
Also the Language section in the Server Setup page.
- Use server-based, language-related content negotiation to point the user to the page that matches their browser preferences, but also add links to each page so that the user can change languages easily if they prefer. more
- If the user switches to a different language, offer them the opportunity to remember that choice and serve up subsequent pages in that language, overriding their browser settings. more
How to's
- Guiding users to translated pages
If my site contains alternative language versions of the same page, what can I do to help the user see the page in their preferred language? - When to use language negotiation
Argues that content negotiation is always a good idea, but that it is not sufficient alone.
You can link to this page and open specific items by using the open
parameter in the URL. For example, <authoring-html.en?open=language&open=langvalues>
will automatically open the sections Language and Choosing language tags. The necessary parameter values are shown to the right of each heading. These are links, to help you create a URL for sharing. The query ?open=all expands all sections.
.