BCP-47 compliant language tag predicates (original) (raw)
1.0
This module provides a single predicate that determines whether a given string is a valid Language Tag as defined by RFC5646 and used across HTTP, HTML, XML, RDF, and much more.
References
- BCP-47, RFC5646 Tags for Identifying Languages
- IANA Registry of Language Tags (Assigned)
- IANA Registry of Language Subtags
- IANA Registry of Language Tag Extensions (UCD)
Returns #t if the string val is a valid BCP-47 language tag.
Examples:
> (require langtag) > (language-tag? "en") #t > (language-tag? "en-US") #t > (language-tag? "en-US-boont") #t > (language-tag? "en-Latn-US") #t > (language-tag? "i-klingon") #t > (language-tag? "x-private") #t
1 Components🔗ℹ
Returns #t if the string val corresponds to one of the three top-level productions for the rule Language-Tag.
Examples:
Returns #t if the string val corresponds to one of the components of a normal-use language tag.
2 Matching🔗ℹ
TBD
Examples:
> (require langtag) > (language-tag-match "en") '(lang "en" ((language . "en"))) > (language-tag-match "en-US") '(lang "en-US" ((language . "en") (region . "US"))) > (language-tag-match "en-US-boont") '(lang "en-US-boont" ((language . "en") (region . "US") (variant . "boont"))) > (language-tag-match "en-Latn-US") '(lang "en-Latn-US" ((language . "en") (script . "Latn") (region . "US"))) > (language-tag-match "i-klingon") '(grandfathered-i "i-klingon") > (language-tag-match "x-private") '(private-use "x-private")
3 Appendix: Definition🔗ℹ
The syntax of the language tag, from [RFC5646], in ABNF [RFC5234] is:
Language-Tag = langtag ; normal language tags |
---|
/ privateuse ; private use tag |
/ grandfathered ; grandfathered tags |
langtag = language |
["-" script] |
["-" region] |
*("-" variant) |
*("-" extension) |
["-" privateuse] |
language = 2*3ALPHA ; shortest ISO 639 code |
["-" extlang] ; sometimes followed by |
; extended language subtags |
/ 4ALPHA ; or reserved for future use |
/ 5*8ALPHA ; or registered language subtag |
extlang = 3ALPHA ; selected ISO 639 codes |
*2("-" 3ALPHA) ; permanently reserved |
script = 4ALPHA ; ISO 15924 code |
region = 2ALPHA ; ISO 3166-1 code |
/ 3DIGIT ; UN M.49 code |
variant = 5*8alphanum ; registered variants |
/ (DIGIT 3alphanum) |
extension = singleton 1*("-" (2*8alphanum)) |
; Single alphanumerics |
; "x" reserved for private use |
singleton = DIGIT ; 0 - 9 |
/ %x41-57 ; A - W |
/ %x59-5A ; Y - Z |
/ %x61-77 ; a - w |
/ %x79-7A ; y - z |
privateuse = "x" 1*("-" (1*8alphanum)) |
grandfathered = irregular ; non-redundant tags registered |
/ regular ; during the RFC 3066 era |
irregular = "en-GB-oed" ; irregular tags do not match |
/ "i-ami" ; the 'langtag' production and |
/ "i-bnn" ; would not otherwise be |
/ "i-default" ; considered 'well-formed' |
/ "i-enochian" ; These tags are all valid, |
/ "i-hak" ; but most are deprecated |
/ "i-klingon" ; in favor of more modern |
/ "i-lux" ; subtags or subtag |
/ "i-mingo" ; combination |
/ "i-navajo" |
/ "i-pwn" |
/ "i-tao" |
/ "i-tay" |
/ "i-tsu" |
/ "sgn-BE-FR" |
/ "sgn-BE-NL" |
/ "sgn-CH-DE" |
regular = "art-lojban" ; these tags match the 'langtag' |
/ "cel-gaulish" ; production, but their subtags |
/ "no-bok" ; are not extended language |
/ "no-nyn" ; or variant subtags: their meaning |
/ "zh-guoyu" ; is defined by their registration |
/ "zh-hakka" ; and all of these are deprecated |
/ "zh-min" ; in favor of a more modern |
/ "zh-min-nan" ; subtag or sequence of subtags |
/ "zh-xiang" |
alphanum = (ALPHA / DIGIT) ; letters and numbers |