BCP-47 compliant language tag predicates (original) (raw)

1.0

This module provides a single predicate that determines whether a given string is a valid Language Tag as defined by RFC5646 and used across HTTP, HTML, XML, RDF, and much more.

References

Returns #t if the string val is a valid BCP-47 language tag.

Examples:

> (require langtag)
> (language-tag? "en")
#t
> (language-tag? "en-US")
#t
> (language-tag? "en-US-boont")
#t
> (language-tag? "en-Latn-US")
#t
> (language-tag? "i-klingon")
#t
> (language-tag? "x-private")
#t

1 Components🔗ℹ

Returns #t if the string val corresponds to one of the three top-level productions for the rule Language-Tag.

Examples:

Returns #t if the string val corresponds to one of the components of a normal-use language tag.

2 Matching🔗ℹ

TBD

Examples:

> (require langtag)
> (language-tag-match "en")
'(lang "en" ((language . "en")))
> (language-tag-match "en-US")
'(lang "en-US" ((language . "en") (region . "US")))
> (language-tag-match "en-US-boont")
'(lang "en-US-boont" ((language . "en") (region . "US") (variant . "boont")))
> (language-tag-match "en-Latn-US")
'(lang "en-Latn-US" ((language . "en") (script . "Latn") (region . "US")))
> (language-tag-match "i-klingon")
'(grandfathered-i "i-klingon")
> (language-tag-match "x-private")
'(private-use "x-private")

3 Appendix: Definition🔗ℹ

The syntax of the language tag, from [RFC5646], in ABNF [RFC5234] is:

Language-Tag = langtag ; normal language tags
/ privateuse ; private use tag
/ grandfathered ; grandfathered tags
langtag = language
["-" script]
["-" region]
*("-" variant)
*("-" extension)
["-" privateuse]
language = 2*3ALPHA ; shortest ISO 639 code
["-" extlang] ; sometimes followed by
; extended language subtags
/ 4ALPHA ; or reserved for future use
/ 5*8ALPHA ; or registered language subtag
extlang = 3ALPHA ; selected ISO 639 codes
*2("-" 3ALPHA) ; permanently reserved
script = 4ALPHA ; ISO 15924 code
region = 2ALPHA ; ISO 3166-1 code
/ 3DIGIT ; UN M.49 code
variant = 5*8alphanum ; registered variants
/ (DIGIT 3alphanum)
extension = singleton 1*("-" (2*8alphanum))
; Single alphanumerics
; "x" reserved for private use
singleton = DIGIT ; 0 - 9
/ %x41-57 ; A - W
/ %x59-5A ; Y - Z
/ %x61-77 ; a - w
/ %x79-7A ; y - z
privateuse = "x" 1*("-" (1*8alphanum))
grandfathered = irregular ; non-redundant tags registered
/ regular ; during the RFC 3066 era
irregular = "en-GB-oed" ; irregular tags do not match
/ "i-ami" ; the 'langtag' production and
/ "i-bnn" ; would not otherwise be
/ "i-default" ; considered 'well-formed'
/ "i-enochian" ; These tags are all valid,
/ "i-hak" ; but most are deprecated
/ "i-klingon" ; in favor of more modern
/ "i-lux" ; subtags or subtag
/ "i-mingo" ; combination
/ "i-navajo"
/ "i-pwn"
/ "i-tao"
/ "i-tay"
/ "i-tsu"
/ "sgn-BE-FR"
/ "sgn-BE-NL"
/ "sgn-CH-DE"
regular = "art-lojban" ; these tags match the 'langtag'
/ "cel-gaulish" ; production, but their subtags
/ "no-bok" ; are not extended language
/ "no-nyn" ; or variant subtags: their meaning
/ "zh-guoyu" ; is defined by their registration
/ "zh-hakka" ; and all of these are deprecated
/ "zh-min" ; in favor of a more modern
/ "zh-min-nan" ; subtag or sequence of subtags
/ "zh-xiang"
alphanum = (ALPHA / DIGIT) ; letters and numbers