Locale vs language vs region
When creating a service or product, it important to understand the distinction between locale, language, and region.
Locale
Locale is a set of parameters that define the system's user-interface and features. This could include:
- A language preference for content that is displayed on a website.
- A region setting that restricts content that is shown on a streaming service.
- A date formatting preference in a calendar application.
Locale is multifaceted
It's tempting as a developer to think of locale as a single setting in their app for picking a language or region.
Locale cannot be reduced to a singular property.
For example, if the user's language preference is en-GB (The IETF language tag for British English), this does not mean the user is located in the GB region (United Kingdom). They could be a British citizen living in Germany! See also, "Using flags to represent language".
Likewise, if the user is in the US region (United States), this does not automatically mean the user wants to use the US measurement system or the US date and time notation.
"Any color the customer wants, as long as it’s black"
Wikipedia defines locale in terms of the "user", but it's often something that the user does not have control over.
- Websites often will only come in one language.
- Streaming services decide region based on assumptions.
- Calendar apps won't support systems other than the Gregorian Calendar.
Language
Language is a subset of locale. This configuration affects both text in the user interface and text from the user themselves.
Language is also multifaceted
Similar to locale, language itself should not be considered a singular setting. Take for example a word processing application like Microsoft Word; A user might deliberately configure their system with:
- Spanish as the user interface for their operating system
- English as the user interface for Microsoft Word
- French as the Word document's language (so they can use features like spell-check)
- German for quotes within the document (which ideally would not be spell-checked in French)
Side note
Many users will use software that is not in their first language or even secondary language.
One reason is using the software's default language (often English) makes it a lot easier to find answers online using search engines. This is because the default language will be the most popular, and therefore discussion and forum groups will stick to the terminology used by the default language.
Another reason is that the translations for the language are just plain wrong and confusing, because the software developer didn't want to pay for a proper translator. Machine translation can help, but it doesn't understand the context of what is behind translated, resulting in mistranslations.
Language has variants
- Language is not the same in every region. For example, British English is slightly different from American English.
- Language can be represented using different scripts. For example, Kazakh can be written using three alphabets: Cyrillic, Latin, and Arabic.
- Language can change over time. For example, German orthography was officially reformed as recently as 1996, and before then in 1901.
- Language can be modified to serve different audiences. Wikipedia has an English edition and a Simple English edition, the latter is primarily written in Basic English.
The IETF BCP 47 language tag is one standard for representing these variants as an alphanumeric code. This allows software to distinguish one language from another, such as:
- en-GB, British English from- en-US, American English
- kk-Cyrl, Kazakh (Cyrillic) from- kk-Arab, Kazakh (Arabic)
- de, German from- de-1901, German (Traditional German orthography)
- en, English from- en-basiceng, Basic English
Other interesting examples include:
- en-gb-oxendict, British English (Oxford English Dictionary spelling)
- es-419, Latin American Spanish, where 419 is the UN M49 code
- und, a special tag for an undetermined language
Worth checking out BCP47 language subtag lookup by r12a, as it can help with looking up language tags.
Language can affect how text is presented
Language goes well beyond the raw characters in text.
- Text will have unique hyphenation rules in different languages.
- Individual glyphs will display differently depending on the language.
- Uppercase/lowercase text will transform differently depending on the language.
Text can consist of mixed languages
It's possible for text from one language to be mixed with text of another language.
A common example of this is user-generated content appearing in notifications, like "A comment was added to your post '{{title}}'.". Ideally, if the title was in a different language, like German, a developer would use HTML to markup the text as German using '<span lang="de">{{title}}</span>. Unfortunately, in practice, I don't think this is ever done.
The content doesn't even need to be interpolated. Social media websites will have content from international users that are in different languages. But I can't find any examples of platforms that use the lang HTML attribute to say what language the individual posts are in.
It's rare to even find online publishing platforms that set the language of the page via <html lang="..."> to anything other than en for English. In my quick search of platforms, the only website I could find that used it were Blogger.com and Bearblog.dev.
Language is not just text
By this I mean language isn't just used in writing systems. It can exist in other forms.
Language can apply to speech, braille, signing, whistling, knots, etc.
Regions
A region setting can affect services in three ways.
Region-independent services
A lot of software does not depend on the region the user is based in.
- Operating Systems like Windows, macOS, and Linux will have the ability to change the language and the region as two different settings.
- Software Applications will have settings to change the language and a lot of them won't have a region setting. For example, a To-Do list app doesn't need to know what region the user is in to function.
Region-influenced services
Many websites offer users the ability to select their own language, but the content that is displayed to them will be influenced by regional factors.
- Wikipedia has several editions in different languages, with a wide variation of content between them. The editions aren't region dependent, but they are very much influenced by the regions where the language is spoken in. In other words, a small German town may have an article in the German edition of Wikipedia, but not in the Japanese Wikipedia.
- Search Engines like Google, DuckDuckGo, and Bing have settings for "Display language" and "Region". The user is able to pick their own language, but the selected region will determine the content they see. For example, a user with the language "Ukrainian" and the region "United Kingdom" would have their user interface in Ukrainian but the content would be in English. In this case, the content is influenced by the region that the user has selected.
- BBC World Service is offered in different languages, but each edition will have a topics focusing on regions where that language is spoken. For example, articles in Arabic will focus on the middle east, whereas articles in Ukrainian will focus on Europe.
Region-dependent services
Services will often be restricted by regions. This may be done for various reasons, like content distribution agreements, legal compliance, or simply because a company doesn't want to provide technical support in other languages for everyone in all regions.
- Netflix.com – In the United States only the languages Spanish and English can be selected. In the United Kingdom, the only language is English.
- Apple.com – Apple's website does not have a language picker. Only the region can be specified. - With an interesting exception for two regions: Canada (available in French and English) and Latin America and the Caribbean (available in Spanish and English).
 
- GDPR – Several news publications, primarily in the United States, blocked users that they think are within regions that have adopted the General Data Protection Regulation (GDPR). These regions include the European Union, European Economic Area, and the United Kingdom. This block will often be in the form of a blank page with a line of text saying "This content is not available in your region" or "451: Unavailable due to legal reasons". As the latter message suggests, the news publications did not want to comply with the regulations.- In 2018, a third of the 100 largest U.S. newspapers have opted to block their sites in Europe, according to NiemanLab. I looked at their data to see how things have changed since then. As of October 2024, 8% of those 100 websites are still unavailable (in the United Kingdom at least).
- Since 2018, similar data protection rules have been adopted in other regions of the world. This includes: California, Virginia, Colorado, Turkey, China, Switzerland. It's not clear whether the aforementioned news organizations have blocked access for these regions as well. I'm guessing no, because the hatred of the GDPR in 2018 was a meme, spread by ignorant and careless people who don't care about data protection. The same ignorant and careless people would therefore not bother to read the news about these newer data protection laws.