Vanilla is a fully localized application. That means that all parts of our user interface must be able to be translated. When developing please keep localization in mind.
Considerations you should know when developing a localizable application.
If you've never written a localized application before there are things you should be aware of. Even if you have written a localized application, it's important to know a bit more about the process at Vanilla so that you avoid costly mistakes that create a lot of work for other people.
- You develop our main application in English, using United States dialect.
- Whenever you add a string to any part of the application it must also be added to the locales repo so that it can be sent off for translation.
- Translations cost between five and twenty cents a word to translate. We currently translate into over 40 languages so even seemingly small translations add up in cost.
- Translations take from one to several days to complete. Moreover, we can't order a new batch of translations until the previous batch has been delivered or else we'll have to pay to translate everything twice.
- If you make a mistake in your wording then we have to re-pay that translation time and cost. Please take care when writing strings.
- Translations are not necessarily the best quality. Or rather, we don't have a way of confirming the quality of most translations since we don't have native speakers in all of our languages. We have some measure of testing to ensure that our dynamic strings are not malformed, but when they are we often have to make guesses when fixing the translations.
In short, I want you to understand that translation is a complex process, even though it might seems straightforward at first. Understanding this main point should be your guide when writing content to be translated.
Adding Translating Strings
When adding strings for translation, you place the key and string in the corresponding file of the tx-source
folder in the locales
repo. So, for instance, if the string to be translated appears in the dashboard, you will put it in either dash_core.php
or dash_text.php
.
Basic String Translation
Every string must be associated with a key in this format:
$Definition['Key'] = 'String';
In many cases, these are the same, as in this example:
$Definition['Add User'] = 'Add User';
The strings should either be in title case, as in the example above, or sentence case, as in this example:
$Definition['Your inbox is empty.'] = 'Your inbox is empty.';
Long Strings
If a string is very long, the code should be an abbreviated version of the string. That way, if we change the string, we won't have to change the code. Here's an example:
$Definition['Routes are used to redirect users.'] = 'Routes are used to redirect users depending on the URL requested.';
If that is impractical, you can use a descriptive Pascal Case string:
$Definition['SomeAddonDescription'] = 'This addon does a number of things, including....'
String Replacements
Many of our translation strings contain placeholders to put other strings or information from the application. There are several ways to do this.
Strings formatted for sprintf()
For strings with one or two replacements we often use sprintf(). Here is an example:
$Definition['No results for %s.'] = 'No results for %s.';
- Only use
sprintf()
when you only have to make one or two replacements. Any more translations are too complex and will lead to translation errors. Reviewers should reject pull requests where a string introduces more than two replacements. - If you do have to make two replacements then use the expanded
%1$s, %2$s
syntax to allow other languages to control the order of their replacements. - We usually want to use
%s
for placeholders even if you want to put number within the string. It's better to allow the passer to dictate the formatting rather than the very limited sprintf()
function. - You generally want to replace placeholders data. Avoid constructing sentences with placeholders representing words. Such an approach is rarely possible for every language.
- Currently, we can only use these strings on the server and not the client.
Strings formatted with <0>...</0>
Placeholders
This is currently our best practice for generating dynamic strings.
For strings that you want to inject HTML or want to use on the client we have a more modern version of sprintf()
. That has different placeholders. Here are some example strings:
$Definition["<0 /> out of <1 /> people found this helpful"] = "<0 /> out of <1 /> people found this helpful";
$Definition["You need to <0>Sign In</0> to vote on this article"] = "You need to <0>Sign In</0> to vote on this article";
This format lets you add individual placeholders using self closing tags. It also lets you inject HTML tags by wrapping placeholders around strings. This is the best way to add an anchor to a translation string because you can specify the URL outside of the translation string.
- On the server you can use this format with
HtmlUtils::formatTags()
. - On the client you can use this format with
<Translate source="<0/> out of <1/> people." c0={10} c1={50} />
- Although it is acceptable to have more than two replacements with this format it is not recommended.
- You generally want to replace placeholders with tags or data. Avoid constructing sentences with placeholders representing words. Such an approach is rarely possible for every language.
Example 1
// Text inside of placeholder brackers can be used by providing a function instead of a value.
<Translate
source="You need to <0>Sign In</0> to vote."
c0={text => <SmartLink to="/entry/signin">{text}</SmartLink>}
/>
Example 2
// Passing multiple values
<Translate
source="<0 /> out of <1 /> people found this helpful"
c0={10}
c1={18}
/>
Strings formatted with Vanilla's formatString()
function
Vanilla has a quasi-templating string format that supports many more rich options. The strings use named fields in placeholders (ex. {fieldname}
, {fieldname,format}
).
$Definition['HeadlineFormat.Answer'] = '{ActivityUserID,user} answered your question: <a href="{Url,html}">{Data.Name,text}</a>';
You can see that this is our most flexible format and allows the developer some control over grammar constructs in order to correct for edge cases in each locale.
On its face, this is a great format option. However, in practice we've found this format to be hard to translate for the following reasons:
- Translators are usually unaware of the format they almost always make mistakes when translating these strings. Some languages even have different commas or braces that get replaced in translation. Yuck.
- The format syntax contains English reserved words. These words are often translated which corrupts the string.
- Machine translation really really messes with these strings.
- This format is actually based on the intl extension and we thought that it would eventually be replaced by the built in standard. However, the built in functionality remains poorly documented and buggy to this day.
This format is really only a good fit for strings that represent sentences with a high degree of UGC injection. The only place where these strings are in our application are for activity and notification messages.
Localization Best Practices
Here are some rules of thumb when introducing strings.
Always try and re-use existing strings.
Because of the complexity and expense of adding new strings, always try and re-use an existing string if it exists. Here are some tips to help with that:
- Keep a copy of the locales repo open so that you can do basic searches when adding new UI components.
- If there is an existing string that is almost exactly what you want then really try and use it. There might be a string with slightly different capitalization or with punctuation. Ask yourself what the material impact on the UX is if you just re-use an existing string.
- Making a string slightly more generic can really help with its re-use. This is a double edged sword as UX can suffer if all of our wording starts to look obtuse. I recommend you think of your audience when going for a more generic wording. For basic members, always try and make things worded as nicely as possible. Moderators and admins can handle a bit more brevity.
Always add new strings to the locales repo
We have to make sure we keep our localized content up to date. And this starts with developers doing their part.
- Add new strings to the locales repo as you go. Just keep the locales repo open in a separate tab or IDE. This is the quickest way to add translations and reduces the chance you will forget.
- Make all of your additions against the
transifex
branch instead of master. This is the branch that is automatically synchronized with the Transifex service. - Do not translate to other languages in the locales repo directly. All translation must be done within Transifex. It will synchronize back to the repo automatically. If it doesn't, then ping Todd.
Do not construct strings with concatenation or over-use of placeholders
It might be tempting to make a sentence by mixing and matching and concatenating other translations together. This is rarely not a bug as there is almost always a language where this won't work.
- Avoid concatenating strings almost always. PR reviewers should be looking for this and always request a justification if they see it.
- It is acceptable to concatenate paragraphs together as each paragraph is usually an independent thought. However, you will rarely do this in practice as such concatenation is usually done in your template or component itself.
- It is usually acceptable to concatenate sentences together as they are often independent thoughts. When adding strings that consist of multiple sentences you should also think about splitting them into separate strings if that will aid string re-use (Example).
- Similar to sentences, it is often acceptable to concatenate strings that aren't in a sentence format. The best example of this are strings separated by commas or semi-colons to represent data. Our meta text lines are another example.
- Avoid constructing sentences with placeholders. Our post famous example is our
"%s not found"
string. We use this everywhere, but there are a lot of replacements that just don't work for every language. If you find yourself coming up with a clever string like this you might want to instead think of a generic string that works for multiple use cases (ex. "Record not found."
) rather than having a bespoke string for every different situation.
Really, we should avoid constructing sentences with too many string replacements in general. They often come off as awkward even in English.
Avoid HTML or other formatting within strings.
We currently have many strings that include HTML formatting. We have found this to be problematic for the following reasons:
- Translators often mess up HTML which leads to bugs when viewing our application in other languages.
- As soon as a string has HTML in it then we can't auto-escape it in the output. This incrementally increases our XSS vector. Wouldn't it be nice to auto-escape every string no matter what?
- React doesn't like outputting HTML directly. The react developers tell you this with the property they choose to output raw HTML. It can't hook into events on dynamic HTML without a huge hassle either.
- We want to be able to use strings for more than just our web app. We also have tooltips, attributes, title tags, plain text emails, and other potential use cases. It's best to put the formatting in the templating language instead.
- HTML is often subjective and more prone to change than translation text. Are you sure you want your string to have a
<b>
tag or a <strong>
tag? It's best not to have that decision overridden by a translation. - The translation code should not include formatting such as hard returns or plaintext indents either. This kind of formatted often has the same issues as HTML.
If you want HTML in your string then use <0>...</0>
style formats and then inject the HTML afterwards. For more complex HTML then you want to use the templating language for HTML (either Twig or React). If neither of these solutions seem to work then maybe the application has a design flaw and there needs to be a deeper conversation within the whole team.
Dates and numbers also need localization
It's not just strings that need to be localized. Dates and numbers need to be localized too.
- For dates make sure you use one of our built in methods to format the date rather than built in PHP functions. The is the current best practice for formatting dates.
- Numbers also have different formats. We don currently have a specific this, but is locale aware.
Make sure you have a basic understanding of the requirements of different languages
If you only speak English this isn't necessarily a natural instinct. English is a peculiar language that is complex, but often forgiving and prone to colloquial turns of phrases that isn't shared by other languages. It also doesn't have many advanced grammatical structures that other languages have. Finally, much of the Internet caters to English speakers which creates many design blindspots to those that speak only English.
If you speak more than one language then you already have an ability to help better our localization effort. Here are some tips that explain some differences that other languages have.
- Verb conjugation. Often verbs in sentences are spelled differently depending on who is the subject of the verb. This means that sentences have to be constructed with this in mind.
- Sentence ordering. Most languages have a different ordering of nouns, verbs, subjects, etc. in sentences. This means that if you construct a sentence with more than one naked
%s
then you probably have a bug. - Gendered words. Many non-English languages have gendered words that change the spelling of words around them. This is often the biggest reason why we can't construct sentences with too many placeholders. Often the word you are trying to inject into a sentence affects the surrounding words.
- Right to left. Some languages display right to left instead of left to right. This can have a significant impact on our application's design.
- Special punctuation characters. Different languages might have their own custom punctuation characters. Some look different, but serve the same purpose as ours. Some have a completely different purpose. There are other small difference too (ex. French likes to put a space before a colon).
- Wide characters. Many asian languages are written with wider fixed-width characters. This makes sites in their language easier to read where everything lines up better. They will sometimes replace our number, punctuation, and even spaces with different characters that look the same, but are padded to take up the same fixed width. It's pretty cool.
- Plurals. Some languages don't change structure from the singular to plural form. On the other hand, did you know some languages have more than just singular and plural forms?
- Different Casing. Some languages don't have capital letters. Some languages do, but use them differently than English. Many languages write titles strictly in sentence case for example.
Common Mistakes
Most of the advice in this article gives mistakes to avoid. However, here are some common mistakes that are seen over and over again that deserve special attention.
- Not adding strings to the locales repo. This is probably the biggest mistake currently. Missing locale strings starts a chain of events in motion that most developers need to think about: The missing translation is seen by the customer and we look bad. The support agent has to hunt down the string and report it. A developer has to add it. Often we have to back-port a string addition because it's hard to explain why a simple string change will take a month to see. Just think of that cost that could have been avoided by some due diligence up front.
- Not re-using strings. We should re-use strings as much as possible, but often new strings come in that differer only in some trivial way from existing ones. Our translation inventory grows significantly every year. That grown represents cost. We should take care to keep it as small and clean as possible.
- Typos. Just plain typos make it into our application all the time. Here is one that was in our application for a long time:
$Definition['%s New Plural'] = '%s New Plural';
Here is another one: "<1>Learn more about custom fonts.</1>."
- Grammar mistakes. Grammar mistakes make it into our application too much. They should be corrected at the design or development stage rather than the QA stage where I think many people expect them to be corrected. If you are a non-native English speaker I highly recommend you ask for wording when adding new strings.
- Odd capitalization. Developers LOVE to capitalize words in sentences. If a word seems to represent a variable then a developer will bold it, Capitalize It, or Both. This often leads to a situation where almost every word in a sentence is capitalized. When writing a sentence give a second thought on those capitalized words and decide whether or not they are really proper nouns.
- Long string keys. Overly long string keys not only feel inefficient, but they represent a maintenance problem too. The longer a translation is the more likely it is to change in the future. Furthermore, what happens if your key has a typo, grammar mistake, or odd capitalization? Fixing a mistake in a key means that you also need to find the key in the source code and fix it there too. The alternative is to just live with an obvious mistake forever. Not much better. If you make your key short then that reduces the risk of it having an issue.
- Translating keys and not strings. When a translation key differs from its string then you can have the case where a customer reports the missing translation. The support agent then creates a ticket with a screenshot of the missing translation. Then a developer adds the key from the screenshot. The problem is that the screenshot is showing the default translation and not the key. Always look through the source code for a missing translation before adding it to the locales repo.
You will note that if you look through our existing application you will see many of these mistake present every day. That's a natural side-effect of a long lived app. It's also the reason why you should really try and absorb best practices from our documentation and not just by looking at our existing code.
Conclusion
Almost all of the information in this article is to serve one purpose: Make our application work well in as many languages as possible. All of our considerations have to do with the experience we've gained over the years of how other languages work.
We want to make the best application possible that does not compromise for other languages. It should be possible to have as good of a user experience in one language as it is in English. We may not get every translation right, but those translations can always be improved over time without impacting our code.
You can see that with all of the considerations laid out in this article, localization represents a significant effort. However, If the entire team from the designers to the developers to the support agents filing localization issues understand our localization process better, then we will be able to have a high quality localization process without significantly affecting our development effort.