Adding and Editing Locale Translation Strings

Higher Logic Vanilla (Vanilla) is a fully localized application which means that all parts of our user interface have be able to be translated into "local languages." When developing, keep localization in mind.

📝 NOTE: This article details various things that you should consider when developing a localizable application. Use the in-article menu to the right to jump to topics.

Localizable applications - development considerations

Regardless of whether you have or have not ever written a localized application, there are things you should be aware of. Also, it's important to know about Vanilla's process so that you can avoid mistakes that could create a lot of work for others.

Language - You develop our main application in English, using United States dialect.
Locales repo - Whenever you add a string to any part of the application, you must also add it to the locales repo so that it can be sent for translation.
Review - If your translation text/wording has any errors, the translation will have to be redone, in which case we have to re-pay that translation cost. This also means delays in completing the project. Take care when writing strings.
Completion time - Translations can take up to several days to complete. Moreover, we can't order a new batch of translations until the previous batch has been delivered or else we'll have to pay to translate everything twice.
Cost - Translations cost between five and twenty cents per word. Vanilla translates into more than 40 languages, so even the smaller translations can be costly, which is why your content review is critical.

Translations pitfalls

Translations are not always necessarily the best quality or entirely accurate. The language and wording of the original text can greatly influence the integrity of the final product.

Vanilla cannot confirm the quality of translations because we don't have native speakers in the 40+ languages that we localize.
We have some measure of testing to ensure that our dynamic strings are not malformed, but when they are we often have to make guesses when fixing the translations.

Final thoughts

It's important that you understand that translation into local languages is a complex process. The complexity, time, and cost should be at the forefront of your planning when writing content that has to be translated.

Adding translating strings

When adding strings for translation, place the key and the string in the corresponding file of the tx-source folder in the locales repo.

For example, if the string to be translated appears in the Dashboard, put it in either dash_core.php or dash_text.php.

Key-string notes

Every string must have a key and be structured as follows.

$Definition['Key'] = 'String';

It is possible that the key and the string are identical, as in:

$Definition['Add User'] = 'Add User';

Strings must be in either Title Case, as in the example above, or sentence case, as in:

$Definition['Your inbox is empty.'] = 'Your inbox is empty.';

Long string tips

If a string is very long, the key should be an abbreviated version of it. That way, if we change the string, we won't have to also change the key, as in:

$Definition['Routes are used to redirect users.'] = 'Routes are used to redirect users depending on the URL requested.';

If that is impractical, you can use a descriptive Pascal Case string:

$Definition['SomeAddonDescription'] = 'This addon does a number of things, including....'

String replacements

Many of our translation strings contain placeholders for other strings or for information from the application. There are several ways to use placeholders.

Strings formatted for sprintf()

For strings with one or two replacements we often use the sprintf() function. Here is an example:

$Definition['No results for %s.'] = 'No results for %s.';

Use sprintf() when you have to make only one or two replacements. More translations than that are too complex and will lead to translation errors. Reviewers should reject pull requests where a string introduces more than two replacements.
If you do have to make two replacements, then use the expanded %1$s, %2$s syntax to allow other languages to control the order of their replacements.
We usually want to use %s for placeholders even if you want to put a number within the string. It's better to allow the passer to dictate the formatting rather than the very limited sprintf().
You generally want to replace placeholders' data. Avoid constructing sentences with placeholders representing words. Such an approach is rarely possible for every language.
Currently, we can only use these strings on the server, not on the client.

Strings formatted with <0>...</0> placeholders

📝 NOTE: This is currently our best practice for generating dynamic strings.

For strings that you want to inject HTML tags or want to use on the client, we have an updated version of sprintf() which has different placeholders. Here are some example strings.

$Definition["<0 /> out of <1 /> people found this helpful"] = "<0 /> out of <1 /> people found this helpful";$Definition["You need to <0>Sign In</0> to vote on this article"] = "You need to <0>Sign In</0> to vote on this article";

This format lets you add individual placeholders using self-closing tags. It also lets you inject HTML tags by wrapping placeholders around strings. This is the best way to add an anchor to a translation string because you can specify the URL outside of the translation string.

On the server, you can use this format with HtmlUtils::formatTags().
On the client, you can use this format with <Translate source="<0/> out of <1/> people." c0={10} c1={50} />.
Although it is acceptable to have more than two replacements with this format, it is not recommended.
You generally want to replace placeholders with tags or data. Avoid constructing sentences with placeholders representing words. Such an approach is rarely possible for every language.

Example 1

// Text inside of placeholder brackets can be used by providing a function instead of a value.<Translate    source="You need to <0>Sign In</0> to vote."    c0={text => <SmartLink to="/entry/signin">{text}</SmartLink>}/>

Example 2

// Passing multiple values<Translate    source="<0 /> out of <1 /> people found this helpful"    c0={10}    c1={18}/>

Strings formatted with Vanilla's formatString() function

Vanilla has a quasi-templating string format that supports many more rich options. The strings use named fields in placeholders, such as {fieldname}, {fieldname,format}.

$Definition['HeadlineFormat.Answer'] = '{ActivityUserID,user} answered your question: <a href="{Url,html}">{Data.Name,text}</a>';

You can see that this is our most flexible format and allows developers some control over grammar constructs in order to correct for edge cases in each locale.

This is a great formatting option; however, in practice we've found this format to be hard to translate for the following reasons:

Translators are usually unaware of the format; they often make mistakes when translating these strings. Some languages even have different commas or braces that get replaced in translation.
The format syntax contains English-reserved words. These words are often translated which corrupts the string.
Machine translation really messes with these strings.
This format is actually based on the intl extension and we thought that it would eventually be replaced by the built-in standard. However, the built-in functionality remains poorly documented and "buggy" to this day.

This format is really only a good fit for strings that represent sentences with a high degree of UGC injection. The only place where these strings are in our application are for activity and notification messages.

Localization best practices

Below are some best practices when using strings for localization.

Re-use existing strings

Due to the complexity and expense of adding new strings, try to re-use existing strings. Here are some tips to help with that.

Strings that are slightly generic are more likely to be re-used. However, this practice can be a double-edged sword because the UX can suffer if all of your wording starts to look obtuse.
✔️ TIP: Think of your audience when considering more generic wording. For community members, try to word things as clearly as possible; whereas Moderators and Admins can typically tolerate a bit more brevity.
Keep a copy of the locales repo open so that you can do basic searches when adding new UI components.
If an existing string is almost exactly what you want, then if at all possible, use it. For example, a string with slight capitalization or punctuation differences could be re-used if the material impact on the UX is minimal.

Add new strings to the locales repo

We have to make sure we keep our localized content up to date. This starts with developers doing their part.

Add new strings to the locales repo as you work. Keep the repo open in a separate tab or IDE to quickly add translations and reduce the likelihood that you'll forget.
Make all of your additions against the Transifex branch instead of Master. This branch is automatically synchronized with the Transifex service.
Do not translate to other languages directly in the locales repo. All translation must be done within Transifex. It will automatically synchronize back to the locales repo. If it doesn't, ping Todd.

Do not construct strings with concatenation or an over-use of placeholders

It might be tempting to make a sentence by mixing and matching and concatenating other translations. However, this almost always results in a bug because there is almost always a language where this won't work.

Avoid concatenating strings. PR reviewers should be looking for this and should always request a justification if they see it.
It is acceptable to concatenate paragraphs together as each paragraph is usually an independent thought. However, you will rarely do this in practice as such concatenation is usually done in your template or component itself.
It is usually acceptable to concatenate sentences together as they are often independent thoughts. When adding strings that consist of multiple sentences, you should also think about splitting them into separate strings if doing so increases their likelihood of being re-used (Example).
Similar to sentences, it is often acceptable to concatenate strings that aren't in a sentence format. The best example of this is strings that are separated by commas or semi-colons to represent data. Our meta text lines are another example.
Avoid constructing sentences that contain placeholders. An example of this is the "%s not found" string. We use this everywhere, but there are a lot of replacements that just don't work for every language. If you find yourself coming up with a clever string like this you might want to instead think of a generic string that works for multiple use cases (e.g., "Record not found.") rather than having a bespoke string for every different situation.

Really, we all should avoid constructing sentences with too many string replacements because too often they are "awkward," even in English.

Avoid HTML and other formatting within strings

We currently have many strings that include HTML formatting. We have found this to be problematic for the following reasons.

Translators often mess up HTML which leads to bugs when viewing our application in other languages.
As soon as a string has HTML in it then we can't auto-escape it in the output. This incrementally increases our XSS vector. Wouldn't it be nice to auto-escape every string, no matter what?
React doesn't like outputting HTML directly. The react developers tell you this with the property they choose to output raw HTML. It can't hook into events on dynamic HTML without a huge hassle either.
We want to be able to use strings for more than just our web app. We also have tooltips, attributes, title tags, plain text emails, and other potential use cases. It's best to put the formatting in the templating language instead.
HTML is often subjective and more prone to change than translation text. Are you sure you want your string to have a <b> tag or a <strong> tag? It's best not to have that decision overridden by a translation.
The translation code should not include formatting such as hard returns and plain text indents. This kind of formatting often has the same issues as HTML.

If you want HTML in your string, use <0>...</0> style formats and then inject the HTML afterwards. For more complex HTML, use the templating language for HTML (either Twig or React). If neither of these solutions work, then maybe the application has a design flaw and there needs to be a deeper conversation within the whole team.

Localizing dates and numbers

Dates and numbers have to be localized, too.

Dates - make sure you use one of our built-in methods to format the date, rather than using built-in PHP functions. The is the current best practice for formatting dates.
Numbers - also have different formats. We don't currently have a specific format for this, but is locale aware.

Have a basic understanding of the requirements of different languages

If you speak English only this isn't necessarily a natural instinct. Consider, too, that the English language:

is a peculiar language that is very complex, but often forgiving and prone to colloquial turns of phrases that isn't shared by other languages.
doesn't have many advanced grammatical structures that other languages have.

Keep in mind, too, that a vast majority of internet content caters to English-language speakers which tends to create design blind spots for those who speak only English.

If you speak more than one language you already have an ability to help better our localization effort.

Non-English language notables

Below are some of the differences that English-language speakers would encounter in non-English languages.

Verb conjugation. Often, verbs in sentences are spelled differently depending on who is the subject of the verb. This means that sentences have to be constructed with this in mind.
Sentence ordering. Most languages have a different ordering of nouns, verbs, subjects, in sentences. So, if you construct a sentence with more than one naked %s, you'll probably have a bug.
Gendered words. Many non-English languages have gendered words that change the spelling of words around them. This is often the biggest reason why we can't construct sentences with too many placeholders. Often the word you are trying to inject into a sentence affects the surrounding words.
Right to left. Some languages display and are read right-to-left instead of left-to-right. This can have a significant impact on our application's design.
Special punctuation characters. Different languages might have their own custom punctuation characters. Some look different, but serve the same purpose as in English. Some have a completely different purpose. There are other small differences, too, such as French prefers a space before a colon.
Wide characters. Many Asian languages are written with wider fixed-width characters. This makes sites in the native language easier to read where everything lines up better. They will sometimes replace our number, punctuation, and spaces with different characters that look the same, but which are padded to take up the same fixed width. It's pretty cool.
Plurals. Some languages don't change structure from the singular to plural form. Also, did you know some languages have more than just singular and plural forms?
Different Casing. Some languages don't have capital letters. Some languages do, but use them differently than English. Many languages write titles strictly in sentence case for example.

Common mistakes to avoid

Most of the advice in this article gives mistakes to avoid. However, below are some common mistakes that are seen repeatedly, and which deserve special attention.

Not adding strings to the locales repo. This is probably the biggest mistake. Missing locale strings starts a chain of events that most developers need to think about: The missing translation is seen by the customer and we look bad. The support agent has to hunt down the string and report it. A developer has to add it. Often we have to back-port a string addition because it's hard to explain why a simple string change will take a month to see. Just think of that cost that could have been avoided by some up front diligence.
Not re-using strings. We should re-use strings as much as possible, but often new strings come in that differ only in some trivial way from existing ones. Our translation inventory grows significantly every year. That growth represents cost. We should take care to keep it as small and clean as possible.
Typos. Just plain typos make it into our application all the time. Here is one that was in our application for a long time: $Definition['%s New Plural'] = '%s New Plural'; Here is another one: "<1>Learn more about custom fonts.</1>."
Grammar mistakes. Grammar mistakes make it into our application too much. They should be corrected at the design or development stage rather than the QA stage where many people expect. If you are a non-native English speaker, I recommend you ask for wording when adding new strings.
Odd capitalization. Developers LOVE to capitalize words in sentences. If a word seems to represent a variable then a developer will bold it, Capitalize It, or Both. This often leads to a situation where almost every word in a sentence is capitalized. When writing a sentence, give a second thought on those capitalized words and decide whether or not they really are proper nouns.
Long string keys. Overly long string keys not only feel inefficient, they represent a maintenance problem, too. The longer a translation is, the more likely it is to change in the future. Furthermore, what happens if your key has a typo, grammar mistake, or odd capitalization? Fixing a mistake in a key means that you also have to find the key in the source code and fix it. The alternative is to just live with an obvious mistake. Not much better. If you make your key short, then that reduces the risk of it having an issue.
Translating keys but not strings. When a translation key differs from its string then you can have the case where a customer reports the missing translation. The support agent then creates a ticket with a screenshot of the missing translation. Then a developer adds the key from the screenshot. The problem is that the screenshot is showing the default translation, not the key. Always look through the source code for a missing translation before adding it to the locales repo.

If you look through our existing application, you will see many of these mistakes; they are a natural side-effect of a long-lived app. This is why you should try to absorb the best practices in our documentation and not just look at our existing code.

Conclusion

Almost all of the information in this article is designed to serve one purpose: Make our application work well in as many languages as possible.

All of our considerations have to do with the experience we've gained over the years of how well, or not, the English language translates into other languages.

We want to make Vanilla an application that is not compromised as the result of being translated into non-English languages. Every community user should have the best possible experience irrespective of language. We may not get every translation right, but translations can be improved over time without impacting our code.

You can see that with all of the considerations laid out in this article, localization represents a significant effort. However, If the entire team — from the designers to the developers to the support agents — has a better understanding of our localization process, then we will be able to have a high-quality localization process that doesn't significantly affecting our development effort.