Translation Memory Cleanup
Cleanup of translation memories in multiple languages
- Fast and efficient cleanup of your translation memories
- Support from ErrorSpy quality assurance software
- Extraction and build-up of your specialized terminology in all languages
Maintenance of your translation databases
Translation memory maintenance reduces translation costs
In many companies, translation memories are at the heart of translation production. They are part of what are called the linguistic assets. Translation memories contain previously translated sentences (called segments) created by human translators or machine translation systems. Once translated, content can be reused. This saves work and costs. At the same time, this reuse ensures consistent translation of texts and thus higher quality.
However, if a translation memory contains errors or is outdated, this can cause major problems with your translations. Erroneous translation memories occur, for example, when special terms have changed or when several translators have provided different translations over time without standardizing their style or terminology. Translation errors resulting from a lack of knowledge about a company’s products or contextual meanings spread like a virus as soon as they enter a translation memory unnoticed.
To avoid erroneous translation memories, it is important to regularly update and maintain your translation memories. This task certainly involves costs. Moreover, this work requires linguistic and special subject knowledge in several languages. That’s exactly what we are here for, and we know how to keep costs down for this type of task. We can assist you with tools and methods that will help you keep your translation memories at a high level of quality.
If you maintain your translation memories regularly, you can be sure that you will save time and money and reuse correct translations of the best possible quality.
Translation into all languages
We translate into all languages
Do you need a translation? We will send you a quote within the shortest possible time. Send us your request using this quote form.
File formats
Which translation memory formats do we process?
Overall, there are many different file formats for translation memories or translated segments. We can handle most of them.
Translation Memories (TMs)
are usually stored in a database, as is the case with the common translation memory systems such as Across, Trados, memoQ and some others. These TMs can be exchanged in various formats, primarily TMX or one of its tool-specific variants
TMX (Translation Memories eXchange)
This is a widely used file format for translation memories based on XML (eXtensible Markup Language). TMX files are easily interchangeable and can be used with a wide range of translation tools and software programs.
XLIFF (XML Localization Interchange File Format)
This is another XML-based file format for translated segments that is widely used in the localization industry. There are some variants of XLIFF such as SDLXLIFF, which is generated by Trados.
SDLTM (proprietary format for Trados databases)
SDLTM is Trados Studio’s internal format for translation memories. SDLTM is based on XML and can be exported to TMX.
CSV (Comma-Separated Values)
This is a simple file format commonly used for storing data in tabular form, with data columns and rows separated by commas. CSV files are relatively easy to handle and can be used with a wide range of software programs and tools.
PHP
is a widely used open source general-purpose scripting language that is particularly suitable for web development and can be embedded in HTML.
JSON
JSON is a data format used to store data in a structured way. It is often used to store data in a database or to transfer data between different parts of a web application.
Translation memory cleanup challenges
We have the necessary knowledge
Key aspects of translation database cleansing
The top six challenges in cleaning up translation memories are:
Identifying incorrect data: Cleaning up a TM requires identifying, finding, and correcting inaccurate or incomplete data. This can be a time-consuming process and requires knowledge of various error patterns in TMs.
Duplicates: Duplicate entries are not just 100% identical strings. Different phrases for the same content must be identified and either eliminated or merged.
Incomplete segments: Truncated segments occur when a text is not properly segmented before translation begins. This can lead to incorrect translations and requires manual correction.
Contextual errors: Contextual errors occur when the source segment has been translated correctly but does not fit the context of the target language. In this case, an expert translator must review the translation and check it for accuracy.
.
Wrong terminology: Wrong terminology can lead to errors in translation memories. In this case, manual correction of the terms used in the TM is required.
.
Translation errors: Incorrect translations can occur for various reasons. One needs translation expertise to detect errors in meaning.
Translation memory cleanup in 6 steps
Data cleansing procedure
This is how the cleanup of your translation memories works:
Request
Offer and project start
Cleanup according to specification
Quality assurance
Adding attributes
Delivery and maintenance
Cleanup and maintenance of translation memories
Our customers work in these industries
Language combinations
Translation databases in the following languages
We clean translation databases in a large number of language combinations. Very popular combinations for TMs are:
- Translation memory German → English
- Translation memory English → German
- Translation memory German → French
- Translation memory French → German
- Translation memory German → Italian
- Translation memory Italian → German
- Translation memory German → Polish
- Translation memory German → Turkish
- Translation memory German → Portuguese
- Translation memory German → Russian
- Translation memory German → Dutch
- Translation memory German → Chinese
Quality and time savings through clean TMs
The cleaning of translation databases in detail
The analysis of translation memories
Before we start the analysis, we discuss with our clients the goal they want to achieve. Some of the aspects we examine during a translation memory are:
- The size, scope and characteristics of the translation memory. This includes, for example, the total number of translations stored in the memory, their age, frequency of use, and the range of subject areas and topics covered.
- Identify redundant segments: Cleanup involves work. Therefore, it is helpful if the set of segments to be cleaned up can be reduced at the beginning. For example, some segments have not been used for years. This can be read from the segment attributes. Therefore, they can be deleted from the TM.
- Common patterns and trends in the data. This may include looking for common words, phrases, or syntactic constructions that occur frequently in the source text or translations. This can help identify inconsistencies or systematic translation deficiencies.
- Terminology used. This is traditionally one of the main causes of erroneous translation memories. We therefore investigate terms that are not translated consistently, or terms that have not been extracted and included in the corporate glossary.
The analysis aims to understand the strengths and weaknesses of the memory and identify areas where it can be improved.
Data cleansing: The types of error
Here are some of the typical errors that can be found in a TM:
Duplicates: A sentence with multiple translations. We also look for sentences in the source language that have basically the same meaning, leading to unnecessary translation variants.
.
False translations: These occur when the translation does not accurately reflect the meaning of the source language. Sometimes translations come from neural machine translation engines and contain so-called “hallucinations”, i.e., words that do not exist in the source text.
Terminology errors: These occur when the terminology used in the translation memory is inconsistent or even incorrect.
Spelling errors: These occur when the translation (or the source text) has not been spell-checked.
Formal errors: These occur when formal aspects are incorrect, such as closing parentheses, inserting incorrect numbers, using incorrect encoding for special characters, etc.
Incomplete sentences: These occur when the text to be translated has not been segmented correctly. This results in incomplete segments, which can even lead to larger errors due to the different syntactic and morphological structure of the languages.
.
Punctuation errors: These occur when the translation contains incorrect punctuation or when punctuation marks are missing.
Deprecated translations. They may contain, for example, incorrect references, links, terms or product names.
About the importance of metadata
Metadata and attributes are important in translation memories because they provide information about each translated segment to which they are assigned. TM attributes can contain a variety of information, such as the name of the translated document, the project number, the acquisition or modification date, the frequency of translation reuse, the segment origin (e.g., alignment or MT), or the editing status of the segment. This information can be very useful in various contexts, such as when using TMs to train an MT engine.
Specific benefits of metadata include:
.
- Improve access to translated segments. By providing information about the translated segments, metadata helps to select or use translated segments more easily.
- Provide context and background information about translated segments. Metadata provides valuable information about the origin and history of translated segments. This can be particularly important for the maintenance or use of translated segments, where context and background information can be critical to the correct use of the segments.
- Support data sharing and collaboration. Providing metadata about translated segments can make it easier for other people or software to find and use translated segments.
Technologies used for data cleansing
There are several tools and methods for cleaning translation memories that can be used depending on the task.
The most important tool we use is ErrorSpy, our translation quality assurance software. We started developing ErrorSpy about 20 years ago, and it has become a Swiss Army knife of quality control. Within seconds, ErrorSpy provides a list of possible errors, such as terminology, number, or consistency errors, for our reviewers to sift through.
We also work with regular expressions that allow us to recognize certain patterns in translation memories and automatically correct some of them. For example, we can recognize and change the spelling of product names, date formats, superfluous spaces, remnants of old spelling, or certain word sequences.
Artificial intelligence methods are used for certain tasks. Among other things, they are very useful for detecting semantic similarities. For example, we can find out that the statements “Do not stay under suspended loads.” and ” It is forbidden to stay under a suspended load.” actually have the same meaning and only need a single translation.
Finally, a number of other linguistic tools or self-written programs help us to detect and solve other typical problems such as incomplete sentences, ambiguities, or spelling errors.
Cleanup service for your language data
Seven reasons to work with us
Why should you use our translation memory cleanup services?
- We guarantee time and cost savings through cleansed TMs.
- We have the right tools and technologies for the job.
- We have more than 20 years of experience with translation memory cleansing
- Our quality assurance meets and exceeds DIN EN ISO 17100.
- We are familiar with AI-based quality assurance methods.
- We guarantee reliable services and quality.
- You have no fixed costs and costs are incurred only, for what you need.
Curation of translation memories
Our services at a glance
We offer a wide range of services for cleaning and curating translation memories, such as
- Cleanup of translation memories in multiple languages
- Build-up and standardize your terminology
- Development and, if necessary, implementation of optimization proposals for your source-language texts
- Regular maintenance service for your translation memories
- Cleanup of texts in other databases, e.g. catalogs, PIM or similar.
- Optimization of your linguistic data for training machine translation engines
Customer testimonials
What our customers say
"We have been working with D.O.G. for many years and appreciate their team as a competent partner. We have our user manuals translated into 25 languages and our new website. No matter if the translations are needed later in InDesign or Typo3, technical requirements are no problem. Even if it is urgent, you can rely on D.O.G. The first time we ordered Japanese translations for a new client, they were highly praised when we asked."
"Very good response to quotations and competence in case of queries regarding translations. Reliable handling of the translations with integration of a TMS as well as fast delivery of the translations."
"High-quality technical translations are essential, especially for our operating instructions and customer documentation for materials testing machines. D.O.G. provides us with all the translations we require in the highest quality and also with absolute adherence to deadlines. We are very satisfied with the translation work and can always recommend D.O.G. GmbH."
FAQ
Frequently asked questions about cleaning up translation memories
First of all, it is important that there is a process to keep translation memories “clean”. This includes selecting the right translation partner, maintaining and using a corporate terminology, and using attributes to make the most of the translated segments. It is recommended that translation memories be cleaned up regularly every 3 to 6 months, with occasional additional reviews in between.
- Quality deficiencies:If a translation memory is not cleaned up, the quality of translations decreases. Over time, translation memories can be filled with incorrect, inconsistent, poorly translated, or inaccurate translations, resulting in a low-quality output.
- Unnecessary Costs:When a translation memory is not cleaned up, translation costs can increase. Fewer segments can be reused, and quality assurance costs are higher due to errors or inconsistencies in translated segments.
- Safety risk:Over time, translation memories can be filled with serious errors that can lead to incorrect actions by the user of a device or software and cause property damage or personal injury.
- Compatibility issues:If a translation memory is not cleaned up, it may cause compatibility issues when used with other systems.
Translation memories can be cleaned up both manually and automatically. Automatic cleanup can detect and remove duplicate segments or delete segments that are too short to be useful. You can also make formal changes to the content of TMs, e.g. using regular expressions. You can also add metadata to segments.
Manual procedures are required whenever human judgment is needed. This is the case, for example, when deciding whether a translation is incorrect or whether a technical term needs to be changed.
Different options for different budgets
Strategies for cleaning and curating translation memories
Cleaning up translation memories is a complex and sometimes costly and lengthy process. Depending on the severity of the errors in the translation memories, the time budget and the cost budget, different strategies can be developed.
Option #1:
Complete cleanup of all errors: This provides the greatest assurance in terms of the quality of the final tested TMs, but not always the best cost-benefit ratio. For example, there are segments that are never used again or those that are very old and concern products that are no longer being developed.
Option #2:
Cleanup of only part of the translation memories. Some TMs were created 10 or more years ago and contain many segments that are no longer up-to-date. The cleanup action can be limited to the last three or five years, for example. This reduces the effort required.
Option #3:
Restrict the quality criteria and review only certain aspects. For example, you could specify that terminology standardization is limited to 50 or 100 key terms.
Option #4:
Work with attributes and deductions to the match level between segment in the memory and segment in the text when using segments from a translation memory. Thus, unchecked segments can receive an attribute and a 2-3% deduction for matches (hits from the translation memory when the same sentence occurs in the text). Thus, unchecked translations are fuzzy matches that the translator should check before including them in his translation. After the translation is completed and checked, all segments used are given the attribute “checked” (or similar).
These options can be combined with each other.
Service - Overview
We check these aspects of translation memories
Linguistic aspects:
- Are grammar and spelling correct?
- Has the specified terminology been followed?
- Are new technical terms recognized and translated consistently?
- Are the local specifications such as number formats or currency correct in the translation?
- Is the spelling of products consistent?
Technical and content aspects:
- Are the translations correct in terms of content?
- Are the translations technically correct?
Technical aspects:
- Is the encoding of characters correct?
- Have any tags been deleted or are any incomplete?
- Are segments truncated and incomplete?
- Are there duplicates or inconsistent translations?
Would you like to have your translation memories cleaned up?
Then you should talk to us, because there are many ways and means to save costs. You can benefit from our experience with numerous cleanup projects. Contact us without obligation.