Terminology extraction tools

“Words mean more than what is set down on paper. It takes the human voice to infuse them with shades of deeper meaning.” - Maya Angelou

“Words mean more than what is set down on paper. It takes the human voice to infuse them with shades of deeper meaning.” – Maya Angelou

Terminology can be extracted either manually, by highlighting words on documents and transferring them to a program, such as Word or Excel, or automatically, by using terminology extraction tools.

In DejaVu, for example, in Lexicon (that’s how their tool is called) you have the choice to select phrases containing 2, 3, 4, or 5 words that would probably indicate that it has identified a term. Then Lexicon allows you to export those phrases into Excel, where you can filter them and see which ones are an actual term. That is basically what an extraction tool does. It is not as hard as you would imagine. The only problem is that, for long documents, you will get a lot of “hits”, so you have to be patient and take some time to review the results.

The Wiki gives a comprehensive list of terminology extraction tools. I have only personally used DejaVu and Trados Extract and Xbench. Some of the following tools have been mentioned in a post by the Linkedin Terminology Group about what translators and terminologists were using as an extraction tool. Please note that the commentaries were taken from their owner’s websites and are not a personal assessment.

Acrolinx – (Commercial) The foundation of Acrolinx’s terminology management system is a central database that stores terminology, keywords, trademarks, brands, and other words and phrases that are specific to your organization. Webside provides useful webinars on how to use it, as well as a section with an overview about terminology management.
ApSIC Xbench (Commercial) With Xbench you are just a hotkey away from your terminology. Just load your bilingual references on Xbench and press Ctrl+Alt+Insert from any Windows application when you want to find a term.
CrossTerm by Across (Commercial) is the terminology system of Across. It facilitates the maintenance and use of a consistent corporate terminology and of digital dictionaries. You can store preferred terms along with synonyms, definitions, images, usage and grammar information, etc. The storage of so-called do-not-use words prevents the use of terms to be avoided in company texts for either technical or marketing reasons.
FiveFilters (free). Term Extraction from FiveFilters.org is a free software project to help you extract terms (e.g. for use as tags) through a web service. Given some text it will return a list of terms with (hopefully) the most relevant first. Terms can be returned in a variety of formats. The application is intended to be a simple, free alternative to Yahoo’s Term Extraction service.
Lexicon, DejaVu (Atril) (A review here) (Commercial) Atril provides a 5 minute video on how to use it.
MultiCorpora‘s Terminology Management System, (Commercial) a tool of their software MultiTrans Prism, is easy to use, fully TBX (industry standard format for terminology data exchange) compliant, and has many automated features that save you time and money. When integrated with MultiTrans Prism‘ suite of tools, it can ensure terminology consistency at the start of a translation project and check completed translations for fidelity to approved vocabulary. This eliminates the factor of human error in maintaining terminology consistency.

Multifultor, by Rolf Keller, is a timesaving program for looking up terminology in dictionaries and almost any data source which you can use to look up a word or phrase. Multifultor can search websites, dictionaries, files and the Windows Index as data sources.

MultiTerm Extract, Trados (SDL) (Commercial). Scribb provides an easy to use guide. Read their tips and tricks here.
qterm by Kilgray (memoQ) (commercial) is a full-fledged browser-based terminology management system that connects directly to the memoQ server. Using qTerm, companies and organizations can turn their terminology into a corporate asset that facilitates internal and external communication, increases brand awareness, improves the quality of technical communication and cuts the costs of misunderstanding.Similis (Download here) – Free. Similis is a Translation Memory (TM) program of French origin, supporting English, German, French, Italian, Spanish, Portuguese and Dutch. It includes a linguistic analysis
engine to break down segments into chunks and generate corresponding Term Bases (TB) or glossaries.
SynchroTerm (Terminotix) (Commercial) s a powerful tool for extracting terms and efficiently creating terminology records from source and target document pairs, bitexts and translation memories. Its user-friendly windows allow you to perform sophisticated extraction, search and context checking functions. With SynchroTerm, your translation archives become a gold mine from which you can quickly extract terminology.
Syn-Tactic  (Free) Syn-Tactic technology will help you to reduce your overall translation costs and cycle time by the extraction of relevant terminology from your technical documentation and use it to feed an advanced machine translation system which will pre-translate your documentation in a shorter time, more cost-efficient and in a more consistent manner.

T-Manager by Rafael Guzman. The tool is made of Excel macros that you can run to validate, diagnose, customize, compare glossaries, do aligned comparisons, leverage and extract terminology, and generate reports.

TaaS (Terminology as a Service) – Free; Beta. Search terminology in various sources. Identify term candidates in your documents and extract them automatically. Look up translation candidates in various sources. Refine and approve terms and their translations. Share your terminology with other users. Collaborate with your friends & colleagues. Use your terminology in other working environment. Watch their 3 minute intro in youtube.
TerMine (Free) by The National Centre for Text Mining (NaCTeM) is the first publicly-funded text mining centre in the world. We provide text mining services in response to the requirements of the UK academic community. NaCTeM is operated by the University of Manchester.
Webterm by Star TS (Free) gives you the power to manage and update your terminology on a global basis. Teams across the globe can communicate their views and implement terminology revisions, making the process faster, cheaper and more efficient. Changes to terminology are available instantly around the world.TermStar Transit NXT (Commercial) is a comprehensive system for terminology use and administration. Importing and exporting terminology, printing dictionaries, searching for terms (including their declined and conjugated forms) as well as other functions will fully meet the needs of a terminologist. TermStar is available either as an independent application or as an integrated part of Transit.
UniTerm by Acolada has been designed to create, to edit and to manage professionally corporate terminologies and special language dictionaries. Developed on the requirements of today´s dictionary systems and of translation tool´s terminology management systems, UniTerm offers a unique set of funcionalities. Read more on the product sheet.

Read the blog WordLo for more useful tools (free). I also recommend you join for Linkedin’s Terminology Group and read more about tools.

10 things you should know about automatic terminology extraction” A guest post by Uwe Muegge at Lingua Greca’s blog.

2 Comments on “Terminology extraction tools

  1. Hi. Sorry for the late response. I will share this soon in a blog post. Thanks for sharing it!

Leave a Reply

Your email address will not be published.