From the Terminologist’s toolbox: Datamundi’s term extractor tool for PDFs

Terminology extractionThis tool was recently shared by Gert Van Assche () in Twitter and in The Open Mic (click here).
Read more about this tool on Datamundi’s page and download the executable file here:
http://www.datamundi.be/cms/index.php/tools/extract-terminology-from-pdf
This is the description provided when you download it: “How does this tool work? This tool is a front-end to a term extraction engine running on our server. You’re using this front-end tool to instruct the engine what to collect for you and to extract the text from the PDF. This happens only when the PDF file is not encrypted or protected against text extraction. This extracted text is uploaded via FTP to our server. On this server the term extraction happens according to your wishes (with or without frequency, only multi-word terms or not, with or without generating an interactive term cloud). When done, the terms (up to 500!) are mailed to you.” But read the links to learn more.

Please note this tool only works for English. 

So give it a try and let me know how it worked for you.

Happy extracting!

(Please note my posts do not endorse any company or person. I share information in my educational blog that I think my readers would find useful).

2 Comments on “From the Terminologist’s toolbox: Datamundi’s term extractor tool for PDFs

  1. Thanks for the tip, the tool would have come in really handy for me. However, it doesn’t work, claiming a txt file is missing somewhere.

  2. Thanks for writing Eva. You might want to contact Gert for help. His email is in the link provided.
    Patricia

Leave a Reply

Your email address will not be published.