From the Terminologist’s toolbox: Datamundi’s term extractor tool for PDFs
This tool was recently shared by Gert Van Assche (
@Gert_VA) in Twitter and in The Open Mic (click here).
Read more about this tool on Datamundi’s page and download the executable file here:
This is the description provided when you download it: “How does this tool work? This tool is a front-end to a term extraction engine running on our server. You’re using this front-end tool to instruct the engine what to collect for you and to extract the text from the PDF. This happens only when the PDF file is not encrypted or protected against text extraction. This extracted text is uploaded via FTP to our server. On this server the term extraction happens according to your wishes (with or without frequency, only multi-word terms or not, with or without generating an interactive term cloud). When done, the terms (up to 500!) are mailed to you.” But read the links to learn more.
Please note this tool only works for English.
So give it a try and let me know how it worked for you.
(Please note my posts do not endorse any company or person. I share information in my educational blog that I think my readers would find useful).