From the terminologist's toolbox: BootCaT Frontend

This corpus builder takes your links and extracts text that can be analyzed in a corpus analysis tool such as AntConc.

There is a very easy tutorial that you can follow (see link below) in which they provide screen shots explaining every step of the process. After you give a name to your project and pick the language, you choose from three different ways to capture your text. The first option is the Simple Mode: You give the tool the terms (called seeds) and BootCaT generates “tuples” (different combinations of those terms) and it automatically collects the URLs related to the topic. To avoid thousands of hits that might not be interesting, you can limit the search to one Internet domain. The final step is to build the corpus which the tool saves in your computer.

You can see the corpus built by BootCaT in Notepad++. Apparently the regular Windows Notepad cannot open the full corpus but they provide the link to download it:

Take note that in order to use the tool you first need to get an Account Key, but no worries, it gives you the link to get it. It’s a quick step.

There are other two options that allow you to skip the first steps: the Custom Tuples in which you type in the tuples directly into the screen and the Custom URL option in which you load a list of favorite URLs. It’s that easy! Here is a sample corpus on oil and gas that I built in BootCaT and uploaded to AntConc. Note that I didn’t change the file name that it generated. As default it saves it as “corpus.txt”, but you can change it as necessary.

Read the full tutorial here:

Watch a tutorial in YouTube prepared by Lexytrad (Universidad de Málaga):

BootCaT also has a page with documentation that will allow you to master this tool:

Make sure you check the comments by Mura Nava below for more useful links.

Just a fun fact: That cute black cat, their mascot, actually has a name: Sbafo, and was designed by Matteo Mazzacurati.

Happy searching!

Image source

