A tool that I did not just discover on Bamboo Dirt but that I find very useful and frequently recommend to our researchers is Ant Conc. It is a traditional concordancing program that is easy to use and allows one to quickly start exploring the contents of a large file or text collection. This is a classic kind of tool that has been around for a long time, but is still relatively unknown to most of our researchers, and hence is worth publicizing in our instruction.
Quickly downloadable to your desktop, it allows you to index multiple files in text format and then to quickly create a word frequency list for the corpus, to create a concordance for one or more words in that corpus, to sort the concordance to by the words that occur to the left or right of the word(s) being studied, and to identify the phrases using that word that occur most frequently in the text.
The availability of regular expression syntax in the search provides a powerful tool for creating a set of terms to study. One can plot the occurrence of the selected terms over the corpus (think of the word “white” in Moby Dick, for instance), and map collocations for the words – i.e. the words that most frequently occur around your hit terms (a kind of connotative penumbra, if you will).
The tool also enables you to compare the frequencies of terms in your corpus with another “reference corpus,” to get a better sense of the most distinctive features of its vocabulary. Hence, it is valuable not only for exploring the contents of a specific text, but comparing sets of texts with one another.
Ant conc is also very useful when one is trying to do some quality control on a large text that has been scanned and read using Abbyy FineReader at the DHC. It can help one to identify some of the characteristic errors that the software has made and therefore to correct them globally with search and replace commands. I have been using it to good effect during the past few days to clean up some texts of medieval Polish charters downloaded from Hathi Trust for use in the CHARTEX project. It takes just a moment to download the program and run it. Since it is an exe
file, it does require administrative control of the machine (since running an exe
file from an unknown source is a good way to get a virus). Hence, we will need the approval and help of our IT folks to put it on our work machines, but we and our readers can have it up and running on our own machines in just a second.