Ant Conc: a classic but very useful tool available on Bamboo Dirt

A tool that I did not just discover on Bamboo Dirt but that I find very useful and frequently recommend to our researchers is Ant Conc.  It is a traditional concordancing program that is easy to use and allows one to quickly start exploring the contents of a large file or text collection.  This is a classic kind of tool that has been around for a long time, but is still relatively unknown to most of our researchers, and hence is worth publicizing in our instruction.

Quickly downloadable to your desktop, it allows you to index multiple files in text format and then to quickly create a word frequency list for the corpus, to create a concordance for one or more words in that corpus, to sort the concordance to by the words that occur to the left or right of the word(s) being studied, and to identify the phrases using that word that occur most frequently in the text.

The availability of regular expression syntax in the search provides a powerful tool for creating a set of terms to study.  One can plot the occurrence of the selected terms over the corpus (think of the word “white” in Moby Dick, for instance), and map collocations for the words – i.e. the words that most frequently occur around your hit terms (a kind of connotative penumbra, if you will).

The tool also enables you to compare the frequencies of terms in your corpus with another “reference corpus,” to get a better sense of the most distinctive features of its vocabulary.  Hence, it is valuable not only for exploring the contents of a specific text, but comparing sets of texts with one another.

Ant conc is also very useful when one is trying to do some quality control on a large text that has been scanned and read using Abbyy FineReader at the DHC. It can help one to identify some of the characteristic errors that the software has made and therefore to correct them globally with search and replace commands. I have been using it to good effect during the past few days to clean up some texts of medieval Polish charters downloaded from Hathi Trust for use in the CHARTEX project. It takes just a moment to download the program and run it. Since it is an exe file, it does require administrative control of the machine (since running an exe file from an unknown source is a good way to get a virus). Hence, we will need the approval and help of our IT folks to put it on our work machines, but we and our readers can have it up and running on our own machines in just a second.

Posted in: tools ● Tagged:
Bob Scott

Author: Bob Scott

Now Columbia’s Digital Humanities Librarian and formerly Head of the its Electronic Text Service, I am excited by the potential of a new wave of tools and techniques for finally realizing the promise of the digital format for humanities scholarship. My own academic interests lie in the history and culture of Eastern Europe, particularly in the Middle Ages, but I look forward to applying my desire for better tools to share and analyze historical sources in that field to the more immediate focus of the history of Morningside Heights. For the Morningside project, I bringing together sources illustrating the history of the Bloomingdale Asylum, which occupied the land that is now the main campus of Columbia from 1821 to 1894.