Digital Tools Uncover Wartime Espionage

IDAH Fellow Arlene Díaz uses digital tools to uncover history of wartime espionage

An image of Arlene Diaz at work

By Shaun Williams, IDAH Graduate Assistant
April 18th, 2016

Arlene Díaz is an Associate Professor of History at Indiana University, Bloomington and a 2015-2016 faculty fellow at the Institute for Digital Arts and Humanities, through her collaboration with IDAH on a Collaborative Research and Creative Activity Funding (CRCAF) grant. Her current research project, “The Invisible War: Spies and Detectives in the Making of the Spanish-Cuban-American War and the American Empire, 1868-1908,” focuses on the Cuban exile community and how the war of Cuba’s independence from Spain was planned and implemented from within the United States. I sat down with Professor Díaz to find out how digital technologies have shaped her research methodology.


Shaun Williams: How did you begin this ambitious project?

Arlene Díaz: Initially, I wanted to work on the Cuban and Puerto Rican exile community in New York during the nineteenth century. I went to Puerto Rico to do research and I found a letter from the Spanish consulate in New York, which led me to the consular archives in Spain. It was there that I realized the extent to which surveillance played a role in the Spanish-Cuban-American War which, in retrospection, should not be surprising. By the mid-nineteenth century, technological developments like the photographic camera stimulated the creation of new fields of study such as criminology to improve the detection of suspicious bodies and activities.. Hence the first American private detective private organization, organization such as the Pinkerton National Detective Agency, was established at that time and they, along with other detective agencies, were hired by the Spanish government to follow Cuban insurgents in the United States.

In the Spanish archives, I found a wealth of documents related to surveillance and spies who were working for the Spanish government in the United States and Cuba. The reason Spanish officials followed their Caribbean subjects in the United States is pretty straightforward: from their main headquarters in New York City, these exiles were organizing Cuba’s second war of independence (1895). A critical part of this “invisible war” consisted of collecting money and sending illegal naval expeditions to the island with arms and supplies from ports in the south and east of the United States. However, the U.S. government did not stop these expeditions completely, even when they violated neutrality laws, because of their own longstanding expansionist interests in the Caribbean. The huge role these Cuban insurgents in the United States played in the planning and funding of the armed conflict, as well the impact of U.S. authorities and American citizens on the outcome of the war, have gone unnoticed in the historiography. This project will use detectives’ reports, most of which were generated by American detectives such as the Pinkertons, to get into the invisible war that Spain, Cuban insurgents, and Americans waged on mostly U.S. soil during the last two decades of the nineteenth century.

S.W.: How have digital technologies helped bring this project to fruition?

A.D.: I’m dealing with data relating to a long period of time (the second half of the 19th century), as well as three countries (Spain, Cuba, and the United States), and the exile communities of Cubans and Puerto Ricans who lived in New York and other U.S. cities. In order to keep track of all these people and their narratives, and the issues of espionage and surveillance, I needed to be able to compare what each person was writing at different points in time. It would have been impossible to do this efficiently without digital technologies.

After hearing about IDAH from Luis A. González (IUB Librarian for Latin America and Caribbean Studies), I decided that I should consult them about applying Digital Humanities research tools and methods for this project. IDAH Associate Director Clara Henderson told me about some of the possibilities for managing this huge amount of data using digital tools and the types of support that I could find for such a project at IDAH, IU Libraries, and OVPR. She suggested I apply for Collaborative Research and Creative Activity Funding (CRCAF) to partner with IDAH to develop protocols and workflows for processing the image files of the archival materials I have collected to convert them to searchable digital text files that may be run through textual analysis and visualization tools. She explained that the suite of tools I could utilize would enable me to conduct research and analysis I would not otherwise have been able to accomplish without these tools, and that the outcome of this project would then lay a solid foundation of research for my book project. The IDAH staff and scholars as well as Library staff such as Michelle Dalmau (Head of Digital Collection Services, IUB Libraries) and Kara Alexander (Digital Media Specialist, DCS, IUB Libraries) have been very helpful in moving this project forward. In addition to my involvement with IDAH and HASTAC (Humanities, Arts, Science, and Technology Alliance and Collaboratory), which have provided a very supportive community, I have also received valuable guidance from Kalani Craig, a Clinical Assistant Professor in the Department of History. She understands the work of historians as well as the challenges of working with digitized historical documents. Without this combination of efforts, I would not have been able to launch this project with such speed and energy.

I brought back 5000 scans of documents from Spain, including handwritten documents, typed and stenographed letters, printed broadsides, and newspaper clippings in Spanish, English and French. In order to use these for textual analysis, we had to clean them up in Photoshop and use OCR [Optical Character Recognition] software like Abbyy FineReader to turn them into text files, or transcribe them by hand. In addition, we’re also using some scanned books downloaded from the HathiTrust Digital Library, which means we’ve had to negotiate the use of these materials for text mining. And we had to scan some 19th century books that had not been digitized. We’ve been using Mechanical Turk for some of the transcription work, and have hired one undergraduate student and two History graduate students to help with the research as well as with more challenging document transcriptions. They transcribe the scanned documents and post the resulting text files in our IU Box folder. In addition, Meredith Laderer, an undergraduate student, has been using some of these materials for a CEWiT internship she is doing with Kalani Craig. Their work has helped us test ways in which we can further use network analysis in this research. Just establishing the protocols for this has been a challenge, but it is working really well; we’ve made huge progress since we started working on this in the fall of 2015.

S.W.: What do you do with the text once it is digitized?

A. D.: First, we used topic modeling [algorithms that uncover the hidden thematic structure in document collections] and input all of the text that we had digitized, organized by type of source (consular letters, private correspondence, personal diaries, newspaper articles, etc.) and date. This gave us a hierarchical list of topics based on words that appeared together frequently, which helped us to identify some of the main concepts that we needed to look at. For example, we noticed that “secret” and “hidden” were often occurring in the same document. From there, we used a corpus linguistics toolkit, AntConc, to get a better look at the individual contexts in which these words were used. When we analyzed different texts using these same processes, differences in word choice helped to reveal information about the texts’ authors. For example, one of the U.S. war correspondents whose newspaper articles we analyzed used many of the same words that Pinkerton detectives were using at the time, and it turned out that he was working as an agent for Admiral Sampson of the U.S. Navy. This was further corroborated through other sources. This combination of digital analyses is helping me identify potential ‘spies’.

Now that we have over 1,000 handwritten documents scanned and transcribed into text files (from 1890 to 1898) and another 2,000 pages of books and printed documents ready, we have been able to analyze them using topic modeling. The results have been fabulous. I would not have been able to get this far in the analysis of the data project in eight months without the tools I am using. For instance, these first experiments with topic modeling and corpus linguistics helped us analyze topics on a year-by-year basis so we could get a better feel for the major issues that were of concern in the detective reports and therefore to the Spanish consuls in the United States. These approaches helped me structure the chapters of my book, but they also sped up that structure. Working with the same documents with a more traditional close reading approach would have taken several years and provided very different kinds of results. It’s exciting to think about this work paving the way for future historical research.