Hack to Learn: Two days of data munging in our nation’s capital

In May, I was one of the 50+ fortunate participants in Collections as Data: Hack-to-Learn, a two-day workshop organized by Library of Congress, George Washington University, and George Mason University. Four datasets and five tools to work with the data were provided in advance. On the first day, we met at the Library of Congress and were introduced to all the datasets, as well as three of the tools. The first of the datasets was the Phyllis Diller Gag File. The Smithsonian’s National Museum of American History and Smithsonian Transcription Center have been conducting a crowdsourcing project to transcribe the 52,000 original index cards of the comedian’s jokes; in March of this year the transcriptions were made available. Each card contains a joke, often a date, sometimes an attribution – if someone gave Diller the joke – and are organized by subjects that appear at the top of the cards….