Category Archives: Archives

Posts about archives, archivists, and archiving.

Building Scholarly Online Archives and Exhibits with Omeka

These days, any scholar or organization with a collection of primary sources such as photographs, drawings, paintings, letters, diaries, ledgers, scores, songs, oral histories, or home movies is bound to have some of this material in digital form. Omeka is a simple, free system built by and for cultural heritage professionals that is used by archives, libraries, museums, and individual scholars and teachers all over the world to create searchable online databases and attractive online exhibits of such digital archival collections. In this introduction to Omeka, we’ll look at a few of the many examples of websites built with Omeka, define some key terms and concepts related to Omeka, go over the difference between the hosted version of Omeka and the open source server-side version of Omeka, and learn about the Dublin Core metadata standard for describing digital objects. Participants will also learn to use Omeka themselves through hands-on exercises, so please *bring a laptop* (NOT an iPad or other tablet) if you can (if not, you can follow along with someone else). Learn more about Omeka at omeka.org and omeka.net.

Note: I’ve taught this workshop many times at various THATCamps, and my lesson plan is online for you to use at amandafrench.net/2013/11/12/introduction-to-omeka-lesson-plan/. See also how Virginia Tech’s own Special Collections uses Omeka at omeka.lib.vt.edu/.

Digital Humanities Projects & (Library) Partnerships

[Apologies for posting so late–I forgot my password and never got a reset link, so I’ve asked a colleague to post for me. So, look for me (Kira Dietz) if you want to talk!]

Special Collections at Virginia Tech has been involved in a few digital humanities projects, usually helping to provide original materials, scanning equipment, training, or some combination of those. However, the University Libraries’ staff and faculty here (and at other academic institutions) have a wide variety of skill sets and interests, including some that go far beyond their daily jobs! I’m interested in talking through some questions relating to the idea of partnerships and collaboration. While my particular focus is on libraries and archives, I’m curious to know where else those of you working on DH projects might find collaborators or partners. So, hopefully we can try to answer questions like:

Do you look for/have you worked with collaborators outside the main field of your project? If so, from where?
What do you look for in a partner/collaborator?
What resources, tools, and skills do you need or want access to that you may not have?
Do you know what potential partners (like libraries, for example) might be able to bring to your project?
How can potential partners let you know they are out there and open to opportunities? How can potential partners best share/promote what they can do?
Why should people creating/running DH projects be interested in collaborators? Why should potential partners be interested in DH projects? (In other words, for both sides, what do “we” get out of it?)

I see this primarily as a “talk” session, but it will also be a great opportunity for all of us to “teach” each other what we might bring to a project!

Musicplectics: Digitally Ranking Musical Complexity for Educators

000000">Musiplectics: the study of the complexity of music

We are in the middle of a collaborative research project here at Virginia Tech (in the departments of music and computer science) and ICAT, creating a web-based software program that automatically ranks the complexity of written music (music scores). The idea is that, by scanning a music score, using a pre-existing pdf, or an xml file, users would be able to use our application to determine the skill level required for the performance of a musical work. We have developed a working prototype to explore music written for clarinet, but are planning to expand its utility to include use for all other common musical instruments.

Pedagogical Value:

I. To Increase the utility of existing digital archives. With the growing availability of digitized scores (both from historical works in the public domain and xml that is being uploaded daily to the web), musicians and music teachers are overwhelmed by the amount of musical works newly available to them at the click of a button. We feel that if there were some way to rank and categorize these works by their complexity, these vast digital resources would become less daunting and more widely used by educators.

II. For competitions and festivals. Often times, educators must make highly subjective, and time-consuming decisions regarding the difficulty levels of the music that is chosen. This ranking system could clarify and objectively facilitate these often-debated choices.

Future Research Projects:

We would like to explore collaborating with existing databases, libraries, or digital reserves to rank as many scores as possible and explore other possible applications of our technology. Further collaborations and suggestions or ideas from our colleagues would be wonderful!

Questions:

– How can we use crowdsourcing, surveys, or other methods as a tool to set the parameters for judging exactly what is difficult on various instruments? Is there a way to get input from a large number of participants so that our parameters could represent a good average for what people believe?

– How can we find the most advanced OCR (optical character recognition) software that will help us to make sense of music, which is sometimes hard to decipher?

– Are there other applications of our software that might be unexplored?

– Who would like to collaborate with us?

Managing Digital Research (Updated w/ Session Notes)

My arrival at THATCamp Virginia on Friday will end a week jaunting around western Virginia doing historical research at a variety of archives and libraries for a new book project (hooray for sabbatical!). Between this and a previous archival trip, just for this project alone, I will have accumulated several thousand .jpg images of archival documents, along with webpage captures, plus article PDFs and notes in various formats (.doc, .xcl, and plain old notebooks). I’m already a Zotero user, so this session can go in the direction of getting the most out of that software for research management, but it’s obviously not a very convenient tool for dealing with this giant cache of images. Ideally, I need a way to make images of documents (most of them typewritten) text-searchable without transcribing each one. Since that’s utterly utopian, I’d settle for a good method of converting photo batches into a single PDF file (like that time I had to photograph all 100+ pages of an old government report instead of being able to scan it into a PDF). And/or somehow attaching robust metadata to image files to allow for search and retrieval (currently, I’m making a simple Excel index for my image files, but I’m sure there’s a better way).

I would welcome a session sharing ideas, suggestions, tools and hacks for keeping track of (tagging? coding? databases?), searching across, and wrangling the unwieldy collection of digital ephemera that we create in this new era of web-and-gadget-based research and writing. I’m sure I’m not the only one wrestling this beast!

Update Fri 4/10 8:30pm

This session ran on Friday afternoon 4-5, and I promised to make my notes public – thanks to everyone who participated and gave suggestions, I sincerely apologize I didn’t think to send around a sign-in so I could attribute the ideas to the people who offered them! GREAT conversation!

Main Ideas
Historians generate images during research as “slavish reproductions” (the legal term) of original artifacts. Whether the original item is under copyright or not, the historian owns the image, and in particular owns any metadata s/he creates associated with that image and its contents. The key to keep in mind when COLLECTING archival images is to be meticulous about documenting where the original items lives, to be able to generate an authoritative citation to it in the future. There are lots of ways to do this well, including keeping a careful list of each image, reproducing the archive’s box-folder-series hierarchy with your own folders, or renaming the images with a standard code to indicate archive-collection-box-folder.

It’s also critical to distinguish between the CONTENT (which includes the image, and the stuff on the image rendered into text if possible) and the METADATA for each item. Content and Metadata are different, but should be linked with a unique identifier. Dublin Core is the archival standard for metadata fields and categories, but it’s not the only possibility.

Specific Tools & Suggestions
Text contained on images, especially if typescript, can be extracted using OCR. Results vary of course, and multi-column newspapers or pale mimeographs might be problematic (if you’re working with mid-20th century sources like mine), but it can be a start. Recommended: Tesseract, Adobe Acrobat, Evernote.

For generating robust metadata associated with images, we agreed that for small-scale projects this really does have to be done by hand; there are limited possibilities for automation, but some ideas included: Devon Think, LibraryThing / Library World, Omeka, Microsoft Access and even Flickr or Tumblr. One good suggestion on tags comes from Shane Landrum @cliotropic (who wasn’t even there, but whose brain I had picked on this yesterday) to adapt library MARC-style tags suitable for your specific project. You’d just need a metadata or content management program that will accept punctuation in the tagging field. For example, ones that might work for my project on school closures during Virginia’s massive resistance era:

SU:massive resistance (SUbject)
YR:1959 (YeaR)
ST:VA (STate)
AU:Boyle, Sarah Patton (AUthor)
PL:Warren County (PLace)
SCH:Prince Edward Academy (SCHool)
ORG:NAACP (ORGanization)
DN:SBC (DeNomination)

…etc

Other Issues
–ideas on OCR of other kinds of files like handwritten sheet music, manuscripts in longhand, non-English-language? For those, see Kurt Luther‘s session ideas on crowdsourcing… some of that work might be something MTurk workers could help with

–what is the threshold for automating / writing script / crowdsourcing to do these tasks vs. the valuable intellectual work of doing them by hand oneself

–thinking ahead about whether sharing of one’s scholarly collection of research images might be something to consider – and what that might mean for database construction up front / early on

— the questions you’ll be asking of your data to some extent drives the form your research database will take, but that is a dynamic & evolving thing because you may find that there are some insights you will discover ONLY because your research data is digital, categorized, searchable, and capable of being manipulated with software. That’s a happy thought!

–What did I overlook? Make additions to my notes in the comments!

Digital humanities + design records

Hello everyone! I’m interested in getting together with some like-minded scholars and practitioners to talk about how architecture and design records in archives can be incorporated into the digital humanities. This is just a simple blurb. I’m excited to see where the conversation might take us. Some ideas for discussion:

How do DH folks use architectural and design records?
What keeps them from using architectural and design records?
What prevents archivists and other cultural heritage professionals from making these records accessible?
What can we do about these barriers?
What would our ideal future look like? Would it have emulation environments for CAD drawings? 3D printing? Virtual reality viewers?

We produced a Google Doc to capture the salient points of our discussion. Check it out here:

docs.google.com/document/d/1-T869zvjb0MiYimb-R-EZ4BQxyEQi5dfqgS1XcdgJgA/pub