Kaci Nash

historian

(DayOfDH) Processing Texts: Making Documents Machine Readable

| 0 comments

I am participating in Day of DH 2014 this year and maintaining a blog on their website. Cross-posting my entries here, because hey! I’m actually blogging.

As planned, I have spent the greater part of the day organizing the approximately 2,000 freedom petition photographs I took at the National Archives into a coherent filing system, organized by term, category of filing, and case number and documenting the image numbers in the spreadsheet I maintained as I was photographing. I think I am a little over half done with this process. Though I still have about an hour and a half left to dedicate to the D.C. courts project today, I am turning my attention to my other project–Locating Lord Greystoke. Right now we are in the process of building two corpuses of texts–one that is large, inclusive, and will be used in our text analysis efforts, and and a second smaller one of key documents that will be featured on the project’s website. The document that I am working with now has been reviewed by the project leader, historian Jeannette Jones, who has pulled out selected passages from the text and made note of the people, places, and concepts she wants to be called out on the website. An undergraduate student also working on the project has already run the document through an OCR program, the output of which I will mark up in TEI. The notes on the document prepared by Dr. Jones indicate what will make it into the <profileDesc> tag in the TEI header, which items she wants to appear in the site’s Encyclopedia and thus need to be encoded in the text, and which places are going to appear as mapping points for this particular document. At the moment, the website’s documents are indexed in Solr and transformed by Cocoon, but we are looking into migrating over to a different framework in the very near future. You can  view a draft of this process in action at the project’s website, where we have set up a proof of concept using minimal documents and our first pass at the project’s mapping interface.

sdafd
A look at my screen: Dr. Jones’ notes; Oxygen, which I use to encode the XML document; and the Google Spreadsheet that is serving as a working bibliography of our project documents.

Leave a Reply

Required fields are marked *.


Follow

Get every new post delivered to your Inbox

Join other followers: