Sherlock Holmes on the Radio Airwaves

A Digital Remediation and Analysis

Editorial Methodology

Corpus Preparation

The radio scripts were sourced from the Generic Radio Workshop, where there were downloadable plain text (.txt) files that includes the body contents of the scripts. For the purpose of this research, metadata that was excluded from the plain text files at the top of the webpage display were also included into the files. There was a regular pattern of "series", "show", "date", and "cast", which led to the decision of strictly limiting the Relax NG schema code in this specific order. There were some discrepancies within the files, most notably being with "Murder in Casbah".

The content of the radio scripts had to be regularized during mark-up so that queries could be performed easily on them in the future, taking into consideration that any changes would not affect the contextual or structural analysis of the project.


The first step of analysis for the corpus was to mark the changes to the story from Arthur Conan Doyle's publication for the radio scripts. Text from Arthur Conan Doyle's original publication was sourced from Project Gutenberg, which allowed me to copy the text needed for the specific chapters into an XML file. For the initial stage of the analysis portion, I focused on the story "A Scandal in Bohemia", where there was a radio recording, the radio script, and original publication are available online for comparison purposes. XSLT was used to plant <xml:id>s in the Arthur Conan Doyle texts.

<xsl:template match="p">
        <p xml:id="{ancestor::xml/@xml:id}-p{count(preceding::p) + 1}">

These <xml:id> tags were used to stitch together the <ln> elements in the radio script to show the correlation between the two files. The XSLT allowed automatic tagging for the Arthur Conan Doyle texts, which then gave a pointer for me to use to manually tag segments of specifc paragraphs from the text

Manual tagging was used for the segmented portions of the story where the changes between the two source files were not similar enough to be stitched together by the original paragraphs.