Editorial policies
Screen presentation on this site
Tooltip clarifications appear when the pointer hovers over blue text in the diplomatic version, while blue in the normalised version indicates the corresponding text (if present). Both screen versions have corresponding red text to show where they differ elsewhere, apart from ſ ~ s. Hover over annotations in the diplomatic version for information on the annotator.
Conventions in screen and file versions are summarised in the table below. The presentation on this (project) website centres on the diplomatic transcription and permits one of three different tabs to be set alongside. The presentation on the MDC site is image-centred, and again different tabs may be placed alongside. Your choice of facing tab is remembered as you move around either site. Transcription presentation in MDC is almost the same as listed below. Editorial notes only appear in the diplomatic transcription. They are endnotes on this site, with hyperlinks between note lemma and note, but in MDC are true footnotes, numbered afresh on each page. Compare HAM/1/2/16 on this site with HAM/1/2/16 in MDC.
handwritten original | TEI/XML file filename.xml | diplomatic text on-screen | normalised text on-screen | diplomatic plain text file filename.txt | normalised plain text file filename-n.txt |
---|---|---|---|---|---|
long s | ſ (UTF-8 long s) | ʃ (UTF-8 esh) | s (normal s) | ſ (long s) | s (normal s) |
any symbol for and | & | & | & | & | & |
other characters | normal print equivalent | normal print equivalent | normal print equivalent | normal print equivalent | normal print equivalent |
&c. = 'et cetera', Dr = 'Doctor' (as title), Mr, Mrs, Messrs, PS, St = 'Saint' | as written,1 untagged | as written1 | as written1 (inline) | as written1 (inline) | as written1 (inline) |
other abbrev'ns, incl. Dr = 'Dear', 'Doctor' (as common noun), 'Dowager', St = 'Street' | tagged 1 <abbr> ~ <expan> | as written1 | expanded (if known) | as written1 (inline) | expanded (if known) |
initial for name | tagged <abbr> ~ <expan> | as written | expanded (if known) | as written | expanded (if known) |
dash as punctuation 2 | -- | -- | -- | -- | -- |
obsolete spelling known at the period (acc. to OED) | tagged <orig> ~ <reg> | as written | normalised | as written | normalised |
idiosyncratic spelling or error | tagged <sic> ~ <corr> | as written | normalised | as written | normalised |
initial capital 3 | as written, untagged | as written | as written | as written | as written |
'd for -ed | as written, (currently) untagged | as written | as written | as written | as written |
(non)use of possessive apostrophe, incl. e.g. gen. sg. any bodies | as written, untagged | as written | as written | as written | as written |
foreign word or phrase4 | tagged | unmarked | unmarked | unmarked | unmarked |
obsolete morphology (e.g. had wrote, She eat some chicken) | (some) tagged <orig> ~ <reg> [in progress] | as written / as written | as written / normalised | as written | as written / normalised |
text supplied by editors | tagged | [supplied text] + tooltip | supplied text | supplied text, unmarked | supplied text, unmarked |
text added by writer | tagged | added text | added text | added text, unmarked | added text, unmarked |
substitution by writer | tagged | + substitute + tooltip | substitute only | substitute only, unmarked | substitute only, unmarked |
cancelled text | tagged | + tooltip | text absent | text absent | text absent |
cancelled text, unreadable or uncertain | tagged <del> + <gap> | ------ + tooltip | text absent | text absent | text absent |
unreadable or uncertain text | tagged <gap> | ------ + tooltip | ------ | <GAP: nn units> (characters, words, lines) | <GAP: nn units> (characters, words, lines) |
unclear or damaged but reasonably certain text | tagged | unclear text + tooltip | unclear text | unmarked | unmarked |
superscript, subscript, position above/below line | tagged <hi> or <add> | formatting displayed | formatting absent | formatting absent | formatting absent |
underline (various styles) | tagged <emph> | formatting displayed (single underline) | formatting absent | formatting absent | formatting absent |
boundary stroke or line (≠ word underline) | (some) tagged <milestone> [in progress] | thin horizontal line | thin horizontal line | ignored | ignored |
new line | tagged | as written | as written | as written | as written |
word split across lines 5 | tagged <orig> ~ <reg> | as written5 | reassembled on first line without internal punctuation | as written5 | reassembled on first line without internal punctuation |
new paragraph at linebreak ± indent | tagged | as written | as written | no indent | no indent |
centred text 6 | (most) tagged [in progress] | centred, on new line | centred, on new line | left-aligned, on new line | left-aligned, on new line |
right-aligned text 6 | tagged | right-aligned, on new line | right-aligned, on new line | left-aligned, on new line | left-aligned, on new line |
new column or page | tagged | ruled line | ruled line | blank line | blank line |
catchword | tagged <fw> | catchword + tooltip | text absent | <CATCHWORD: word> | text absent |
surplus word | tagged | surplus word + tooltip | word absent | <SURPLUS: word> | word absent |
editorial footnote | tagged <note/@resp> | lemma[numeral] + tooltip | note absent | note absent | note absent |
quoted speech | tagged <q> | unmarked | unmarked | unmarked | unmarked |
literary or biblical quotation | tagged <cit/quote + bibl> | quoted text + tooltip | quoted text | quoted text | quoted text |
line of verse | tagged <l> | unmarked | unmarked | unmarked | unmarked |
change of hand in letter as sent | tagged <handShift> | unmarked unless footnote needed | unmarked | <HANDSHIFT> | <HANDSHIFT> |
annotation not present in letter as sent | tagged <note/@hand> | annotation + tooltip | annotation absent | <ANNOTATION: annotation> | annotation absent |
moved section 7 | original and destination locations tagged <anchor>, <ref> | ▼ at original location + tooltip, footnote at destination | ▼ at original location | no indication at original location, <MOVED> at destination | no indication at original location, <MOVED> at destination |
Notes to table
1 Any punctuation under superscripted letter(s) in abbreviations is placed last, regardless of relative left-right orientation in the original. Thus, Mr. Mr: Mr– Mr may occur (inline versions Mr. Mr: Mr- Mr), but M.r M:r M-r will not. A letter+macron abbreviation (ac̄ept, com̄and, etc.) is generally expanded as doubling of the letter (accept, command), but note Com̄ps,thrō, wc̄h (Compliments, through, which).
2 The dash as punctuation, represented by two hyphens, always has a space on either side. By contrast, a single unspaced hyphen character is used for normal hyphen (well-known) and horizontal stroke under superscript abbreviation (Mrs–). Unspaced double em-dash is used for a dash that suppresses all or part of a name or place (Miſs —— = ‘Miss Goldsworthy’, their —— = ‘their Majesties’, to —— = ‘to Windsor’, Mr. H—— = ‘Mr. Hodges’, Ly– S.—— = ‘Lady Stormont’, the K——g = ‘the King’), shortens a word (by T——w = ‘by Tomorrow’) or euphemistically blanks all or part of a profanity (D——d = ‘Damned’).
3 In some hands it can be difficult to distinguish upper and lower case in word-initial position. Decisions are based on close comparison with other letter-forms in the same hand, but some arbitrariness is inevitable.
4 French and other foreign languages are not normalised – neither corrected nor regularised to present-day grammar and orthography. Place-names are not generally normalised either.
5 Words split across two lines may have a hyphen on the first, the second or both fragments (reco-|ver, imperfect|-ly, satisfacti-|-on); or a double hyphen (pur=|port, dan|=ger, qua=|=litys); or none (respect|ing).
6 Centred text and right alignment are simulated on-screen by extra indentation.
7 Insertions that interrupt the text are moved to their logical point or to the start or end of a letter; address panels are placed at the end.
Project files
The master-copy of each document in the project is an XML file conforming to TEI P5. End-of-line is LF only.
Two different TXT files are derived from each XML file: plain and (partially) normalised. The main purpose of normalisation is to facilitate research and improve future part-of-speech tagging; coverage is subject to change. EOL is CR + LF.
The corpus edited and released to date, with each TXT format in a separate zip file, is freely available for non-profit use to anyone who registers. Just fill in our simple online form here.