Editorial schema and documentation for the Unlocking the Mary Hamilton Papers project
Table of contents
- 1. Document history
- 2. Project and edition
- 3. Guidelines for transcription and mark-up
- 3.1. The TEI header
- 3.1.1. Title of item in the edition: <titleStmt>
- 3.1.2. Contributions by project team and others: <editor> and <respStmt> in <editionStmt>
- 3.1.3. Acknowledging funders: <funder> in <editionstmt>
- 3.1.4. Responsibility for sets of files: <authority> in <publicationStmt>
- 3.1.5. Number of words: <measure> in <extent>
- 3.1.6. Title in repository catalogue: <msIdentifier>
- 3.1.7. Title and language in MDC: <msItem>
- 3.1.8. Number of sheets: <measure> in <supportDesc>
- 3.1.9. Hands: <handDesc>
- 3.1.10. Date of writing: <origDate> in <fileDesc>
- 3.1.11. Responsibility for editing: <projectDesc>
- 3.1.12. Sending and receipt of letters: <correspDesc>
- 3.1.13. File history: <revisionDesc>
- 3.2. Structural elements of the text
- 3.2.1. Page begins: <pb>
- 3.2.2. Column begins: <cb>
- 3.2.3. Line begins: <lb>
- 3.2.4. Paragraphs: <p>
- 3.2.5. Divisions: <div>
- 3.2.5.1. Diaries and journal-letters
- 3.2.5.2. Multiple letters or notes in a single item
- 3.2.5.3. Addresses
- 3.2.5.3.1. Frank signatories
- 3.2.5.4. Self-embedded <div>
- 3.3. Opening and closing elements of a letter
- 3.3.1. Openers: <opener>
- 3.3.2. Closers: <closer>
- 3.3.2.1. Signatures: <signed>
- 3.3.3. Salutations in opener and closer: <salute>
- 3.3.3.1. <salute n="opening">
- 3.3.3.2. <salute n="closing">
- 3.3.4. Datelines: <dateline>
- 3.3.5. Dates: <date>
- 3.3.6. Postscripts: <postscript>
- 3.4. Other content-based mark-up
- 3.4.1. Names
- 3.4.1.1. Names of people
- 3.4.1.2. Names of places
- 3.4.1.3. Spelling variation in names
- 3.4.2. Passages marked for research analysis: <seg>
- 3.4.3. Direct speech: <q>
- 3.4.4. Quotations: <quote>
- 3.4.5. Verse line: <l>
- 3.4.6. Foreign words and text: <foreign>
- 3.4.6.1. Accents and spelling in French
- 3.4.7. Indications by author for recipient: <fw>
- 3.4.1. Names
- 3.5. Manuscript features
- 3.5.1. Underlining: <emph>
- 3.5.2. Superscript, subscript, italics, etc.: <hi>
- 3.5.3. Hyphens, dashes and lines for suppression
- 3.5.4. Authorial lines and flourishes: <milestone>
- 3.5.5. Quotation marks
- 3.5.6. Apostrophes
- 3.5.7. Parentheses
- 3.5.8. Ampersands, etc.
- 3.5.9. Word spacing
- 3.5.10. Additions: <add>
- 3.5.10.1. Addition vs. annotation
- 3.5.10.2. Addition vs. continuation
- 3.5.11. Deletions: <del>
- 3.5.11.1. Substitutions: <subst>
- 3.5.11.2. <del> + <add> ≠ <subst>
- 3.5.12. Censorship: @reason="censored"
- 3.5.13. Change of hand: <handShift>
- 3.5.14. Annotations on manuscript: <note>[@hand]
- 3.5.14.1. Which shelfmarks to transcribe
- 3.6. Editorial intervention
- 3.6.1. Diplomatic vs. normalised: <choice>
- 3.6.1.1. Correction: <sic> + <corr>
- 3.6.1.2. Regularisation: <orig> + <reg>
- 3.6.1.2.1. Word separation at line-end
- 3.6.1.2.2. Self-embedded <choice>
- 3.6.1.3. Expansion of abbreviations: <abbr> + <expan>
- 3.6.1.3.1. Macron
- 3.6.1.3.2. Tho and tho' (‘though’), thro and thro' (‘through’)
- 3.6.2. No transcription: <gap>
- 3.6.3. Cautious transcription: <unclear>
- 3.6.4. Supplied text: <supplied>
- 3.6.5. Redundant words: <surplus>
- 3.6.6. Moved text: <anchor> and <ref>
- 3.6.7. Footnotes: <note>[@resp]
- 3.6.7.1. Links and cross-references: <ref>
- 3.6.7.2. Bibliographic citations: <bibl>
- 3.6.8. XML comments (internal editorial notes)
- 3.6.1. Diplomatic vs. normalised: <choice>
- 3.7. Overview of dating of letters
- 3.8. Overview of attribute @rend
- 3.8.1. Style
- 3.8.2. Rightwards movement
- 3.8.3. Cancellation of midline linebreak
- 3.9. Overview of person references
- 3.9.1. <salute> vs. <seg>
- 3.9.2. Referring to persons in <opener>
- 3.9.3. Referring to persons in <p>
- 3.9.3.1. The addressee
- 3.9.3.2. The author
- 3.9.3.3. Other people
- 3.9.4. Referring to persons in <closer>
- 3.9.4.1. The addressee
- 3.9.4.2. The author
- 3.1. The TEI header
Unlocking the Mary Hamilton Papers
Transcription manual and coding schema
David Denison, Tino Oudesluijs and Nuria Yáñez-Bouza
1. Document history
This XHTML document (last modified 01-02-2023) is derived from an ODD file created by David Denison. (The alphabet soup is briefly explained below.) That file started life in 2018 as an XML file produced by the Roma TEI customisation tool, together with an RNC schema derived from it. It was used by the editing team in the final couple of years of the project ‘Image to Text’, namely David Denison, Nuria Yáñez-Bouza and a number of research assistants (for names see Acknowledgements).
From 5 April 2020 it was turned into an ODD file which generated an RNG schema for the successor project, ‘Unlocking the Mary Hamilton Papers’. The ODD file and resulting schema have been continuously updated since then for new situations encountered by the editing team and as our knowledge of TEI broadens. The schema constrains the format of transcriptions and provides additional prompts and warnings in the oXygen XML editor.
In parallel, a discursive transcription manual for internal use was created and maintained in Microsoft Word by Tino Oudesluijs, with input from Nuria Yáñez-Bouza. In September-October 2021, the contents of the manual were transferred to the present file by DD and partly reworked with input from NYB and TO, still mainly for use by the editing team.
2. Project and edition
'Unlocking the Mary Hamilton Papers' is an AHRC-funded project running from December 2019 to November 2023, which will produce a scholarly edition in online form of the papers of Mary Hamilton (1756-1816), and conduct literary and linguistic research on the edited materials. The project members are Hannah Barker, Sophie Coulombeau, David Denison, Nuria Yáñez-Bouza, Tino Oudesluijs, Cassie Ulph and Christine Wallis. The editing team is David Denison, Nuria Yáñez-Bouza, Tino Oudesluijs, Cassie Ulph and Christine Wallis. For more detail on the project, please see the project website. For a brief formal description of the edition, see the edition DOI.
The bulk of Hamilton's papers are held in the John Rylands Research Institute and Library, and further materials have been sought from nearly a dozen libraries and archives in the UK and USA. The edition will include as much of the available material as can be reliably transcribed during the life of the project. It will be available initially as both diplomatic and normalised transcriptions, together with notes and metadata, plus images of each page transcribed. Text files of diplomatic and normalised transcriptions are available on request. The source XML files will be released after the end of the project. From spring 2022 the online edition appears in Manchester Digital Collections, and from September 2022 transcriptions are no longer available on the project website. Over 1500 items are already available (see here), and about 100 more are expected to be added before spring 2023, plus a large number of untranscribed documents.
The diplomatic transcription represents the original manuscript as closely as practicable, reproducing spellings, capitalisation, morphology, abbreviations and punctuation. It reproduces lineation and approximate position on the page, allowing easy comparison with the original. The exception is when a block of text is moved by the editors to prioritise logical sequence over physical position, such as when an author has squeezed a postscript into a blank space at the side or top of a(nother) page and we put it in the expected position to enhance continuity. The original position of moved text is indicated by ▼ in the transcription, and the new position by a footnote. Authorial errors, deletions and substitutions are shown. Annotations made after the original writing are shown and are distinguished from authorial text. Other features, such as place-names, non-literary speakers/writers directly quoted, foreign words, etc., are tagged in the TEI/XML file but not indicated in the display.
The normalised transcription attempts to represent the final intention of the author at the time of writing, and it also modernises many forms to standard present-day British spelling and morphology, corrects certain authorial slips, expands abbreviations, omits author-deleted text and ignores annotations. However, capitalisation, use of apostrophes in plurals or possessives, separated forms like him self, and punctuation are not adjusted. The normalised text is unformatted.
For a concise tabular summary of how various features are treated in diplomatic and normalised transcriptions, and also in the text files made available separately, please see the page Editorial policies.
3. Guidelines for transcription and mark-up
Examples from project XML files cited in these Guidelines are sometimes shortened as follows:
- Content or mark-up may be simplified if not pertinent.
- The personography attribute @ref is mostly omitted from <persName> and <rs> tags.
- One or more linebreaks may be introduced in long lines.
3.1. The TEI header
3.1.1. Title of item in the edition: <titleStmt>
The content of element <title type="item">, prominent in the XML file, is the title of that item in our edition and is the one displayed on the project website. It should normally follow one of the following formats: ‘Letter/Note/etc. from […] to […]’ or ‘Diary of […] (startdate – enddate)’. Other titles begin with ‘Journal-letter’, ‘Copy/Copies’, or, less frequently, ‘Transcripts’, ‘Catalogue’, ‘Notes’, ‘Anthology’, ‘Commentary’, ‘Account book’, etc.
NB. Dates are always written out as ‘day [as cardinal number] month [as word in full] year’, e.g. 4 November 1783; see origDate below.
For the participants, namely author and (for letters) recipient, use the name by which they were known at the time of the document (with the exception of Mary Hamilton, who is referred to as such throughout the corpus). If they were previously or later known under a different name or title, please use the following formats:
Different personal title:
Different name:
NB. When confronted with multiple forenames, e.g. Charlotte Margaret Gunning or Martha Carolina Goldsworthy, go with the form most often used by that person when signing as author. So if Charlotte Gunning signs most of her letters ‘Charlotte Margaret Gunning’ (or the equivalent in initials), use that for <title type="item">, rather than ‘Charlotte Gunning’.
If part of a document is missing, the catalogue (mostly ELGAR) will usually open its summary with something like ‘Incomplete note/letter from […] to […]’. Use this formula in <title type="item"> as well (see for example HAM/1/13/10, HAM/1/15/2/7).
The title in <titleStmt> is to be repeated in <msItem> but NOT necessarily in a third title element, namely <collection type="title"> in <msIdentifier>, where it is commented out in the XML. The latter differs in some cases, as it consists of the first line of the summary as provided on the website of the catalogue in question. In probably the majority of items, however, all three titles agree.
3.1.1.1. Documents written on behalf of another
Items that are possibly written on behalf of another person (called here, for want of a better word, a ‘beneficiary’) raise questions about who is author, who is sender, and whether the item's title should include an ‘on behalf of’ statement. The main factors to consider are whether the writer wrote from dictation, how the text refers to the beneficiary, and who signed the document if there is a signature. Other factors may come into play, but for now our guidelines are as follows:
1. Text refers to beneficiary in first person singular, but writer is not the same person
- The beneficiary is considered to be the author, since the writer is in effect writing to dictation.
- The writer
- appears in <handDesc> as the main hand, <handNote xml:id="major_hand" scope="major" medium="ink" scribe="secretary" scribeRef="psn:AnA">Hand of writer of the main text, probably Anne Astley.</handNote>
- is one of the persons in <correspAction type="sent">,
- but does not appear in the title.
- appears in <handDesc> as the main hand,
- The beneficiary
- appears as <author> in <msItem>,
- is one of the persons in <correspAction type="sent">,
- is listed in the title (format ‘Note/Letter on behalf of [beneficiary] to [recipient]’), as in LWL Mss Vol. 75 (item 47): <title type="item">Note on behalf of Mary Delany to Mary Hamilton</title> <author ref="psn:MD">Delany, Mary</author> <correspAction type="sent"> <persName ref="psn:AnA">Astley, Anne</persName> <persName ref="psn:MD">Delany, Mary</persName> […] </correspAction>
- If in addition the beneficiary has signed the letter, they are also listed as writer of the signature in <handDesc> (minor hand), as in LWL Mss Vol. 75 (item 41): <handNote xml:id="minor_hand_1" scope="minor" medium="ink" scribe="author" scribeRef="psn:MD">Hand of writer of the signature, Mary Delany.</handNote>
2. Text refers to beneficiary in third person, and writer is not the same person
- It is normally the writer who is then considered to be the author, and no mention is made of the item being ‘on behalf of’ someone (format ‘Note/Letter from [ author ] to [recipient]’), whether the item
- is signed by the writer/author, as in GEO/ADD/3/84/1, LWL Mss Vol. 75 (items 57, 65),
- or is not signed at all, as in LWL Mss Vol. 75 (item 35):
<title type="item">Letter from Georgina Mary Anne Port to Mary Hamilton</title> - In other words, the document is treated just like any other item containing information about a third party.
3. Intermediate between 1 and 2
A special case is an item clearly conveying a third-person message on behalf of someone who could not, for whatever reason, write it themself.
- Then the title may include the names of both writer/author and beneficiary (format ‘Note/Letter from [ author ] on behalf of [beneficiary] to [recipient]’), as in LWL Mss Vol. 75 (item 49): <title type="item">Note from Anne Astley on behalf of Mary Delany to Mary Hamilton</title>
- The beneficiary does not otherwise appear in any header element, apart from the summary.
A decision to include ‘on behalf of’ in the title in such intermediate cases may involve delicate judgements of content and style, always subject to the need for consistency among similar items.
- Examples of inclusion are LWL Mss Vol. 75 (items 38, 39), both of which start along the lines of ‘[Beneficiary] sends their best compliments’, indicating that the item was written by the author because the beneficiary could not.
- Two other examples are LWL Mss Vol. 75 (items 55, 56), where Georgiana Mary Anne Port writes on behalf of Mary Delany, despite Port referring to Delany only in the third person and signing the items herself.
3.1.2. Contributions by project team and others: <editor> and <respStmt> in <editionStmt>
The element <respStmt> is retained for research assistants and volunteer or student transcribers, coders, etc. who worked on the file, and for general contributors to archiving, cataloguing and imaging from outside the Unlocking project team. It is no longer needed for members of our team, all of whom are declared differently in <editionStmt> as <principal> or <editor>. Please see <revisionDesc> for entering and dating file actions by members of the team and others.
3.1.3. Acknowledging funders: <funder> in <editionstmt>
The following is to be added to newly transcribed texts [i.e. in the Unlocking project]:
Files dating from the Image to Text project will have one of these entries:
3.1.4. Responsibility for sets of files: <authority> in <publicationStmt>
The following formula is used for newly transcribed texts:
When checking older files that were previously transcribed in the Image to Text project, please make sure that the text is as follows:
3.1.5. Number of words: <measure> in <extent>
The number of words in each item is computed by a script run on the normalised text files, where split words have been reassembled at a line-end. Punctuation is ignored. The word count for each item is then imported back into the XML file. The figures produced by CQPweb may be higher if tokens (e.g. can't as 2 tokens) and/or punctuation are counted.
3.1.6. Title in repository catalogue: <msIdentifier>
In the element <msIdentifier> there are various items beginning ‘<collection type=…’ enclosed in XML comments <!– … –> because not used in MDC. As with the <title type="item"> (see Title of item in edition above), copy over the information given in the repository's catalogue.
NB. <!– <collection type="title"> –> is to be left as it is in the repository's catalogue, without any corrections not yet incorporated there.
An example from HAM/1/12/48:
Make sure to copy the information from ELGAR (or, where relevant, another repository's catalogue) for the correct description of the series name as well as the title.
3.1.7. Title and language in MDC: <msItem>
<msItem> is the element from which MDC takes the title of the item. Make sure that <title> matches <title type="item"> (the element used by the project website). Leave the <textLang> element as shown when the document is in that language.
3.1.8. Number of sheets: <measure> in <supportDesc>
In this <measure> element we count sheets [of paper], like the JRRIL (John Rylands Research Institute and Library), and not pages [of text], like the Royal Archives. According to JRRIL staff, ‘sheets’ is standard practice. For the GEO files, we will have to base ourselves on the images in order to determine how many sheets were used. If the number of sheets is still unclear, the archivist should be consulted if possible.
Make sure there is no period after the number of sheets. Also make sure to have ‘sheet’ rather than ‘sheets’ when there's only one:
For <condition> in the HAM papers, please copy what is provided in the catalogue. If no condition is provided, please word as follows:
Where for the HAM papers we copy <condition> from the catalogue, the GEO catalogue only specifies the physical description, i.e. ‘loose manuscript paper(s)’. After correspondence with the Head of Digital Services, William & Mary Libraries (24/03/2020), this has been put in the element <collation>, and we omit the element <condition>. Future catalogue information from other libraries can go into <condition> and/or <collation> as appropriate.
3.1.9. Hands: <handDesc>
This element in the header is to be written out as follows. This example has 3 hands: Lady Dartrey as the author of the document, the addressee Mary Hamilton as a minor hand (probably), and a third, unknown writer:
The following conventions have been adopted:
- Do not specify whether a document is a letter or note in @scribeRef. Only mention ‘main text’.
- End the content of each <handNote> with a period.
- In the description of the hand, use either ‘author’ (for main hands), ‘writer’ (for someone writing on behalf of the author, see also Documents written on behalf of another above) or ‘writer’ (for minor hands). Do not use ‘scribe’, ‘annotator’, etc.
- Be sure to add the name of the ‘author’ as well as any names of ‘writer’ when known.
- For the name of a hand in @scribeRef, please use the name as given in the first line of the catalogue where the document is from (in this case ‘Lady Dartrey’), not the canonical name we have decided on for the personography, in this case ‘Lady Philadelphia Hannah Dartrey (née Freame)’.
- The optional attribute @cert may be used for major hands that are not entirely certain, with possible values "high | medium | low". Do not use @cert with minor hands.
- When fairly sure of the identity of a minor hand, please enter "addressee | relative | secretary | etc." in @scribe (rather than "unknown"), as well as "psn:xml:id" rather than "psn:unknown" in @scribeRef.
- Please use ‘probably’ (see example), and not ‘possibly/likely/etc.’
- When specifying the location of an annotation by a minor hand on a page, use the preposition ‘at’ instead of ‘in’ for consistency.
- When there is only one minor/major hand, the suffix _1 is unnecessary.
We permit the author to be entered a second time as a minor-hand writer of annotations that are made by them after the original writing (occasionally, as in HAM/2/4, long before) and not considered to be part of the text. See especially Addition vs. annotation below.
NB. When a hand is cross-referenced in the body of the text, remember to add # , e.g.
3.1.10. Date of writing: <origDate> in <fileDesc>
This element contains the date of writing. It should always have the attribute @calendar="Gregorian" and normally also @when, and the date in text, as shown:
If the date is unknown, please include only @calendar and not @when or @when-custom, and put ‘n.d.’ as text:
This is similar to how we treat undated documents in <correspDesc> (see further Overview of dating of letters). As there, it is permissible instead of @when to use @notBefore and/or @notAfter with a <precision> element (see <correspDesc> below). For an item composed over a period, e.g. a diary or a journal-letter, use @from and @to instead. In the latter case, dates of writing in <origDate> will necessarily differ from the single <date> of sending in <correspDesc>:
Note that we treat date of writing of a journal entry as if written on the day concerned, even when a bunch of entries were apparently composed together at a later date.
Note too that the value of @when, @notBefore, @notAfter, @from or @to is normally yyyy-mm-dd. Incomplete dates have the formulas yyyy-mm or yyyy, though –mm-dd (without year) is now disallowed. The attribute @when cannot combine with any of the other attributes discussed here.
When a date is given in text as content of an element like <origDate>, we always use the format ‘day [as cardinal number] month [as word in full] year’, and in that order. There is no -st, -nd, -rd, -th on the date, no comma, nor do we add a leading zero for dates 1-9:
3.1.11. Responsibility for editing: <projectDesc>
For newly transcribed texts [i.e. in the Unlocking project, not Image to Text] in the GEO papers, word as follows:
For newly transcribed texts, word as follows:
3.1.12. Sending and receipt of letters: <correspDesc>
When the date of receipt is noted in the manuscript (characteristic of MH as recipient), a <date> element should be added to <correspAction type="received">. And if the only explicit date in the letter is the date of receipt (as with many of the GEO letters sent by GPW), the <date> in <correspAction type="sent"> should have the attributes @when OR @notBefore and/or @notAfter, as well as a <precision> element (within <date>!). For example:
On the different formats for a date as attribute value and as element content, please see <origDate> above. On the different dates that may be recorded for a letter, see further Overview of dating of letters below.
For the <placeName> elements, fill in the place-name mentioned in the address or dateline of the letter. If only a house or other similar habitation is mentioned (e.g. St. James, The Queen’s Lodge), use the location of the habitation as it was at the time of writing, e.g. London for St. James, Windsor for the Queen’s Lodge. Historic hamlets, e.g. Cowslip Green, are to be considered as places in their own right and so are also tagged <placeName>. Be mindful of places such as Kew that are now part of London but weren’t in the late 18th and early 19th centuries.
If no place-name is mentioned anywhere in the letter/note or address, please use <placeName>unknown</placeName> in both the sent and received entries for <correspDesc>, unless one can be safely inferred, in which case the attribute @cert is added:
For personal names in <correspDesc>, we go with that used by the person in question in their earliest piece of correspondence to Mary Hamilton. For example:
and not ‘Dawson/Freame, Philadelphia, Lady’, since Lady Dartrey did not sign her earlier letters to Mary Hamilton as such.
Please consult the personography. Enter the name as follows:
Obviously the optional née parenthesis applies only to a married woman, but a version with masculine né can be used for a man who changes his surname, usually on adoption or on acquiring an inheritance. Only the title(s) used in the canonical form need be included.
Very occasionally, the optional attribute @cert may be added for a personography identification that is uncertain, with possible values "high | medium | low".
3.1.13. File history: <revisionDesc>
Please enter actions in <revisionDesc> as in the following example, in descending order of date:
NB. Remember to include # before an editor's or contributor's xml:id. Keep the default text, for consistency.
For newly transcribed files, the default text is
We specify year, month and day. If the transcription is done over a period of days (or longer!), we only enter the date of completion.
We use <respStmt> in <editionStmt> to record transcribing or other activities provided by others (see <respStmt> above). Date of completion of their contribution should be recorded in a <change> statement here in <revisionDesc>, though not usually for regular archivists, etc., whose contribution to a specific file is not easily dated.
3.2. Structural elements of the text
3.2.1. Page begins: <pb>
All transcriptions with any text at all have a <pb n="1">. The element <pb> is always preceded by a blank line (as a visual aid in the XML file itself — there is no effect on the displayed transcriptions). It is immediately preceded by <lb/>, except in the case of <pb n="1"/>. It can be followed by <cb> on the same line but nothing else other than a footnote, <note resp=…> (so please put any <note hand=…> on the following line with its own <lb/>). The form is always <pb n="nn"/>, where nn = 1, 2, 3, etc. Sequential page numbering is not handled automatically but is the editor's responsibilty. After the first <pb>, an element such as <opener> or <p> must be opened to contain text. See also Divisions below.
3.2.2. Column begins: <cb>
If a single image displays two pages, as is frequently the case for HAM letters from the JRRIL, do not code as separate pages:
Instead code as two columns:
Don't repeat <pb> before the second <cb>. When <cb> is the first element in a line, necessarily therefore at least n="2", it must be preceded by <lb/>, and again before that, a blank line. Such blank lines have no effect in XML but they help an editor to quickly relate the transcription to the image.
Although use of ‘column’ is a bit misleading, because for the author they were separate pages folded back to back, we will continue using <cb> for the JRRIL letters (advice in correspondence with the JRRIL Imaging Manager and TEI Project Co-ordinator). It does mean, however, that each <pb> corresponds to one image.
3.2.3. Line begins: <lb>
The element <lb> can have an optional @rend that applies to the following line rather than a whole element (see also Overview of attribute @rend below), with possible values "indent", "center" or "align-right" (but not "inline", which would be redundant at line beginning). With the present XSLT display script, <lb rend="center|align-right"/> does not create an extra blank line after itself, which can be useful, as other elements with such a @rend attribute create a blank line afterwards and also force any items on the same XML line but outside the @rend element onto a separate line in the display.
Represent wide blank vertical spaces in the original with an extra empty <lb/>, e.g. in HAM/1/8/2/22:
3.2.4. Paragraphs: <p>
After the opening paragraph, the element <p> is used for new paragraphs as visually indicated by the author of the text, e.g. by an indented new line, a blank line inbetween two other lines, a previous line that ends well short of the right margin, or an inline mark (e.g. GEO/ADD/3/82/4, end of p.1). Occasionally there is a longer than usual space before a semantic paragraph that starts in midline (e.g. HAM/1/2/31, HAM/1/15/2/3). Such ‘new paragraphs’ that continue on the same line need <p rend="inline"> to prevent the display script from moving to a new line:
The author's desire for a new para has clearly been trumped by the need to save paper. Our website retains the author's layout. We don’t mark a new paragraph inferred only from content and without any visual sign.
When a line starts with a dash, i.e. — , don’t mark it as the start of a new <p rend="indent">, just start the line with — (i.e. space hyphen hyphen space) within the current <p>.
In the header, <p> never has an attribute. In the body it may have the attribute @rend (see Overview of attribute @rend), with values "align-right" or "center" (which shift the whole paragraph), or "indent" (which indents the first line only). As noted for the example just above, the value "inline" is needed in some cases; see also the second example in Postscripts below and the fuller explanation in Overview of attribute @rend.
As for the end of a paragraph, we routinely have to end <p> in midline before <closer>, because the current TEI guidelines do not allow <closer> within <p> (cf. Forney et al.’s paper 2020). To avoid a line-break in website displays after the closing </p> tag, rend="inline" must be added to the opening <closer> tag that follows:
3.2.5. Divisions: <div>
The element <div> is a subdivision of <body> (at least in our project, since we do not use <front> or <back>). It is needed in one or more of three different situations, discussed in separate subsections below.
Note that if an item contains a <div>, the whole of the <body> must be distributed between one or more <div>s, with no residue (such as a footnote or annotation) orphaned outside any <div>.
Note also that if any whole <div> is written by a different hand from any other, then all the <div>s in the item must have the attribute @hand. Furthermore, as div/@hand does not show on-screen, a footnote after an opening <div> tag should explain the change of hand.
Possible values for the obligatory attribute @type are:
- "letter" (for the first <div> of an item that the catalogue describes as a letter, or of an embedded letter)
- "note" (for an item that the catalogue describes as a note)
- "continuation" (when an author indicates resumption of letter-writing)
- "entry" (for a diary/journal entry)
- "address" (for the ‘direction’, which may be just a name, or the postal address in a letter or note)
- "cutting" (for a newspaper cutting)
- "part" (generic type for a <div> that doesn't fit any of the others)
Don't leave a blank line between a closing </div> and the next opening <div>: the 8-space indent is enough to make the div change conspicuous in the XML.
NB. The first of the two examples above is likely to be problematic in MDC without some recoding, as it involves a midline div and para break without rend="inline" on the second para.
If a div break coincides with a page break, please make sure that that the div break comes first, so that <pb> and its customary preceding blank line occur at the start of the new <div>:
3.2.5.1. Diaries and journal-letters
In a diary-like structure with explicit, usually sequential indication of days, use one <div> for each day, even if it’s clear that several entries were written together at a later point in time. However, in the routine case of an ordinary letter that mentions recent events without systematic and explicit marking of the different days, there is no need for <div>s on that account.
If a journal-letter was written over several days (e.g. some of John Dickenson’s letters), although the catalogue records the date when the letter was started, we record in <origDate> the period over which it was written, and in <correspAction type="sent"> the date it was sent. See further Overview of dating of letters.
A <div> can include any of the elements discussed in Opening and closing elements of a letter, e.g. openers, paragraphs, closers, etc. However, when a letter has been put aside by an author and later resumed, <div type="continuation"> doesn't need a new <opener>; and likewise with <div type="entry"> in a diary or journal-letter. If for example a day of the week is mentioned, just use <dateline> (containing a <date> tag if the date can be worked out), followed by <p>:
3.2.5.2. Multiple letters or notes in a single item
For multiple letters, whether embedded or sequential, each begins with <div type="letter">; see for example HAM/1/2/21 and HAM/1/11/29, respectively. Likewise for multiple notes, except that they begin with <div type="note">.
NB. If the letters or notes are written by different hands, use the attribute @hand in every <div> (and not <handShift>).
3.2.5.3. Addresses
If a document has a destination address at (or moved to) the end, the preceding content of the letter (including annotations and postscripts) and the address should each be put in their own <div>.
Two <div>s cannot intersect. If, say, the last page has the address sandwiched between postscripts or paragraphs, move the address and its <div> to the end, using <anchor> and <ref> to show the actual physical order on the page; see Moved text below.
An annotation on the same page as the address should be included in the address <div> if it concerns a date, name, address line, place-name, etc. Otherwise include it at the end of the preceding <div>.
The address is coded as follows. First the obligatory element <p>, containing <address> (although annotations, free marks, single sheet remarks, etc. go outside <address> and inside <p>). There will almost always be a <persName> and/or a <placeName>. If there is any other text in the address that does not belong in either of these elements, such as the preposition To (which frequently occurs in the line above the name of the addressee), wrap lines inside <addrLine>. Lines containing only a place- or personal name don't need <addrLine>. After all elements in the address have been coded, close with </address> and </p> before closing the division with </div>. A number of points are illustrated in this elaborate example (albeit simplified a little here):
If both a preposition and a <persName> or <placeName> occur on the same line, wrap them both inside <addrLine>:
If a <placeName> crosses a line, extend <addrLine> around it as well:
Occasionally, a single address line written over a fold gets broken up by unfolding into fragments scattered higgledy-piggledy around the page, usually angled in different directions, e.g. in HAM/1/6/3/1. Please don't try to recreate the layout of the unfolded image with <anchor>s and <ref>s. Instead, just give the line as if it had not been fragmented and add a footnote with the following format:
3.2.5.3.1. Frank signatories
Someone, usually an MP, may provide the author with a free frank. They write out the destination address, add their own name and usually give the location and date of doing so. Because this must be done before the letter is sent, what they write is not classed as an annotation. Make sure to
- Add the signatory to <handDesc> as a minor hand with "frank-signatory" as the value of @scribe, and put that hand in the attribute @hand in <div type="address">. (As always, this means that every other <div> in the document must have @hand too.)
- Add a footnote after the <div> tag to note and explain the change of hand from the previous <div>.
- Wrap the signatory's <placeName> and <date> (if given) in a <dateline> element placed before and therefore outside <address>, as the dateline is not part of the actual address, though it does belong inside the <div>: <handNote xml:id="minor_hand_2" scope="minor" medium="ink" scribe="frank-signatory" scribeRef="psn:WDev">Hand of writer of signature on p.2, William Devaynes MP.</handNote> […] […] <div type="address" hand="#minor_hand_2"><note resp="#DD">The address is in the hand of the provider of the frank, William Devaynes, MP for Barnstaple 1784-1796.</note> <lb/><pb n="2"/> <lb/><p rend="center"><placeName>London</placeName> <date when="1786-06-10">Tenth June 1786</date> <lb/><address><persName><choice><abbr>J.</abbr><expan>John</expan></choice> Dickenson Esq</persName><note resp="#CW">FREE frank in red ink.</note> <lb rend="indent"/><addrLine><persName>Sir <choice><abbr>W</abbr><expan>William</expan></choice> Wake</persName>s</addrLine> <lb rend="indent"/><addrLine>Courteen Hall</addrLine> <lb rend="indent"/><placeName>Northamptonshire</placeName> <lb/><addrLine><persName>Wfreedevaynes</persName></addrLine> <note resp="#CU">This seems to be ‘W free devaynes’ written all as one word.</note></address></p> </div>
3.2.5.4. Self-embedded <div>
It is sometimes appropriate to embed one <div> within another, for example if an item is split into <div>s for one of the usual reasons, and there is, say, a quoted letter inside one of those (as in HAM/1/6/8/1). Note, though, that the whole of the outer <div> must then be distributed between one or more embedded <div>s, with no residue (such as a footnote or annotation) orphaned outside any <div>. In other words, <div> behaves just like <body> when it contains a <div>. This means that the embedded <div> forces us to close the outer <div> before it and start a new one (typically with @type="continuation") after it, even if logically we might have preferred simply to embed the inner <div> without closing the outer one.
3.3. Opening and closing elements of a letter
The normal structure of a letter contains the following elements:
- opener
- paragraph(s)
- closer
The relative order is fixed, and these three elements cannot overlap or be embedded in one another. A paragraph core is obligatory, the others not. About three quarters of our letters have openers, and over 90% have closers.
Postscript(s) are another optional element at this level (some 35% of our letters). The element <dateline> can function either at the same level or embedded in one of the high-level elements. All the elements mentioned here that are specific to letters are discussed in the following sections, plus the lower-level elements <salutation> and <date>.
3.3.1. Openers: <opener>
The opener ‘groups together dateline, byline, salutation, and similar phrases appearing as a preliminary group at the start of a division, especially of a letter’ (TEI Guidelines). In this edition, <opener> can include the following elements: <dateline>, <date> (inside <dateline>), <placeName> (also inside <dateline>), and <salute n="opening">.
NB. <opener> refers to the time of production (TEI Guidelines). Therefore metadata to do with the date received rather than date sent, as is typical of Hamilton’s annotations, should appear outside <opener>. (On <date> in this context, with or without a <dateline> wrapping, see Dateline.)
A salutation such as ‘My dear X’ at the very start of the letter is tagged with <salute> inside <opener>.
NB. If a letter doesn't begin with a salutation, the first sentence belongs to <p> rather than <opener>. Since <salute> is disallowed in <p>, a salutation later in that sentence, as in e.g. ‘I have so missed you my dear X, […]’, is instead marked with the project-specific tag <seg type="openSalute"> (see Salutations and Overview of person references).
Each <div> of a multi-div document can in principle have its own opener, but when a <div type="continuation"> is used, <opener> is not repeated (as already noted under Diaries and journal-letters above). If a weekday is mentioned to indicate when the letter was continued, use only <dateline> (containing <date> if a date is given or can be inferred). That <dateline> element leads straight into <p>.
<opener> has an optional attribute @rend for rightwards shifting of the content, see also Overview of @rend.
3.3.2. Closers: <closer>
A <p> element needs to be closed before a <closer> is opened; see also Paragraphs.
The closer ‘groups together salutations, datelines, and similar phrases appearing as a final group at the end of a division, especially of a letter’ (TEI Guidelines). In this project, all text in <closer> should be inside one of the elements <salute n="closing">, <seg type="closeSalute">, <seg type="closeFormula">, <dateline>, <note> or <signed>, leaving no untagged text.
If the text leading up to the closer includes words and phrases such as ‘adieu’, ‘God bless you’, ‘Give my love to X’, ‘remember me to X’, etc., they should all be tagged as part of the preceding <p> (but see also <salute n="closing">).
Since <closer> (like <opener>) refers to the time of production, an annotation to do with the time of receiving the letter should sit outside.
It is possible to have more than one <closer> in a document, e.g. one in the main body and a second one
- inside a <postscript> (see Moved text for an example)
- after a <postscript> (e.g. needed to wrap a dateline there)
<closer> has an optional attribute @rend for rightwards shifting of the content, see also Overview of @rend.
3.3.2.1. Signatures: <signed>
When the author signs, the name, wrapped in <persName>, goes in <signed>. The element <signed> goes inside <closer> but after and outside <salute> (if present).
Often a signature is written more quickly than most of the text, and as such does not always clearly contain all of the initials one is looking for. The guideline here is to transcribe the initials as close as possible to the original, and when a doubtful letter is at least possible, to transcribe it. For example, when Harriet Finch signs a letter with a clear capital H plus a mark to the right that could be (part of) an F, then transcribe HF (and code each initial with the appropriate <choice>, <abbr> and <expan> elements). Only when there is clearly nothing more should we transcribe just H.
When initials are written in one movement of the pen and are thus attached, keep the <choice> elements for each initial together, without a space between them, to capture this in the diplomatic transcription, but add a space after the forename(s) inside <expan> for the normalised transcription, for example:
3.3.3. Salutations in opener and closer: <salute>
Salutations concern references to the addressee. Those in <opener> or <closer> are enclosed in <salute> with the attribute @n, value "opening" or "closing" as appropriate.
NB. For salutations inside <p>, where <salute> is disallowed, see the use of project-specific tags in Segments and Overview of person references.
The attribute @rend does not need to be specified in <salute> (this can be done in <opener>, <closer>, or indeed <lb>; cf. Overview of @rend).
3.3.3.1. <salute n="opening">
<salute n=”opening”> sits inside <opener>, as do the various possible elements pertaining to the creation of the document such as <dateline>, <date>, <placeName>, etc. when situated at the top of the letter:
3.3.3.2. <salute n="closing">
- The closing <salute> wraps text at the start of <closer> that does not go inside other elements such as <seg type="closeSalute"> or <seg type="closeFormula">. It also contains all subsequent elements in <closer> apart from <dateline>, <note> and <signed> (incl. <seg type="closeFormula">).
- ‘Adieu’, which normally stays near the end of <p>, only goes inside <salute> if it comes between or after other elements that go inside <closer>: <lb/>[…] you there, or not.</p> <closer rend="inline"><salute n="closing">Believe me I ever shall be <lb/><seg type="closeFormula"><choice><abbr>yr.</abbr><expan>your</expan></choice> sincere admirer and affectionate Friend.</seg> <lb/>Adieu.</salute></closer>
- The following phrases (and variations thereof), often the first words of <closer>, should be included in <salute>: ‘I have the honour to be …’, ‘I sign myself …’, ‘I remain …’, ‘I believe / Believe me (to be) …’.
- Frequently, these phrases are followed by a call to the reader (usually with the pronoun you, sometimes through the use of an imperative), followed by a verb. Both are to be included in the <salute> element. Verbs include: beg, believe, assure, convince, know, continue, oblige. In the following example, <closer> and <salute> start with ‘Believe’: <closer><salute n="closing">Believe me <seg type="closeSalute">my dear <persName>Miss Hamilton</persName>,</seg> <seg type="closeFormula">your most humble servant</seg></salute> <signed><persName>John Dickenson</persName></signed></closer>
- When other verbs are used to call upon the reader before a document is concluded, these are to go inside <salute> (and therefore in <closer>), not in <p>. In the following example, <closer> and <salute n="closing"> start with ‘you’: <lb/>[…] as long as life remains <lb/>in my fragile frame,</p> <closer rend="inline"><salute n="closing">you never shall want <seg type="author">a <lb/>sincere Friend & a tender Brother,</seg> in <lb/><seg type="closeFormula">Your ever sincerely affectionate</seg></salute> <signed><persName>Palemon</persName></signed>
- Modifiers of verbs and noun phrases should always be included in the salutation, e.g. ‘truly’ in ‘truly believe me that …’ or ‘with the …’ in ‘with the assurance of my esteem & affection, believe me …’.
- When the verb conclude is used in the sense that the author is about/going to conclude the letter, this should not be taken as the start of the salutation and should remain inside <p>, hence outside <closer>.
- Prepositions, relativisers, etc., with the exception of the coordinating conjunction and, should not be included in <salute> or indeed <closer>.
3.3.4. Datelines: <dateline>
Only use this element for dates and/or place names written by the author at the start or end of a letter or of a <div>. Include it as part of either <opener> (except in a continuation <div>, see Openers above) or <closer>:
NB <dateline> is also to be used in the absence of <date> and <placeName> elements if information regarding the creation of the letter is still present, e.g. when only the day of the week is mentioned.
If a <dateline> is present for authorial text, any annotation by a later (minor) hand that adds to or corrects the date or place of sending goes inside it. See e.g. HAM/1/5/2/2:
But if all the metadata concerning a date or place-name of sending is written by a minor hand, there will be no <dateline>. The annotation(s) will contain a <date> and/or <placeName> and will not be part of <opener> or <closer>. See e.g. GEO/ADD/3/82/10:
If <dateline> appears at the end of a letter, it should be included in the <closer>, normally after <salute> and <signed>. If there is a <postscript> intervening between <signed> and <dateline>, wrap <dateline> in a second <closer> element:
3.3.5. Dates: <date>
This element is to be used for the date only, and not for other parts of the dateline or annotation such as time of day (e.g. ‘8 o’clock’) or the name of the day (e.g. ‘Tuesday’). If the date is split by ‘received’ or a time, use separate <date> tags for the fragments. The name of the day can be left inside <date> if adjacent to the numbers but doesn’t require a <date> element if not.
An exception has been made in diaries for <div>s that would otherwise contain no date, if the day can be dated. And in other cases, a date without textual content has been inserted. Both uses of <date> are to ensure that persons mentioned in diaries are only counted once per day for the purposes of social network analysis.
Typically, the month and day fragment will receive a full @when value ("yyyy-mm-dd"), while the year fragment just gets the year ("yyyy"). If the year was clearly added after the creation of the letter (from evidence of continuity, size, position or ink), then it should be placed in a separate <note[@hand]> inside <date>.
NB. When a date is given as ‘The […] of […] etc.’, please include ‘The’ inside <date>.
NB. In our project, <date> has paired opening and closing tags around textual content, so the following attribute-only structure is not used: <date when="[…]"/>.
3.3.6. Postscripts: <postscript>
TEI requires text inside <postscript> to be embedded in <p>, e.g.:
Either <postscript> or the preceding <lb> can have an optional @rend. If there is a <label> for explicit use by author of ‘P.S.’ or ‘Postscriptum’, etc. and the text of the postscript continues on the same line, <p> requires rend="inline", whether or not there is an earlier @rend in <lb> or <postscript>, e.g. GEO/ADD/3/82/69:
NB. Remember to leave a space between the closing tag </label> and the opening tag <p rend="inline">.
If the writer adds an afterthought to a letter without an explicit ‘P.S.’, use <postscript> regardless, but without <label>.
A postscript normally contains only <p>, but when a writer uses words and phrases that need wrapping in elements normally reserved for <closer> (e.g. <signed>, <salute n="closing">, <seg type="closeFormula">), please add a <closer> element to contain them:
For information on using the elements <anchor> and <ref> with postscripts that need to be moved, see Moved text. If a moved postscript contains elements other than <p>, e.g. <label>, <closer>, <signed>, then <ref> must be repeated inside each such element, as in the example above.
NB. The TEI Guidelines disallow <postscript> as first element in <body> or <div>. If a fragmentary letter consists solely of a postscript, we precede it with a blank line containing an empty <p/>.
3.4. Other content-based mark-up
3.4.1. Names
The elements <persName>, <placeName> and <rs> are used to tag names.
Personal names are stored in a personography and indexed by means of an xml:id unique to each individual. Place-names, though tagged, are not similarly indexed in this project. Spellings of personal names are not normalised to a standard form, nor most place-names (but see Spelling in names below), so the differences between diplomatic and normalised transcriptions will concern abbreviated and suppressed names.
An abbreviated name should be tagged with <abbr>. If the full form is known, it is marked up in a <choice> element, as detailed in Expansion of abbreviations below:
If a name is partially suppressed by means of a horizontal line, use two em-dashes for the line and surround the whole form with <abbr>:
The suppressed name, like an abbreviated one, should be expanded if known:
If a name is represented solely as a horizontal line, use two em-dashes wrapped in <abbr>, even though complete suppression of a name is not, strictly speaking, abbreviation. However, if we know or discover what name has been suppressed, the <abbr> can be wrapped in <choice> and the full form given in an <expan> element. Thus ‘a Mr– —— who's name I dont at this moment recollect’ would be coded:
NB. Do not include final punctuation in a name element. Characters under a superscript in an abbreviation are obviously not excluded.
3.4.1.1. Names of people
Use the element <persName> for ‘a proper noun or proper-noun phrase referring to a person, possibly including one or more of the person's forenames, surnames, honorifics, added names, etc.’ (TEI guidelines).
Use <rs> (referencing string) for a phrase that is not a proper name but nevertheless refers to an individual or group of individuals.
A very few items with unique reference that lack a personal name, such as ‘the King’, ‘the Queen’, nevertheless take <persName> (with the article included).
Where the reference is far more contextually determined, such as ‘his wife’, ‘my cousin’, use <rs> to wrap the whole noun phrase, including any determiners (the, my, etc.) and even other <persName> or <rs> elements. Therefore ‘Lady Hamilton's Brother’ and ‘the Sister of my Dearest Husband’ would be coded as follows (apart from ref="psn:XYZ" attributes on both <rs> and <persName>, omitted for brevity in this and other examples):
Where a group of individuals is mentioned, tag each one separately wherever possible.
Family groups like ‘the Glovers’, ‘the Miss Mores’ are to be tagged as <persName>. A preceding the (which is nearly always present) is part of the proper name and so is included in <persName>, as is the plural marker -s.
If, say, several forenames or forenames + surname are given as initials, expand each one separately.
Close a <persName> or <rs> element before these items:
- possessive s (+ apostrophe if present): <persName>Lady Hamilton</persName>s letters<rs>the young Lady</rs>'s Spirit was hurt at this Refusal
- final punctuation: Pray give my love to <persName>Miſs Burney</persName>, & tell her […]
Open a <persName> element after these text items — they remain outside the tag:
- ‘dear’
- noun phrases in apposition to the name (including familial relations such as ‘my cousin’, ‘his Uncle’): The Housekeeper <persName>Betty</persName> has a notion of old English hospitality[…] my Cousin <persName>Robert Greville</persName> came […]
- but unequivocal use of ‘Aunt’, ‘Uncle’ or ‘Cousin’ as a quasi-title, usually with initial capital, does go inside <persName>, just like ‘General’, ‘Bishop’, etc.: [My best Compliments to …] <persName>Uncle Frederick</persName><lb/>[…] to send some <lb/>little token of remembrance to my sweet <persName>Louisa</persName>, <lb/>to keep for the sake of <persName>Aunt Ippa</persName><persName>Bishop More</persName>s Library of Books
- but unequivocal use of ‘Aunt’, ‘Uncle’ or ‘Cousin’ as a quasi-title, usually with initial capital, does go inside <persName>, just like ‘General’, ‘Bishop’, etc.:
A reference to the addressee in the main body of the text (neither opener nor closer) is tagged with <seg type="bodySalute">.
- If it contains a name, e.g. ‘my dear Mary’, then that name is tagged with <persName>: […] and so, <seg type="bodySalute">my dear <persName>Mary</persName>,</seg> I continue to […]
- If not, e.g. ‘dear Sister’, ‘my dear friend’, the whole string (less any final punctuation) is tagged with <rs> inside the <seg>: […] and so, <seg type="bodySalute"><rs>my dear friend</rs>,</seg> I continue to […]
NB. Do not tag a name with <persName> if it is
- part of a building name, e.g. ‘The Queen’s Lodge/House’ or ‘St James Palace’
- (part of) the title of a piece of literature, e.g. ‘Sidney Biddulph’ or ‘Julia’
- but mark up that section with <seg type="rp">.
3.4.1.2. Names of places
Tag all place-names (but not street or house names) with <placeName>, including ‘Town’ with capital T. If in doubt whether a name refers to a house or estate or village, use <placeName>.
3.4.1.3. Spelling variation in names
Spelling variation in place-names is likewise neither modernised nor made homogeneous. If a modern place is unrecognisable in its old form, an explanatory footnote may be offered. We are not currently planning a place-ography. If future research requires systematic mapping, an id or coordinates could be added to <placeName> in the same way that @ref in <persName> and <rs> (and also @scribeRef in <handNote>) correlate different forms of personal reference via the person's xml:id.
Nor, incidentally, do we have an eventography. Therefore we don't code events with a tag. If a particular event is relevant for the content of the document or for the research strand on Reading Practices, add a footnote about it. If the date of the event is relevant and is known, code <date> in the footnote but not in the text.
3.4.2. Passages marked for research analysis: <seg>
The element <seg> in TEI is an arbitrary segment, a general-purpose tag for stretches of text. In this project, its attribute @type can carry one of seven values: "para" | "rp" | "openSalute" | "bodySalute" | "closeSalute" | "closeFormula" | "author". All but the first are used to mark up text for analysis in the research strands of the project.
- type="para" inside <summary> in the TEI header represents a paragraph.
- This simply allows paragraph breaks in the library catalogue to be taken over into our metadata (following TEI Project Co-ordinator 16/1/20).
- type="rp" marks passages relevant to reading practices.
- See also Quotations below.
- The reading practices research strand investigates the reading, writing and circulation of texts among participants in the Papers, including printed and manuscript material, private letters and other textual media. For the purposes of this research strand, ‘prints’ (i.e. printed images accompanied by textual content, such as engravings) are considered texts.
- NB. <seg> cannot cross a paragraph <p> boundary, so a long passage relevant to reading practices must be broken into separate <seg>s if it extends beyond the end of a paragraph.
- type="openSalute | bodySalute | closeSalute" marks various kinds of reference to the addressee.
- They belong in <opener>, <p> and <closer>, respectively.
- These and the remaining values of @type are part of the mark-up used in the research strand on sociolinguistics and letter-writing to capture non-structural features that lack a standard tag in default TEI mark-up.
- See Salutations and Overview of person references.
- type="closeFormula" marks formulaic references to the author in <closer>, such as ‘yours (most) affectionately’, ‘ever your most humble servant/friend’, etc.
- See Referring to the author for more detail on this segment type.
- type="author" marks the occasional self-reference to the author of a text, e.g. when John Dickenson refers to himself as ‘your dear Husband’.
- This kind of self-reference is generally found in <p>, though there are a handful of examples in <closer>.
All references to persons — and there is always one at least — inside the <seg> types "openSalute | bodySalute | closeSalute" (as well as <salute n="opening">) require a <persName> or <rs> tag, the @ref attribute pointing to an addressee's xml:id. Likewise type "author", in this case with @ref pointing to an author's xml:id. Final punctuation goes inside the <seg> or <salute> element, but outside <persName> | <rs>.
NB. If <foreign> and/or <emph> are needed inside <seg>, the preferred relative order from outside to inside would be <seg><rs><foreign><emph>.
3.4.3. Direct speech: <q>
Use <q> to represent direct speech by a real person, as when a letter-writer quotes someone they met and spoke to. If there are quotation marks in the original, reproduce them inside the <q> element. Indirect speech may also be signalled by <q> if the writer uses quotation marks.
3.4.4. Quotations: <quote>
Use <quote> where a writer quotes from another letter or diary, and reproduce their (non)use of quotation marks inside the <quote> element (see Quotation marks below). The passage containing the quotation should be tagged with <seg type="rp">.
NB. We use <quote> (with final -e) to wrap and thereby identify a passage of quotation. We do not use the entity "
to represent the quotation mark character.
The passage containing such a quotation should obviously be tagged with <seg type="rp"> as well. By default, <seg> and <cit> are co-extensive (<seg> wrapping <cit>), but <seg> can be extended to left (as in the example above) and/or right of the citation if there is adjacent text that belongs inside it.
The bibliographic citation is not part of the author's text and only appears as a tooltip when the quotation is moused over. For bibliographic citations that do appear in the text display, almost always in editorial footnotes, see Bibliographic citations below.
3.4.5. Verse line: <l>
The tag serves to distinguish verse from prose, and it exempts the part-of-speech tagger from operating where register, style, spelling and often date can be quite different from the surrounding context. There is no need to normalise in verse lines.
3.4.6. Foreign words and text: <foreign>
Languages other than English are tagged <foreign>, with the attribute @xml:lang set to the appropriate language, most notably French. Common values are "fr" | "it" | "la". We have needed "pt" [Portuguese] and "nl" a handful of times. To determine what counts as French and what English at the time of writing, please first consult the list of French words and phrases in the GH repository and follow that. If the word/phrase is not mentioned in that list, check the OED and consider the context of the letter, as well as the frequency with which this particular word/phrase occurs in all transcriptions up until this point. After this, a suggested classification (i.e. either English or French) should be put to the team before being entered in the list of French words and phrases in GH, so that future instances can be tagged — or not — consistently.
When a word or phrase originally borrowed from French is considered to be part of English at the time of writing (e.g. Adieu), leave it unmarked for <foreign>. However, when used in a French context, include in <foreign> with the rest of the line/text in French:
3.4.6.1. Accents and spelling in French
We present French as written, without correction or modernisation.
3.4.7. Indications by author for recipient: <fw>
The element <fw> is normally used in TEI for representing running headers, catchwords, etc. in printed books. We use it for authorial annotations for the benefit of a contemporary reader, especially the recipient of a letter, such as catchwords, headers explaining what follows, pagination, stanza numbers, and ‘PTO’ instructions. They should not form part of the linguistic corpus in the normalised text. Some examples follow:
- True catchword inserted below the line at bottom right (HAM/1/4/4/1): <lb rend="align-right"/><fw type="catch" place="bot-right">Country</fw>
- Probable catchword at end of last line on page (HAM/1/4/4/9): <lb/>[…] for this <fw type="catch" place="inline">picture</fw>
NB. Such inline cases require a decision: was it an intentional catchword, or was the repetition overleaf a mistake? — in which case tag the second occurrence as <surplus> instead of the first as <fw>.
- Where the author has numbered the pages or otherwise indicated order of reading, or given some other explanatory heading (GEO/ADD/3/82/14, 24 and 69): <lb rend="center"/><fw type="pageNum" place="top-center">2<hi rend="superscript">d.</hi>.</fw> <lb rend="center"/><fw type="head" place="top-center">Continue on P.S.</fw> <lb/><opener><dateline><fw type="head" place="top-left">Tuesday Night in answer to Monday Night.</fw></dateline>
- Where the author has numbered the stanzas of a poem (GEO/ADD/3/82/37): <lb rend="center"/><fw type="stanzaNum" place="center">2<hi rend="superscript">d.</hi>.</fw>
- Where the author has given an equivalent of modern P.T.O. (HAM/1/4/7/5): <lb rend="align-right"/><fw type="turnPrompt" place="bot-right">Turn</fw>
NB. The attribute @place in <fw> is purely for information in our mark-up and has no effect on display. Therefore any left-right positioning other than left-aligned or inline requires a @rend attribute in <lb>.
3.5. Manuscript features
3.5.1. Underlining: <emph>
By default, underlining is tagged as <emph rend="underlined">.
When several words are underlined individually, each should be tagged separately, whereas only one tag is necessary if the same continuous line underlines several words. Compare the examples from GEO/83/82/11 and GEO/3/82/13 below:
If underlining extends over two lines, use separate tags for each line. For example, in GEO/3/82/43:
Leave any final punctuation outside the tag, but a continuously underlined passage may include non-final punctuation.
NB. Use more specific values for @rend where appropriate ("double-underlined" | "dotted-underlined " | etc.) Although the first part of the value is ignored at present, the last part will still trigger single underlining in our display script, and the more precise mark-up may be helpful to a future researcher.
We have (almost) stopped using <hi rend="underlined">, retaining it in principle only for underlining that couldn't possibly be emphatic. Even if a writer such as CMG consistently underlines foreign words, we can't rule out some degree of emphasis, so @rend goes in <emph> rather than being an attribute of <foreign> (as formerly in the project) or <hi>. Our default in such cases is <foreign> wrapping <emph>, but we do have examples the other way round (46 to 322).
NB. Underlining occasionally gives a clue as to whether a <foreign> tag is needed, as in this extract from HAM/1/15/1/17, where lecture in the fourth line is clearly not the English word but rather the French lecture ‘reading’, while in the second line, underlining indirectly supports the transcription Drame rather than Drama:
3.5.2. Superscript, subscript, italics, etc.: <hi>
The element <hi> (highlighted) ‘marks a word or phrase as graphically distinct from the surrounding text, for reasons concerning which no claim is made’ (TEI Guidelines). The attribute @rend is obligatory. Use the value "superscript" for characters at the end of an abbreviation that are raised above the baseline. The display script presents them in a smaller and slightly raised font. Much less frequently needed is "subscript".
NB. Superscript is to be distinguished from insertions above the line, which are tagged as <add place="above"> and displayed wholly above x-height; similarly subscript vs. <add place="below">. See Additions.
Use <hi rend="italic"> for linguistic forms, book titles, etc. in footnotes. NB. It is a minor, though not uncommon, tag abuse to use @rend in editorial footnotes rather than to indicate ‘how the element in question was rendered or presented in the source text’ (TEI Guidelines).
Some other available values for @rend such as "larger" | "retraced" are there for completeness of transcription, but if used are not visible in the display.
3.5.3. Hyphens, dashes and lines for suppression
For hyphen (a word-internal feature), use a single hyphen in these contexts:
- Normal hyphen (well-known), whether intra-word on one line or at the end of a line where the word has been split by the author across two lines.
NB. Only use hyphen at end of line (no following space) if the author does so; sometimes they use a double hyphen (=) or nothing at all.
NB. The hyphen may also be repeated before the word fragment at the start of the next line, but contemporary practice is rather variable.
- Horizontal mark under superscript abbreviation (Mrs–)
NB. A single hyphen never has a space to its left.
For dash (a piece of word-external sentence punctuation), use two hyphens with a space on either side of the pair, even at the beginning or end of a line:
A horizontal line used to suppress all or part of a name or an oath is represented by two em-dashes not surrounded by spaces. This is not sentence punctuation, merely suppression (‘blanking’), and our transcription is a visual representation of the manuscript form.
- Wrap the suppressed word in <abbr>: Whilst I was at <persName>M<hi rend="superscript">r</hi> <abbr>——</abbr></persName> […]
- In turn, if the unsuppressed form is known, wrap <abbr> in <choice> with <expan>: <seg type="bodySalute"><rs> you little <choice><abbr>D——l</abbr><expan>Devil</expan></choice></rs></seg>the <persName><choice><abbr>K——g</abbr><expan>King</expan></choice></persName> & <rs>my <choice><abbr>B——r</abbr><expan>Brother</expan></choice></rs><seg type="author">her <persName><choice><abbr>D——</abbr><expan>Dickenson</expan></choice></persName></seg>
Whether complete suppression of a word can fairly be called abbreviation is open to question, but the use of <abbr> even here allows for expansion if the unsuppressed form is known, as well as serving to distinguish further a line of suppression (which if not tagged would often have been ‘ —— ’) from a dash (‘ — ’): it's important to keep them apart.
3.5.4. Authorial lines and flourishes: <milestone>
We use the <milestone> element to represent some kind of line or flourish added by author or annotator to mark the conclusion of one section, e.g. underneath the dateline or a postscript (to the extent that this can be distinguished from word underlining or dashes). If a simple forward slash ‘/’ or other character is a better representation of such a mark, and when it is inline, use that instead (it’s about the visual representation, not the function of the element <milestone>).
The attributes and their possible values are as follows:
- value of @style: "20% align-left | 20% center | 20% align-right | 50% center | 100%".
- A crude indication of width and position.
- value of @type: "line".
- Only "line" at present, even when the actual mark is diagonal, or distinctly bowed, or with curlicues.
- value of @unit: "address" | annotation | date | dateline | entry | letter_body | pageNum | paragraph | postscript | salutation | section | signature | time".
- The part of the document that is closed off by the line or flourish.
When a <milestone> is used to mark off a full element (e.g. opener, salute, dateline, note, etc.), place it inside that element, before the closing tag:
The annotation in this example has been rounded off with a flourish. By ensuring that the milestone is marked up as part of the annotation, the line will — as it should — disappear from the normalised display.
3.5.5. Quotation marks
Curly quotation marks are available in oXygen from the Symbols menu (Ω button on toolbar and search for quotation in the character map). Recently used symbols remain in the menu for quick re-use.
3.5.5.1. Double
Represent quotation marks in the original by curly double quotation marks (“…”). Quotation marks don't always come in pairs: it was good practice at the time to repeat opening quotes at the start of each line, with closing quotes used just once at the end of the quotation:
NB. For coding attribute values in TEI mark-up, only straight double quotation marks ("…") are permitted.
NB. A very few of our authors do sometimes use single quotation marks, e.g. Mrs Sarah Hamilton in HAM/1/3/1/1. In such cases use curly single quotes.
3.5.5.2. Single
NB. Single curly quotation marks (‘…’) should be used in footnotes for glosses (meanings) and when quoting a passage:
3.5.6. Apostrophes
We distinguish apostrophes from single quotation marks. Always use the straight apostrophe ('), both in contexts where apostrophe is still used (possessives, shortened auxiliary verbs, etc.) and in contexts no longer standard:
NB. The last two examples here would of course be normalised to <reg>distressed</reg> and <reg>would</reg>, respectively.
3.5.7. Parentheses
When a pair of characters is used to surround a parenthesis, always use (…) even if both opening and closing brackets look more like straight /…/.
When a bracket-like character seems to be used singly as some sort of punctuation mark (and not merely careless omission of the closing bracket of a parenthesis), code it with whatever character comes closest to it visually, whether /, ( or [.
3.5.8. Ampersands, etc.
A variety of single-character squiggles are used by writers to abbreviate the conjunction and. We represent any of them by an ampersand. However, the character ‘&’ has a special value in XML and must be entered as the XML entity ‘&’ in the text stream.
Other internal entities include angle brackets (aka the less-than and greater-than symbols), used all the time in mark-up but rarely needed in text. If needed, they are to be coded as ‘<’ and ‘>’, respectively. As for straight apostrophe, we use the normal keyboard character ' rather than the entity ‘'’ — apparently without ill effects.
3.5.9. Word spacing
Spacing can be idiosyncratic, and we often give writers the benefit of the doubt or even silently modernise an apparent missing space, e.g. alittle Friend > a little Friend. However, wrong intra-word spacing, e.g. a lone for ‘alone’, is corrected with <sic> + <corr> to produce alone in the normalised edition, except for instances that would have been considered appropriate at the time (e.g. to day, my self, etc.), in which case we don't intervene.
For the special case of a word broken across a line boundary (i.e. hyphenation, with or without an explicit hyphen), see Word separation at line-end below. For initials and acronyms written without spaces, see Expansion of abbreviations.
3.5.9.1. Horizontal spacing between items: <space>
The element <space> ‘indicates the location of a significant space in the text’ (TEI Guidelines). We use it as a convenient way of capturing the layout of a manuscript by displaying extra space between items on the same line. There are two attributes, @unit, with the fixed value "chars", and @quantity, value any number from, say, 3 to 60. Leave one ordinary space to the left of <space>:
3.5.10. Additions: <add>
The element <add> (addition) ‘contains letters, words, or phrases inserted in the source text by an author’ (adapted from TEI Guidelines). The obligatory attribute @place indicates where the text to be added comes from in the original document:
- place="above" or "below" [the line]
- place="margin-left" or "margin-right"
when the addition is roughly level with the line
- place="inline"
when spacing, other edits or ink make clear that something was squeezed in on the line as an afterthought, or as part of a substitution
- place="center | top-left | top-center | top-right | bot-left | bot-center | bot-right"
when the addition comes from a location distant from the line where it is to be inserted
The values "above" and "below" of @place affect the display visibly, while all other values suggest that the added text can be inserted at the default level of the line.
We no longer use the optional attribute @hand.
3.5.10.1. Addition vs. annotation
Sometimes an author adds text not just above/below/on the line or level with the line in the margin, but somewhere else on the page instead. If paired symbols (asterisks, crosses, etc.) are used by the author to indicate where the added text is to be inserted, we can use <add> at the indicated destination, with <ref type="moved"> inside the <add> and @anchor at the actual site (see Moved text), plus a short footnote reading ‘Moved added text here from [e.g. bottom of page], the insertion point indicated by paired [crosses/asterisks, etc.]'. Omit the second clause if symbols aren't used, and no need to transcribe the symbols if they are.
However, an alternative coding sees such material not as an addition to be inserted in the text but as an explanatory, supplementary annotation made some time after the original time of writing. Making the addition~annotation distinction by means of chronology is notoriously uncertain and impractical. The crucial distinction is whether inserting the material in the text would disrupt it, especially to the point of making it ungrammatical. That would be an argument in favour of annotation. Another pointer in the same direction would be material placed at the foot of the page when there would have been room to insert it above the line. In any event, this is a judgement call to be made on a case-by-case basis.
If the annotator, typically MH, is returning to a diary or (copy of/draft for) one of her own letters, then the choice of annotation would require MH to be entered twice in <handDesc>, once as major hand = author and again as minor hand = writer of annotation. See also Annotations below.
Another situation calling for the author to be entered as a minor hand is where the annotation is actually earlier than the main text, because the sheet has been re-used, as in the unused, upside-down datelines for 6-8 December 1781 on HAM/2/4 p.62, a sheet now part of a sequence of entries for September 1783.
If the writer of the extra material is not the author, then there is normally no choice: it is an annotation by a minor hand. There is one possible exception where <add> may in fact be used: see Annotations on manuscript and Supplied text below.
3.5.10.2. Addition vs. continuation
The definition given above in Addition makes clear that <add> is intended to mark afterthoughts inserted in a stretch of text already written. Such additions are to be distinguished from the situation where an author is running out of space at the foot of the page and just continues writing in some available bit of white space; a common expedient is to continue the last few words up the right margin. Such continuations should not be wrapped in <add>. Instead begin a new line in the transcription, with a footnote at the end of the continuation to explain, say, that the last eight words are written vertically in the right margin (or as appropriate). If the location of the continuation is distant from the end of the conventionally written material, we again continue on a new line, this time with the continuation wrapped in a <ref> whose @target points to an <anchor> tag at the original location of the material; for details see Moved text below.
3.5.11. Deletions: <del>
This element is used for text (anything from one character to a long passage) deleted by the author of the document, or otherwise indicated by them as being superfluous or spurious. The deletion is visible in the original.
NB. Not to be used for deletions made by annotators, editors or encoders; see further below.
The obligatory @rend attribute most often has the value "cancelled" (i.e. crossed out), "erased" or "overwritten". You can use up to two values together, including the fourth possible value, "censored". This last can only be used in the relatively infrequent event that the author has self-censored themself. It is much more common for an annotator to censor text (dealt with in Censorship below). Because censorship by another is not coded as <del>, <del> does not need an attribute @hand.
3.5.11.1. Substitutions: <subst>
The element <subst> (substitution) groups ‘one or more deletions […] with one or more additions when the combination is to be regarded as a single intervention’ by the writer of the original text (adapted from definition in TEI Guidelines). The <subst> element contains one or more <del> elements and one or more <add>s, in any order, but without any other text intervening, apart from spaces if needed. The most common case is one deletion paired with one addition. The order is <del><add> if the new text is on top of or after the deleted matter:
If the new text appears to the left of the deletion, the order is reversed accordingly:
More complex cases with interwoven <del>s and <add>s are permitted by the schema, may be fully intelligible as XML code and should produce the correct transcriptions, though the tooltip(s) offered on our website are unlikely to be wholly adequate.
3.5.11.2. <del> + <add> ≠ <subst>
As noted above, <subst> cannot contain any text that is not wrapped in either <del> or <add>. It cannot therefore be used for a deletion + related addition which leaves intervening text intact, as in an author's decision to delete the word only from the beginning of a clause and insert it at the end, or to turn to into at while retaining the t. Without using linking or grouping mechanisms, such revisions must be coded as two unrelated changes:
3.5.12. Censorship: @reason="censored"
‘Censorship’ as used here is the deliberate obscuration of information, most often personal names, especially by MH herself in the papers she archived, whether in her own diaries, in copies of letters sent, or in letters she received. Sometimes it involves place-names, notably in her obsessive scrubbing-out of ‘(N)orthampton(shire)’ in letters received from William Napier, where the place is hardly a secret. Sometimes it involves whole passages.
When the censor is the original author, even if carrying out the deletions possibly long after the original composition, we usually ignore the passage of time and use <del rend="… censored">.
Where the censor is an annotator, deletions are not coded by <del>, which is an authorial action. Instead they must be coded as either <gap> or <unclear> (both covered below under No transcription and Cautious transcription, respectively), with "censored" included in the values of @reason:
Censorship by an author of their own earlier writing raises the same questions as authorial insertions, discussed in Addition vs annotation above. In those few documents where a set of insertions and deletions by the author can be safely distinguished from the original writing, and where we choose to treat such insertions as annotations by a minor hand (using a separate <handNote> entry in <handDesc>), then the author-censor's later ‘deletions’ are likewise those of an annotator, so <del> is not used.
3.5.13. Change of hand: <handShift>
The element <handShift> is used sparingly. It has the obligatory attribute @new, whose value indicates the hand and only indirectly the writer. It is only used when multiple hands have contributed to the same (section of) text, e.g. when a signatory does not write the rest of the letter, or when several people contribute short passages to the same letter.
However, if the contributions of different hands can be neatly divided among separate letters within the same document, or at least into cleanly demarcated sections such as letter content vs. address page, use <div> with the attribute @hand rather than <handShift>, e.g.
The display scripts do not show a change of hand, whether coded by either <handShift> or <div>, so it is helpful to the reader to insert a footnote after the <pb> tag if the change of hand coincides with a page break, or after the <handShift> or <div> tag if mid-page:
If a hand adds anything after the letter was sent, that is an annotation and clearly distinguished on-screen (see just below), so neither a <handShift> tag nor a footnote is needed.
3.5.14. Annotations on manuscript: <note>[@hand]
We use the term annotation for anything written on a letter after its sending, including notes by the recipient, by members of the Hamilton/Anson family curating the collection, and by archivists employed in a library or archive. Anything written after sending by a different hand from the author(s) of the letter must be an annotation.
NB. Although franked addresses are written by another hand at a different time from the letter itself, they must at least be written before sending and so are not classed as annotations. See Frank signatories above.
With drafts or copies of out-letters, the position is less clear, and with diaries there is no ‘sending’ (though in fact MH often reports sending portions of her diary to friends). Text in hands other than the (current) main hand must be an annotation, but text in the main (author's) hand may be an addition or an annotation. For considerations we can bring to bear on the choice, see Addition vs. annotation above.
An annotation is coded as <note> with the attribute @hand, preferably given as the first attribute for visual checking purposes, to distinguish an annotation from a footnote, which is <note> with attribute @resp. (Order of attributes is actually immaterial in TEI/XML.) The attribute @place indicates a rough placement on the page but has no effect on the display. An optional attribute @rend can be used to adjust L-R placement on the line, though increasingly such a @rend attribute is placed in <lb> instead (see Overview of @rend below).
The display scripts display annotations in the diplomatic transcription in a different font and colour from those of the main text, with a mouseover tooltip attributing the hand, if known. Annotations in letters that concern the letter as a whole, e.g. archivists' numbers or shelfmarks, are often moved outside the main content if not already at the very top. Annotations are not considered part of the textual corpus and are omitted completely from normalised transcriptions.
If an annotation appears above the line and is to be displayed accordingly, the only mechanism we have for raising text above the line is <add place="above">, and so we have occasionally used <add> on the whole contents of <note[@hand]>; strictly speaking, that is not an addition:
3.5.14.1. Which shelfmarks to transcribe
We have never transcribed the shelfmarks of the John Rylands archivist, Lisa Crawley. We transcribe as annotations all other archival writing, including shelfmarks or references added by other libraries:
3.6. Editorial intervention
Processing by XSLT and other scripts allows a single text stream in our TEI/XML files to generate both a diplomatic and a normalised transcription, whether for on-screen display (.html files) or for plain text (.txt files). In broad outline, this is done in two ways:
- different templates in a script, for example
- to retain long-s ‘ſ’ in diplomatic text files but display it as ‘ʃ’ on-screen, and to normalise it to ‘s’ both in text files and on-screen
- to display author-deleted text in a certain way in the diplomatic outputs but to suppress it completely in the normalised ones
- <choice> elements in the text stream, e.g. for original spelling vs. modern
The diplomatic forms aim to represent as faithfully as practicable the usage of the original manuscript, while the normalised forms may be easier for non-specialists to read but also, and crucially, a more efficient input for automatic part-of-speech (POS) tagging.
3.6.1. Diplomatic vs. normalised: <choice>
A <choice> element always wraps a pair of elements together, with no residue. We use three different pairs, discussed below. The first member is always the one selected for diplomatic outputs, the second for normalised.
NB. There are circumstances where (usually) the second element itself contains an embedded <choice> (see Self-embedded <choice> below).
NB. We don't normalise quoted text, poetry or foreign-language text.
3.6.1.1. Correction: <sic> + <corr>
We tag forms which are clearly errors, or else idiosyncratic spellings not attested at the time, with <sic>, tagging the corrected form with <corr> and wrapping the pair in <choice>. To establish contemporary currency, first check the Excel sheet ‘orig-sic_2021july_checks.xslx’ in GH. If the word is not there, add your decision after consulting the list of Forms in OED, or if necessary doing an OED Advanced search of Quotations for Quotation Text, which allows sorting of hits by date. If the form seems to be current at the time, use <orig> instead.
The vast majority of forms with <sic> will get an adjacent <corr> element, the pair in turn wrapped in <choice>. In the following example, one of a tiny number containing <sic> without <choice>, it is unclear whether uncommonly is a slip for uncommon or whether another word has been left out, and so there is no safe correction:
3.6.1.2. Regularisation: <orig> + <reg>
Obsolete morphological forms and spellings which were not anomalous at the time of writing are regularised to standard British English with the pair <orig> + <reg>, including original forms that happen to match present-day American rather than British spelling, e.g. -or rather than -our. However, we do not adjust (non-)use of apostrophes concerning plurals or possessives, nor archaic capitalisation.
NB. We no longer retain long ſ inside <reg>, <corr> or <expan>.
Past tense/past participles of regular verbs spelled with -d and now also -'d for PDE -ed are regularised:
Both stem and ending in forms like carry'd, desyr'd, folowd are regularised at the same time:
No-longer-standard forms of irregular [strong] verbs (e.g. past tense sunk, past participle wrote) are to be regularised:
Forms of lay for lie are not currently tagged:
Regularise wou'd, cou'd, shou'd, woud, coud, shoud, and (NB. change of practice) also regularise missing apostrophes in negative forms like dont or wont, and in pronoun + verb combinations like shes.
The forms 'til, until are still current, so they are not tagged, but 'till is. The frequent variants of o'clock without apostrophe and/or spaced as two words should be regularised, though capitalisation is left as is:
3.6.1.2.1. Word separation at line-end
Our diplomatic transcription respects line boundaries in the original, but where a word crosses a line break, our normalised transcription reconstitutes the word at the end of the first line. Code with <choice> as follows, adding the attribute @n to <choice> with the value "hyp":
Reproduce exactly the type of hyphen (or non-use) on each fragment. The illustration above shows a double hyphen (for which we use the equals sign) on both first and second fragments — not uncommon in the corpus — but either fragment may have a single, double or no hyphen.
In order to ensure that the usual presentation is maintained, with diplomatic and normalised transcriptions having the same line-spacing and page or column boundary marking, the diplomatic transcription (generated from <orig>) and the normalised transcription (from <reg>) each need a <lb> element; we no longer include the attribute ‘break="no"’. If the broken word additionally crosses a column and/or page break, then <cb> and/or <pb> and their preceding <lb> will need to be repeated in <reg> as well:
NB. A punctuation mark that follows a word separated at line-end goes outside the <choice> element. This has the unfortunate effect in the normalised transcription that the punctuation mark is no longer attached to the reconstituted word but appears at the start of the following line. However, the script pipeline that produces normalised text files restores the orphan punctuation to its rightful position at the end of the first line.
3.6.1.2.2. Self-embedded <choice>
3.6.1.3. Expansion of abbreviations: <abbr> + <expan>
With some exceptions, we tag an abbreviated form with <abbr>, its expansion with <expan>, and wrap the pair in <choice>. The most common way for abbreviations to be written in the Papers is
- one or more letters representing the start of the full word
- one or more letters, superscripted, ending with the last letter of the full word
- usually but not always, a mark under the superscript, which for consistency we choose to represent as punctuation (most often point, hyphen or colon), never as underline
NB. In transcription, any such punctuation is placed after all the letters, outside the superscript and before the closing </abbr> tag, regardless of relative left-right orientation in the original. (The same is true of abbreviations like Mrs-, Dr. that we don't tag.) That way a string such as Compts (as in the first example below) or the less frequent Comps may be followed by punctuation but will never be interrupted by it, which is useful for searching purposes.
Words from letter-writing formulae such as compliments, affectionate(ly) and servant are obvious candidates for abbreviation in correspondence:
Particularly common as abbreviations are function words such as the or your or with, personal titles like Lord or Lady, and personal names like William; we can include here initials as well.
We expand Col./Coll/Coll./etc. ‘Colonel’ and Gen:/Genl./etc. ‘General’, but many familiar abbreviations are not tagged at all: Mr, Mrs, Dr (as title, ‘Doctor’), Messrs, Esq, PS, St (‘Saint’), No (‘Number’), all still in use (and listed here in modern style without superscript or a final point). Even though generally superseded now by etc, we don't tag the very frequent &c (‘et cetera’) either.
Two of the same forms are tagged as abbreviations in the normal way, however, when used differently, namely Dr (as the referential noun ‘Doctor’, as ‘Dowager’ or as adjective ‘Dear’) and St (as ‘Street’):
Use separate tags for adjacent abbreviations, such as the Prince of Wales's repeated A. A. A. ‘Adieu Adieu Adieu’, or the signature MCG ‘Martha Carolina Goldsworthy’:
NB. Notice the small cheat in the last example which leaves the closed-up initials to appear as such in the diplomatic transcription, while the full names are correctly spaced out in the normalised transcription, thanks to an extra space inserted in the first two <expan> elements (also noted in Signatures). The same trick may be needed for HRH ‘His/Her Royal Highness’, etc.
NB. The abbreviations PW ‘Prince of Wales’ and M: P: ‘Member of Parliament’ are best expanded as single units.
Unlike <sic> and <orig>, <abbr> is used reasonably often without its sister element or the enclosing <choice>, particularly for a person's initial when it cannot safely be expanded into a full name:
3.6.1.3.1. Macron
Some writers abbreviate a double consonant by placing a macron over a single consonant: com̄and ‘command’, ac̄ept ‘accept’, etc. The macron is also sometimes found in thrō ‘through’ and wc̄h ‘which’.
To enter such forms, find the appropriate Unicode character in oXygen via Edit | Insert from Character Map and search for ‘macron’. For ō choose Latin Small Letter O with Macron, which is a straightforward accented letter. Other values needed, such as m̄ n̄ c̄ r̄, must be constructed by typing the bare letter and then Combining Macron from the menu. This character may play havoc with cursor position in the oXygen editor, and lines containing it are more easily edited in Notepad++, which displays the combining macron uncombined and otherwise behaves normally. (A letter + combining macron currently displays less well in Firefox than in Chrome, Opera or Edge.)
3.6.1.3.2. Tho and tho' (‘though’), thro and thro' (‘through’)
The forms tho' and thro' with apostrophe are treated as abbreviations and tagged with <abbr>:
Likewise Dorothy Blosset's idiosyncratic form th'o.
However, the forms tho and thro without apostrophe are tagged with <orig>:
3.6.2. No transcription: <gap>
The element <gap> ‘indicates a point where material has been omitted in a transcription […] because the material is illegible [or] invisible’ (adapted from definition in TEI Guidelines). It signals an editorial decision to omit, or an inability to transcribe, a span of text that follows that point. It does not itself extend over a span, so the closing slash / goes at the end of the tag (as with <lb/>) and there is no separate closing tag:
As indicated by these examples, <gap> must contain the attributes @reason and @extent:
- value of @reason: up to 3 from "cancelled | censored | covered | crease | damage | end-of-line | erased | faded | fold | folded-corner | illegible | ink-blot | overinked-pen | page-cut | seal | smudge | tear | unclear".
- value of @extent: "nn character(s) | word(s) | line(s)", where nn = 1 | 2 | 3 | etc.
The value "½ line" is also permissible.
NB. A hedge like "5-6 characters", "2-3 words" is permissible, and the display script reacts to the first number only.
If an untranscribed passage crosses line boundaries, it should be tagged as a single <gap>:
When an author has written something and subsequently deleted it, rendering it irrecoverable, <del> and <gap> should be used together as follows:
The method of deletion is stated in @rend for <del>, e.g. "cancelled" or "erased", so it need not be repeated in @reason for <gap>: a sufficient reason for a gap in deleted text is that it is "unclear" or wholly "illegible".
NB. If a part of the text has been censored by the recipient and is no longer legible, do not use <del> with <gap> but <gap> alone.
The <gap> element should only be used where text has not been transcribed. If partially legible text has been transcribed, <unclear> should be used instead.
3.6.3. Cautious transcription: <unclear>
If text is legible with a reasonable degree of certainty, use <unclear>:
Permitted values for the obligatory attribute @reason in <unclear> are "cancelled | censored | corrected | covered | crease | damage | end-of-line | erased | faded | fold | folded-corner | indistinct | ink-blot | overinked-pen | overwritten | page-cut | seal | smudge | tear". The default value if there is no specific cause for the unclearness is "indistinct". There is no limit to how many values can be used together:
3.6.4. Supplied text: <supplied>
The tag <supplied> is used as sparingly as possible. Words omitted by an author in error are not <supplied> in the transcriptions; a footnote may be offered instead. Where damage leaves no visible trace of a word, <gap> is used. If a word or character is almost readable, but with some uncertainty, it is marked <unclear>.
The tag <supplied> appears in this edition when there is not enough text remaining to justify <unclear>, but where there are sufficient traces of text to corroborate contextual and physical evidence of loss. This evidence may include third-party writings, such as a recipient who adds the words that were once there but now gone due to damage (see HAM/1/2/20). An example:
As indicated in the example, there are three obligatory attributes, @resp, @reason and @cert:
- @resp as usual indicates who is responsible for the editorial action.
NB. Remember to include # in the value.
- value of @reason: up to 3 from "covered | crease | damage | end-of-line | faded | fold | folded-corner | illegible | ink-blot | missing | page-cut | seal | smudge | tear".
- value of @cert: "low | medium | high", with default "high", since <supplied> is not to be used lightly.
A lost possessive s or 's may be <supplied>, but normally with cert="low", since the use of apostrophes at that period is rather variable. Likewise a lost past tense or past participle (e)d may be <supplied>, but normally with cert="low" or cert="mid", depending on the consistency of that author's selection from among such forms as wished, wish'd, wishd, wisht (all of which occur in the corpus).
Very rarely indeed we have wrapped <supplied> inside <add> in order to have the supplied text placed above the line (HAM/1/2/47):
3.6.5. Redundant words: <surplus>
When a word is mistakenly repeated by an author, e.g. ‘There is a walk near the the Castle called […]’, tag the second occurrence as <surplus> with the attribute @reason and value "repeated". Other possible values for @reason in <surplus> are "anacoluthon" | "inappropriate" | "inserted-material", the last-named for the situation where a change of mind by the author in another part of the sentence makes the tagged word unnecessary in the revised version.
If a word is repeated over a page break, there is the possibility of tagging the first one as a catchword instead, if that would seem consonant with the writer's habits.
3.6.6. Moved text: <anchor> and <ref>
An <anchor> ‘attaches an identifier to a point within a text, whether or not it corresponds with a textual element’ (TEI guidelines). A <ref> (reference) ‘defines a reference to another location […]’. When we move a segment of text to a more logical reading position in order to avoid a discontinuity, we link source and destination positions with an <anchor> and a <ref> element, respectively. The chunks of text liable to be moved are addresses, annotations, dates, postscripts and ‘sections’ (a generic term for anything from a few words to several paragraphs).
Suppose a postscript has been written at the very top of the letter (see HAM/1/10/1/25, GEO/ADD/3/82/11 for examples). In the transcription, an <anchor> is placed at the top, marking the original position of the postscript and displaying as ▼. The element is a point, not a span, so the closing / is contained within the tag. The postscript is moved to where it would normally have been located had there been space enough, namely the foot of the letter. The moved text is wrapped in a <ref> whose attribute @target, "#ps", corresponds to the @xml:id of the <anchor>, "ps". However, the attribute @type on <ref> with value "moved" instructs the display script not to create a hyperlink tempting the reader to jump to the anchor position, which would be the more normal use of <ref>[@target] (on which see Links and cross-references below).
Top of the letter (place of the postscript in the ms):
Bottom of the letter (place of the postscript in the transcription):
For the placement of <ref> inside the postscript, see Postscripts above. Recall that if there is a <label> before the <p> of the postscript, <ref> must be duplicated inside <label>:
Anchor requires three attributes: @xml:id, @type and @resp:
- value of @xml:id: "ps | ann | sec | add | dt".
The one actually used in a particular <anchor> must correspond to the full form given as the value of @type.
- value of @type: "postscript | annotation | section | address | date".
The schema prompts with possible values and ensures that @xml:id and @type belong together.
If there is more than one of any <anchor> type in the document, the values of their @xml:id's are numbered, e.g. "ps1 | ps2 | …", "sec1 | sec2 | …".
- @resp as usual indicates who is responsible for the editorial action.
An <anchor> should not be free-floating but should have a <lb> to its left. Remember to include @resp in <anchor>, with # before your initials.
Although the original location is displayed as ▼, the moved text with its <ref> wrapping has no special marking on-screen. We generally add a footnote at the end of the moved text to alert the reader, often with this formulaic wording:
The parts about unfolding and direction of writing are each optional, used only as necessary.
NB. For the special case of an address line broken up by unfolding, see Addresses above.
It is quite common that the last page of a letter is divided in three, with two strips of content separated by the address panel in the centre. In such cases our preferred practice is to move the address down rather than the bottom strip up, since the address is the last to be written, and ▼ in the middle of the page is easier for relating image and transcription at a glance.
Although not an objective of this project, the code linking <anchor> and <ref> would permit a hyper-diplomatic version of transcription in which everything was left in situ.
3.6.7. Footnotes: <note>[@resp]
An editorial footnote is inserted where something is to be explained to the reader. The display script moves it to the foot of the text and numbers it sequentially, leaving a lemma number in situ, raised above the line. On the project website, hyperlinks allow the reader to jump from lemma to footnote [actually an endnote] and back; in MDC, the footnotes appear just below the text corresponding to one image, and the numbering restarts with each new page. Footnotes are coded as <note> with the sole attribute @resp, to distinguish them from Annotations for the XSLT script. The value of @resp is "#XYZ", where XYZ is the note-writer's standard identifier, aka the editor's initials; the # means that the xml:id has been declared within the document.
There is hardly any TEI/XML restriction on the position of <note>. The lemma for a footnote usually follows the last character of the span of text referred to. A footnote referring to the whole document, e.g. explaining an editorial change of sequence, will normally go straight after <pb n="1"/>.
A footnote referring to a particular element usually goes just before that element's closing tag. This avoids the footnote lemma jumping to the next line if the element is part of a centred or right-aligned span. If a footnote concerns an annotation, however, that would cause the raised footnote lemma number to be displayed in green italics, like the rest of the annotation, so the footnote's opening <note> is best placed after the annotation's closing </note> — so long as there is no problem keeping the lemma on the same line; if there is, too bad, and italicising the number is the lesser of two evils.
- A footnote lemma follows a span of text including any closing punctuation, i.e. according to British rather than American publishing practice.
- There is no space before a footnote lemma, except to separate lemmas where there is more than one footnote at a particular point in the transcription.
- A footnote ends with a period.
- References to book titles in a footnote go in italics, not quotation marks.
- Quoted material goes in single curly quotation marks, but when discussing the meaning of a word, distinguish the word itself (in italics) from the meaning(s) or sense(s) attached to it (in single curly quotation marks).
- References to the OED should be laid out in one of the ways illustrated below: […] Tredrille<note resp="#TO">‘A card-game played by three persons, usually with thirty cards’ (<hi rend="italic">OED</hi> s.v. <hi rend="italic">tredrille</hi> n. Accessed 02-07-2020).</note> […] pugh<note resp="#TO">This reading is uncertain, but <hi rend="italic">pugh</hi> is a rare 18th-century spelling of the interjection <hi rend="italic">pooh</hi> ‘[e]xpressing impatience, contempt, disdain, etc.’ (<hi rend="italic">OED</hi> s.v., B.1. Accessed 19-06-2020).</note>
Please include a footnote when the image of a document only contains covers, in order to clarify the absence of a transcription.
NB. What this document calls ‘footnotes’ are presented as such in MDC transcriptions, but they are displayed as endnotes on the project website.
3.6.7.1. Links and cross-references: <ref>
Hyperlinks to other items are needed most often inside editorial notes, as well as for navigation through a series of items using the ‘Previous Item’ and ‘Next Item’ buttons. In limited circumstances, links may appear in the transcription itself. A hyperlink is stored inside <ref>, with the attributes @type and @target. We prefer to have @type first, as a convenience for visual checking of XML files. Except in an item's summary, where an absolute path is needed, the target of a link within the edition is simply a filename that ends in ‘.xml’. All but the last of the following types are displayed as hyperlinks that the user can click on.
- The previous and next items in a subseries are stored in correspContext/ref in the header, with @type taking the values "prev" or "next": <correspContext> <ref type="prev" target="AR-HAM-00001-00002-00017.xml"/> <ref type="next" target="AR-HAM-00001-00002-00019.xml"/> </correspContext>
For the first or last items in a subseries, simply omit the prev or next link, as appropriate.
- For a hyperlinked mention in text or footnotes of another item in the edition, the value of @type is "HAMedition". Manchester Digital Collections uses the plain filename format here for @target, which can point either to [the top of] an entire document: See <ref type="HAMedition" target="AR-HAM-00001-00017-00070.xml">HAM/1/17/70</ref> for […]
or to a specific page:
[…] cf. <ref type="HAMedition" target="AR-HAM-00001-00002-00017/2.xml">HAM/1/2/17 p.2 col.2</ref> and others in this sequence. […] the poem Hamilton refers to a couple of months later in a diary (<ref type="HAMedition" target="AR-HAM-00002-00005/15.xml">HAM/2/5 p.15, entry for 13 November 1783</ref>): ‘I corrected M<hi rend="superscript">rs Vesey</hi>s copy of Miſs H Mores Poem […]’ - For a hyperlinked mention in the <summary> element of the header, the value of @type is again "HAMedition", but Manchester Digital Collections uses absolute references for @target here: <summary><seg type="para">[…] This was originally catalogued, along with <ref type="HAMedition" target="https://www.digitalcollections.manchester.ac.uk/view/AR-HAM-00001-00001-00002-00007-B">HAM/1/1/2/7b</ref>, as a single item HAM/1/1/2/7.</seg> […]
Notice the different format, with a full URL and no ‘.xml’ at the end.
- For a link to the project website, @type takes the value "HAMproject": See <ref type="HAMproject" target="https://www.maryhamiltonpapers.alc.manchester.ac.uk/a-new-species-of-robbery/">‘A New Species of Robbery’</ref> (blogpost by Cassandra Ulph).
- For all other external sites, whether in header or body, @type="external": ‘“Ferruginea” has dark-green leaves with rust-brown undersides’ (<ref type="external" target="https://en.wikipedia.org/wiki/Magnolia_grandiflora">Wikipedia</ref>).
- Finally, as discussed in Moved text above, a <ref> with @type="moved" links a stretch of moved text to the <anchor> that marks its original position, for completeness of information in the XML file: <ref type="moved" target="#dt">1813</ref>
Links with <anchor> as target are neither displayed in transcriptions nor processed as hyperlinks.
3.6.7.2. Bibliographic citations: <bibl>
A bibliographic citation in an editorial footnote should be tagged with <bibl>. This mark-up does not modify the visual appearance of the citation, but it may prove useful for indexing purposes.
Contrast the use of <bibl> wrapped inside <cit> (see Quotations above), where the bibliographic reference for an authorial quotation is only visible on-screen in a mouseover tooltip.
3.6.8. XML comments (internal editorial notes)
XML comments are for internal project use and will often be provisional or temporary; they should be deleted when no longer needed. They take the following form:
- The pseudo-attribute @resp assigns the comment to the team member concerned. As usual, # is needed before your initials.
- A colon is not needed before the comment part.
- A double hyphen is not allowed inside an XML comment, only at the boundaries.
XML comments are ignored by the display script and are not visible in transcriptions or metadata on-screen.
3.7. Overview of dating of letters
For most letters, the date we enter in both <origDate> and <correspAction type="sent">/date is the date recorded in ELGAR or the catalogue of another library, reproduced from <date> in <dateline> (if present) or otherwise inferred from the manuscript. However, if a journal-letter was written over several days, there can be one date in the catalogue and two different dates in the XML:
- the catalogue typically records the date the letter was started
- <origDate> records the period over which it was written
- <correspAction type="sent">/date records when it was sent
Furthermore, if it is known when a journal- or other letter was received, the metadata will have yet another <date> element — potentially different again — in <correspAction type="received">.
The appropriate attribute for entering a date is as follows:
- precise single date: @when
- approximate single date: @notBefore and/or @notAfter
- range of specific dates (for <origDate> only): @from + @to
Dates in an attribute are entered in the "1779-09-26" style (see Dates above).
Dates transcribed from an original document are reproduced exactly as written, but dates in plain text form in metadata (always) and footnotes (usually) are given in the style ‘26 September 1779’, etc.
3.8. Overview of attribute @rend
The attribute @rend (rendition) ‘indicates how the element in question was rendered or presented in the source text’ (TEI Guidelines). We use @rend in three quite distinct ways that cannot be combined:
- @rend adjusts the style of a span of text wrapped by one of the elements <hi>, <emph>, <l>.
- @rend shifts rightwards either what follows <lb> or the contents of an existing element such as <opener>, <closer>, <dateline>, <date>, <salute>, <signed>, <postscript>, <note>[@hand] and others.
- @rend="inline" cancels a midline line-break at start or end of a <p> element, when added to an opening <p> or <closer> tag.
We take them in turn. NB. We no longer permit mixing of the style and rightwards-movement uses in the same element (<signed rend="align-right underlined">, etc.).
3.8.1. Style
<hi> requires @rend with usually 1 but can be up to 3 values from the list "underlined | superscript | subscript | italic | strikethrough | double-underlined | triple-underlined | dotted-underlined | wavy-underlined | larger | all-caps | over-inked-pen | retraced". The last four are ignored in display, and all the variants of "underlined" are displayed as single underline.
NB. Superscript and subscript are reasonably grouped here among style attributes, since they display text in a smaller font as well as slightly raising or lowering it.
<emph> requires @rend with between 1 and 3 values from "underlined | italic | double-underlined | triple-underlined | dotted-underlined | wavy-underlined | retraced | larger | all-caps", with similar display limitations.
Wherever appropriate, underlining should be done with <emph> rather than <hi>. This needs thorough revision, since certain values of @rend, and indeed the element <emph> itself, had not been available in the Image to Text project.
<l> (verse line) optionally allows @rend with 1 of the following values: "underlined | italic".
3.8.2. Rightwards movement
On many content elements, @rend is concerned with left-right positioning, with possible values "indent | center | align-right". On our project website, these labels are somewhat misleading. Value "indent" inserts some hard spaces to indent the first line of the element only. Values "center" and "align-right" shift the left margin inwards for the duration of the element, rather than adjusting each line of text to be genuinely centered or aligned to the right margin, as in a word processor. Good practice generally is to use @rend in this way only on an element that starts at the left margin. If used on an element with text to its left, it will invoke an undesired line-break before the element. The values "center" and "align-right" for @rend always introduce a blank line in the display after the element, which is also often undesirable. For these reasons, unless it is particularly convenient to have @rend in another element, we are moving increasingly towards using @rend in <lb>, where it applies to that line only and does not introduce blank lines before or after it.
The default layout without @rend is left alignment.
NB. It is inadvisable to insert rightwards-moving @rend in more than one element on the same line. For cases where there needs to be more than the usual inter-word spacing between items on the same line, use the <space> element instead (on which see Extra spacing between items above). Leave one normal space to the left of <space>.
3.8.3. Cancellation of midline linebreak
The last use of @rend is to handle a peculiarity of the interaction between TEI and HTML. In normal use, a paragraph <p> element in XML or HTML is treated as such in on-screen HTML, starting on a line of its own and finishing on a line of its own. However, there are situations where TEI demands an opening <p> or a closing </p> at a point which must be midline if we are to preserve the lineation of the ms. The principal cases are these:
- A postscript begins with ‘P.S.’ or similar and may then continue on the same line with the content of the postscript. TEI requires that the line start with <postscript>, followed by ‘P.S.’ wrapped in <label>, followed by the content of the postscript wrapped in <p>. The tag <p> is thus inevitably midline. We place rend="inline" in <p> to instruct the display script not to jump to a new line: <lb/><postscript><label>P.S.</label> <p rend="inline">I am in daily <lb/>expectation of a letter from […];
- A div break requires a paragraph break as well, and the new paragraph may not be the very first item after the new <div>. In this example, <dateline> is not allowed in <p>: <lb/>we have had a good deal of rain this afternoon — </p> </div> <div type="continuation"> <dateline><date when="1789-07-15">15<hi rend="superscript">th</hi></hi></date>/</dateline> <p rend="inline">This morning I <lb/>got up at 5 O'Clock on purpose to write to you […]
- To eke out the paper, a writer starts a new paragraph not on a new line but after an extra space on the same line: <lb/>[…] I continue to amuse myself very much, with Music. <lb/>I have been once at the Opera, and was very <lb/>well entertained.</p> <space unit="chars" quantity="5"/><p rend="inline"><persName>Lady Stormont</persName> is come <lb/>to <placeName>Town</placeName>. <rs>Her Daughter</rs> is much improved […]
- A closer always requires the preceding paragraph to end just before it, and the changeover point may well be midline: <lb rend="indent"/>God bleſs and preserve You — </p> <closer rend="inline"><seg type="closeSalute">my dear <lb/>dear <persName>Mary</persName> […]
NB. @rend is not permitted in <fw>, nor anywhere in the header, specifically in <p>.
3.9. Overview of person references
This section summarises the different ways of tagging person references, some of which have already been discussed elsewhere under Salutations, Openers, Closers and Segments for research amalysis. For brevity, the examples leave out the attribute @ref that links an element to the personography. In actual XML files, of course, this attribute, used with value "psn:XYZ", must never be omitted from <persName> or <rs>.
3.9.1. <salute> vs. <seg>
References to the addressee and the author are tagged with <salute> or <seg> (and <persName> as well, if a proper name is used), depending on the role of the string in the letter:
- <salute> is a structural element containing the greeting at start or end of the letter.
- <seg> is used to tag mentions of the addressee in <p>, as well as of the author.
Final punctuation goes inside the <salute> or <seg> element. NB. This does not apply to <persName> and <rs>, neither of which includes final punctuation, only internal punctuation that is part of the reference, e.g. in an abbreviation.
If an explicit name for the addressee is given, both <salute> and <seg> take <persName>:
Pronouns such as you are not tagged either as <rs> or <seg>.
3.9.2. Referring to persons in <opener>
This will always be a reference to the addressee. If the name is given, use <persName> inside <salute>, if not, <rs> in <salute>:
3.9.3. Referring to persons in <p>
3.9.3.1. The addressee
The element <seg> should be used to tag a reference to the addressee in <p>, with one of two possible values for @type: "openSalute | bodySalute":
- Use <seg type="openSalute"> if there is no <salute> element in <opener> but a salutation in the first sentence of <p> instead (e.g. ‘To be sure there seems, my Dear Miſs Hamilton, to be a spell upon […]’). Remember to include <persName> or <rs> as appropriate.
- Use <seg type="bodySalute"> beyond the opening sentence, as follows:
- Only the explicit name of the addressee: seg type="bodySalute"><persName>Miss Hamilton</persName></seg>
- More than the explicit name of the addressee: <seg type="bodySalute">my dear <persName>Miss Hamilton</persName></seg>
- No explicit name of the addressee: <seg type="bodySalute"><rs>my dear friend/sister/etc.</rs></seg>
- Only the explicit name of the addressee:
- If <seg type="bodySalute"> is used towards the end of <p> and near <closer> (cf. Referring to persons in closer), indicate its proximity to <closer> with the attribute @n and the value "closing": <seg type="bodySalute" n="closing"><rs>my dear</rs></seg> God bless you.
NB. Pronouns such as you are not to be tagged here, in line with our general practice for <rs>. As with <salute>, keep final punctuation inside <seg>, including dash ( — ):
<lb/>Adieu — <seg type="bodySalute" n="closing"><rs>My very dear friend</rs> — </seg> With best Love to […] - If the <closer> element is only used for the author's signature and/or a dateline (i.e. there are no <salute="closing"> or <seg type="closeFormula"> elements present), a closing reference to the addressee which occurs before the signature and/or dateline is to be tagged with <seg type="bodySalute" n="closing"> inside <p>. <lb/>is my Night in waiting — God Bleſs You, Good Night <lb/><seg type="bodySalute" n="closing"><rs>My Dear Dear friend</rs>.</seg></p> <lb/><closer><signed><persname>Miranda</persName></signed></closer><lb/>Adieu <seg type="bodySalute" n="closing"><rs>my Dear Dear Friend</rs></seg></p> <lb/><closer><signed><persname>Miranda</persName></signed></closer>
It can happen that an author refers to the addressee in a quoted piece of text, as William Napier does in HAM/1/19/26. In that case, use <seg type="bodySalute"> rather than, for example, <rs>. It is not a direct salutation towards the addressee, but we will still want to investigate how the author refers to them in the context of a letter/note/etc. directed to them, hence the use of <seg type="bodySalute">.
In principle we do not tag generic references (‘the French’, ‘people who can make a tolerable use of them’), only references to specific individuals, but sometimes the distinction is tricky. For example, Mrs Sarah Dickenson writes ‘but how is it poʃsible you shou'd wish for a Letter from me, who in this little retirement can find nothing, which deserves to be the subject of one, to a young Lady? […] but that which is pleasing to a fond Mother, is too insignificant to be committed to paper’ (HAM/1/3/1/2). The highlighted phrases are technically generic, but clearly Mrs Dickenson is also characterising addressee and author as being, respectively, those kinds of person. It's a matter of judgement which such generic characterisations are usefully tagged with <seg> for research purposes (and perforce also with <rs>).
3.9.3.2. The author
When an author refers to themself in <p> in the third person, generally without using their name (John Fisher is a notable exception), we tag this with <seg type="author"> containing <persName> or <rs>:
- No explicit name of the author: surely no man can love a Woman more than <seg type="author"><rs>your Husband</rs></seg> loves you
- With explicit name of the author: <seg type="author"><persName>Lord Orford</persName></seg> is extremely obliged to […] <seg type="author">your affectionate <persName>Dickenson</persName></seg> is very unhappy to be at so great a distance from you
3.9.3.3. Other people
For other people mentioned in <p>, use <persName> where a name is present, <rs> otherwise. Some <rs> strings must in turn contain an embedded <persName> or <rs>:
3.9.4. Referring to persons in <closer>
3.9.4.1. The addressee
We use <seg type="closeSalute"> inside <closer> for a vocative reference to the addressee.
- Use <persName> inside <seg type="closeSalute"> where a name is present, <rs> if not (just as with <seg type="openSalute"> and <seg type="bodySalute">).
- For example, ‘Believe me, my dear brother, your most humble servant John Dickenson’ becomes: <closer><salute n="closing">Believe me <seg type="closeSalute"><rs>my dear brother,</rs>/seg> <seg type="closeFormula">your most humble servant</seg></salute> <signed><persName>John Dickenson</persName></signed></closer>
- And ‘Believe me, my dear Miss Hamilton, your most humble servant John Dickenson’ becomes: <closer><salute n="closing">Believe me <seg type="closeSalute">my dear <persName>Miss Hamilton</persName>, </seg> <seg type="closeFormula">your most humble servant</seg></salute> <signed><persName>John Dickenson</persName></signed></closer>
- For example, ‘Believe me, my dear brother, your most humble servant John Dickenson’ becomes:
- A parenthetic salutation intervening between a Believe me-type phrase and the closing formula is tagged with <seg type="closeSalute">, e.g. ‘believe me, my dear friend, your most humble servant HF’, and likewise one that occurs at the end of the closer, as ‘toujours chére’ [sic] in the second example below: <closer><salute n="closing">believe me, <seg type="closeSalute"><rs>my dear friend</rs>,</seg> <seg type="closeFormula">your most humble servant</seg></salute> <signed>HF</signed></closer><closer><salute n="closing"><seg type="closeFormula"><foreign xml:lang="fr">Votre</foreign></seg></salute> <signed><persName>Palemon</persName></signed> <seg type="closeSalute"><foreign xml:lang="fr">toujours chére</foreign>.</seg></closer>
- If <seg type="closeSalute"> is the first element in <closer>, however, e.g. ‘my dear friend, I am your most humble servant HF’, it precedes <salute> rather than going inside: <closer><seg type="closeSalute"><rs>my dear friend</rs>,</seg> <salute n="closing">I am <seg type="closeFormula">your most humble servant</seg></salute><signed>HF</signed></closer>
- If the <closer> element is only used for the author's signature and/or a dateline (i.e. there are no <salute="closing"> or <seg type="closeFormula"> elements present), a closing reference to the addressee which occurs before the signature and/or dateline is considered part of <p> and thus should be tagged with <seg type="bodySalute" n="closing"> inside <p>, not with <seg type="closeSalute"> inside <closer>. <lb/>is my Night in waiting — God Bleſs You, Good Night <lb/><seg type="bodySalute" n="closing"><rs>My Dear Dear friend</rs>.</seg></p> <lb/><closer><signed><persname>Miranda</persName></signed></closer><lb/>Adieu <seg type="bodySalute" n="closing"><rs>my Dear Dear Friend</rs></seg></p> <lb/><closer><signed><persname>Miranda</persName></signed></closer>
3.9.4.2. The author
When the author signs, the name, wrapped in <persName>, goes in <signed> (see Signatures). The element <signed> goes inside <closer> but after and outside <salute> (if present).
Such formulae to do with the author as ‘Yours affectionately’, ‘Your most humble servant’ are always wrapped in <seg type="closeFormula"> inside <salute n="closing">, and indeed <salute n="closing"> nearly always contains one, sometimes as its only contents:
- Include adjacent adverbials such as ‘ever’, ‘always’, ‘with great sincerity’: <closer><salute n="closing">I remain <seg type="closeFormula">ever your most affectionate</seg></salute> <signed><persName>Palemon</persName></signed></closer>
- Sometimes modifiers of the formula can be found after the (pro)noun referring to the author, in which case <eg type="closeFormula"> should be extended to include them: <lb/><closer><salute n="closing"><seg type="closeFormula">Yours with all my heart Errors excepted</seg></salute> <signed><persName ref="psn:HH">Henry Hamilton</persName></signed></closer>
- However, if another element such as <seg type="closeSalute"> intervenes between adverbial and basic formula, the adverbial goes inside <salute> but is not part of <seg type="closeFormula">: <closer><salute n="closing">I remain ever, <seg type="closeSalute"><rs>my dear friend</rs>,</seg> <seg type="closeFormula">your most affectionate</seg></salute> <signed><persName>Palemon</persName></signed></closer>
- However, if another element such as <seg type="closeSalute"> intervenes between adverbial and basic formula, the adverbial goes inside <salute> but is not part of <seg type="closeFormula">:
- Phrases occurring after the <signed> element such as ‘toujours de même’ are to be tagged with a separate <seg type="closeFormula"> element and wrapped inside a new <salute n="closing"> element. Additionally, if such a phrase is underlined, the preferred sequence is as follows: <closer> […] <signed>[…]</signed> <lb/><salute n="closing"><seg type="closeFormulae"> <foreign xml:lang="fr"><emph rend="underlined">toujours de même</emph></foreign>.</seg></salute></closer>
- NB. Final punctuation goes outside <foreign> and <emph> but remains inside <seg>.
- Instances such as ‘Your/Vôtre Palemon’ take <seg type="closeFormula"> for Your/Vôtre and <signed> for Palemon: <seg type="closeFormulae"><choice><abbr>Yr.</abbr><expan>Your</expan></choice></seg> <signed><persName>Palemon</persName></signed>
Specification []
Module: tei | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Module: core | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Module: header | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Module: linking | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Module: msdescription | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Module: namesdates | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Module: textstructure | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Module: transcr | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Module: tagdocs | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Module: corpus | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Module: certainty | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<salute> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<add> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<unclear> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<del> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<gap> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<hi> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<supplied> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<choice> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<expan> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<q> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<date> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<origDate> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<precision> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<foreign> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<ref> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<note> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<handNote> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<closer> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<opener> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<signed> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<postscript> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<text> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<handShift> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<lb> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<persName> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<rs> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<dateline> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<address> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<addrLine> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<cb> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<pb> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<correspAction> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<correspContext> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<roleName> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<p> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<space> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<bibl> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<milestone> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<metamark> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<seg> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<listTranspose> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<transpose> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<ptr> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<editor> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<title> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<orgName> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<cit> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<l> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<surplus> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<availability> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<licence> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<principal> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<notesStmt> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<relatedItem> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<facsimile> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<surface> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<div> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<emph> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<handDesc> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<measure> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<anchor> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<fw> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<change> |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<teiHeader> |
|