Forgetting. Encoding Failure Encoding failure Encoding Failure Encoding failure.
Text Encoding Issues The British Academic Written English (BAWE) project Corpus Linguistics...
-
Upload
pierce-hardy -
Category
Documents
-
view
219 -
download
0
description
Transcript of Text Encoding Issues The British Academic Written English (BAWE) project Corpus Linguistics...
![Page 1: Text Encoding Issues The British Academic Written English (BAWE) project Corpus Linguistics University of Birmingham July 16 th, 2005.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b967f8b9ab0599c3b1a/html5/thumbnails/1.jpg)
Text Encoding Issues
The British Academic Written English(BAWE) project
Corpus Linguistics University of Birmingham July 16th, 2005
![Page 2: Text Encoding Issues The British Academic Written English (BAWE) project Corpus Linguistics University of Birmingham July 16 th, 2005.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b967f8b9ab0599c3b1a/html5/thumbnails/2.jpg)
Assessed student writing
Which theoretical approach has best helped you ‘make sense’ of The Waste Land and why?
‘Would you agree that subordination was inscribed into the life of a domestic servant?’
“The expenditure of National Lottery funds on the arts in Britain cannot be convincingly
defended”. Discuss.
Explore the significance of the chat show genre as contributor to the project of feminist heterosexual politics
Critical Commentary: p180, from "Le jour je m'égarais..." to "le démon de mon coeur".
Information Systems Development
Case Study of the white-throated capuchin monkey (Cebus capucinus)
![Page 3: Text Encoding Issues The British Academic Written English (BAWE) project Corpus Linguistics University of Birmingham July 16 th, 2005.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b967f8b9ab0599c3b1a/html5/thumbnails/3.jpg)
Assessed student writing
![Page 4: Text Encoding Issues The British Academic Written English (BAWE) project Corpus Linguistics University of Birmingham July 16 th, 2005.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b967f8b9ab0599c3b1a/html5/thumbnails/4.jpg)
Text Encoding Issues
General issues A first stage of BAWE mark-up Dimensions Interactive tagging
Specific questions Text hierarchy Formulae
![Page 5: Text Encoding Issues The British Academic Written English (BAWE) project Corpus Linguistics University of Birmingham July 16 th, 2005.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b967f8b9ab0599c3b1a/html5/thumbnails/5.jpg)
A first stage of BAWE mark-up
shift in document format DOC XML: TEI standard
formatting: preserve information automatic vs. manual steps
of annotation
![Page 6: Text Encoding Issues The British Academic Written English (BAWE) project Corpus Linguistics University of Birmingham July 16 th, 2005.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b967f8b9ab0599c3b1a/html5/thumbnails/6.jpg)
Dimensions of mark-up
Text hierarchy
front, body, backsectionsparagraphs“s-units”
Text flow
highlightinglistsfigurestablesformulaeblock quotes
![Page 7: Text Encoding Issues The British Academic Written English (BAWE) project Corpus Linguistics University of Birmingham July 16 th, 2005.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b967f8b9ab0599c3b1a/html5/thumbnails/7.jpg)
Interactive tagging
Tagging by clicking: • graphical interface
• quick tagging
• reduce errors
• impose coherence
![Page 8: Text Encoding Issues The British Academic Written English (BAWE) project Corpus Linguistics University of Birmingham July 16 th, 2005.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b967f8b9ab0599c3b1a/html5/thumbnails/8.jpg)
Interactive tagging
![Page 9: Text Encoding Issues The British Academic Written English (BAWE) project Corpus Linguistics University of Birmingham July 16 th, 2005.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b967f8b9ab0599c3b1a/html5/thumbnails/9.jpg)
What goes into <front> vs. <body>? Example of two first pages:
![Page 10: Text Encoding Issues The British Academic Written English (BAWE) project Corpus Linguistics University of Birmingham July 16 th, 2005.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b967f8b9ab0599c3b1a/html5/thumbnails/10.jpg)
Encoding of example pages
<front><titlePage><docTitle><titlePart type="main">Case Study of the white-throated capuchin monkey (<hi rend="italic">Cebus capucinus</hi>)</titlePart><titlePart>xxx</titlePart></docTitle><figure id="BAWE_3016a-pic1"/></titlePage></front><body>
<front><docTitle><titlePart type="main" rend="underline">Discuss the handling of the discourses of religion and the effects of religious and ethical change in the Victorian period</titlePart></docTitle></front><body>
Anthropology vs. English Studies assignment
![Page 11: Text Encoding Issues The British Academic Written English (BAWE) project Corpus Linguistics University of Birmingham July 16 th, 2005.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b967f8b9ab0599c3b1a/html5/thumbnails/11.jpg)
Formulae
equations (and all kinds of variations of =) chemical formulae arithmetic expressions logical expressions expressions following some other discipline-
specific formalism (e.g. computer code, phonetic transcription etc.)
a part ("term") of any of these (if non-NL)
![Page 12: Text Encoding Issues The British Academic Written English (BAWE) project Corpus Linguistics University of Birmingham July 16 th, 2005.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b967f8b9ab0599c3b1a/html5/thumbnails/12.jpg)
Insert empty <formula> tag anything that has been inserted with the MS formula
editor (appears as a "field");
any complex formal expression, i.e. that cannot be represented as a simple sequence of characters (e.g. fraction, square root)
0 I(∆s) =
Q any formal expression separated typographically from
running text (new paragraph)
![Page 13: Text Encoding Issues The British Academic Written English (BAWE) project Corpus Linguistics University of Birmingham July 16 th, 2005.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b967f8b9ab0599c3b1a/html5/thumbnails/13.jpg)
Example
... The slope of the yield curve can be analysed by looking at the spread between the long-term and the one-period, short-term interest rate, denoted as Sn
t = Rnt – rt. If we manipulate equation 1, the yield spread, Sn
t, can be written as the expectation of a weighted average of future changes in short-term interest rates as follows:
Snt = Et Sn
t *Sn
t * = (1/n) [(n-1)Δrt+1 + (n-2)Δrt+2 + …+ Δrt+(n-1)] [2]
<p><s>...</s> <s>The slope of the yield curve can be analysed by looking at the spread between the long-term and the one-period, short-term interest rate, denoted as S<hi rend="italic"><hi rend="sup">n</hi><hi rend="sub">t</hi></hi> = R<hi rend="italic"><hi rend="sup">n</hi><hi rend="sub">t</hi></hi> – r<hi rend="italic"><hi rend="sub">t</hi></hi>.</s> <s>If we manipulate equation 1, the yield spread, S<hi rend="italic"><hi rend="sup">n</hi><hi rend="sub">t</hi></hi>, can be written as the expectation of a weighted average of future changes in short-term interest rates as follows:</s></p><p><formula notation="" id="EC0001-form2"/></p>
![Page 14: Text Encoding Issues The British Academic Written English (BAWE) project Corpus Linguistics University of Birmingham July 16 th, 2005.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b967f8b9ab0599c3b1a/html5/thumbnails/14.jpg)
Principles of mark-up
1. Keep the structure of the document as close to the original as possible
2. Mark up elements relevant to our research
3. Should be cost effective