Dyslexia Guild Conference 2013 - Online Corpus

download Dyslexia Guild Conference 2013 - Online Corpus

of 65

  • date post

    21-Jan-2015
  • Category

    Education

  • view

    432
  • download

    1

Embed Size (px)

description

Online Corpus: a structured set of texts where information about each word's part of speech (verb, noun, adjective, etc.) is added to the corpus in the form of tags. Dominik Lukes

Transcript of Dyslexia Guild Conference 2013 - Online Corpus

  • 1. dyslexiaaction.org.uk Online Corpus Literacy Teachers Best Friend Dominik Luke http://dominiklukes.net Dyslexia Guild Summer Conference 2013

2. Outline dyslexiaaction.org.uk http://www.flickr.com/photos/adactio/3563832656 What is a corpus Answering questions with a corpus The language of corpus searches The corpus and the classroom Practice 3. Corpus / Corpora dyslexiaaction.org.uk ???? 4. dyslexiaaction.org.uk of about language knowledge http://www.flickr.com/photos/missturner/3029700617/ 5. Prescriptivism dyslexiaaction.org.uk how language should be used Descriptivism how language is used v 6. dyslexiaaction.org.uk Most of the prescriptive rules of the language mavens make no sense on any level. They are bits of folklore that originated for screwball reasons several hundred years ago For as long as they have existed, speakers have flouted them 7. dyslexiaaction.org.uk intellectual abdication should be ashamed current around 1900 a perversion of grammatical education blind to textual evidence even when he himself exhibits it dishonest and stupid vile little compendium of tripe about style Grammarian Geoffrey K Pullum on More passives in Orwell's pompous essay with the warning about how you mustn't use them than in any periodical you can lay your hands on! 8. This usage stuff is not straightforward and easy. If ever someone tells you that the rules of English grammar are simple and logical and you should just learn them and obey them, walk away, because you're getting advice from a fool. http://languagelog.ldc.upenn.edu/nll/?p=2790 9. Corpus dyslexiaaction.org.uk Key modern tool for finding out about how language works 10. Corpus dyslexiaaction.org.uk is a large database of representative language samples 11. Corpus dyslexiaaction.org.uk 100s of millions of words from (mostly) written language in different genres in small samples (~2000 words) 12. Corpus dyslexiaaction.org.uk used for linguistic research, making dictionaries, writing grammars, 13. dyslexiaaction.org.uk 14. Corpora available for teachers dyslexiaaction.org.uk http://corpus.byu.edu 15. Access to COCA and related BYU corpora is free dyslexiaaction.org.uk but free registration required for more than ~10 queries a day 16. dyslexiaaction.org.uk 17. dyslexiaaction.org.uk Brown the grandfather COCA BNC Webcorp Google 18. dyslexiaaction.org.uk 19. dyslexiaaction.org.uk 20. dyslexiaaction.org.uk http://www.flickr.com/photos/atoach/3900591006/ Searching a corpus early on in the process of making a generalization can save you a lot of unpleasant surprises later. 21. How do we use the word dyslexia? We speak more often of dyslexic children than adults. We speak more often of dyslexia than any other dys- word. dyslexiaaction.org.uk 22. Concordance BNC: dyslexic [n*] COCA: dyslexic [n*] http://www.americancorpus.org/ http://corpus.byu.edu/bnc 23. dyslexiaaction.org.uk COCA: dys* 24. Suffixing rules dyslexiaaction.org.uk *yed *ied 25. Suffixing rules dyslexiaaction.org.uk *yed *ied played stayed portrayed enjoyed unemployed surveyed died tried married worried identified applied 26. The Corpus Magic dyslexiaaction.org.uk * [ ] ? Different corpora use slightly different codes. Read the manual. [n* ] 27. The Corpus Magic dyslexiaaction.org.uk * [ ] ? Any one character Any number of characters (incl 0) Lemma (all inflectional forms of a word) Different corpora use slightly different codes. Read the manual. [n* ] Part of speech tags (e.g. nouns) 28. dyslexiaaction.org.uk *each each, reach, beach, teach, outreach, , impeach, teach* teachers, teaching, , teachable, teacher-librarians, t*ch touch, teach, tech, torch, trench, twitch, , three-inch, teach * teach the, teach us, teach students, 29. dyslexiaaction.org.uk ?each reach, beach, teach, peach, leach, keach, each? each- (1), each# (1) [ie nothing] ?each? peachy, bleachy, teacha, reachs (2) [ie spelling error], t?ch tech, tach, toch, tuch, tsch, tich t??ch touch, teach, torch, tisch, 30. [Lemma] dyslexiaaction.org.uk 31. Part of speech tags dyslexiaaction.org.uk [run].[n*] [run] [n*] 32. Common tags dyslexiaaction.org.uk [n*] noun [NN2] plural nouns [v*] verb [VVD] verb past tense [aj*] (BNC) / [j*](COCA) adjective [av*] (BNC) / [r*](COCA) adverb 33. Help dyslexiaaction.org.uk 34. dyslexiaaction.org.uk 35. dyslexiaaction.org.uk 36. You can also dyslexiaaction.org.uk cats and dogs search for idioms ?each*s combine wildcards [=pretty] search for synonyms car|bike|horse search for alternatives used -car exclude searches For more details see: 37. Concordance + KWIC dyslexiaaction.org.uk *ies.[N*] 38. dyslexiaaction.org.uk KWIC Key-Word In Context *ies.[N*] 39. Limit searches by genre dyslexiaaction.org.uk 40. Other questions corpus can answer Are there more nouns or verbs ending in -ies? *ies.[V*] vs. *ies.[N*] Are there four-letter verbs ending in -ed in the present tense? ??ed.[VVB] What are the most common adjectives describing students vs. pupils. [j*] [student] vs. [j*] [pupil] What do we say teachers do most often? [teacher] [vvb] dyslexiaaction.org.uk 41. Corpus, rules, and regularity dyslexiaaction.org.uk http://www.flickr.com/photos/51505078@N00/352492687 pre* *ed *ies.[V*] 42. Collocations Limits on variability dyslexiaaction.org.uk See also Kennedy, p. 80-23 43. Collocations (cont) Limits on variability dyslexiaaction.org.uk See also Kennedy, p. 80-23 44. Collocations (cont) dyslexiaaction.org.uk [teacher] must [v*] 45. Idioms and set phrases dyslexiaaction.org.uk 275 results 359 results 46. Google as a Corpus dyslexiaaction.org.uk "put the search text in quotes" use * for the search item 47. dyslexiaaction.org.uk 48. Google as a Corpus Pros & Cons dyslexiaaction.org.uk PRO: rare, low frequency usage, uptodate usage CON: no sampling, no frequency sort, no genre limit, no part of speech tags 49. Google results counts are only rough estimates dyslexiaaction.org.uk http://searchengineland.com/why-google-cant-count-results-properly-53559 Different people searching in different geographic locations can get different numbers Sometimes searching for A gives fewer results than searching for A without B 50. but Google fights can be fun dyslexiaaction.org.uk 51. WebCorp is makes Google search results linguist-friendly dyslexiaaction.org.uk 52. Avoid Common Corpus Errors dyslexiaaction.org.uk Be aware of limitations: sampling, coverage, size, presence of typos and errors, bad part of speech tagging Beware of low frequency results Beware of homographs Check results come from multiple sources Check KWIC to confirm relevance Limit search by genre http://www.flickr.com/photos/andreassolberg/433734311 53. Check examples and sources dyslexiaaction.org.uk 54. Always check low frequency results dyslexiaaction.org.uk must [v*] [n*] sometimes they come from the same source 55. False roots http://etymonline.com corner, silly, preface, cockroach, protest, stable 56. Make your own corpus with TextSTAT http://neon.niederlandistik.fu-berlin.de/en/textstat 57. Make your own corpus with AntConc dyslexiaaction.org.uk http://www.antlab.sci.waseda.ac.jp/software.html 58. Corpus in the classroom dyslexiaaction.org.uk teacher preparation student discovery 59. Teacher preparation dyslexiaaction.org.uk find relevant, common examples prepare worksheets check for exceptions find out answers to student questions about rules and usage 60. Student discovery dyslexiaaction.org.uk show search results to students to work out rules or word meanings teach students how to search for questions ask students to give each other puzzles for searching 61. For heavy classroom use dyslexiaaction.org.uk register for group access to prevent spam lock out 62. Corpus v dictionary dyslexiaaction.org.uk 63. Non-classroom corpus use dyslexiaaction.org.uk supplement dictionary cross-word puzzles check typical usage when writing 64. Where to go next? dyslexiaaction.org.uk http://www.corpora4learning.net 65. Thank you Contact http://dominiklukes.net dyslexiaaction.org.uk