Bibliography

[Abney1991] Steven Abney. 1991. Parsing by chunks. Principle-based parsing. Kluwer Academic Publishers.

[Artstein/Poesio 2008] Ron Artstein and Massimo Poesio. 2008. “Inter-coder agreement for computational linguistics”. Computational Linguistics. 34. 4.

[Beagrie 2001] Neil Beagrie. 2001. Preserving UK digital library collections. Program: electronic library and information systems. 35. 3. 215-226.

[Beißwenger et al. 2012] Michael Beißwenger, Maria Ermakova, Alexander Geyken, Lothar Lemnitzer, and Angelika Storrer. to appear. “A TEI Schema for the Representation of Computer-mediated Communication”. Journal of the TEI. 3.

[Bell Labs 1979] . 1979. UNIX™ time-sharing system. UNIX programmer’s manual. 7th edition.

[Bies et al. 1995] Ann Bies, Mark Ferguson, Karen Katz, and Robert MacIntyre. Bracketing Guidelines for Treebank II Style Penn Treebank Project. 1995.

[Boyd et al. 2008] Adriane Boyd, Markus Dickinson, and Detmar Meurers. 2008. “On detecting errors in dependency treebanks”. Research on Language and Computation. 6. 2. 113-137.

[Brants et al. 2002] Sabine Brants, Stefanie Dipper, Silvia Hansen, Wolfgang Lezius, and George Smith. 2002. The TIGER Treebank.

[Burchardt et al. 2006] A. Burchardt, K. Erk, A. Frank, A. Kowalski, and S. Pado. 2006. SALTO – a versatile multi-level annotation tool. Proceedings of LREC'2006, Genoa (IT).

[Chiarcos 2008] Christian Chiarcos. 2008. An ontology of linguistic annotations. LDV Forum (=Journal for Computational Linguistics and Language Technology). 23. . 1-16.

[Chiarcos 2010] Christian Chiarcos. 2010. Grounding an ontology of linguistic annotations in the Data Category Registry. Proceedings of the LREC'2010 workshop on language resource and language technology standards. State of the art, emerging needs, and future developments. Valetta (MT). 37-40.

[Chomsky1965] Noam Chomsky. 1965. Aspects of the theory of syntax. The MIT Press.

[Collins 1997] Michael Collins. 1997. Three generative, lexicalised models for statistical parsing. Proceedings of the 35th annual meeting of the Association for Computational Linguistics (jointly with the 8th conference of the EACL). 16-23.

[Compston1919] Herbert Fuller Bright Compston. The inscription on the stele of Méša, commonly called the Moabite Stone. 1919. Society for Promoting Christian Knowledge.

[Coward/Grimes 2000] David F. Coward and Charles E. Grimes. 2000. Making dictionaries. A guide to lexicography and the Multi-Dictionary Formatter.

[Dipper 2005] Stefanie Dipper. 2005. XML-based stand-off representation and exploitation of multi-level linguistic annotation. Proceedings of Berliner XML Tage 2005 (BXML 2005). Berlin (DE). 39-50.

[Engelberg/Lemnitzer 2009] Stefan Engelberg and Lothar Lemnitzer. 2010. Lexikographie und Wörterbuchbenutzung. 4th edition. Stauffenburg. Tübingen.

[Erk et al. 2003] Katrin Erk, Andrea Kowalski, Sebastian Pado, and Manfred Pinkal. 2003. Towards a resource for lexical semantics. A large German corpus with extensive semantic annotation. Proceedings of ACL 2003. Sapporo (JP). 537–544.

[Erk/Pado 2004] Katrin Erk and Sebastian Pado. 2004. A powerful and versatile XML format for representing role-semantic annotation. Proceedings of LREC 2004.

[Fellbaum 1998] Christiane Fellbaum. 1998. WordNet. An electronic lexical database. MIT Press. Cambridge, MA.

[Fielding 2000] Roy Thomas Fielding . 2000. Architectural styles and the design of network-based software architectures. University of California (dissertation). Irvine.

[Fischer 2005] Steven R. Fisher. 2005. Reaktion Books. A history of writing.

[Furrer/Volk 2011] Lenz Furrer and Martin Volk. 2011. http://aclweb.org/anthology-new/W/W11/W11-4115.pdfReducing OCR errors in Gothic script documents”. 97-103. Proceedings of Workshop on Language Technologies for Digital Humanities and Cultural Heritage (associated with RANLP 2011), Hissar, Bulgaria.

[Garside et al. 1997] Roger Garside, Geoffrey Leech, and Anthony McEnery. 1997. Corpus annotation. Linguistic information from computer text corpora. Addison Wesley Longman.

[Geyken 2007] Alexander Geyken. “The DWDS corpus. A reference corpus for the German language of the 20th century”. 2007. Christiane Fellbaum. Collocations and Idioms. Linguistic, lexicographic, and computational aspects. Continuum. London. 23–41.

[Geyken et al. 2011] Alexander Geyken, Susanne Haaf, Bryan Jurish, Matthias Schulz, Christian Thomas, and Frank Wiegand. 2012. TEI und Textkorpora: Fehlerklassifikation und Qualitätskontrolle vor, während und nach der Texterfassung im Deutschen Textarchiv”. Jahrbuch für Computerphilologie. 9.

[Geyken et al. 2012] Alexander Geyken, Susanne Haaf, and Frank Wiegand. 2012. The DTA ‘base format’: A TEI-Subset for the Compilation of Interoperable Corpora”. Jeremy Jancsary. 11th Conference on Natural Language Processing (KONVENS) – Empirical Methods in Natural Language Processing, Proceedings of the Conference. Schriftenreihe der Österreichischen Gesellschaft für Artificial Intelligence. 5.

[Grosso et al. 2003] Paul Grosso, Eve Maler, Jonathan Marsh, and Norman Walsh. 2003. XPointer Framework. W3C recommendation.

[König et al. 2003] Esther König, Wolfgang Lezius, and Holger Voormann. 2003. TIGERSearch 2.1 user's manual . IMS (Universität Stuttgart).

[Haaf et al. forthcoming] Susanne Haaf, Frank Wiegand, and Alexander Geyken. “Measuring the correctness of double-keying. Error classification and quality control in a large corpus of TEI-annotated historical text”. forthcoming. Journal of the Text Encoding Initiative.

[Heid et al. 2010] Ulrich Heid, Helmut Schmid, Kerstin Eckart, and Erhard Hinrichs. 2010. A corpus representation format for linguistic web services: the D-SPIN text corpus format and its relationship with ISO standards. Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10).

[Hinrichs 2004] E. Hinrichs, S. Kübler, K. Naumann, H. Telljohann, and J. Trushkina. 2004. Recent Developments in Linguistic Annotations of the TüBa-D/Z Treebank.. Proceedings of the Third Workshop on Treebanks and Linguistic Theories (TLT)..

[Hinrichs/Vogel 2010] Erhard Hinrichs and Iris Vogel. 2010. Interoperability and standards . CLARIN D5C-3.

[Holley 2009] Rose Holley. How good can it get? Analysing and improving OCR accuracy in large scale historic newspaper digitisation programs”. 2009. D-Lib Magazine. 14. 3/4.

[Ide 1998] Nancy M. Ide. Corpus encoding standard. SGML guidelines for encoding linguistic corpora. 1998. 463-470. Proceedings of the First International Language Resources and Evaluation Conference, Granada, Spain.

[Ide et al. 2000] Nancy M. Ide, Patrice Bonhomme, and Laurent Romary. “XCES: An XML-based standard for linguistic corpora”. 2000. 825-830. Proceedings of the Second language Resources and Evaluation Conferene (LREC), Athens, 2000. European Language Resources Association (ELRA).

[Ide/Suderman 2007] N. Ide and K. Suderman. 2007. GrAF: A graph-based format for linguistic annotations. Proceedings of the linguistic annotation workshop, held in conjunction with ACL 2007. Praha (CZ). 1-8.

[IPA 1999] Handbook of the International Phonetic Association. A guide to the use of the international phonetic alphabet. 1999. Cambridge University Press.

[ISO 639-3:2007] . 2007. Codes for the representation of names of languages – Part 3: Alpha-3 code for comprehensive coverage of languages.

[ISO 3166-1:2006] . 2006. Codes for the representation of names of countries and their subdivisions – Part 1: Country codes.

[ISO 3166-2:2007] . 2002. Codes for the representation of names of countries and their subdivisions – Part 2: Country subdivision code.

[ISO 12620:2009] . 2009. Terminology and other content and language resources – Specification of data categories and management of a Data Category Registry for language resources.

[ISO 24610-1:2006] . 2006. Language resource management – Feature structures – Part 1: Feature structure representation.

[ISO 24612:2012] . 2012. Language resource management – Linguistic annotation framework (LAF).

[ISO 24613:2008] . 2008. Language resource management – Lexical markup framework (LMF).

[ISO 24615:2010] . 2010. Language resource management – Syntactic annotation framework (SynAF).

[Jurafsky/Martin 2009] Dan Jurafsky and James Martin. 2009. Speech and language processing. Prentice Hall. 2nd edition.

[Kahn/Wilensky 2006] Robert Kahn and Robert Wilensky. 2006. A framework for distributed digital object services. International Journal on Digital Libraries. 6. 2. 115-123.

[Kunze/Lemnitzer 2007] Claudia Kunze and Lothar Lemnitzer. 2007. Computerlexikographie. Eine Einführung. Narr. Tübingen.

[Kupietz et al. 2010] Marc Kupietz, Cyril Belica, Holger Keibel, and Andreas Witt. 2010. The German Reference Corpus DEREKO. A primordial sample for linguistic research”. 1848-1854. Proceedings of the seventh conference on International Language Resources and Evaluation (LREC 2010). European Language Resources Association (ELRA).

[Leech 1993] 1993. Geoffrey Leech. Corpus annotation schemes. Literary and Linguistic Computing. 8. 4. 275-281.

[Lemnitzer/Zinsmeister 2010] Lothar Lemnitzer and Heike Zinsmeister. 2010. Korpuslinguistik. Eine Einführung. Narr. Tübingen. 2.

[Lieberman et al. 2005] Henry Lieberman, Alexander Faaborg, Waseem Daher, and José Espinosa. 2005. How to wreck a nice beach you sing calm incense. Proceedings of the International Conference on Intelligent User Interfaces (IUI 2005).

[Lezius 2002] Wolfgang Lezius. 2002. Ein Suchwerkzeug für syntaktisch annotierte Textkorpora. University of Stuttgart Arbeitspapiere des Instituts für Maschinelle Sprachverarbeitung (AIMS).

[Lezius 2002] Wolfgang Lezius. 2002. TIGERSearch – ein Suchwerkzeug für Baumbanken. Proceedings of Konvens 2002. Saarbrücken (DE).

[Lüdeling/Kytö 2009] Anke Lüdeling and Merja Kytö. 2009. Corpus Linguistics. An International Handbook. 2. de Gruyter. Berlin/New York. Handbooks of Linguistics and Communication Science/Handbücher zur Sprach- und Kommunikationswissenschaft. 29.2.

[Lüngen/Sperberg-McQueen 2012] Harald Lüngen and Michael Sperberg-McQueen. A TEI P5 document grammar for the IDS text model. 2012. Journal of the Text Encoding Initiative. 3.

[MacWhinney 2000] Brian MacWhinney. 2000. The CHILDES project. Tools for analyzing talk. Part 1: The CHAT transcription format. 3rd edition (newer editions available online). Lawrence Erlbaum Associates. Mahwah, NJ.

[Magerman 1994] David M. Magerman. 1994. Natural language parsing as statistical pattern recognition. Doctoral dissertation.

[Martens 2011] Scott Martens. 2011. Quantifying linguistic regularity. Centrum voor Computerlinguïstiek, KU Leuven.

[McEnery/Wilson 2001] Tony McEnery and Andrew Wilson. 2001. Corpus linguistics. An introduction. Edinburgh university press. Edinburgh. 2nd edition. Edinburgh textbooks in empirical linguistics.

[Nartker et al. 2003] Thomas A. Nartker, Kazem Taghva, Ron Young, Julie Borsack, and Allen Condit. 2003. OCR correction based on document level knowledge”. 103-110. Proc. IS&T/SPIE 2003 Intl. Symp. on Electronic Imaging Science and Technology.

[NISO:2004] Understanding Metadata. . 2004.

[Odijk/Toral 2009] Jan Odijk and Antonio Toral. 2009. Existing evaluation and validation of LRs . FlaReNet D5.1.

[Perkuhn et al. 2012] Rainer Perkuhn, Holger Keibel, and Marc Kupietz. 2012. Korpuslinguistik. Fink. Paderborn.

[Porter 1980] Martin F. Porter. 1980. An algorithm for suffix stripping. Program. 14. 3. 130-137.

[Przepiórkowski 2011] Adam Przepiórkowski. 2011. Integration of language resources into web service infrastructure . CLARIN D5R-3b.

[Pytlik Zillig 2009] Brian L. Pytlik Zillig. TEI analytics. Converting documents into a TEI format for cross-collection text analysis”. 2009. 187-192. Literary & Linguistic Computing. 24. 2.

[Riester et al. 2010] Arndt Riester, David Lorenz, and Nina Seemann. 2010. A recursive annotation scheme for referential information status. Proceedings of the 7th International Conference of Language Resources and Evaluation (LREC). Valletta (MT). 717-722.

[Ringersma/Drude/Kemp-Snijders 2010] 2010. J. Ringersma, S. Drude, and M. Kemps-Snijders. Lexicon standards: From de facto standard Toolbox MDF to ISO standard LMF.. Talk presented at LRT standards workshop, Seventh conference on International Language Resources and Evaluation (LREC'2010).

[Romary et al. 2011] Laurent Romary, Amir Zeldes, and Florian Zipser. 2011. <tiger2/> documentation. Draft version as of May 25, 2011.

[Russel/Norvig 2009] Stuart Russell and Peter Norvig. 2009. Artificial intelligence. A modern approach. Prentice Hall.

[Saenger 1997] Paul Saenger. 1997. Stanford University Press. Space between words: the origins of silent reading.

[Santorini 1990] Beatrice Santorini. 1990. Part-of-speech tagging guidelines for the Penn Treebank project. 3rd revision.

[Schiller et al. 1999] Anne Schiller, Simone Teufel, Christine Stöckert, and Christine Thielen. 1999. Guidelines für das Tagging deutscher Textcorpora mit STTS (Kleines und großes Tagset).

[Schmandt-Besserat 1992] Denise Schmandt-Besserat. 1992. University of Texas Press. Before writing.

[Sinclair 1991] John McHardy Sinclair. Corpus, concordance, collocation. 1991. Oxford University Press. Describing English language.

[Sinclair 2005] John Sinclair. 2005. Corpus and text – basic principles. Martin Wynne. Developing linguistic corpora. A guide to good practice. 1–16. Oxbow Books. Oxford.

[Soria/Monacchini 2008] Claudia Soria and Monica Monacchini. 2008. Kyoto-LMF WordNet representation format (version 4). KYOTO working paper WP2/TR2.

[Svensen 2009] Bo Svensen. 2009. A handbook of lexicography. The theory and practice of dictionary-making. Cambridge University Press. Cambridge (UK).

[Tanner et al. 2009] Simon Tanner, Trevor Muñoz, and Pich Hemy Ros. Measuring mass text digitization quality and usefulness. Lessons learned from assessing the OCR accuracy of the British Library’s 19th century online newspaper archive”. 2009. D-Lib Magazine. 15. 7/8.

[TEI P5] , Lou Bournard, and Syd Bauman. TEI P5: Guidelines for electronic text encoding and interchange .

[Thompson/McKelvie 1997] H. S. Thompson and D. McKelvie. 1997. Hyperlink semantics for standoff markup of read-only documents. Proceedings of SGML Europe’97. Barcelona (ES).

[Unsworth 2011] John Unsworth. Computational work with very large text collections. Interoperability, sustainability, and the TEI”. 2011. Journal of the Text Encoding Initiative . 1.

[Windhouwer 2012] Menzo Windhouwer. 2012. RELcat: a Relation Registry for ISOcat data categories. Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12). European Language Resources Association (ELRA).

[Yamada/Matsumoto 2003] Hiroyasu Yamada and Yuji Matsumoto. 2003. Statistical dependency analysis with support vector machines. Proceedings of IWPT 3.

[Zipser/Romary 2010] Florian Zipser and Laurent Romary. 2010. A model oriented approach to the mapping of annotation formats using standards. Proceedings of the Workshop on Language Resource and Language Technology Standards (LREC'2010). Malta (MT).

[Zinsmeister et al. 2008] Heike Zinsmeister, Andreas Witt, Sandra Kübler, and Erhard Hinrichs. 2008. Linguistically annotated corpora. Quality assurance, reusability and sustainability. Anke Lüdeling and Merja Kytö. Corpus linguistics. An international handbook. 1. 759-776. Mouton de Gruyter. Berlin. Handbücher zur Sprach- und Kommunikationswissenschaft.

[Zinsmeister 2010] Heike Zinsmeister. 2010. Korpora. K.-U. Carstensen, Ch. Ebert, C. Ebert, S. Jekat, R. Klabunde, and H. Langer. Computerlinguistik und Sprachtechnologie. Eine Einführung. 3rd edition. 482-491. Spektrum Akademischer Verlag. Heidelberg.