Digital Linguistics

Band 4:
Bański, Piotr / Heid, Ulrich / Herzberg, Laura (Hrsg.): Harmonizing language data. Standards for linguistic resources. VIII/462 S. - Berlin / Boston: de Gruyter, 2025.
ISBN: 978-3-11-914802-3
Alternatives Medium:
E-Book (PDF). Berlin / Boston: de Gruyter. ISBN: 978-3-11-220821-2    → Open Access

Standards function as safeguards to ensure that data remains interpretable, uniformly queryable, and archivable over time – a critical challenge for digital humanists working with complex linguistic resources. This book provides an overview of essential standards for ensuring the sustainability of data in the Digital Humanities (DH). It addresses the selection of data encoding formats, methods of annotating primary data, and approaches to making resources findable and accessible. The focus is on various forms of linguistic data, such as texts, lexicons, or parallel arrangements (e.g., translations or transcribed recordings). The work explains the role of annotations and metadata in structuring and contextualizing data and examines the influence of diverse data formats, shaped by local academic or industrial practices. In contrast to neural language models, which often yield impressive but opaque results, DH projects aim for transparency, reproducibility, and sustainability. Achieving these goals requires interoperability – the seamless interaction between data and tools. The book demonstrates how clear guidelines and best practices help ensure the long-term usability of data. It offers digital humanists practical approaches and well-founded standards to sustainably archive and efficiently utilize their data, making it an indispensable resource for the field.

Inhaltsverzeichnis

Bański, Piotr / Heid, Ulrich / Herzberg, Laura:
  Towards an optimum degree of order in the field of language resources IDS-Publikationsserver
Verlag
S. 1
Wartena, Christian:
  Character encoding and its importance for text resources S. 17
Romary, Laurent:
  International standards for the identification and the description of languages and their varieties S. 35
Ljubešić, Nikola / Erjavec, Tomaž:
  Part-of-speech tagging and related annotation S. 61
Schwarz, Pia:
  Named entity recognition and entity linking IDS-Publikationsserver
Verlag
S. 89
Ferreira, Vera / Hedeland, Hanna / Neely, Kelsey:
  Annotated audiovisual language data: data quality and data maturity S. 115
Werthmann, Antonina:
  From spoken language data to TEI-based ISO standard IDS-Publikationsserver
Verlag
S. 145
Bański, Piotr / Diewald, Nils:
  Dealing with multiple annotations S. 169
Pisetta, Ines / Trippel, Thorsten:
  Standards and practices for long-term digital archiving S. 201
Lüngen, Harald / Pisetta, Ines:
  Conversion into the archival format I5 S. 229
Trippel, Thorsten:
  Metadata for research data S. 251
Fahad Khan, Anas:
  Linguistic linked (open) data S. 281
Evert, Stephanie / Weber, Timm / Bothe, Steffen / Heinrich, Philipp / Piperski, Alexander:
  Data exploitation: corpus queries S. 303
Frick, Elena / Schmidt, Thomas:
  Querying Spoken Language Data S. 339
Körner, Erik / Eckart, Thomas:
  Accessing linguistic content in distributed research environments S. 377
Kamocki, Paweł:
  Taxonomy of legal and ethical metadata for language resources IDS-Publikationsserver
Verlag
S. 401
Preissner, Annette / Heid, Ulrich:
  The life of an ISO standard S. 427
 
Index S. 447
Author index S. 461