Description

This text archive focuses on German political speeches held by top officials mostly from 1990 onwards, selected according to their political relevance. This is work in progress, updated and extended versions will follow. The currently included texts come from the following sources:

Reference

If you use the texts please cite at least one of these references:

Feel free to contact me if you have questions or if you would like to collaborate on this corpus.

Data

Work with the texts

The corpus can be queried online here using a faceted full-text search featuring linguistic annotation:

Appropriate tooling:

Current version (4th release, 2019)

Zenodo badge

The corpus currently includes a total of 6,685 speeches by 71 speakers, spanning a time from 1984 to 2017 and amounting to about 13 million words. The files below consist of texts with metadata encoded in XML format.

Legacy versions (outdated, for reproducibility only)

Visualizations (beta version from 2018)

For maintenance reasons the pages are static: word lists of relevant queries, output in as web pages (CSS/XHTML).

Mentions

The mentions below are updated on a regular basis.

Corpus and Computational Linguistics

History and Political Science

Miscellaneous

Changelog

2019-06-17 4th release: Augmented text base, deduplication and refined metadata.
2018-09-28 Refined speaker metadata and text base for the Chancellery.
2018-08-30 Refined text base and updated visualizations.
2018-05-09 3rd release, updated text archive.
2012-08-03 First part of the (now outdated) code released: https://github.com/adbar/gps-corpus-builder
2012-03-05 2nd version: POS-tags, lemmas, XML TEI, keywords.
2011-12-06 Readme and CC BY-SA license added.
2011-09-08 Better visualizations of the speeches and better formatting.
2011-08-16 Minor bugs corrected.
2011-07-25 First release.