One of the aims of our project is to compile a small corpus of plague treatises and descriptions of health resorts (Bäderkunden) from the 15th-17th centuries, which will also be made available for other research interests.The prints have all been digitised by various libraries, but had to be made machine-readable for further processing. To do this, we used the transkribus software in the project, which enables automatic text recognition even for older language levels and different fonts. The results then had to be proofread. We were able to complete this first important step in the middle of last year. We aim to publish the transcriptions soon and make them available for use.
In the next step, we are now in the process of formally annotating the texts: Normalisation, lemmatisation and part of speech. For this, we first used the automatic tagger DTA-CAB from the German text archive (Deutsche Textarchiv). Now we need to correct the annotations again and adapt them to our requirements. We are working with the INCEpTION software, which is designed for corpus annotations (see image). At the moment we are working on normalisation, but we hope to be able to start with the next annotation level soon.
After the formal annotations, we also want to make our own qualitative annotations, which will help us to describe man-usages