Analisi del layout documentale

RESEARCH DESCRIPTION

Our research group brings together the expertise of Computer Vision scholars and scholars in the History of Text Transmission to explore the analysis of the layout of Latin manuscripts. At the core of the project are innovative experiments with a few-shot learning classification approach, based on semantic categories such as text, paratext, decoration, summaries, and titles. The goal is to recognize and extract these different classes with increasing precision, enabling the analysis of large quantities of paratextual material in its broadest sense.

Our collaborative experimentation began with the analysis of biblical manuscripts, for which a new open-access dataset has been developed. This dataset will be expanded and extended to include printed texts as part of the project PRIN 2022 PNRR ‘DOBiPS – Data Oriented Biblical Paratext Studies’. The ministerial funding will enable collaboration among the members of the research unit in Udine (PI Emanuela Colombi; Gian Luca Foresti; Laura Pani; Laura Casella) and at the University of Cassino and Lazio meridionale (Roberta Casavecchia; Alessandra Peri; Laboratorio LIBeR)

Further expansions of the dataset and ongoing implementations of the few-shot learning algorithm are dedicated to non-Latin scripts (Greek and Arabic) and Latin manuscripts with irregular and customized layouts, such as sermon collections. This work is being carried out in collaboration with the PASSIM Project (Patristic Sermons in the Middle Ages) based at the Radboud Universiteit Nijmegen (PI Dr. Shari Boodts).

Technological Tools and Methods

Few-shot learning algorithms and semantic classification
Automatic analysis of layout and paratext (text, decorations, titles, summaries)
Development and expansion of open-access manuscript datasets
Support from Computer Vision and image preprocessing techniques