Vai al contenuto

Layout Document Analysis

Research Description

Our research group brings together the expertise of Computer Vision scholars and scholars in the History of Text Transmission to explore the analysis of the layout of Latin manuscripts. At the core of the project are innovative experiments with a few-shot learning classification approach, based on semantic categories such as text, paratext, decoration, summaries, and titles. The goal is to recognize and extract these different classes with increasing precision, enabling the analysis of large quantities of paratextual material in its broadest sense.

Our collaborative experimentation began with the analysis of biblical manuscripts, for which a new open-access dataset has been developed. This dataset will be expanded and extended to include printed texts as part of the project PRIN 2022 PNRR ‘DOBiPS – Data Oriented Biblical Paratext Studies’. The ministerial funding will enable collaboration among the members of the research unit in Udine (PI Emanuela Colombi; Gian Luca Foresti; Laura Pani; Laura Casella) and at the University of Cassino and Lazio meridionale (Roberta Casavecchia; Alessandra Peri; Laboratorio LIBeR)

Further expansions of the dataset and ongoing implementations of the few-shot learning algorithm are dedicated to non-Latin scripts (Greek and Arabic) and Latin manuscripts with irregular and customized layouts, such as sermon collections. This work is being carried out in collaboration with the PASSIM Project (Patristic Sermons in the Middle Ages) based at the Radboud Universiteit Nijmegen (PI Dr. Shari Boodts).

Publications

Silvia Zottin, Axel De Nardin, Emanuela Colombi, Claudio Piciarelli, Filippo Pavan, Gian Luca Foresti: U-DIADS-Bib: a Full and Few-Shot Pixel-Precise Dataset for Document Layout Analysis of Ancient Manuscripts. Neural Computing and Applications (NCAA) 2023.

Axel De Nardin, Silvia Zottin, Claudio Piciarelli, Emanuela Colombi, Gian Luca Foresti: Pixel-Precise Document Layout Segmentation Via Dynamic Instance Generation and Local Thresholding. International Journal of Neural Systems 2023, doi 10.1142/S0129065723500521

Axel De Nardin, Silvia Zottin, Matteo Paier, Gian Luca Foresti, Emanuela Colombi, Claudio Piciarelli: Efficient Few-Shot Learning for Pixel-Precise Handwritten Document Layout Analysis. WACV 2023: 3669-3677

Axel De Nardin, Silvia Zottin, Matteo Paier, Gian Luca Foresti, Emanuela Colombi, Claudio Piciarelli: Dynamic Instance Generation for Few-Shot Handwritten Document Layout Segmentation. AI4CH@AI*IA 2022: 26-34.

PROJECT TEAM