Datasets

The dataset used for the competition is the Uniud - Document Image Analysis DataSet - Bible version (U-DIADS-Bib), a proprietary dataset developed through the collaboration of computer scientists and humanities at the University of Udine.

A full description of the dataset is given in Zottin, S., De Nardin, A., Colombi, E. et al. U-DIADS-Bib: a full and few-shot pixel-precise dataset for document layout analysis of ancient manuscripts. Neural Comput & Applic (2024). https://doi.org/10.1007/s00521-023-09356-5, arXiv.

U-DIADS-Bib consists of 50 uniquecolor page images for each manuscript, for a total of 200 images, saved in JPEG format with a resolution of 1344×2016 pixels. Each page is paired with its respective Ground Truth (GT) data, stored in a PNG image of identical dimensions to the original.

The GTs encompass six distinct, non-overlapping annotated classes, including background, paratext, decoration, text, title, and chapter headings, encoded by RGB value as follows:

RGB(0,0,0) Black: Background
RGB(255,255,0) Yellow: Paratext
RGB(0,255,255) Cyan: Decoration
RGB(255,0,255) Magenta: Main Text
RGB(255,0,0) Red: Title
RGB(0,255,0) Lime: Chapter Headings

Task 1: Few-shot layout segmentation

Train set: 3 instances per manuscript
Validation set: 10 instances per manuscript
Private test set: 30 instances per manuscript (available only after the competition is concluded)
Download: Dataset will be sent privately after registration

Task 2: Many-Shot Layout Segmentation

Train set: 10 instances per manuscript
Validation set: 10 instances per manuscript
Private test set: 30 instances per manuscript (available only after the competition is concluded)
Download: Available on March 12th 2024

Data augmentation and additional datasets

The use of any data augmentation is allowed for this challenge as well as the employment of external, public data for pre-training purposes, however the use of the latter must be clearly stated by the authors at the time of submission.

The alteration of the train-validation-test splits of the dataset provided for the competition is, however, strictly forbidden under penalty of exclusion from the competition. In particular, in Task 1 it's strictly forbidden to use the validation images as a part of the training set!

Evaluation

The evaluation phase involves testing participants' systems on 30 private and unpublished images per manuscript once the competition is finished, and it will be the same for both Tracks. To provide participants with full flexibility to utilize their preferred software libraries and accurately replicate their processing pipeline, the evaluation will take place on Google Colab by running the code submitted by the participants on the instances of our private test set.

Evaluation includes calculating the main semantic segmentation metrics adopted in the literature, including Precision, Recall, F1-score, and Intersection Over Union (IoU). For each metric, the macro average over the semantic classes is calculated so that each of them is given equal importance irrespective of the frequency with which they appear in the dataset.

Metric definitions are reported in Eq.(1)–(4), where TP, FP and FN stand respectively for True Positives, False Positives, and False Negatives.

Precision = (TP) / (TP + FP) (1)

Recall = (TP) / (TP + FN) (2)

F1-Score = 2 × Precision × Recall / (Precision + Recall) (3)

IoU = (TP) / (TP + FP + FN) (4)

These evaluation metrics are calculated individually for each manuscript, so you must submit 4 different model trainings, one for each manuscript. Please be aware that while we will report the results of the submitted systems on all the aformention metrics the final leaderboard will be defined exlusively based on the IoU metric.

We provide the Python script with the evaluation code, which we will use for our final test. Please keep the image names, folders, and directories unchanged. Download: evaluation_code

For both tracks, in case of suspicious results, we reserve the right to retrain the model provided by the participants to validate the reported results.

For this reason, we invite alle the partecipants to write their code with an eye to reproducibility (e.g. explicitely seeding RNG components) as to avoid inconsistencies with the provided results.

Additionally, a brief report (maximum 3 pages) describing the proposed framework should be submitted along with the code.

The winners will be distinguished for the two Tracks.

Submission

By the submission dates of the respective Tracks, participants must send the following via email to the organizers:

The Google Colab notebook containing executable training and evaluation code (including libraries). Note: the code should be set up so that when running the notebook, the reading of the dataset and the folders indicated for the evaluation code happens automatically.
The training of your framework and the segmentation maps of the validation images.
A short report (maximum 3 pages) describing the method and approach, the results for each manuscript, and the average of all manuscripts for the metrics previously outlined.