• No results found

Making sense of illustrated handwritten archives

N/A
N/A
Protected

Academic year: 2021

Share "Making sense of illustrated handwritten archives"

Copied!
1
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Making Sense of

Illustrated Handwritten Archives

Andreas Weber*1, Mahya Ameryan*2, Lise Stork*3, Katherine Wolstencroft 3, Eulàlia Gassó Miracle 5, Siegfried Nijssen 3, Marco Wiering 2,

Maarten Heerlien 5, Michiel Thijssen, Marti Huetink 4, Fons Verbeek 3, Aske Plaat 3, Joost Kok 3, Lissa Roberts 1, Jaap van den Herik 6, Lambert Schomaker 2.

INFRASTRUCTURE

PROCESS

QUER

Y

Lexicon & Ontologies

Apply noise removal, binarization and normalization on page images.

Extract regions of interests (ROI) from document images through a geometrical and logical analysis. Technology diagram Process diagram Ontologies Image scan & Lexicon collection www

End users www for labeling RUG

RUG Graphics and Drawing Recognition Naturalis UL RUG UT Naturalis UT User in ter fac e Brill Front end Historians Biologists Text Recognition Monk @ HPC Center UL UL RUG

1 STePS, University of Twente (UT)

2 ALICE, University of Groningen (RUG)

3 LIACS, Leiden University (UL)

4 Brill publishers (Leiden, the Netherlands)

5 Naturalis Biodiversity Center (Naturalis)

6 LCDS, Leiden University (UL)

Publish extracted knowledge as Linked Data. Cross-match enriched results with Naturalis specimen collection databases as well as other cultural heritage resources.

MAKING SENSE realizes a technologically advanced and user-friendly digital infrastructure to open up, enrich and

connect illustrated handwritten archives. It combines both image and textual recognition, and allows for an

integrat-ed study of underexplorintegrat-ed digitizintegrat-ed scientific collections. This approach is applicable across the cultural heritage

domain and is demonstrated using a 17,000 page account of the exploration of the Indonesian Archipelago between

1820 and 1850 (“Natuurkundige Commissie voor Nederlands-Indië”). This poster provides a project overview,

pres-ents the infrastructure’s basic layout and sketches its realization in the period 2016-2020. Funding for this project is

provided by the Netherlands Organization for Scientific Research (NWO) and BRILL publishers.

Preprocessing

Layout Analysis

Text and Picture

Recognition

Integration

Outreach

Recognize page segments and form hypotheses about their content. The historical collec-tion contains text, drawings of animals and plants and tables with numerical data. The challenge is to extract as much information from a scanned image. We will use layout analysis and segmentation to arrive at text and object classification using (deep) machine learning. Already the low-level problem of segmentation requires knowledge.

Identify and construct vocabularies and ontologies that can be used as background knowledge and the formal representation of these resources.

Select background knowledge that can be used to improve the accuracy of the recognition process. Develop algorithms based on probabilistic logic programming to integrate back-ground knowledge, candidate words and candidate images.

Naturalis

Dataset and challenges:

Size: 17,000 pages of scientific exploration of the Indonesian Archipelago 1820 - 1850

Format: Many of the handwritten pages are enriched with drawings, tables, lists.

Languages: German, Latin, Greek, Dutch, French, Malay.

Authors: As the fieldnotes and drawings were composed by 18 different naturalists, they contain a variety of drawing and writing styles and layout structures.

Intra-word connections

Inter-word connections

* These authors have made equal contributions to this poster and the accompany-ing screencast.

UT

General public

For more information, see also: www.brill.com/makingsense

Preprocessing Layout Analysis

Tables Drawings Text Segmentation Candidate images Recognizer Hypothesis P(W | Context)

Context hint’s: e.g. “hepa...” Something

with liver

Current relevant lexicon

Ontologies

WHICH BAT SPECIES WERE COLLECTED AND DRAWN IN JAVA IN THE PERIOD 1820 - 1833?

KINGDOM Animalia

PHYLUM Chordata

CLASS Mammalia

ORDER Chiroptera (bats)

FAMILY Vespertilionidae Vespertilio Pteropodidae Pteropus • Date • Person • Visual features • Species names • Place • ....

Referenties

GERELATEERDE DOCUMENTEN

We consider a system described by the one-dimensional linear wave equation in a bounded domain with appropriate boundary conditions.. To stabilize this system, we propose a

Voor het dijkvak Ser-Lippenspolder en een deel van de athenepolder geldt voor de getildenzone dat voor herstel van de natuurwaarden een constructie gekozen dient te worden die

meer uit I wanueer men de distrikten tot andere dan regcut. schnpsgroepeu brengt en wanneei men de distrikten af7.011derlijk beschouwt. Sommige dier distrikten

We zijn van mening dat het bestuur van onze stichting allereerst zelf verantwoordelijk is voor het besturen van de Voedselbank.. We laten ons in eerste instantie dus leiden door

willen wij als samenleving dat het budget dat voor zorg voor- zien wordt, zo aangewend wordt dat er een overschot is voor het dienstverlenende bedrijf?”.. Het

Bij alarmsignalen, onhoudbare pijn, of als uw klachten na circa 6 tot 12 weken onvoldoende zijn afgenomen, is een operatie om de uitstulping weg te nemen te

Ze ging alles voor hem doen: rekeningen betalen, zijn boekhouding, pff… In die tijd zag ik hem weer minder, want zijn vriendin kwam.. Het was net alsof ik dan niet welkom was

IK ZET ALLES NOG EENS OP EEN RIJTJE ZE HEBBEN ME BEDUVELD VOOR 60 FLAPPEN, ZE HEBBEN M'N BESTE VRIEND GEMOLD EN IK BEN NIET HET SOORT DAT OOK ZIJN ANDERE WANG