• No results found

University of Groningen The snowball principle for handwritten word-image retrieval van Oosten, Jean-Paul

N/A
N/A
Protected

Academic year: 2021

Share "University of Groningen The snowball principle for handwritten word-image retrieval van Oosten, Jean-Paul"

Copied!
7
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

University of Groningen

The snowball principle for handwritten word-image retrieval

van Oosten, Jean-Paul

DOI:

10.33612/diss.160750597

IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below.

Document Version

Publisher's PDF, also known as Version of record

Publication date: 2021

Link to publication in University of Groningen/UMCG research database

Citation for published version (APA):

van Oosten, J-P. (2021). The snowball principle for handwritten word-image retrieval: The importance of labelled data and humans in the loop. University of Groningen. https://doi.org/10.33612/diss.160750597

Copyright

Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).

Take-down policy

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum.

(2)

The snowball principle for

handwritten word-image

retrieval

The importance of labelled data and humans in the loop

(3)

Cover illustration by Bente van der Graaf

Printed by Proefschriftmaken (www.proefschriftmaken.nl) This research was made possible thanks to the SNN project Target

This research was also in part made possible by the University of Groningen, Slimmer AI and the Human Interface Laboratory at Kyushu University

Special thanks to the National Archive for the use of the collec-tion Kabinet der Koningin

(4)

The snowball principle for

handwritten word-image retrieval

The importance of labelled data and humans in the loop

Proefschrift

ter verkrijging van de graad van doctor aan de Rijksuniversiteit Groningen

op gezag van de

rector magnificus prof. dr. C. Wijmenga en volgens besluit van het College voor Promoties.

De openbare verdediging zal plaatsvinden op vrijdag 26 maart 2021 om 16.15 uur

door

Jean-Paul van Oosten

geboren op 2 september 1984 te Westvoorne

(5)

Promotor

Prof. dr. L.R.B. Schomaker

Beoordelingscommissie

Prof. dr. A.P.J. van den Bosch Prof. dr. M. Biehl

(6)

C O N T E N T S

1 i n t r o d u c t i o n 1

1 From monks to the Monk system . . . 1 2 Human involvement in the handwriting

recogni-tion pipeline and misleading assumprecogni-tions on . . . 5 2.1 Machine learning algorithms . . . 8 2.2 Features . . . 8 2.3 The origin and availability of labels . . . . 9 3 Research questions . . . 11 4 Outline of the thesis . . . 15 2 e x a m i n i n g c o m m o n a s s u m p t i o n s a b o u t t h e

c o n v e r g e n c e o f t h e b au m-welch training al-g o r i t h m f o r h i d d e n m a r k ov m o d e l s 17 1 Introduction . . . 17 2 Method . . . 21 3 Relation between distance to the global optimum

and performance in terms of Maximum Likelihood 24 4 Relation between initial and trained distance . . . 27 5 Training with partially known models . . . 31 6 Implications . . . 32

6.1 How much do we need to ‘push’ a model in the right direction? . . . 33 6.2 Is meta-learning necessary? . . . 35 7 Discussion and conclusion . . . 36 3 a r e e va l uat i o n a n d b e n c h m a r k o f h i d d e n

m a r k ov m o d e l s 41

1 Introduction . . . 41 2 Benchmark . . . 43 3 Learning the topology of a transition matrix . . . 46 4 The importance of temporal modelling . . . 49 5 Discussion . . . 53 4 s e pa r a b i l i t y v e r s u s p r o t o t y p i c a l i t y i n h a n d

-w r i t t e n -w o r d-image retrieval 57 1 Introduction . . . 57

(7)

6 c o n t e n t s

2 Separability versus Prototypicality . . . 61

3 Methods . . . 65

4 Results . . . 70

5 Conclusions . . . 73

5 g e n e r a l d i s c u s s i o n 79 1 Machine learning and representation . . . 79

1.1 Baum-Welch training of HMMs . . . 80

1.2 The relative importance of transition vs. observation probabilities . . . 81

2 Labels . . . 82

3 Loops and snowballs, not pipelines . . . 85

4 Deep learning . . . 86

5 Conclusion . . . 88

appendix 93

summary 97

samenvatting 103

publications by the author 109

bibliography 111

Referenties

GERELATEERDE DOCUMENTEN

Labelled data is important—for both scientific research as well as industry—because it drives the feedback loop and therefore al- lows for improvements in both accuracy and the

Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright

To conclude, this thesis discusses human involvement in the handwriting process from three different angles: In the design of machine learning methods, design of feature

Dit was bestudeerd door data te genereren met een bekend globaal optimum, en veel verschillende modellen te laten trainen op deze gegenereerde data.. Zo konden we de afstand tussen

In Frontiers in Handwriting Recognition (ICFHR), 2012 Interna- tional Conference on, pages

In Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on, pages 537–542.. Segmentation-free word spotting with

Some people I’d like to thank in random order: Frank (for your programming wisdom, the great collaboration on Flexc++ and generally being a good friend), Robert and Marieke (all

Trained hidden Markov models closer to the global optimum do not neces- sarily have a better maximum likelihood estimate (Chapter 2).. Without guidance, it is hard for hidden