UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl)
Style characterization of machine printed texts
Bagdanov, A.D.
Publication date
2004
Link to publication
Citation for published version (APA):
Bagdanov, A. D. (2004). Style characterization of machine printed texts.
General rights
It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s)
and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open
content license (like Creative Commons).
Disclaimer/Complaints regulations
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please
let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material
inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter
to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You
will be contacted as soon as possible.
Bibliography y
M.. Aiello and A. Smeulders. Thick 2d relations for document understanding.
Informa-tiontion Sciences, 2004. (to appear).
J.. F . Allen. Maintaining knowledge about temporal intervals. Communications of the
ACM,ACM, 26(ll):832-843, November 1983.
R.. Alquézar, A. Sanfeliu, a n d F. Serratosa. Synthesis of function-described graphs. In
Proc.Proc. Joint IAPR Int. Workshops SSPR'98 and SPR'98, number 1451 in LNCS, pages
112-121,, 1998.
O.. Altamura, F . Esposito, and D. Malerba. Transforming paper documents into xml formatt with wisdom-H-. International Journal of Document Analysis and Recognition, 3(2),, 2000.
A.. Antonacopoulos. Page segmentation using the description of the background.
Com-puterputer Vision and Image Understanding, 70(3):350-369, June 1998.
A.. Appiani, F . Cesarini, A. Colla, M. Diligenti, M. Gori, S. Marinai, and G. Soda. Au-tomaticc document classification and indexing in high-volume applications. International
JournalJournal on Document Analysis and Recognition, 4(2), 2002.
A.. D. Bagdanov and M. Worring. Fine-grained document genre classification using first orderr random graphs. In Proceedings of ICDAR2001, pages 79-83, September 2001.
A.. D. Bagdanov and M. Worring. First order gaussian graphs for efficient structure classification.. Pattern Recognition, 36{6):1311-1324, February 2003.
A.. D. Bagdanov and M. Worring. Multi-scale document description using rectangular granulometries.. International Journal of Document Analysis and Recognition (IJDAR), 2003.. To appear.
D.. Bagley. The great computer language shootout. http://www.baglGy.org/doug/ s h o o t o u t / . .
H.. S. Baird. The skew angle of printed documents. In Proceedings of the SPSE 40th
ConferenceConference Symposium on Hybrid Imaging Systems, pages 21-24, 1986.
H.. S. Baird. Model-directed document image analysis. In Proceedings of the Symposium
onon Document Image Understanding Technology, 1999.
H.. S. Baird. State of the art of document image degradation modeling. In Proceedings of
thethe 4th IAPR Workshop on Document Analysis Systems (DAS 2000), December 2000.
invitedd plenary talk.
F.. Bapst and R. Ingold. Using typography in document image analysis. In Proceedings
ofof Electronic Publishing, Artistic Imaging, and Digital Typography, pages 240-251,1997.
S.. Batman, E. R. Dougherty, and F. Sand. Heterogeneous morphological granulometries.
PatternPattern Recognition, 33:1047-1057, 2000.
T.. Breuel. Layout analysis by exploring the space of segmentation parameters. In
ProceedingsProceedings of the Fourth International Workshop on Document Analysis Systems (DAS'2000),(DAS'2000), 2000.
[17]] R. Brugger, Z. A, and R. Ingold. Modeling documents for structure recognition using generalizedd n-grams. In Proceedings of the fourth International Conference Document
AnalysisAnalysis and Recognition (ICDAR '97), August 1997.
[18]] H. Bunke. Recent developments in graph matching. In ICPROO, pages Vol II: 117-124, 20Ö0. .
[19]] R. Cattoni, T. Cüiani2, S. Messelodi, and C. M. Modena. Geometric layout analysis techniquess for document image understanding: a review. Technical report, IRST, 1998
[20]] F. Cesaririi, E. Francesconi, M. Gori, and G. Soda. A two level knowledge approach for understandingg documents of a multi-class domain. In The Proceedings of the
Interna-tionaltional Conference on Document Analysis and Recognition, pages 135-138, 1999.
[21]] E. Chailloux, P. Manoury, and B. Pagano. Developpement d'applications avec Objective
Caml.Caml. O'Reilly, 2000. (an English translation is available here; h t t p : / / c a m l . i n r i a .
ff r / o r e i l l y - b o o k / ) .
[22]] D. Chandler. Semiotics: The Basics. Routledge, 2001.
[23]] G. Cousineau and M. Mauny. The Functional Approach to Programming. Cambridge Universityy Press, 1998,
[24]] N. Damera-Venkata, B. L. Evans, , and V. Monga. Color error diffusion halftoning.
IEEEIEEE Signal Processing Magazine, 20(4):51-58, 2003.
[25]] M. de Berg, M. van Kreveld, M. Overmars, and O. Schwarzkopf. Computational
Geom-etry,etry, Algorithms and Applications. Springer Verlag, 1997.
[26]] A. Dengel, R. Bléisinger, F. Hein, R. Hoeh, and F . Hones. Offkemaid-a system for officee mail analysis, interpretation and delivery. In Proceedings of First International
WorkshopWorkshop on Document Analysis Systems, pages 253—275, 1994.
[27]] A. Dengel and F . DubieL Clustering and classification of document structure - a machine learningg approach. In Proceedings of the third International Conference on Document
AnalysisAnalysis and Recognition, August 1995.
[28]] A. Dengel and F. Dubiel. Computer understanding of document structure. International
JournalJournal of Imaging Systems and Technology, 7, 1996.
[29]] M- Diligenti, P. Frasconi, and M. Gori. Hidden tree inarkov models for document imagee classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(4):519-523,, April 2003.
[30]] D. Doermann, A. Rosenfeld, and E. Rivlen. The function of documents. In Proceedings
ofof ICDAR97, 1997.
[31]] D, S. Doermann. The indexing and retrieval of document images: A survey. Computer
VisionVision and Image Understanding, 70(3), June 1998.
[32]] E. R. Dougherty. Parallels between granulometric and fourier transforms. In Proceedings
ofof the 1997 IEEE Workshop on Nonlinear Signal and Image Processing,
1997-[33]] E. R. Dougherty, J. Pelz, F. Sand, and A. Lent. Morphological image segmentation by locall granulometric size distributions. J. Electronic Imaging, 1, 1992.
[34]] F . Esposito, D. Maïerba, and F . Lisi. Machine learning for intelligent processing of printedd documents. Journal of Intelligent Information Systems, 14{2/3):175-198, 2000.
[35]] J. M. Geusebroek, R. van den Boomgaard, A. W. M. Smeulders, and H. Geerts. Color invariance.. IEEE Trans. Pattern Anal Machine Intell, 23(12): 1338-1350, 2001.
[36]] M. Gondran and M. Minoux. Graphs and Algorithms. Wiley & Sons, 1984.
[37]] C, Gratin, J. Vitria, F. Moreso, and D. Serón. Texture classification using neural networkss and local granulometries. In J. Serra and P. Soille, editors, Mathematical
morphologymorphology and its applications to image processing, pages 309-316. Kluwer Academic
Publishers,, 1994.
[38]] Project gutenberg. http://vww.promo.net/pg/.
[39]] P. Haffner, Y. L. Cun, L. Bottou, P. Howard, and P. Vincent. Color documents on the webb with djvu. In Proceedings of the International Conference on Image Processing, pagess 239-243,
1999-[40]] Halftoning Toolbox for MATLAB, http://www.eca.utexas.edu/~tevans/projects/
halftoning/toolbox/index.htmXl. .
[41]] M. R. Hansen and H. Risehel. Introduction to Programming using SML. Addison-Wesley, 1999. .
[42]] R. M. Haralick, P. L. Katz, and E. R. Dougherty. Model-based morphology: the opening spectrum.. Graphical Models and Image Processing, 57(1):1-12, 1995.
[43]] H. Hase, T. Shinokawa, M. Yoneda, M. Sakai, and H. Maruyama. Character string extractionn by multi-stage relaxation. In Proceedings of Fourth International Conference
onon Document Analysis and Recognition, August 1997.
[44]] H. J. A. M. Heijmans. Morphological Image Operators. Academic Press, Boston, 1994.
[45]] T. K. Ho. Fast identification of stop words for font learning and keyword spotting. In
ProceedingsProceedings of the 5th Int'l Conference on Document Analysis and Recognition, pages
333-336,, Bangalore, India, September 1999.
[46]] J. Hu, R. Kashi, and G. Wilfong. Comparison and classification of documents based on layoutt similarity. Information Retrieval, 2(2/3):227-243, 2000.
[47]] J. J. Hull. Document image similarity and equivalence detection. International Journal
onon Document Analysis and Recognition, l(l):37-42, 1998.
[48]] Y. Ishitani. Document layout analysis based om emergent computation. In The
Pro-ceedingsceedings of the International Conference on Document Analysis and Recognition, 1997.
[49]] The Image understanding Environment QUE). http://www.aai.com/AAI/IUE/IUE.
html. .
[50]] S. P. Jones. Haskell 98 Language and Libraries. Cambridge University Press, 2003.
[51]] J. Kanai and A. D. Bagdanov. Projection profile based skew estimation algorithm.
[52]] T. Kanungo, R. M. Haralick, H. S. Baird, W. Stuezle, and D. Madigan. A statistical, nonparametricc methodology for document degradation model validation. IEEE
Trans-actionsactions on Pattern Analysis and Machine Intelligence, 2000.
[53]] E. Kavallieratou, N. Fakotakis, and G, Kokkinakis. Skew angle estimation for printed andd handwritten documents using the wigner-ville distribution. Image and Vision
Com-puting,puting, 20(ll):813-824, 2002.
[54]] H.-Y. Kim and J. H. Kim. Hierarchical random graph representation of handwritten characterss and its application to hangul recognition. Pattern Recognition, (34):187-201, Februaryy 2001.
[55]] K. Kise, M. Iwata, and K. Matsumoto. On the application of voronoi diagrams to page segmentation.. In DLIA '99, 1999.
[56]] K. Kise, W. Yin, and K. Matsumoto. Document image retrieval based on 2d density distributionss of terms with pseudo relevance feedback. In Proe. of the Sixth International
ConferenceConference on Document Analysis and Recognition (ICDAR 2001), pages 488-492, 2Ö01.
[57]] B. Klein, S- Agne, and A. D. Bagdanov. Understanding document analysis and under-standingg (through modeling). In Proceedings of ICDAR2003, pages 1218-1223, August 2003. .
[58]] D. Koelma. The horus C++ reference, h t t p : / / w w w . s c i e n c e . u v a . n l / ~ h o n i s / .
[59]] D. Koelma. A Software Environment for Image Interpretation. PhD thesis, University off Amsterdam, 1996.
[60]] J. J. Koenderink. The structure of images. Biological Cybernetics, (50):363-370, 1984.
[61]] J. J. Koenderink and A. J. van Doorn. Representation of local geometry in the visual system.. Biological Cybernetics, 30:383-396, 1987.
[62]] K. Konstantinides and J. Rasure. The khorus software development environment for imagee and signal processing. IEEE Transactions on Image Processing, 3(3):243-252, 1994. .
[63]] G. E. Kopec and P. A. Chou. Document image decoding using markov source models.
IEEEIEEE Transactions on Pattern Analysis and Machine Intelligence, 16(6), June 1994.
[64]] U. Köthe. Reusable software in computer vision. In Handbook of Computer Vision and
Applications,Applications, volume 3, pages 103-132. Academic Press, 1999.
[65]] S. Kullback. Information Theory and Statistics. John Wiley, 1959.
[66]] V. Lavrenko, T, M.Rath, and R.Manmatha. Holistic word recognition for handwritten historicall documents. In Proceedings of the first International Workshop on Document
ImageImage Analysis for Libraries (DIAL'04), pages 328-335, 2004.
[67]] D. Lee and J. J. Hull. Detecting duplicates among symbolically compressed images in aa large document database. Pattern Recognition Letters, 22(5):545-550, 2001.
[68]] K.-H. Lee, Y.-C. Choy, and S.-B. Cho. Logical structure analysis and generation for structuredd documents: A syntactic approach. IEEE Transactions on Knowledge and
[69]] J. Liang and D. S. Doermann. Logical labeling of document images using layout graph matchingg with adaptive learning. In Proceedings of the $th I APR Workshop on
Docu-mentment Analysis Systems (DAS 2002), pages 224-235, 2002,
[70]] Y. Lu and C. L. Tan. Word spotting in chinese document images without layout analy-sis.. In Proceedings of the 16th International Conference on Pattern Recognition (ICPR
2002),2002), volume 3, 2002.
[71]] Lush: Lisp Universal Shell. h t t p : / / l u s h . s Q T i r c f e f o r g e . n e t / .
[72]] P. Maragos. Pattern spectrum and multiscale shape representation. IEEE Transactions
onon Pattern Analysis and Machine Intelligence, 11:701-716, 1989.
[73]] G. Matheron. Random Sets and Integral Geometry. John Wiley & Sons, New York, 1975. .
[74]] L. Miclet. Structural Methods in Pattern Recognition. North Oxford, 1986.
[75]] R. Milner, M. Tofte, R. Harper, and D. B. MacQueen. The Standard ML Programming
LanguageLanguage (Revised), MIT Press, 1997.
[76]] S. K. Murthy, S. Kasif, and S. Salzberg. A system for induction of oblique decision trees.. Journal of Artificial Intelligence Research, 2, August 1994.
[77]] G. Nagy. Twenty years of document image analysis in pami. IEEE Transactions on
PatternPattern Analysis and Machine Intelligence, 22(1), January 2000.
[78]] G. Nagy and S. Seth. Hierarchical representation of optically scanned documents. In
ProceedingsProceedings of the seventh International Conference on Pattern Recognition, pages
347-349,, 1983.
[79]] G. Nagy and M. Viswanathan. Dual representation of segmented technical documents. In
ProceedingsProceedings of the first International Conference on Document Analysis and Recognition,
pagess 141-151, 1991.
[80]] W. Niblack. An Introduction to Digital Image Processing. Prentice Hall, 1986.
[81]] The OCaml Programming Language, http://www.ocaial.org/.
[82]] L. O'Gorman. The document spectrum for page layout analysis. IEEE Trans. Pattern
AnalysisAnalysis and Machine Intelligence, 15(11):1162-1173, 1993.
[83]] N. Otsu. A threshold selection method from gray-level histograms. IEEE Trans. Syst,
Man,Man, Cybern., 9(l):62-66, 1979.
[84]] P. Palumbo, S. Srihari, J. Soh, R. Sridhar, and V. Demjanenko. Postal address block locationn in real time. Computer, 25(7):34-42, July 1992.
[85]] L. C. Paulson. ML for the Working Programmer (2nd edition). Cambridge University Press,, 1996.
[86]] T. Pavlidis. Structural Pattern Recognition. Springer-Verlag, 1977.
[87]] H. Peng, F . Long, and Z. Chi. Document image recognition based on template matching off component block projections. IEEE Transactions on Pattern Analysis and Machine