Methodology for reconstruction of 3D Building Models using 3D-CityJSON and Floor Plans

(1)

METHODOLOGY FOR

RECONSTRUCTION OF 3D BUILDING MODELS USING 3D-CITYJSON AND FLOOR PLANS

R.G. Kippers (Richard)

MSc Computer Science,

Data Science & Technology specialization

SUPERVISORS:

dr. ir. M. Van Keulen (EEMCS)

dr. M.N. Koeva (ITC)

(2)

Preface

This thesis is the end product of my master study Computer Science, specialization Data Science and Technology at the University of Twente. During this project, I designed and evaluated a methodology for reconstruction of 3D buildings using CityJSON and ﬂoor plan images. When I started searching for a research subject, I knew I wanted a topic related to built environment and 3D models/digital twins. Many of my friends work in the construction industry and already excited me with the possibilities of 3D building models. Also, during my internship I learned the beneﬁts that 3D models can have on the quality of built environment. There is already a large amount of unstructured data (sometimes even publicly) available for buildings, and as a data science student I found a possible integration of information that seemed like an interesting research project.

I would like to thank Sander Oude Elberink, who gave me the opportunity to write a re- search proposal for this project. I also would like to thank the supervisors Mila Koeva and Maurice van Keulen for taking their time to regularly have meetings, providing me with valuable feedback, and opening their network so I was able to ask questions to ex- perts. I am happy with the support of BIM4ALL B.V., which provided me with evaluation data and helped with technical 3D related questions. Finally, I want to thank the munic- ipality of Rijssen-Holten for providing data so I was able to create a real-life evaluation case.

I hope you will enjoy reading this thesis and are able to gain new insights. Please do not hesitate to contact me if you have further questions or feedback regarding this research.

Richard Kippers

Rijssen, July 14, 2021

(3)

Abstract

In the past decade, a lot of effort is put into applying digital innovations to building life cycles (planning, construction, operation, renovation and demolition) (Ngwepe & Aig- bavboa, 2015). 3D Models have been proven to be efﬁcient for decision making, scenario simulation and 3D data analysis during this life cycle (Rajat Agarwal & Sridhar, 2016).

Creating such digital representation of a building can be a labour-intensive task, depend- ing on the desired scale and level of detail (LOD). This research aims at creating a new automatic deep learning based method for building model reconstruction. It combines exterior and interior data sources: 1) 3D BAG, the ﬁrst fully automatically generated 3D building data set with level of detail 2.2

¹

, 2) archived ﬂoor plan images (e.g. scanned or exported from CAD software). To reconstruct 3D building models from the two data sources, an innovative combination of methods is proposed.

In order to obtain the information needed from the floor plan images (walls, openings and labels), deep learning techniques have been used. In addition, post-processing tech- niques are introduced to transform the data in the required format. In order to fuse the extracted 2D data and the 3D exterior, a data fusion process is introduced. This process finds the optimal rigid transformation for the floor plan outline onto the 3D CityJSON ex- terior and transforms floor plan objects according to a reference table. The final building object, consisting of floor planes, interior and exterior walls including openings, and roof is stored as a CityJSON type Building object.

The method output data has been evaluated in two ways. First, the method output is compared with the a corresponding BIM model, which functions as ground truth. Sec- ondly, the method output data is evaluated for a real life use-case, namely to calculate living space areas.

It was found that the method proposed in this paper works well for easier, smaller build- ings. The floor plan data extraction method does not work well on larger and more complex buildings, mostly due to the lack of representative training data. For simpler buildings (terraced or smaller detached houses), the method works well and produces promising output data. From the literature review, no prior research on automatic inte- gration of CityGML/JSON and floor plan images has been found. Therefore, this method is a first approach on this data integration. Having precise linked 3D data provides op- portunity for detailed analysis, such as improved area calculation, facility management, urban planning and energy simulation.

13D BAG, 3D BK TU Delft -https://3dbag.nl/

(4)

1 Introduction 6

1.1 Research Motivation . . . 6

1.2 Problem Context . . . . 7

1.2.1 BIM and GIS . . . 8

1.2.2 3D City Models and building analysis . . . 8

1.2.3 Scientiﬁc Gap . . . 9

1.2.4 Case: Calculating living space area for the municipality . . . 10

1.3 Problem Statement . . . 10

1.4 Research goals . . . 11

1.4.1 Sub-objectives . . . 11

1.5 Research Questions . . . 11

1.6 Conceptual Framework . . . 12

2 Background 13 2.1 CityJSON . . . 13

2.2 Input ﬂoor plans . . . 13

2.3 Floor plan image processing . . . 13

2.4 Maximum polygon overlap . . . 15

2.5 Predicting ﬂoor level height and other structural properties . . . 16

2.6 Data quality and standards . . . 16

3 Research Methodology 18 3.1 Methodology . . . 18

3.2 Global Design . . . 19

(5)

4 RQ 1A: Floor plan parsing 20

4.1 Data sources . . . 20

4.2 Walls . . . 20

4.2.1 Deep Learning Architectures . . . 20

4.2.2 Training . . . 22

4.2.3 Evaluation . . . 22

4.2.4 Post-processing . . . 23

4.3 Openings . . . 25

4.3.1 Deep Learning Architectures . . . 25

4.3.2 Training . . . 25

4.3.3 Evaluation . . . 26

4.4 Room Type . . . 28

4.5 Merging the extracted data . . . 28

4.6 Floor plan parsing results . . . 29

5 RQ 1B: Consistent merging method 38 5.1 Data sources . . . 38

5.2 Obtain 3D BAG exterior and NWB Roads . . . 38

5.3 Polygon transformation . . . 38

5.4 Solving area inconsistencies . . . 40

5.5 Resize objects to real-life scale . . . 41

5.6 3D CityJSON model generation . . . 41

6 RQ 2: Method evaluation 43

6.1 Data sources . . . 43

(6)

6.2 RQ 2A: Comparison with ground truth . . . 43

6.2.1 BIM Building Comparison 1 . . . 44

6.2.2 BIM Building Comparison 2 . . . 48

6.3 RQ 2B: Comparison with the method used in the municipality for living space area calculation . . . 52

7 Discussion 54 7.1 Semantic segmentation for detecting walls on floor plan images validation 54 7.2 Object detection for finding openings on floor plan images validation . . . 55

7.3 Floor plan parsing . . . 55

7.4 Method output and BIM models . . . 56

7.5 Method output and municipality . . . 56

7.6 Scalability and costs . . . 57

8 Conclusion 58 8.1 Limitations . . . 59

8.2 Future work . . . 60

References 61

(7)

1 Introduction

1.1 Research Motivation

Buildings have a signiﬁcant role in our daily lives. On average, Europeans spend about 90% of their time indoors (European Union, 2003) and 38% of all CO2 pollution is related to buildings or their construction (United Nations, 2020). Therefore, a lot of effort is put into improving the building life cycle, which consists of the following stages: planning, construction, operations, renovation and demolition (Ngwepe & Aigbavboa, 2015). 3D Building models have been proven to be efﬁcient for decision making, scenario simu- lation and 3D data analyis during this life cycle (Rajat Agarwal & Sridhar, 2016). Two domains that use 3D building models are 3D city modelling (as part of Geographic in- formation system, GIS) and Building Information Modelling (BIM).

BIM is mainly used during the ﬁrst two phases of a building lifecycle, speciﬁcally plan- ning and construction (Volk, Stengel, & Schultmann, 2014). It is able to improve buildings due to better collaboration, designing and documentation. 3D City models or so-called

"digital twins", which are representations of reality, are often used once a building is con- structed. They can be of great value for a variety of applications. Research by A Biljecki et al (Biljecki F & A, 2015) documented the use cases of 3D city models across mul- tiple domains. Among the numerous examples of applications of 3D city models are urban related studies, energy demand estimation and simulations, risk analysis, urban inventory, facility management, and urban planning.

Figure 1: Levels of Detail in CityGML (J. S. Biljecki H. Ledoux, 2016)

Creating digital representations of a building can be a labour-intensive task, depending

on the desired scale and level of detail (LOD). Biljecki (J. S. Biljecki H. Ledoux, 2016)

deﬁned 5 generic LODs for 3D city modelling. Example illustrations of these models are

visible in Figure 1. The lowest level, LOD 0 is a 2D shape of the building outline, and the

highest level, LOD 4, is a detailed indoor and outdoor 3D object with corresponding se-

mantics. The different CityGML LOD levels are divided in sub-levels (LOD x.0- LOD x.3)

(J. S. Biljecki H. Ledoux, 2016), of which the variations for LOD 2 are shown in Figure

2. City models with a higher level of detail can be used for a wider variety of applica-

tions, but are more time-consuming to be created. Therefore, automatic reconstruction

(8)

is an important part of the 3D city modeling domain. Researchers have done numerous experiments and recent results indicate that LOD 2 automatic reconstruction is con- ceivable. There are variety of methods available for automatic reconstruction, most of them using (publicly available) aerial point cloud data and cadastre data (Amiranti, 2020;

Balázs Dukai1, 2020; Sander Oude Elberink & Commandeur, 2013). However, LOD 3 and LOD 4 are difficult to obtain automatically and are often semi-automatically modelled or parsed from an existing BIM models (Blaauboer et al., 2012). These BIM models are usually created manually by highly qualified BIM engineers in during the construction phase, or afterwards in the implementation phase. Very often they are using reference data like point clouds (Tzedaki & Kamara, 2013). Therefore, in practice, BIM models are only created on smaller scales, such as individual buildings, housing blocks, or cam- puses. This leaves a gap between the two most common applications: large-scale, low detail GIS data and small-scale, high detail BIM models. Within this research we aim at filling this gap.

Figure 2: More speciﬁc LOD 2 levels (J. S. Biljecki H. Ledoux, 2016)

This research is focused in the Netherlands, where it is mandatory to provide a building specification to the local government before starting a construction, or a large renova- tion of a building. Part of this specification is a 2D floor plan. Therefore, many organisa- tions, such as housing associations, local governments and private real estate owners do already have archived 2D floor plans, often in PDF format. In spring 2021, a coun- trywide LOD 2.2 3D Building Registration is published for the whole Netherlands, called 3D BAG

²

. This is the ﬁrst fully automatically generated 3D building data set with level of detail 2.2. The existence of these two data sources gives the opportunity to extend the research by Boeters (Boeters, 2015) to enrich LOD 2.2 data sets with indoor seman- tics and geometry. This could be beneﬁcial for several use cases, of which a few are mentioned in this research.

1.2 Problem Context

This section will elaborate on the context in which this research takes place. In the first two subsections, relevant scientific developments in the domain context (BIM, GIS, 3D city models and building analysis) are described. The next subsection highlights the gap in the current scientific work, and the last subsection explains a possible use case for the newly developed and proposed methodology.

2

(9)

1.2.1 BIM and GIS

3D representations of buildings are getting more attention (Google, n.d.), and are used for a wide range of applications (Biljecki F & A, 2015). 3D building objects do exist in both the BIM and GIS domain, but are used from a different perspective (Herle, 2020).

From the BIM perspective, the most used file exchange format is Industry Foundation Classes (IFC) (Laat, 2010). This format is an ISO open standard and is designed as an interoperability communication standard between different BIM software applications such as Autodesk Revit, Graphisoft ArchiCAD, and Solibri Model Checker. Buildings are modelled in standard hierarchies, and elements are classified in standard methods such as SfB ( SfB BASIC CLASSIFICATION TABLES , n.d.). BIM models are by default highly de- tailed and often represent a physical building with a small error margin. This margin depends on the project agreement, and can be as small as 1 mm. As in city modelling, BIM models do have different levels. However, they are called Levels of Development, and share the same abbreviation LOD with Level of Detail. In the case of Level of De- velopment, number ranges from 100 (conceptual) to 500 (as-built, verified) and relate to the development phase of the project (Trimble, 2018). The IFC format supports ob- jects to be positioned on a coordinate reference system. However, this is not common practice in reality (Gilbert, 2020). Hence, when only looking at an IFC file, its position on the world can not be determined. IFC data can have relations with other data sources, using semantic web technology (BIM linked data). In addition, it is possible to store BIM model data in databases such as BIMserver. However, this is not a standard practice (Beetz, 2010). From a GIS perspective, most data is not stored in files, but in spatial databases (Laat, 2010), like the open-source PostGIS database, or alternatives. GIS is often used on a larger scale, with a lower granularity, and provides variety of opportuni- ties for geolocation, analysis, simulations and documentation (Laat, 2010). A standard format of representing urban elements such as buildings is CityGML. This format is an open standard by the Open Geospatial Consortium (Gröger, Kolbe, Nagel, & Häfele, 2012). CityGML enables modelling of semantic and geometric data in a hierarchical way. Buildings can be defined in different level of details, of which an example can be seen above in figure 1. Even at LOD 4, the building is not as semantically rich as in BIM models (Gilbert, 2020). This data can be added using the CityGML GeoBIM extension (Laat, 2010). Scientific work for integrating BIM and GIS data is mostly focussed on data/file conversion, semantic mapping or both (et al, 2017). A recent development of CityJSON, a new encoding for the CityGML data model, which is 6 times more compact then CityGML (Ledoux et al., 2019).

1.2.2 3D City Models and building analysis

A 3D city model is a virtual spatial and semantic representation of objects in a certain

area (Kolbe, 2009). A city model can include buildings, infrastructure, natural and man

(10)

made objects. Objects are often sourced by type, and can sometimes be obtained au- tomatically, such as for building footprints and boundary properties from data sources e.g. cadastre databases or aerial images (Amiranti, 2020; Balázs Dukai1, 2020; Sander Oude Elberink & Commandeur, 2013).

The open database technology for 3D city models used in most scientiﬁc literature is 3DCityDB (Yao et al., 2018). This is an extension to the open-source PostGIS database and can be extended with the 3DCityDB-Web-Map client, to visualize city objects in a webbrowser. There are a few open demo datasets for 3DCityDB available

³

. A proprietary alternative is the application CityEngine, developed by the software vendor ESRI. Some examples for successful 3D city model application are in the ﬁeld of solar irradiation, energy demand estimation, urban planning, facility management, emergency response, simulations regarding climate change, determination of ﬂoor space etc. (Biljecki F & A, 2015). The application of 3D city models involves many stakeholders like (local) gov- ernments, emergency responders, city planners, and private companies. These applica- tions mostly utilize LOD < 3 building objects.

For BIM models, apart from designing and construction, there are also applications on the re-use of information. The information can for example be re-used for facility management purposes. Current research (Moretti, Xie, Merino, Brazauskas, & Parlikad, 2020) describes how BIM data, together with IoT integration can be used to create dig- ital twins. These digital twins can be used for performing different analysis, such as anomaly detection and other pro-active monitoring applications. BIM data can also be used during the last phase of a building lifecycle which is demolition. It can not only be used for planning the demolition, but also for re-allocating building materials for a circular economy (Druijff, 2019). Several techniques for BIM data analysis do exist, e.g.

the BIM Big Data System Architecture for Asset Management (Karim, 2017), BIMserver (Beetz, 2010), and IfcOpenShell (IfcOpenShell, 2014). A framework proposed by Hijazi (Hijazi1, 2020) combines BIMserver and 3DCityDB into 3DCityDB4BIM, which is a plat- form for analysing BIM models in a 3D city model context.

1.2.3 Scientiﬁc Gap

Research by Boeters (Boeters, 2015) introduced LOD2+, which is an extension to City GML 2, to support ﬂoor levels. In this paper, a method is introduced to estimate heights and include ﬂoor surfaces to the existing CityGML LOD 2 objects. Other relevant re- search is the current on-going research project, iNous

⁴

(indoor/outdoor spatial data ser- vices), funded by the Korean government. This project is focused on overlapping areas with this research. First, they try to reconstruct IndoorGML based on building blueprints and LIDAR data. After that, they will combine this data with existing IFC (BIM) ﬁles. How-

3Chair of Geoinformatics, TU München - https://www.3Dcitydb.org/3Dcitydb/demos/

4iNous, Pusan National University - http://www.inous.net/sub/overview.html

(11)

ever, the outcome of this project is not published yet. There is no more relevant scientiﬁc literature published regarding the combination of 3D exterior and 2D ﬂoor plan data.

1.2.4 Case: Calculating living space area for the municipality

In the Netherlands, municipalities determine the value of properties (WOZ waarde) each year. This value inﬂuences several taxes, including the property tax and income tax. A part of the WOZ value depends on the size of a building. Historically, this was based on the building volume in m

³

. However, from 2022 it is mandatory to use the usage area in m

²

(Kadaster, 2019). Since most municipalities do not have this data, the Waarder- ingskamer, Valuation chamber, part of the government, made a conversion table and other guidelines to convert volume data to areas. The Waarderingskamer expects a accuracy of at least 5% (Waarderingskamer, 2019a).

1.3 Problem Statement

As mentioned in the previous section, to the best of our knowledge there is not yet a method developed for combining 3D city model data and 2D floor plans. Therefore, with this research we aim to focus in the combination of these two sources. This can be beneficial due to the higher level of detail, as mentioned in the previous sections. Due to the temporal difference, and inaccuracy (originating e.g. from measurement quality or generalizing algorithms) in the data sources, the combination of data induces many inconsistencies. Merging this data is a process with multiple steps for solving this chal- lenge, therefore this process will be called "data fusion" throughout this research. An abstract process flow of the research problem is shown in Figure 3.

Figure 3: Abstract Process Model

To fuse the two data sources, both need to be available in a compatible format. There-

fore, the ﬂoor plan images need to be converted into a vector format. After that, the

inconsistencies mentioned above need to be eliminated, and a ﬁnal 3D-object needs to

be made.

(12)

1.4 Research goals

The goal of this study is to create a method that is able to fuse 3D CityJSON exterior models with 2D ﬂoor plans. Moreover, the outcome of this method may be used in a large scale 3D city models. Therefore, the method should be computationally efﬁcient to convert large number of buildings. For the real-life use case, this method should be optimized for the Netherlands. The research objectives are:

1. To develop a method that is able to create a new CityJSON dataset by fusing a LOD 2 CityJSON object and ﬂoor plans

2. To analyse the output of the method for a real-life application

1.4.1 Sub-objectives

1. (a) To determine a vector representation format for 2D ﬂoor plans and modify an existing deep learning technique to ﬁt this representation.

(b) To design an algorithm for the fusion of 2D vector data and 3D CityJSON.

2. (a) To compare method output data with the ground truth.

(b) To use the output data for a real-life use case.

1.5 Research Questions

To reach the determined research objectives, the following research questions have been deﬁned:

1. How can existing CityJSON LOD 2 datasets be fused with ﬂoor plan images?

(a) How can an existing ﬂoor plan images be parsed using deep learning tech- niques?

(b) What is a consistent method for merging 3D city model- and 2D ﬂoor plan vector data?

2. How does the method perform?

(a) What is the quality the output of this method in comparison with the ground truth?

(b) How can data from this method be used to calculate living space areas useful

for improving the accuracy of the taxable areas?

(13)

1.6 Conceptual Framework

Figure 4: Conceptual Framework

The conceptual framework is shown in Figure 4. This conceptual framework gives a broad overview of this research. On the left side, the two main data sources can be found. The first data source, 2D floor plan, is not publicly available, but the user of the method can supply these files. The CityJSON LOD 2.2 dataset (3D BAG) consists of tiles each containing multiple buildings. As will be explained in the next section, a road database will be used to find the right orientation for a floor plan. The road database that will be used is called NWS (National Road Database). The parsed floor plan will be fused with the CityJSON LOD 2.2 exterior, and this will result into the CityJSON file with indoor information. A sample of each input data source can be found in Figure 5.

2D Floor plan CityGML LOD 2.2 Roads (vector)

Figure 5: Data Samples

(14)

2 Background

2.1 CityJSON

CityJSON (Ledoux et al., 2019) is a JSON encoding for 3D city models, sometimes also called digital twins. JSON stands for JavaScript Object Notation and is a format for storing and transporting data, derived from JavaScript object notation syntax. The in- troductory paper claims that, in comparison with CityGML (the current Open Geospatial Consortium standard) it is easier to use, and it is more compact with a compression fac- tor of around six with real-world data (Ledoux et al., 2019). The CityJSON file contains of metadata and city objects. City objects can represent physical objects and are 1st-level city objects (objects that can exists on their own, e.g. Building, Bridge, Road) or 2nd-level city objects (objects that need a parent to exists, elg. BuildingPart, Window or Buildin- gInstallation ). Each city object should have a geometry property. This property contains the 3D geometric primitives of the object. Additional information about the object can be stored in the attributes property. The geometry object can have multiple LODs, so the same CityJSON file can be used for 2D GIS and 3D GIS use cases. Geometry objects can contain semantics. This allows for visual properties, templates and surface types. The most recent CityJSON specification can be found on https://www.cityjson.org/specs/.

CityJSON is supported by a few software applications like Blender and QGIS. A software compatibility list can be found on https://www.cityjson.org/software/.

2.2 Input ﬂoor plans

Figure 6 shows a Dutch building specification. A few observations on this document can be made. This document does contain more information than only the floor plan. It also includes front, side and back views, intersections and a global overview. The doc- ument also includes specifications for the garage. The document is scanned manually.

The original document is not put exactly horizontal on the scanner bed, so the image is rotated ± 5 degrees. The floor plan of the attic is missing, and the floor plan of the garage’s attic is shown the place where the house attic floor plan is expected to be.

Typical Dutch building speciﬁcations do not share the same layout.

2.3 Floor plan image processing

Previous work from Or et. al. (hang Or, 2005) in 2005 introduced deterministic image

processing and symbol recognition techniques to interpret ﬂoor plan scans. However,

as with many computer vision problems, the focus has shifted from feature engineering

and deterministic tasks to methods learning from training data (Ahti Kalervo et al., 2019).

(15)

Figure 6: Scan of building speciﬁcation. The ﬂoor plan is highlighted in the red rectangle.

In order to obtain the multiple segmentation maps and labels, e.g. room types and point of interest (walls, icons, openings etc.), multi-task methods or networks should be used.

The performance of multi-task networks do highly depend on the relative weighting be-

tween each task’s loss(Kendall, Gal, & Cipolla, 2018).The deep learning breakthrough for

ﬂoor plan parsing was presented in resarch by Chen Liu (C. Liu, 2017), that used deep

learning to vectorize rasterized images. It does so by using a discriminative network to

obtain junctions, integer programming to obtain primitives and ﬁnally post-processing

to obtain a vector format. The method does only support openings and walls along the

x and y-axis. Diagonal walls are not supported. The network is trained on ﬂoor plans

originating from a Japanese real estate website. Yang (Yang, 2018) uses semantic seg-

mentation to convert ﬂoor plan images into vectors. Kalervo (Ahti Kalervo et al., 2019)

introduces a new dataset, CubiCasa, on which deep learning methods can be trained. It

also introduces a method with promising results. The PyTorch implementation is open

source. Another dataset is CVC-FP (de las Heras, Terrades, Robles, & Sanchez, 2015),

which consists of 122 scanned ﬂoor plan images. Recent work (by (H, 2020) and (Wu,

2020)) introduce methods based on deep learning techniques to generate CityGML or

IFC ﬁles from ﬂoor plan images. A few implementations, like by Kalervo (Ahti Kalervo

et al., 2019) start with a partially pre-trained model (e.g. ImageNet), to have a decent

initialized model. A remark on the usage of deep learning is that it currently works best

when there are thousands, millions or more training examples (Marcus, 2018). Table 1

shows the different deep learning architectures, targets, loss functions and types used

in the recent literature.

(16)

Paper Targets Method Architecture Loss

(Surikov,2020) Walls Semantic segmentation U-Net IoU

Window/Door/Icons Object detection Faster-RCNN Mean average precision (MaP) (C. Liu,2017) Wall or room type Semantic segmentation Modiﬁed ResNet-152 Pixelwise soft-max cross entropy loss

Icons Semantic segmentation Modiﬁed ResNet-152 Pixelwise soft-max cross entropy loss

(Yang,2018) Doors Semantic segmentation U-Net+DCL mIoU, mean accuracy

Walls Semantic segmentation U-Net+DCL mIoU, mean accuracy (Ahti Kalervo et al.,2019) Wall or room type Semantic segmentation Modiﬁed ResNet-152 Cross-entropy loss

Icons Semantic segmentation Modiﬁed ResNet-152 Cross-entropy loss (Zeng,2019) Room boundary

(wall, door, window) Classiﬁcation and

detection VGG16 Cross-and-within-task weighted loss Room type Classiﬁcation and

detection VGG16 Cross-and-within-task weighted loss (H,2020) Wall or room type Object detection Multiple Dice loss, WCE loss

Doors Object detection Multiple Dice loss, WCE loss

(Wu,2020) Walls, doors, icons Instance segmentation Mask R-CNN Multi-task loss function of Mask-RCNN

Table 1: Deep learning methods comparison

2.4 Maximum polygon overlap

To find the appropriate scale, rotation and orientation to project the exterior and floor plan outline, the maximum overlap of the two outlines needs to be found. De Berg (Berg, 2005) defined a method on finding optimal polygon overlap under translations. How- ever, this method only works for convex polygons. Building blueprints do not have to be convex, so this function is not applicable. Milenkovic (Milenkovic, 1998) introduces a method for optimal overlap using rotations and movements, that also works for non- convex polygons. This method does not support other methods such as scaling. Har- Peled (Har-Peled, 2016) introduces a method that approximates the maximum overlap of a polygon under translations. For polygons close to convex, this problem can be solved in nearly linear time. Research by Ahn et. al. (Ahn, Cheong, Park, Shin, & Vi- gneron, 2007) calculates the maximum overlap for two polygons using rigid motions (translation, rotation, scale, reflection and glide reflection). This method gives a rigid motion φ

^app

that is at least 1 - α times the maximum over all rigid motions. Berg et. al.

(De Berg, Cheong, Devillers, Van Kreveld, & Teillaud, 1998) introduces a method for max- imum overlap of two convex polygons under translations in O(n + m)log(n + m) time, where n and m are the number of vertices in the respective polygons. The method is an algorithm of steps with binary searches for new locations for the points based on the average centroid (geometric centre) of both polygons.

To calculate the similarity between two polygons, a variety of metrics are available. First,

the Hausdorff distance measures the maximum distance of a set to the nearest point in

the other set (Equation 1). The Fréchet distance also takes into account the location and

ordering of the vertices along the edges. This algorithm traverses in a direction trough

all points and ﬁnds the maximum distance for each combination (Equation 2). Lastly, the

Jaccard similarity coefﬁcient describes the similarity of two polygons by their overlap

and intersection (simpliﬁed in Equation 3). This metric does not look at the vertices or

direction of the edges.

(17)

H(A, B) = max

a∈A

n

min

b∈B

{d(a, b} o

(1)

F (A, B) = inf

α,β

max

t∈[0,1]

( d

A(α(t)), B(β(t)) )

(2)

J (A, B) = |A ∩ B|

|A ∪ B| , 0 ≤ J (A, B) ≤ 1 (3)

Where

A, B = Input polygons

d(a, b) = Distance between point a and point b

2.5 Predicting ﬂoor level height and other structural properties

The ceiling heights, floor heights and exterior and interior wall thicknesses depend on the building’s year of construction. This is mostly due to regulations for increased living standards and insulation requirements. Boeters (Boeters, 2015) made a table for the city of Rotterdam, to estimate exterior wall thickness and interior wall thickness based on the building type (stacked, non-stacked or others), construction year, and number of floors. This is done by researching building blueprints. However, by applying this method it was not possible to determine the roof thickness and ceiling/floor thickness.

There do exists building regulations in the Netherlands, that enforce minimum heights.

Since 1992, these regulations are deﬁned in the Bouwbesluit

⁵

.

2.6 Data quality and standards

A model or a map is an abstraction of reality, it can safely be said that no map stored in a GIS is truly error-free (Heuvelink, 2005). This means that it always deviates from the real world. Constructions often deviate from the original plans, and after a renovation, the original plans are not accurate anymore. Also, calculations for areas like living space area can be ambiguous. Therefore, there are a few standards for measuring, deviations and deﬁnitions.

The NEN 2580 (NEN, 2007) is a Dutch standard that deﬁnes terms, deﬁnitions and meth- ods for determining and measuring areas/contents of buildings and terrains with a build- ing destination. For example it determines the room types that can be included to cal- culate the total living space area for a building. Within the Dutch municipalities and the

5https://rijksoverheid.bouwbesluit.com

(18)

Dutch real estate and brokerage industry, the Meetinstructie Gebruiksoppervlakte wonin- gen, Measurements instruction living area (Waarderingskamer, 2019b) is used. This dif- fers from the NEN 2580 in two respects: it separates areas in living space- and other indoor areas and it considers non-external load-bearing walls as living space area. The ISO 286-1:2010 (ISO, 2010) is an international standard for tolerance on measurement errors. Research by Łuczyński (Łuczyński R, 2017) compares a few standards for mea- suring and calculation standards for buildings.

It can safely be said that no GIS map is error-free (Heuvelink, 2005), and error propa-

gation in GIS has received a lot of attention (F. Biljecki, 2015). In 3D GIS, the effect of

deviations on the x/y-axis can have a different effect then deviations on the z-axis (F. Bil-

jecki, 2015). Previous research on applying simulations on CityGML models has found

that the positional error has a much higher impact than the LOD (F. Biljecki, 2017), there-

fore, the researchers suggest that it is pointless to acquire geoinformation of a ﬁne LOD

if the acquisition method is not accurate. The researches distinguish two types of errors,

acquisition-induced errors and analysis-induced errors. Literature by Olde Scholtenhuis

(Scholtenhuis, 2018) focusses on uncertainties in 3D (underground) utility data. Here,

the z-axis is often not stored due to liability reasons. Olde Scholtenhuis provides a table

with a comparison of uncertainty capturing methods, ranging from textual attributes and

blurred or lines/colours representations to probabilistic models. Research by Krämer et

al. (Krämer, Haist, & Reitz, 2007) describes a data quality model for CityGML, based on

the 6 elements of spatial data quality: positional accuracy, completeness, semantic ac-

curacy, attribute correctness, temporal conformance and logical consistency. Krämer

et al. have based these elements on multiple spatial ISO data quality standards. Wag-

ner (Wagner, Alam, & Coors, 2013) describes three kinds of accuracy for 3D CityGML

models: positional accuracy, thematic accuracy, and temporal accuracy. Thematic and

semantic accuracy are synonyms (Hangouët, 2015). Logical consistency can also be

found under geometric validity. This is often a requirement for using CityGML buildings

in decision-making software. Standards for geometric validity (ISO 19107) make sure

that 3D objects are geometrical logical. Automatic validation is possible, for example

using the tool val3Dity by Ledoux (Ledoux, 2018).

(19)

3 Research Methodology

This section will ﬁrst explain design science and the methodology used to achieve the research goals. After that, the global solution design (called treatment by Wieringa (Wieringa, 2014)) will be explained.

3.1 Methodology

Figure 7: The engineering cycle. The question marks indicate knowledge questions, and the exclamation marks indicate design problems (Wieringa, 2014)

The book by Wieringa (Wieringa, 2014) deﬁnes design science. Design science is the de- sign and investigation of artifacts in context. These artifacts can for example be infor- mation systems or other engineering results. Wieringa deﬁnes three types of questions:

design research problems (a.k.a. technical research questions), empirical knowledge questions (questions about the real world), and analytical knowledge questions (ques- tions about logical consequences of deﬁnitions). The engineering cycle (ﬁgure 7) is a process that iterates over design science activities. A research project can concentrate on one or more of the four steps. A few steps of the engineering cycle will be followed to accomplish the research goals. Since this project is a solution-oriented research (in distinction to problem-oriented research), the treatment design and treatment valida- tion steps are applied to this research. Wieringa does not use the term solution, since the artifact of a design science project might not solve the problem or only partially.

Therefore, Wieringa calls this a treatment.

Treatment design The ﬁrst step of this research is the treatment design step. This step

does merely consist of design problems and ties to the ﬁrst research question. Since this

project is solution-oriented, there are no stakeholders to obtain requirements. The data

requirements to reach the research objective are shown in Figure8. As mentioned in the

previous section, there are no complete solutions to this problem in existing scientiﬁc

literature. Therefore, a new method will be designed.

(20)

Treatment validation The second research question relates to the succeeding step in the engineering cycle. This puts the artifact (the dataset) in context and researches its capabilities.

3.2 Global Design

Figure 8 shows the proposed method to merge the two data sources (CityJSON and 2D ﬂoor plans). In order to obtain the information needed from the ﬂoor plan images, they are parsed into vectors with attached semantic attributes, using deep learning methods.

The data extracted from the floor plans are walls, openings and room types. This part will be elaborated upon in section 4. Obtaining the 3D exterior and NWB roads is explained in section 5.2. Since the floor plans do not indicate an orientation (e.g. to the North), adjacent roads are used to rotate the CityJSON building with respect to the floor plan.

For each facade with an adjacent road, the optimal transformation (θ) for an overlap is found. Estimating the floor level heights and solving inconsistencies between the 2D and 3D data is based on predefined rules. The final building, consisting of floor planes, interior and exterior walls, doors, windows, and roof is stored in the CityJSON Building object. This part is available in section 5.

Figure 8: Proposed Solution Methodology

(21)

4 RQ 1A: Floor plan parsing

This section aims at providing a method to answer the first research question, "How can existing floor plan images be parsed using deep learning techniques?". The extracted data will be used in the next research question, and therefore the needed data format is shown in Figure 8. In general, 3 targets need to be obtained from the image. The first are the walls, the second are the openings (door and window), and the third are the room type. Since images can be noisy, each target has specific challenges. A few examples:

for target 1) the distinguishment between reference lines, furniture and walls, for target 2) dealing with multiple variations on icons, and for target 3) characters and annotations that do not represent the room type (e.g. titles, annotations and sizes). Therefore, for each task, a different approach has been chosen. As identiﬁed in Table 1, a variety of methodologies and techniques have been used by the researchers to parse ﬂoor plan images.

Section 4.2 - 4.5 elaborates further on the methods used. These methods are needed for the data extraction (walls, openings and room types) and integration. The last section 4.6 describes the result analysis method and presents the results.

4.1 Data sources

In Table 2, all used data for this research question is shown.

Name Description Format Source

CubiCasa5K 5000 floor plan images and annotation. Image (png) and vector (svg) (Ahti Kalervo et al.,2019) Dutch images 1 5 floor plan images from houses in the Netherlands Image (png) Municipality of Rijssen-Holten Dutch images 2 1 floor plan image from a office building in the Netherlands Image (png) BIM4ALL B.V.

Table 2: Data used for RQ 1A: Floor plan parsing

4.2 Walls

4.2.1 Deep Learning Architectures

Semantic segmentation is the most commonly used method in the selected literature.

This is a process where for each pixel, a prediction is made. Based on these predictions, further processing is possible. There are a variety of semantic segmentation architec- tures described in recent literature, of which two architectures are compared in this re- search: U-Net (Ronneberger, Fischer, & Brox, 2015) and Fast-SCNN (Poudel, Liwicki, &

Cipolla, 2019). U-Net is chosen since it is the most common architecture in the selected

literature. The latter, Fast-SCNN is chosen since it is a recent technique, proven to be

(22)

efﬁcient and reach high accuracy in implementations, and works with a small capacity network size.

The U-Net convolutional neural network (Ronneberger et al., 2015) was originally de- signed for biomedical image segmentation, but generalizes well to other semantic seg- mentation tasks. The network foundation has a U-shape, as can be seen in Figure 9.

The asymmetric architecture consists of a left (encoder) and a right side (decoder).

The encoder uses convolution blocks followed by max-pool downsampling to encode the input image into feature representations at multiple levels. The decoder projects features learned by the encoder on the pixel space, by upsampling and concatenation.

The decoder uses output from the encoder on multiple levels.

Figure 9: U-Net architecture (Ronneberger et al., 2015)

Fast Segmentation Convolutional Neural Network (Fast-SCNN) (Poudel et al., 2019) is

a semantic segmentation network suited to efﬁcient computation. The network con-

sists of four building blocks: down-sampler, global feature extractor, feature fusion and

classiﬁer. The ﬁrst block, down-sampler, extracts the low-level features such as edges

and corners from the image, using convolutional layers. The second block, the global

feature extractor, is aimed to capture the global context. It uses the output from the ﬁrst

block and uses bottlenecks and Pyramid Pooling Module (PPM). The latter is used to

aggregate different region-based context information. Bottlenecks are layers with fewer

weights than the layers before. This encourages the network to compress feature rep-

resentations to the best ﬁt in the available space. The PPM module applies average

pooling and upscaling to multiple levels (resolution) of the image, to get an overview

from global to the local context from the image. In the third block, feature extraction the

output of the down-sampler and global feature extractor is combined. The last block

uses activation layers to predict a class for each layer. The architecture overview is

shown in Figure 10.

(23)

Figure 10: Fast S-CNN architecture (Poudel et al., 2019)

4.2.2 Training

Both networks are implemented in Tensorflow and trained on a Nvidia GeForce GTX 1080 GPU. The training input is a n × m pixels floor plan greyscale image and a binary matrix of the same size, the mask. In this matrix, a cell value is 1 if the corresponding pixel in the floor plan image is part of a wall, else 0. Besides resizing the floor plan image to, there are no transformations applied to the input of the network. Training is done using the Adam optimizer with an initial learning rate of 1e

⁻⁷

. The used loss function is binary cross-entropy (Equation 4). Each epoch consists of 500 training steps and 20 validation steps. Early stopping is applied on the validation loss. This stops training when a monitored metric has stopped improving for the last 5 epochs. The batch size is 8. The input size of 848x848 for U-Net and 512 × 512 pixels for Fast-SCNN is chosen because of the maximum available memory on the GPU.

(y log(p) + (1 − y) log(1 − p)) (4)

4.2.3 Evaluation

The following metrics are calculated:

1. Loss (Equation 4)

2. Accuracy -

T P +T N +F P +F N^{T P +T N}

3. Recall -

_{T P +F N}^{T P}

4. Precision -

_{T P +F P}^{T P}

5. Intersection over Union (IoU) -

T P +F P +F N^{T P}

(0 ≤ IoU ≤ 1)

An accuracy close to 1 indicates a high number of correct predictions. However, since

the wall-class pixels are less frequent than other-class pixels, an underﬁtting model can

(24)

still have high accuracy. Therefore, recall and precision are calculated. The recall indi- cates the ratio between the pixels predicted wall and all ground truth wall pixels. The precision is the ratio between the correct predicted pixels as wall and pixels predicted as wall. The Intersection over Union (IoU) is used to view the overlap between the masks.

The result of these metrics can be found in Table 3.

Name Loss IoU Accuracy Recall Precision Epochs U-Net 0.0693 0.8859 0.9686 0.8399 0.8859 66 Fast-SCNN 0.0730 0.6528 0.9674 0.88310 0.849 72

Table 3: Segmentation training metrics

4.2.4 Post-processing

After pixel-wise classification, individual walls need to be identified and converted to polygons. This is done in multiple steps, using the OpenCV library (Bradski, 2000). The first phase consists of multiple morphological transformations. Morphological trans- formations are a set of image processing transformations that process images based on shapes. Each pixel in the image is adjusted based on the value of the neighbouring pixels, present in the pixel kernel. By adjusting the kernel size, the transformation can be made more or less sensitive to its neighbourhood. There are 2 main transformations:

erosion and dilation. Erosion changes the pixel value to 0 if not all pixels in the kernel are 1 and dilation changes the pixel value to 1 if at least one pixel in the kernel is 1. The ﬁrst applied transformation is closing, which applies erosion followed by dilation to each pixel and thereby closes small holes. The second operation is dilation. The third oper- ation is erosion, to thinner the walls. Each step uses the same kernel, which is a 5 × 5 matrix. This results in a picture where each wall has two lines, representing the wall edges. The output for each step is shown in Figure 11.

Figure 11: Morphological steps

To obtain individual walls from these pixel-based images, two different approaches have been tested. In the ﬁrst approach, line detection by Hough line transform (Bergen &

Shvaytser, 1991) is applied. Hough line transforms can be used to detect straight lines in

pictures. The algorithm identiﬁes lines in polar coordinates (r, θ). OpenCV has two imple-

mentations of Hough transforms: standard and probabilistic. The ﬁrst returns polar co-

(25)

ordinates and the latter returns the extremes of the detected lines (x

min

, y

min

, x

max

, y

max

).

Since a wall consists of two lines (both edges), these need to be connected. Therefore, lines with similar slopes and a minimal distance less than the set threshold are com- bined. However, it was not possible to get a good balance between combining too few- or too many lines, which resulted in lines not being combined into a wall or overgener- alizing which ignores small changes in direction. This can be seen in Figure 12.

Figure 12: Combining to many elements

The second approach is to use OpenCV to extract wall contours. The contours are con- verted to polygonal curves by drawing a border around the edges. After that, the contour lines are simpliﬁed using the Ramer–Douglas–Peucker algorithm in OpenCV (Bradski, 2000), with a precision of 5 pixels. This algorithm converts line segments to a similar curve with fewer points. The lines in this contour are re-drawn if the angle θ is approxi- mately in a

^π₄

, a ∈ N. After that, Algorithm 1 is applied. In Figure 13, an original image, the all-walls polygon and the individual walls can be seen. This approach has, in contrast to previous research (C. Liu, 2017) the ability to support diagonal walls.

Algorithm 1 Create individual walls while all-walls.area > 0.01 do

top-left ← top-left point from all-walls polygon

highest-angle ← point with connecting edge to top-left with highest θ highest-dx ← point with connecting edge from top-left with highest ∆x wall ← Polygon from top-left to highest-angle

direction ← -1 if highest angle point x < top-left point x else 1 while intersection(all-walls,wall).area == wall.area do

Extend wall in direction on x-axis end while

if wall.area == 0 then

Make bounding box around top-left, highest-angle and highest-delta points, ﬁnd the intersection with all-walls and select the largest element as wall end if

all-walls ← Symmetric difference between all-walls and wall

end while

(26)

Figure 13: Left: input, center: all-walls polygon, right: individual walls

4.3 Openings

4.3.1 Deep Learning Architectures

Doors and windows are represented by icons in the ﬂoor plan. In the selected literature, a few deep learning object detection algorithms are used. In this research, four different architectures are compared. In order to compare the ﬁnal result, all networks should output in the same format. Therefore, all selected networks predictions are bounding boxes. The used deep learning architectures are shortly introduced below.

Faster R-CNN (Ren, He, Girshick, & Sun, 2015) uses convolutional layers as input for a regional proposal network (RPN). This region proposal network aims to detect bounding boxes for objects. A convolutional network is applied on these region proposals. Single Shot Detector (SSD) (W. Liu et al., 2016) uses, unlike Faster R-CNN, only one step (shot) to identify and detect objects. In this research SSD network is tested with two encoders, RetinaNet50 and EfﬁcientNet-b0. The fourth network type, CenterNet (Duan et al., 2019) models objects as single points, the center point of its bounding box. An encoder is a mapping from X → Y , where X is the input space (R

^m×n×3

) and Y the code space.

During training, it will capture only the salient relations by backpropagation.

4.3.2 Training

Training is done using the Tensorﬂow Object Detection API (Huang et al., 2017). This is

an open-source framework based on Tensorﬂow that enables the use of pre-trained net-

works from the TensorFlow 2 Detection Model Garden. During training, the Tensorﬂow

Model Garden standard settings have been used. For Faster R-CNN, the Adam optimizer

is used with learning rate 1e

⁻⁷

. For both CenterNet and SSD architectures, the momen-

tum optimizer is used with a cosine decay learning rate. This decreases the learning rate

over time, based on the cosine curve. All networks are trained on a Nvidia GeForce GTX

1080 GPU GPU. During training, each architecture uses a combination of loss functions:

(27)

• Faster R-CNN

– Classiﬁcation loss – Box Localization loss – RPN Localization loss – RPN Object loss – Normalized total loss – Regularization loss

• SSD with Mobilenet v1 and SSD with EfﬁcientNet b0 - – Classiﬁcation loss

– Localization loss – Normalized total loss – Regularization loss

• CenterNet

– Heatmap Variant Focal Loss – L1 Norm Offset Loss

– L1 Norm Dimension Size Loss.

Regularization loss helps coefﬁcients in the deep learning network to remain small, to prevent overﬁtting. Normalized total loss is a weighted sum of all separate loss func- tions. L1 Loss is the least absolute deviation, P

ⁿ_i=1

|y − ˆ y| .

4.3.3 Evaluation

During training, the classiﬁcation and localization loss is calculated, both on the region proposal network as the box classiﬁer loss. This, together with regularization (penaliz- ing to discourage model complexity) is combined into a total loss. Depending on the network architecture, different combinations and calculations losses are implemented.

Therefore, to compare these networks, the result is evaluated on consistent metrics.

The COCO (Common Objects in Context) consists of often used metrics to compare ob- ject detection architectures (Lin et al., 2014). Two metrics are average precision (AP) and average recall (AR) at different IoU levels.

The IoU number is the area of overlap divided by the area of the union of the prediction

and ground truth. This IoU score is used to mark predictions as true positives (TP) or

false positive (FP). A false negative (FN) is a missed object. Precision is used to describe

(28)

the ratio of correct positives

_{T P +F N}^{T P}

. Recall (sensitivity) is used to describe the ration between the TP and all positives (TP + FN),

_{T P +F N}^{T P}

. Since the prediction mark (TP, FP, FN) depends on the IoU threshold (to what extent should the prediction ﬁt on the ground truth box), the metrics can be calculated for different IoU thresholds. When ordering predictions by their conﬁdence level, a prediction/recall curve can be made. The area under this curve is called the average precision (AP). The mean average precision (mAP) is the average of the AP. The result can be found in Table 4. Figure 14 shows an example prediction.

Name Resolution (px) Seen samples Avg Precision Avg Recall Faster R-CNN with

Resnet-50 (v1) 640x640 BS = 2,

Steps = 14K

Total = 28K 0.66 0.445

SSD with Mobilenet v1

FPN (RetinaNet50) 640x640 BS = 4,

Steps = 14K

Total = 56K 0.744 0.595

SSD with EfﬁcientNet-b0

(EfﬁcientDet-d0) 512x512 BS = 4,

Steps = 14K

Total = 56K 0.606 0.441

CenterNet with

hourglass backbone 512x512 BS = 2, Steps = 14K

Total = 28K 0.704 0.531

Table 4: Object Detection metrics

Figure 14: Sample Object Detection prediction

(29)

4.4 Room Type

Room types are shown as characters in the ﬂoor plan image. Optical Character Recog- nition (OCR) is a technique to obtain machine-encoded text. There are a variety of off- the-shelve OCR techniques, of which a few are compared in research by Tomaschek (Tomaschek, 2018). The Tesseract version 4 (Smith, 2019) consists of a new neural net- work (LSTM) based OCR engine, which focuses on line recognition. This technique has a word-accuracy ratio of 100% in the research of Tomaschek.

First, using morphological and colour techniques in OpenCV, the walls and furniture are ﬁltered from the image. Subsequently, OpenCV is found to ﬁnd contour boxes. Finally, Tesseract version 4 is used to extract the characters in this box.

4.5 Merging the extracted data

After obtaining individual walls, openings and labels, all data needs to be combined into a single object. First, the openings and walls are combined. For each opening, the wall with the largest intersection is chosen. After that, the object is projected onto the wall and for door elements, the direction is found by ﬁnding the maximum offset distance to the wall intersection centroid. The process can be seen in Figure 15. Finally, room labels are added on its center coordinates.

Figure 15: Process

(30)

4.6 Floor plan parsing results

To evaluate the described methods output, 10 random samples from the CubiCasa5K dataset and 6 Dutch samples are inspected. Since the study area of this method is the Netherlands and CubiCasa5K does not contain Dutch ﬂoor plan samples, 6 Dutch sam- ples are chosen. For each sample, the true positives (TP, correct items), false positives (FP, output that should not be in the result), and false negatives (FN, missing items) are given. The precision and and recall are added to each table, and the harmonic mean of the precision and recall (F1-score, Equation 5) are calculated. To ﬁnd the accuracy of the door direction, the ratio between correct directions and true positive doors is given. An object is considered true positive if more than 75% of the ground truth wall area is cov- ered. In Table 5 the F1 scores for the CubiCasa5K and Dutch samples are presented. The results for the CubiCasa5K dataset can be found in Table 6-15 and results for a subset of the samples provided by the municipality of Rijssen-Holten and the company BIM4ALL are shown in Table 16-20. The results will be discussed and analysed in section 7.3.

F 1 = 1

1

2

(

_recall¹

+

_precision¹

) (5)

Wall Door Window

CubiCasa5k 0.9 0.93 0.85

Dutch ﬂoor plan samples 0.83 0.66 0.69

Table 5: F1 scores per type

(31)

Sample 1 (Cubicasa5K)

Input Output

TP FP FN Precision Recall F1-Score

Wall 29 4 0 0.89 1 0.94

Door 12 0 0 1 1 1

Window 7 0 0 1 1 1

Door direction ratio: 6/12

Table 6: CubiCasa Sample 1

Sample 2 (Cubicasa5K)

Input Output

TP FP FN Precision Recall F1-Score

Wall 31 3 0 0.91 1 0.95

Door 12 0 0 1 1 1

Window 10 1 1 0.91 0.91 0.91

Door direction ratio: 3/12

Table 7: CubiCasa Sample 2

(32)

Sample 3 (Cubicasa5K)

Input Output

TP FP FN Precision Recall F1-Score

Wall 21 3 0 0.88 1 0.94

Door 8 0 0 1 1 1

Window 6 0 0 1 1 1

Door direction ratio: 5/8

Table 8: CubiCasa Sample 3

Sample 4 (Cubicasa5K)

Input Output

TP FP FN Precision Recall F1-Score

Wall 55 5 7 0.92 0.89 0.9

Door 16 0 3 1 0.84 0.91

Window 0 0 9 1 0 0

Door direction ratio: 4/16

Table 9: CubiCasa Sample 4

(33)

Sample 5 (Cubicasa5K)

Input Output

TP FP FN Precision Recall F1-Score

Wall 14 2 0 0.86 1 0.92

Door 5 0 0 1 1 1

Window 6 0 0 1 1 1

Door direction ratio: 2/5

Table 10: CubiCasa Sample 5

Sample 6 (Cubicasa5K)

Input Output

TP FP FN Precision Recall F1-Score

Wall 22 2 4 0.92 0.85 0.88

Door 9 0 0 1 1 1

Window 8 0 1 0.89 0.94

Door direction ratio: 3/9

Table 11: CubiCasa Sample 6

(34)

Sample 7 (Cubicasa5K)

Input Output

TP FP FN Precision Recall F1-Score

Wall 16 3 0 0.84 1 0.91

Door 3 2 0 0.6 1 0.75

Window 5 0 0 1 1 1

Door direction ratio: 2/5

Table 12: CubiCasa Sample 7

Sample 8 (Cubicasa5K)

Input Output

TP FP FN Precision Recall F1-Score

Wall 16 2 1 0.89 0.94 0.91

Door 4 0 0 1 1 1

Window 3 1 0 0.75 1 0.86

Door direction ratio: 2/4

Table 13: CubiCasa Sample 8

(35)

Sample 9 (Cubicasa5K)

Input Output

TP FP FN Precision Recall F1-Score

Wall 7 2 0 0.77 1 0.87

Door 1 1 0 0.45 1 0.67

Window 5 3 0 0.63 1 0.77

Door direction ratio: 0

Table 14: CubiCasa Sample 9

Sample 10 (Cubicasa5K)

Input Output

TP FP FN Precision Recall F1-Score

Wall 18 1 2 0.95 0.9 0.79

Door 5 0 0 1 1 1

Window 5 0 0 1 1 1

Door direction ratio: 3/5

Table 15: CubiCasa Sample 10

(36)

Sample 11 (Dutch images 1)

Input Output

TP FP FN Precision Recall F1-Score

Wall 13 1 3 0.93 0.81 0.97

Door 6 0 1 1 0.86 0.92

Window 5 0 0 1 1 1

Door direction ratio: 3/6

Table 16: Dutch Sample 1

Sample 12 (Dutch images 1)

Input Output

TP FP FN Precision Recall F1-Score

Wall 12 1 0 0.92 1 0.96

Door 4 0 0 1 1 1

Window 7 0 0 1 1 1

Door direction ratio: 2/4

Table 17: Dutch Sample 2

(37)

Sample 13 (Dutch images 1)

Input Output

TP FP FN Precision Recall F1-Score

Wall 10 0 1 1 0.91 0.95

Door 6 0 0 1 1 1

Window 6 1 0 0.87 1 0.93

Door direction ratio: 3/5

Table 18: Dutch Sample 3

Sample 14 (Dutch images 2)

Input Output

TP FP FN Precision Recall F1-Score

Wall 19 5 12 0.8 0.61 0.69

Door 2 1 5 0.67 0.29 0.4

Window 0 1 16 0 0 0

Door direction ratio: 1

Table 19: Dutch Sample 4

(38)

Sample 15 (Dutch images 1)

Input Output

TP FP FN Precision Recall F1-Score

Wall 16 1 15 0.94 0.52 0.67

Door 4 2 2 0.66 0.66 0.66

Window 3 0 5 1 0.38 0.55

Door direction ratio: 0

Table 20: Dutch Sample 5

Sample 16 (Dutch images 1)

Input Output

TP FP FN Precision Recall F1-Score

Wall 16 1 4 0.94 0.8 0.86

Door 0 0 5 0 0 0

Window 7 4 3 0.64 0.7 0.66

Door direction ratio: 0

Table 21: Dutch Sample 6

(39)

5 RQ 1B: Consistent merging method

This section aims at providing a method to answer research question 1.b, "What is a consistent method for merging 3D CityJSON and 2D ﬂoor plan vector data?". This re- search question has two challenges: mapping the processed ﬂoor plan orientation on real-world coordinates using rigid transformations, and combining information into a 3D-model.

5.1 Data sources

In Table 22, all used data for this research question is shown.

Name Description Format Source

3D BAG LOD 2.2 3D building data set for the Netherlands CityJSON 3D geoinformation group TU Delft NWB Roads Web Feature Service (WFS) consisting of 159,398 KM of roads LineString Nationaal Georegister

Table 22: Data used for RQ 1B: Consistent merging method

5.2 Obtain 3D BAG exterior and NWB Roads

CityJSON objects in the 3D BAG are grouped by location in tiles. All tiles are available in a GeoJSON feature collection. This collection describes the bounding boxes of a tile and the tile ID, which can be used to download the corresponding CityJSON ﬁle. If the building coordinates are known, the CityJSON tile is downloaded and the building object geometry is extracted from the CityJSON ﬁle. By creating a mesh of the building geometry, intersections, translations, and other manipulations are possible. The NWB road database is available by a Web Feature Service (WFS), which can be accessed by HTTP requests. In order to obtain roads within a certain direction of the building, a bounding box parameter needs to be added to this request. The WFS returns a list of linestrings, which can be empty if no roads are close to the building.

5.3 Polygon transformation

In contrary with the 3D City Model, the ﬂoor plans do not have an orientation (e.g. to the

north). It is likely that the front facade is the horizontal line on the bottom of the ﬂoor

plan. To ﬁnd the front facade in the city model, it is assumed that this is parallel to a

road within 15 meters of the building centroid. Thus, for each parallel road, polygon map-

ping is applied to find the best fit of the floor plan onto the 3D City Model building. As

mentioned in the background (Section 2.4), there are a variety of methods for polygon

Methodology for reconstruction of 3D Building Models using 3D-CityJSON and Floor Plans

METHODOLOGY FOR

RECONSTRUCTION OF 3D BUILDING MODELS USING 3D-CITYJSON AND FLOOR PLANS

R.G. Kippers (Richard)

MSc Computer Science,

Data Science & Technology specialization

SUPERVISORS:

dr. ir. M. Van Keulen (EEMCS)

dr. M.N. Koeva (ITC)

Preface

I hope you will enjoy reading this thesis and are able to gain new insights. Please do not hesitate to contact me if you have further questions or feedback regarding this research.

Richard Kippers

Rijssen, July 14, 2021

Abstract

, 2) archived ﬂoor plan images (e.g. scanned or exported from CAD software). To reconstruct 3D building models from the two data sources, an innovative combination of methods is proposed.

The method output data has been evaluated in two ways. First, the method output is compared with the a corresponding BIM model, which functions as ground truth. Sec- ondly, the method output data is evaluated for a real life use-case, namely to calculate living space areas.

Contents

1 Introduction 6

1.1 Research Motivation . . . 6

1.2 Problem Context . . . . 7

1.2.1 BIM and GIS . . . 8

1.2.2 3D City Models and building analysis . . . 8

1.2.3 Scientiﬁc Gap . . . 9

1.2.4 Case: Calculating living space area for the municipality . . . 10

1.3 Problem Statement . . . 10

1.4 Research goals . . . 11

1.4.1 Sub-objectives . . . 11

1.5 Research Questions . . . 11

1.6 Conceptual Framework . . . 12

2 Background 13 2.1 CityJSON . . . 13

2.2 Input ﬂoor plans . . . 13

2.3 Floor plan image processing . . . 13

2.4 Maximum polygon overlap . . . 15

2.5 Predicting ﬂoor level height and other structural properties . . . 16

2.6 Data quality and standards . . . 16

3 Research Methodology 18 3.1 Methodology . . . 18

3.2 Global Design . . . 19

4 RQ 1A: Floor plan parsing 20

4.1 Data sources . . . 20

4.2 Walls . . . 20

4.2.1 Deep Learning Architectures . . . 20

4.2.2 Training . . . 22

4.2.3 Evaluation . . . 22

4.2.4 Post-processing . . . 23

4.3 Openings . . . 25

4.3.1 Deep Learning Architectures . . . 25

4.3.2 Training . . . 25

4.3.3 Evaluation . . . 26

4.4 Room Type . . . 28

4.5 Merging the extracted data . . . 28

4.6 Floor plan parsing results . . . 29

5 RQ 1B: Consistent merging method 38 5.1 Data sources . . . 38

5.2 Obtain 3D BAG exterior and NWB Roads . . . 38

5.3 Polygon transformation . . . 38

5.4 Solving area inconsistencies . . . 40

5.5 Resize objects to real-life scale . . . 41

5.6 3D CityJSON model generation . . . 41

6 RQ 2: Method evaluation 43

6.1 Data sources . . . 43

6.2 RQ 2A: Comparison with ground truth . . . 43

6.2.1 BIM Building Comparison 1 . . . 44

6.2.2 BIM Building Comparison 2 . . . 48

6.3 RQ 2B: Comparison with the method used in the municipality for living space area calculation . . . 52

7 Discussion 54 7.1 Semantic segmentation for detecting walls on floor plan images validation 54 7.2 Object detection for finding openings on floor plan images validation . . . 55

7.3 Floor plan parsing . . . 55

7.4 Method output and BIM models . . . 56

7.5 Method output and municipality . . . 56

7.6 Scalability and costs . . . 57

8 Conclusion 58 8.1 Limitations . . . 59

8.2 Future work . . . 60

References 61

1 Introduction

1.1 Research Motivation

BIM is mainly used during the ﬁrst two phases of a building lifecycle, speciﬁcally plan- ning and construction (Volk, Stengel, & Schultmann, 2014). It is able to improve buildings due to better collaboration, designing and documentation. 3D City models or so-called

Figure 1: Levels of Detail in CityGML (J. S. Biljecki H. Ledoux, 2016)

Creating digital representations of a building can be a labour-intensive task, depending

on the desired scale and level of detail (LOD). Biljecki (J. S. Biljecki H. Ledoux, 2016)

deﬁned 5 generic LODs for 3D city modelling. Example illustrations of these models are

visible in Figure 1. The lowest level, LOD 0 is a 2D shape of the building outline, and the

highest level, LOD 4, is a detailed indoor and outdoor 3D object with corresponding se-