S. GONG

Master thesis (1)

Student report (1)

2 records found

Towards Adaptive Trajectory Data Management: Modelling, Accessing, Distributing, and Query Optimization in Distributed Database

Master thesis (2024) - S. GONG (author) , BM Meijers (mentor) , C.W. Quak (graduation committee member)

With the proliferation of the IoT infrastructure, the trajectory data is dynamically emerging. This data originates from a variety of moving objects, containing big volumes of multi-dimensional information such as space, time, semantics etc. The underlying information can be potentially applied to create added value through scientific research, decision-making, emergency management etc.

However, due to the special properties of the trajectory data, namely high frequency, cardinality, dimensionality, heterogeneity etc., traditional data management systems face difficulties in handling such data. Even though some distributed solutions or big data solutions exist in other fields, they are not designed considering the modelling, accessing, distributing and querying characteristics of this special spatio-temporal data.

Given the spatial data management problems, a clustering/indexing solution for high dimensional Point Cloud by Space-filling curve considering the heterogeneous data spatial distribution has been developed, advocated and validated by a series of research finished at the GDMC, TU Delft.

However, it is uncertain whether the framework can be extended to other types of space-related phenomena. Furthermore, whether distributed database techniques can be utilized remains to be explored and what adjustments should be made is still unclear. To some extent, this thesis is an expanded study based on the Point Cloud research mentioned above.

To address these data management challenges, this thesis focuses on trajectory data modelling and compression, indexing and clustering, partitioning and distributing. Also, the querying strategies are studied. More specifically, the three main results of this thesis are:

Model the trajectory as the sequence split by semantic attributes and a spatio-temporal cube. This modelling takes proximity (locality) preservation and trajectory preservation into account at the same time, resulting in a balanced level of flexibility and aggregation, mitigating the storage burden by row-wise compression. For different subdivision resolutions (depth of Octree for space partitioning), the compression ratio can be up to 10.

Access the trajectory data by Space-filling Curve. The Space-filling Curve indexing method maps the 3D (high dimensional) indices to 1D (low dimensional) indices, overcoming the contradiction between high dimensionality and high cardinality. Adaptive Octree is used to mitigate the heterogeneity of the trajectory data. Based on the experiment results, the optimal tree depth is 4 or 5. The query optimization (specifically, the range merging technique) is also preliminarily explored.

Distribute the data on the distributed machines. This distributed deployment results in higher (nearly linearly in the experiment) scalability (horizontal expansion of disk, memory and CPU resources) and speed-up. The Space-filling Curve distributing strategy results in better load-balancing. However, due to the lack of flexibility of the distributed database platform used (specifically, Greenplum), the localization of data and computation (such as local aggregation) is limited.

(Co-)development of an open-data-based tool to perform preliminary environmental analyses at district scale in different European countries

Student report (2023) - B.S. Tsai (author) , L.C. Huizer (author) , M. Giampaolo (author) , S. Monté (author) , S. GONG (author) , G. Agugiaro (mentor) , Gabriel Garcia (mentor)

This report details the development process of an open-data-based tool, an extension of the original interface created by Royal HaskoningDHV. The objective was to bridge the gap between geographical data and Architectural, Engineering, and Construction (AEC) industry applications. The tool aimed to transform spatial data for architects, facilitating contextual analysis in Rhinoceros and Grasshopper, ultimately aiding architects and engineers in enhancing designs based on environmental impact.

The initial tool focused on Netherlands data, but the ultimate goal was to make it applicable to other countries/regions. The research involved evaluating data availability for different regions, acquiring and aligning relevant data for Grasshopper, and implementing these data workflows into wind and solar analyses.

The data evaluation stage revealed challenges due to varying data availability and accessibility across countries. For example, Germany's fragmented data required navigating different portals, while Hong Kong's centralized data via API was more accessible. The lack of standardization hindered automation, necessitating manual data retrieval strategies that could be challenging for non-geomatics experts.

Data alignment methods varied, introducing complexities. For instance, Italy required 3D extrusion from 2D shapefiles, leading to unavoidable errors. Spain used a different method, showcasing the difficulty of a universal solution due to data standardization and interoperability issues.

Two techniques were envisioned for the open-data tool: TIN-based and Voxel-based methods, each with distinct qualities and limitations. The TIN-method offered high-quality analyses but required rigorous data alignment, while the Voxel-based method allowed flexibility but risked issues with resolution.

Limitations of exploratory analysis included a focus on five countries/regions and inherent constraints of Rhinoceros, limiting tool accessibility and requiring alternative approaches. Additionally, language barriers and data platform permeability might have led to overlooked datasets.

In conclusion, the report acknowledges the need for future work. Optimization of code for readability and performance is suggested, and the inclusion of additional data types (vegetation, land use, transport) in data workflows is proposed. Input from AEC professionals through methods like questionnaires or testing is recommended for further improvement. This report emphasizes the evolving nature of the tool and the importance of ongoing refinement to meet the needs of diverse AEC professionals.