Thesis / Project Topics 2024

This page lists my topics offered for CS Bachelor and DE Master Theses. Below is an overview list, strikethrough means the topic is taken.

All work follows the procedures, so you may want to study these first. Programming prerequisites should be taken serious - in all cases non-trivial implementation in one of several languages is involved. Code will regularly add functionality to our rasdaman system and, as such, be used by our project partners and the general scientific and technical community; hence code quality (including, e.g., concise tests and documentation) is an integral evaluation criterion. Generally, I appreciate not only the result, but also the way towards it - therefore, showing continuous progress, initiative, and planful work for sure is an asset. Knowledge characterized as "advantageous" means that it is not mandatory, but not bringing it along will increase workload significantly, and make deadlines tight. We reserve to not give a topic to a student if there is too much risk that a good result will not be achieved, for the student's sake.

For the topics where you need deeper knowledge of the rasdaman datacube engine, there is some initial information about open-source rasdaman community with documentation as well as an intro to geo datacube standards, in particular to WCPS.

Make use of the official report template. If your report is of sufficient quality to be submitted successfully to a conference or journal for publication this will be considered a strong plus.

Note that:

only the topics below will be accepted for supervision, due to resource constraints.
thesis topics must be agreed before the end of the drop/add period.

Distributing ML Inference on Big Geo Data

topic: tbd
Task on hand is:
- tbd
team size: 1
prerequisites: tbd
classification: ML
particularities: none

Penetration Testing

topic: A wide field of website attack methods is known today, and effectively we live in a continuous global cyber war. As our research group operates several servers there is a danger of attack, too.
Task on hand is to perform systematic research on attack vectors, perform penetration tests on a small set of websites given by the supervisor, evaluate the outcome, and suggest improvements where insufficiently secured sites have been found.
team size: 1
prerequisites: software engineering
classification: little implementation, more assessment
particularities: none

Integration of Array Libraries into an Array DBMS

topic: Array processing is prominent in scientific computing, visualizaiton, computer vision, and many more. Consequently, highly optimized packages have been built to support application development. In a different direction, Array Databases have been coined to support management and analytics on massive n-D arrays. Although both operate on arrays, there is not much connection yet. An exception is the integration of Intel MKL (Math Kernel Library) into the rasdaman Array DBMS; in this approach, a wrapper in the DBMS engine connects to the library code through so-called User-Defined Functionsi (UDFs). To the user, it appears as if new operations are added to the query language. Currently implemented is LAPACK (Linear Algebra Package).
Task on hand is to establish rasdaman UDF packages for further libraries, following the existing role model of the Intel MKL package in rasdaman. Note that it is not about (re)implementing the algorithms, but about glue code allowing the database engine to invoke the MKL functions. Target libraries include completion of the Intel MKL library (missing: BLAS, ScaLAPACK, sparse solvers, fast Fourier transforms, and vector math) and OpenCV with its Operations on Arrays, Image filtering, Histograms, and more. After one is done, the rest is mostly cloning the code. The number of functions to be addressed will be adjusted to the team size.
team size: 1 - 2
prerequisites: C++, Linux, software engineering
classification: C/C++ implementation, server-side testing, demonstration, documentation
particularities: none

Human Brain Datacube

topic: Some of the largest and most widely accessed connectomic datasets is the human cortex “h01” dataset, which is a 3D nanometer-resolution image of human brain tissue. The raw imaging data is 1.4 petabytes (roughly 500,000 * 350,000 * 5,000 pixels large, and is further associated with additional content such as 3d segmentations and annotations that reside in the same coordinate system, based on a human brain atlas. The “Neuroglancer precomputed” format is more compact, with less volume. Google has done an optimized web-based interactive viewing which can be manipulated from TensorStore.
Task on hand is to repeat the Google demo on rasdaman. This involves: fetch and understand the brain dataset; establish a datacube with rasdaman; create an interactive 3D visualization demo.
team size: 1
prerequisites: (python?) pogramming skills, data wrangling
classification: database setup, querying, application programming, visualization
particularities: none

Integrated Vector/Datacube Service

topic: Geographic analysis rarely applies to a rectangular area, but typically to some polygon-bounded area, such as "bush fire statistics over Greece", "crop yield in Lower Saxony", etc. This requires algorithmic combination of vector and raster data, such as cadastral information and satellite imagery.
Goal is to implement a demonstration service which allows interactive selection of some place by name from a vector database (to be established) retrieving a vector polygon which subsequently gets pasted into a predefined datacube analytics query (a few examples to be done) expressed in the WCPS geo datacube query language, sent to the datacube server (existing), and with the result (typically an image) displayed on a simple map tool like Leaflet (integration code available).
Technically, work consists of: deploying a PostGIS database (on a cloud VM provided), feeding it with some sample data to be found (simple table schema, consisting of 2 columns for place name and polygon), writing the frontend code for interaction + PostGIS access + datacube service access + result display.
team size: 1
prerequisites: Linux, JavaScript
classification: Geo Web service
particularities: -

Vector Files as Datacube Query Parameter

topic: OGC Web Coverage Processing Service (WCPS) is a geo datacube query language with integrated spatio-temporal semantics based on the notion of a multi-dimensional coverage which may represent a datacube. Queries can be parametrized, among others with vector polygons allowing to "cut out" abritrary regions. Currently, these vectors have to be provided in an ASCII representation called Well-Known Text (WKT). However, the most widely used format in the geo universe is not WKT, but ESRI Shapefiles, a binary format.
Goal is to add support for the Shapefile format for vector upload in the petascope component of rasdaman, next to the existing WKT decoder. Open-source libraries for decoding exist, for example GeoTools and shapelib; one of those should be used. Appropriate tests should be established to demonstrate that the Shapefile decoder works properly.
team size: 1
prerequisites: Java, Linux
classification: query language enhancement
particularities: -

Location-Centric Spatial Analytics

topic: Web Coverage Processing Service (WCPS) is an internationally standardized geo datacube query language. Built-in region extraction is confined to rectangular boxes given by the corner coordinates. However, in many cases a user would wish to confine selection and analysis to some irregular region, such as delimited by a polygon ring. In the rasdaman implementation of WCPS, therefore, a non-standard polygon clipping function has been added which receives as input a vector polygon in some simple ASCII syntax. However, this is still not convenient as users typically do not have these polygons available, but would rather like to address by location names, such as "Spain" or "Bremen". This is currently unavailable.
Goal is to develop a Web frontend which allows users to type in a place name (with autocompletion support), obtains the coordinates, and pastes that into a WCPS query which the user then can submit and see the (2D map) result on screen; a WCPS server with suitable datacubes is readily available. To this end, the place names should be stored in a simple PostGIS geocoding database (3 columns: id, name, polygon string) which is to be built from a sample like this. A few WCPS template queries should be prepared which do various practically interesting things, such as obtaining a vegetation map or performing some aggregation.
team size: 1
prerequisites: databases, JavaScript
classification: geo Web application
particularities: -