home profile publications research teaching service awards

Thesis / Project Topics 2024

In Spring 2025 I will be on Sabbatical and hence not supervise theses - sorry!

This page lists my topics offered for CS Bachelor and DE Master Theses. Below is an overview list, strikethrough means the topic is taken.

A Study of ML Model APIs in the Earth Sciences | Penetration Testing | Human Brain Datacube | Fluid Earth Demo | Integrated Vector/Datacube Service | Vector files as Datacube Query Parameter | Location-Centric Spatial Analytics

All work follows the procedures, so you may want to study these first. Programming prerequisites should be taken serious - in all cases non-trivial implementation in one of several languages is involved. Code will regularly add functionality to our rasdaman system and, as such, be used by our project partners and the general scientific and technical community; hence code quality (including, e.g., concise tests and documentation) is an integral evaluation criterion. Generally, I appreciate not only the result, but also the way towards it - therefore, showing continuous progress, initiative, and planful work for sure is an asset. Knowledge characterized as "advantageous" means that it is not mandatory, but not bringing it along will increase workload significantly, and make deadlines tight. We reserve to not give a topic to a student if there is too much risk that a good result will not be achieved, for the student's sake.

For the topics where you need deeper knowledge of the rasdaman datacube engine, there is some initial information about open-source rasdaman community with documentation as well as an intro to geo datacube standards, in particular to WCPS.

Make use of the official report template. If your report is of sufficient quality to be submitted successfully to a conference or journal for publication this will be considered a strong plus.

Note that:

  • only the topics below will be accepted for supervision, due to resource constraints.
  • thesis topics must be agreed before end of the drop/add period.

A Study of ML Model APIs in the Earth Sciences

  • topic: Machine Learning (ML) is of ever-growing importance in industry and research. This is no different in the Earth Sciences where ML is applied to massive spatio-temporal data, such as satellite image timeseries and weather data. In the AI-Cube and FAIRiCUBE projects, ML models have experimentally been coupled with spatio-temporal datacubes with the goal of establishing smooth, automated ML model integration into a datacube query language. Datacubes are a human-centric and more efficient way of structuring and presenting multi-dimensional data and so particularly suitable for space/time Earth data. With the system used, rasdaman, the integration was possible via User-Defined Functions (UDFs) and an adapter allowing the datacube server to dynamically feed pytorch with the model and the satellite images selected.
    However, several problems were encountered which are now being investigated. It starts already with the observation that even in a single tool, pytorch, models expect quite individual data preprocessing, formatting, etc. This is an obstacle for a smooth, uncomplicated use of ML.
    Task on hand is:
    • Select, in agreement with the supervisor, 5 ML models using satellite and wather data from the Huggingface AI repository
    • build a prototype UDF for each model (based on the existing ML UDF code) and demonstrate invocation via WCPS based on Earth datacubes existing in rasdaman (disregard model output accuracy)
    • summarize commonalities and differences from the API perspective
  • team size: 1
  • prerequisites: python
  • classification: ML
  • particularities: none

Penetration Testing

  • topic: A wide field of website attack methods is known today, and effectively we live in a continuous global cyber war. As our research group operates several servers there is a danger of attack, too.
    Task on hand is to perform systematic research on attack vectors, perform penetration tests on a small set of websites given by the supervisor, evaluate the outcome, and suggest improvements where insufficiently secured sites have been found.
  • team size: 1
  • prerequisites: software engineering
  • classification: little implementation, more assessment
  • particularities: none

Human Brain Datacube

  • topic: Some of the largest and most widely accessed connectomic datasets is the human cortex “h01” dataset, which is a 3D nanometer-resolution image of human brain tissue. The raw imaging data is 1.4 petabytes (roughly 500,000 * 350,000 * 5,000 pixels large, and is further associated with additional content such as 3d segmentations and annotations that reside in the same coordinate system, based on a human brain atlas. The “Neuroglancer precomputed” format is more compact, with less volume. Google has done an optimized web-based interactive viewing which can be manipulated from TensorStore.
    Task on hand is to repeat the Google demo on rasdaman. This involves: fetch and understand the brain dataset; establish a datacube with rasdaman; create an interactive 3D visualization demo.
  • team size: 1
  • prerequisites: (python?) pogramming skills, data wrangling
  • classification: database setup, querying, application programming, visualization
  • particularities: none

Fluid Earth Demo

  • topic: While visualization of 2D map data is common sense today, this is not the case with 4D x/y/z/t data such as atmosphere and ocean data. However, it is desirable to have 4D visualization capabilities in many services. Fluid Earth is a tool that accomplishes 4D visualization of atmospheric data.
    Goal is to combine Fluid Earth with rasdaman datacubes. Weather timeseries are available in rasdaman datacubes, such as in the Cube4EnvSec Weather service. Concretely task is to (i) establish a Fluid Earth service and (ii) extend open-source Fluid Earth so as to read not from files, but from the rasdaman database (via standardized APIs which rasdaman supports already).
  • team size: 1
  • prerequisites: JavaScript
  • classification: demo service setup & coding
  • particularities: interfacing between the components has to be found out

Integrated Vector/Datacube Service

  • topic: Geographic analysis rarely applies to a rectangular area, but typically to some polygon-bounded area, such as "bush fire statistics over Greece", "crop yield in Lower Saxony", etc. This requires algorithmic combination of vector and raster data, such as cadastral information and satellite imagery.
    Goal is to implement a demonstration service which allows interactive selection of some place by name from a vector database (to be established) retrieving a vector polygon which subsequently gets pasted into a predefined datacube analytics query (a few examples to be done) expressed in the WCPS geo datacube query language, sent to the datacube server (existing), and with the result (typically an image) displayed on a simple map tool like Leaflet (integration code available).
    Technically, work consists of: deploying a PostGIS database (on a cloud VM provided), feeding it with some sample data to be found (simple table schema, consisting of 2 columns for place name and polygon), writing the frontend code for interaction + PostGIS access + datacube service access + result display.
  • team size: 1
  • prerequisites: Linux, JavaScript
  • classification: Geo Web service
  • particularities: -

Vector Files as Datacube Query Parameter

  • topic: OGC Web Coverage Processing Service (WCPS) is a geo datacube query language with integrated spatio-temporal semantics based on the notion of a multi-dimensional coverage which may represent a datacube. Queries can be parametrized, among others with vector polygons allowing to "cut out" abritrary regions. Currently, these vectors have to be provided in an ASCII representation called Well-Known Text (WKT). However, the most widely used format in the geo universe is not WKT, but ESRI Shapefiles, a binary format.
    Goal is to add support for the Shapefile format for vector upload in the petascope component of rasdaman, next to the existing WKT decoder. Open-source libraries for decoding exist, for example GeoTools and shapelib; one of those should be used. Appropriate tests should be established to demonstrate that the Shapefile decoder works properly.
  • team size: 1
  • prerequisites: Java, Linux
  • classification: query language enhancement
  • particularities: -

Location-Centric Spatial Analytics

  • topic: Web Coverage Processing Service (WCPS) is an internationally standardized geo datacube query language. Built-in region extraction is confined to rectangular boxes given by the corner coordinates. However, in many cases a user would wish to confine selection and analysis to some irregular region, such as delimited by a polygon ring. In the rasdaman implementation of WCPS, therefore, a non-standard polygon clipping function has been added which receives as input a vector polygon in some simple ASCII syntax. However, this is still not convenient as users typically do not have these polygons available, but would rather like to address by location names, such as "Spain" or "Bremen". This is currently unavailable.
    Goal is to develop a Web frontend which allows users to type in a place name (with autocompletion support), obtains the coordinates, and pastes that into a WCPS query which the user then can submit and see the (2D map) result on screen; a WCPS server with suitable datacubes is readily available. To this end, the place names should be stored in a simple PostGIS geocoding database (3 columns: id, name, polygon string) which is to be built from a sample like this. A few WCPS template queries should be prepared which do various practically interesting things, such as obtaining a vegetation map or performing some aggregation.
  • team size: 1
  • prerequisites: databases, JavaScript
  • classification: geo Web application
  • particularities: -
Copyright © 2004+ Peter Baumann -- -- tel. +49-173-583 7882 -- Disclaimer