In Spring 2025 I will be on Sabbatical and hence not supervise theses - sorry!
This page lists my topics offered for CS Bachelor and DE Master Theses.
Below is an overview list, strikethrough means the topic is taken.
All work follows the procedures, so you may want to study these first.
Programming prerequisites should be taken serious - in all cases non-trivial implementation in one of several languages is involved.
Code will regularly add functionality to our rasdaman system and, as such, be used by our project partners and the general scientific and technical community;
hence code quality (including, e.g., concise tests and documentation) is an integral evaluation criterion.
Generally, I appreciate not only the result, but also the way towards it - therefore, showing continuous progress, initiative, and planful work for sure is an asset.
Knowledge characterized as "advantageous" means that it is not mandatory, but not bringing it along will increase workload significantly, and make deadlines tight.
We reserve to not give a topic to a student if there is too much risk that a good result will not be achieved, for the student's sake.
Make use of the official report template.
If your report is of sufficient quality to be submitted successfully to a conference or journal for publication this will be considered a strong plus.
Note that:
only the topics below will be accepted for supervision, due to resource constraints.
thesis topics must be agreed before end of the drop/add period.
A Study of ML Model APIs in the Earth Sciences
topic:
Machine Learning (ML) is of ever-growing importance in industry and research.
This is no different in the Earth Sciences where ML is applied to massive spatio-temporal data, such as satellite image timeseries and weather data.
In the AI-Cube and FAIRiCUBE projects, ML models have experimentally been coupled with spatio-temporal datacubes with the goal of establishing smooth, automated ML model integration into a datacube query language.
Datacubes are a human-centric and more efficient way of structuring and presenting multi-dimensional data and so particularly suitable for space/time Earth data.
With the system used, rasdaman, the integration was possible via User-Defined Functions (UDFs) and an adapter allowing the datacube server to dynamically feed pytorch with the model and the satellite images selected.
However, several problems were encountered which are now being investigated. It starts already with the observation that even in a single tool, pytorch, models expect quite individual data preprocessing, formatting, etc. This is an obstacle for a smooth, uncomplicated use of ML.
Task on hand is:
Select, in agreement with the supervisor, 5 ML models using satellite and wather data from the Huggingface AI repository
build a prototype UDF for each model (based on the existing ML UDF code) and demonstrate invocation via WCPS based on Earth datacubes existing in rasdaman (disregard model output accuracy)
summarize commonalities and differences from the API perspective
team size: 1
prerequisites: python
classification: ML
particularities: none
Penetration Testing
topic:
A wide field of website attack methods is known today, and effectively we live in a continuous global cyber war.
As our research group operates several servers there is a danger of attack, too.
Task on hand is to perform systematic research on attack vectors, perform penetration tests on a small set of websites given by the supervisor, evaluate the outcome, and suggest improvements where insufficiently secured sites have been found.
team size: 1
prerequisites: software engineering
classification: little implementation, more assessment
particularities: none
Human Brain Datacube
topic:
Some of the largest and most widely accessed connectomic datasets is the
human cortex “h01” dataset,
which is a 3D nanometer-resolution image of human brain tissue.
The raw imaging data is 1.4 petabytes (roughly 500,000 * 350,000 * 5,000 pixels large,
and is further associated with additional content such as 3d segmentations and annotations that reside in the same coordinate system, based on a human brain atlas.
The “Neuroglancer precomputed” format is more compact, with less volume.
Google has done an optimized web-based interactive viewing
which can be manipulated from TensorStore.
Task on hand is to repeat the Google demo on rasdaman. This involves:
fetch and understand the brain dataset;
establish a datacube with rasdaman;
create an interactive 3D visualization demo.
team size: 1
prerequisites: (python?) pogramming skills, data wrangling
topic:
While visualization of 2D map data is common sense today, this is not the case with 4D x/y/z/t data such as atmosphere and ocean data.
However, it is desirable to have 4D visualization capabilities in many services.
Fluid Earth is a tool that accomplishes 4D visualization of atmospheric data.
Goal is to combine Fluid Earth with rasdaman datacubes.
Weather timeseries are available in rasdaman datacubes, such as in the Cube4EnvSec Weather service.
Concretely task is to (i) establish a Fluid Earth service and (ii) extend open-source Fluid Earth so as to read not from files, but from the rasdaman database (via standardized APIs which rasdaman supports already).
team size: 1
prerequisites: JavaScript
classification: demo service setup & coding
particularities: interfacing between the components has to be found out
Integrated Vector/Datacube Service
topic:
Geographic analysis rarely applies to a rectangular area, but typically to some polygon-bounded area, such as "bush fire statistics over Greece", "crop yield in Lower Saxony", etc.
This requires algorithmic combination of vector and raster data, such as cadastral information and satellite imagery.
Goal is to implement a demonstration service which allows interactive selection of some place by name
from a vector database (to be established) retrieving a vector polygon which subsequently gets pasted
into a predefined datacube analytics query (a few examples to be done) expressed in the WCPS geo datacube query language,
sent to the datacube server (existing), and with the result (typically an image) displayed on a simple map tool like Leaflet (integration code available).
Technically, work consists of: deploying a PostGIS database (on a cloud VM provided),
feeding it with some sample data to be found (simple table schema, consisting of 2 columns for place name and polygon),
writing the frontend code for interaction + PostGIS access + datacube service access + result display.
team size: 1
prerequisites: Linux, JavaScript
classification: Geo Web service
particularities: -
Vector Files as Datacube Query Parameter
topic:OGC Web Coverage Processing Service (WCPS) is a geo datacube query language
with integrated spatio-temporal semantics based on the notion of a multi-dimensional coverage which may represent a datacube.
Queries can be parametrized, among others with vector polygons allowing to "cut out" abritrary regions.
Currently, these vectors have to be provided in an ASCII representation called Well-Known Text (WKT).
However, the most widely used format in the geo universe is not WKT, but ESRI Shapefiles, a binary format.
Goal is to add support for the Shapefile format for vector upload in the petascope component of rasdaman, next to the existing WKT decoder.
Open-source libraries for decoding exist, for example GeoTools and shapelib; one of those should be used.
Appropriate tests should be established to demonstrate that the Shapefile decoder works properly.
team size: 1
prerequisites: Java, Linux
classification: query language enhancement
particularities: -
Location-Centric Spatial Analytics
topic:Web Coverage Processing Service (WCPS) is an internationally standardized geo datacube query language.
Built-in region extraction is confined to rectangular boxes given by the corner coordinates.
However, in many cases a user would wish to confine selection and analysis to some irregular region, such as delimited by a polygon ring.
In the rasdaman implementation of WCPS, therefore,
a non-standard polygon clipping function has been added
which receives as input a vector polygon in some simple ASCII syntax.
However, this is still not convenient as users typically do not have these polygons available, but would rather like to address by location names, such as "Spain" or "Bremen".
This is currently unavailable.
Goal is to develop a Web frontend which allows users to type in a place name (with autocompletion support), obtains the coordinates, and pastes that into a WCPS query
which the user then can submit and see the (2D map) result on screen; a WCPS server with suitable datacubes is readily available.
To this end, the place names should be stored in a simple PostGIS geocoding database (3 columns: id, name, polygon string)
which is to be built from a sample like this.
A few WCPS template queries should be prepared which do various practically interesting things, such as obtaining a vegetation map or performing some aggregation.