Research
"The most exciting phrase to hear in science, the one that heralds new discoveries, is not 'Eureka!' (I've found it!), but 'That's funny'."
-- Isaac Asimov
|
[ my research group and its projects
| standards authored and scientific publications and conference organization and committee work
| PhD theses supervised
]
My research interests focus on scientific data management, in partiuclar: on large multi-dimensional array databases.
In the Large-Scale Information Services group we have to main outcomes so far:
- The rasdaman array DBMS, which allows to query multidimensional arrays of unlimited size stored in relational databases, currently defines the global state-of-the-art in scalable raster databases.
- The Open GeoSpatial Consortium (OGC) Web Coverage Processing Service (WCPS) standard which defines a request language for navigation, extraction, aggregation, and analysis of large, multi-dimensional sensor, image, and statistics data (and further standards, see standards).
Rasdaman is the open-source reference implementation of WCPS and - under work - for WCS 2.0.
The idea of array (or "raster") databases is to provide database support for objects consisting of a set of homogeneous items ("pixels", "voxels"), each of which is associated a coordinate in n-D Euclidean space. Essentially, this category is made up from sensor, image, and statistics data; among the prime application areas are online satellite image services, Grid data services, and life science image mining. The core characteristics of array objects are:
dimensional data with a clearly defined spatial neighbourhood,
discretised (which imposes algorithmic problems, e.g., for scaling), and
high-volume objects, frequently in large numbers (one seamless map object usually is multi-Terabyte, and NASA satellite archives occupy several Petabyte of imagery).
Database support for such multidimensional discrete data (MDD) differs from multimedia databases in that MDD operations do not perform image understanding with subsequent queries on the derived image contents, but operate on the conceptual level of "pixel" arrays on the original data themselves. MDD techniques differ from image processing in that operations are not constrained to main memory dimensions, rather queries usually operate on multi-Terabyte, in future: multi-Petabyte objects.
MDD support has been largely neglected by database research, although it has gained significant practical impact recently. Meanwhile we have understood core operations to a degree that, e.g., queries in our rasdaman system usually are not i/o bound any more. Technological advance, however, has given rise to manifold new challenges on conceptual, architectural, and application level. The following list indicates my primary research interests, but certainly is non-exhaustive:
- Conceptual work: formal expressiveness of array algebrae; extending optimization from currently heuristic to cost-based models; unified modeling of image/signal processing and statistics operations to derive generic operation sets and database optimization techniques for them; generalized dimension hierarchies with preaggregation with the goal of achieving a unifying paradigm for supporting fast statistical queries on spatio-temporal (e.g., geo) and non-spatial (e.g., OLAP) data.
- Database architecture: Design and evaluation of further algebraic optimization operators for complex, practically relevant classes of array functions (e.g., filter kernels), taking into account relevant work in supercomputing (such as halo computing); parallel query evaluation; load balancing in multi-CPU / cluster environments (inter query parallelism); storage hierarchies; implementation of array operators as object-relational extensions; thorough analysis of the influence of RDBMS clustering (and other physical optimization) mechanisms on array query performance.
- Application studies: Distance learning / eLearning; GIS / remote sensing; geo physics / exploration / earth system research (i.e., atmosphere, hydrosphere, lithosphere); computational fluid dynamics (industrial, climate modeling, LES, ...); (Human) brain imaging; Genetics.
All practical work is based on rasdaman serving as research vehicle; new findings are incorporated into these systems, thereby gaining a continuously growing research platform. Additionally, if the contributed code is of succifient quality, it will be incorporated in the open-source code base to make it available to the worldwide community.
PhD Theses Supervised
- Jacobs University Bremen
- Dimitar Misev: On the Integration of Array and Relational Models in Databases
- Alireza Rezaei Mahdiraji: A Query Language for Scientific Meshes
- Michael Owonibi: Dynamic Resource-Aware Decomposition of Geoprocessing Services Based on Declarative Request Languages (Helmholtz ESSReS Research School stipendiate)
- Jinsongdi Yu: Towards a Specification-based Quality Guarantee for Geo Raster Web Services (Chinese stipendiate)
- Angelica Garcia Gutierrez: Using OLAP Preaggregation Methods to Speed Up Raster Queries (DAAD stipendiate)
- Technische Universität München
- Karl Hahn: Parallele Anfrageverarbeitung in multidimensionalen Array-Datenbanksystemen
- Bernd Reiner: HEAVEN -- Eine hierarchische Speicher- und Archivierungsumgebung für multidimensionale Array Datenbankmanagement Systeme
- Andreas Dehmel: A Compression Engine for Multidimensional Array Database Systems
- Norbert Widmann: Efficient Operation Execution on Multidimensional Array Data
- Paula Furtado: Storage Management of Multidimensional Arrays in Database Management Systems
- Roland Ritsch: Optimization and Evaluation of Array Queries in Database Management Systems
Standards
Since around 2000, I am engaged in various standardization bodies, as well as further international bodies.
In my capacity of chair/co-chair of the Open Geospatial Consortium Web Coverage Service Standards Working Group, the Web Coverage Processing Service group, and the Coverages Domain Working Group.
I am also editor of the following international standards; OGC specifications are publicly available once adopted:
- ISO 9075 SQL Part 15: MDA (Multi-Dimensional Arrays) (adopted standard)
- ISO 19123-1 Schema for Coverage Geometry and Functions - Fundamentals (adopted standard)
- ISO 19123-2 Coverage Implementation Schema (adopted standard)
- ISO 19123-3 Schema for Coverage Geometry and Functions - Processing Fundamentals (adopted standard)
- OGC 08-068: Web Coverage Processing Service (WCPS) Language (adopted standard)
- OGC 09-146: GML 3.2.1 Application Schema - Coverages (adopted standard)
- OGC 09-146r6: Coverage Implementation Schema (adopted standard)
- OGC 09-110: WCS Interface Standard - Core (adopted standard)
- OGC 13-057: WCS Interface Standard - Transaction Extension (adopted standard)
- OGC 12-040: WCS Interface Standard - Range Subsetting Extension (adopted standard)
- OGC 08-059: WCS Interface Standard - Processing Extension (adopted standard)
- OGC 12-039: WCS Interface Standard - Scaling Extension (adopted standard)
- OGC 11-053: WCS Interface Standard - CRS Extension (adopted standard)
- OGC 12-049: WCS Interface Standard - Interpolation Extension (adopted standard)
- OGC 09-149: WCS Interface Standard - XML/SOAP protocol extension (adopted standard)
- OGC 09-148: WCS Interface Standard - XML/POST protocol extension (adopted standard)
- OGC 09-147: WCS Interface Standard - KVP protocol extension (adopted standard)
- OGC 10-140: WCS Application Profile - Earth Observation (adopted standard)
- OGC 14-052: WCS Application Profile - MetOcean (candidate standard)
- OGC 12-174: WCS Interface Standard - REST protocol extension (candidate standard)
Earlier work on the ISO Computer Graphics Reference Model:
- Peter Baumann: A Description Technique for the Computer Graphics Reference Model. ISO/IEC SC24/WG1 RM/4, July 1990
|