Over my time as a PhD student in TĂĽbingen I've been steadily working on compilation methods that target database
systems as runtimes. I may not get around to updating this list as soon as new work is published, so if you are
interested you can look at our chairs webpage (db@tue
) or my orcid (0000-0002-6625-9627
).
To Iterate Is Human, to Recurse Is Divine — Mapping Iterative Python
to Recursive SQL
BTW2023 -
Datenbanksysteme fĂĽr Business, Technologie und Web
Student Paper (Best Student Contribution Award) •
doi:10.18420/BTW2023-73
•
db@tue
Abstract: Writing complex algorithms and iterative computations in SQL is difficult at best, commonly
leading to code that intermingles looping control flow with database access. This yields programs with control
flow that rapidly hops in and out of the database, with each roundtrip incurring significant overhead. We
present the ByePy compiler, which can compile entire Python functions directly to plain recursive SQL:1999
queries. By doing so, the compilation eliminates all but a single roundtrip, leading to runtime speedups of up
to an order of magnitude.
Snakes on a Plan: Compiling Python Functions into Plain SQL Queries
SIGMOD/PODS '22:
International Conference on Mangement of Data
Demo Paper •
doi:10.1145/3514221.3520175
•
db@tue
Abstract: "Move your computation close to the data" is decades-old advice that is hard to follow if
your code exhibits complex control flow. The runtime of such applications suffers from a continual back and
forth between database-external code execution and plan-based SQL evaluation. We demonstrate the ByePy
compiler which translates entire Python functions with arbitrary control flow-including deeply nested
iteration-into plain recursive SQL:1999 queries. The invocation of a ByePy-compiled function enters the
database engine once to execute the plan of a single query. Computation does not get much closer to the data
than this. The system rewards this translation effort from Python to SQL with runtime improvements of up to an
order of magnitude.
ByePy: Compilation of Python to SQL
M.Sc. Thesis •
db@tue
Abstract: Databases have become ubiquitous in the world of application development and
have found their way into the toolbelt of many python developers. Sadly, many
of these developers still struggle with the database world’s de facto lingua franca
SQL. This is no fault of their own; SQL differs drastically from imperative Python
programming in its declarative nature. The struggle with SQL has led to many writing
code interweaving complex control flow and database access. Such code exhibits
poor performance compared to even the most complicated queries. The constant
context switches between the Python interpreter and the underlying database add
up quickly, resulting in a significant amount of time being wasted on overhead that
does not further the computation itself. To address this issue, we present ByePy —
a Python to SQL compiler, which can compile entire Python functions with arbitrary
control flow into plain recursive SQL:1999 queries. Compilation with ByePy is rewarded
by runtime speedups of up to an order of magnitude.
Utilizing parallelism in the two-phase approach of translated fsUDFs
B.Sc. Thesis •
db@tue
Abstract: When analyzing recursive functions it is common to look at the function calls by building a
call graph for an example input. Doing so a lot of the time we face the fact that we should be able to
parallelize the parts of said call graph, that are mostly independent of each other. But some languages and
systems make this process harder for us than it should be. One such system is the DBMS PostgreSQL, even though
it allows for recursive UDFs they are poorly optimized and only viable for inputs yielding minimal amounts of
recursive steps. To parallelize this directly is a chore in and of itself, but we can make use of a method
introduced by Christian Duta and Torsten Grust in their 2020 paper “Functional-Style SQL UDFs With a Capital
'F'". They propose a method to translate a given 'functional-style' UDF into a query that frst builds the call
graph and then walks said call graph to calculate the result. Injecting parallelization into this two phase
approach is what this work focuses on. We propose a simple method of extending the two phases such that the
building of the call graph also partitions said call graph into disjointed, or at least mostly disjointed,
sub-problems and that the evaluation uses said information to actually run these sub-problems in parallel. We
found during our testing, that there can be major benefts for certain problems, but also that not all
recursive problems lend themselves to being parallelized. We found that in general the more tree-like the call
graph is the more benefts can be gained from parallelization.
Most of my projects can be found either on github or on codeberg . The following are some of the ones I see
as
highlights:
befunge-sql
A graded project from university, the gist of it was to flex SQL's muscles a bit and implement some complex
computational problem entirely inside of a single query. To that end, I opted to implement an interpreter for
the esoteric programm language Befunge.
dfb_predict
A graded group project from university, the gist of it was that we needed to develop an application to predict
game in the DFB 1. Liga, the German football league. It encompassed everything from data collection and
processing, match prediction using stochastic methods, and UI development for an easy to use interface.
pycells
An implementation of simple cellular automata in python, but with a few twists. Said twists pertain to the
some
implementation details as—to make it more challenging—I decided to setup some limitations before actually
starting out. The four core limitations I set myself were the following:
- The simulation functions should be able to handle n-dimensional rules and states.
- The rules should be given as integers. (Though this limits the rules I need to be able to simulate.)
- The state should be represented as an integer, making use of the fact that the whole state consists of a
n-dimensional array of booleans that can be flattened.
- The simulations functions should make use of the integer-nature of the state representation and should
minimize the use of iteration.
bf.py
A fully featured interpreter for Brainfuck in a
single Python expression. I decided that just golfing it wasn't enough and as an extra hurdle I should
only use a single expression, i.e. a "true oneliner".