Compliant Geo-distributed Query Processing

SIGMOD 2021
Publication Date: 1.12.2020

Abstract

In this paper, we address the problem of compliant geo-distributed query processing. In particular, we focus on dataflow policies that impose restrictions on movement of data across geographical or institutional borders. Traditional ways to distributed query processing do not consider such restrictions and therefore in geo-distributed environments may lead to non-compliant query execution plans. For example, an execution plan for a query over data sources from Europe, North America, and Asia, which may otherwise be optimal, may not comply with dataflow policies as a result of shipping some restricted (intermediate) data. We pose this problem of compliance in the setting of geo-distributed query processing. We propose a compliance-based query optimizer that takes into account dataflow policies, which are declaratively specified using our policy expressions, to generate compliant geo-distributed execution plans. Our experimental study using a geo-distributed adaptation of the TPCH benchmark data indicates that our optimization techniques are effective in generating efficient compliant plans and incur low overhead on top of traditional query optimizers.