Submodularity of Distributed Join Computation.

Authors:
Rundong Li
Rundong Li
Shenyang Aerospace University
China
Xinyan Deng
Xinyan Deng
School of Mechanical Engineering
United States

Proc ACM SIGMOD Int Conf Manag Data 2018 Jun;2018:1237-1252

Northeastern University, Boston, USA.

We study distributed equi-join computation in the presence of join-attribute skew, which causes load imbalance. Skew can be addressed by more fine-grained partitioning, at the cost of input duplication. For random load assignment, e.g., using a hash function, fine-grained partitioning creates a tradeoff between load expectation and variance. We show that minimizing load variance subject to a constraint on expectation is a monotone submodular maximization problem with Knapsack constraints, hence admitting provably near-optimal greedy solutions. In contrast to previous work on formal optimality guarantees, we can prove this result also for self-joins and more general load functions defined as weighted sum of input and output. We further demonstrate through experiments that this theoretical result leads to an effective algorithm for the problem of minimizing running time, even when load is assigned deterministically.

Download full-text PDF

Source
http://dx.doi.org/10.1145/3183713.3183728DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6404978PMC
June 2018

Publication Analysis

Top Keywords

fine-grained partitioning
8
load
6
result self-joins
4
load expectation
4
admitting provably
4
expectation variance
4
prove result
4
load variance
4
minimizing load
4
variance minimizing
4
self-joins general
4
tradeoff load
4
hash function
4
functions defined
4
assignment hash
4
function fine-grained
4
load functions
4
creates tradeoff
4
general load
4
partitioning creates
4

References

(Supplied by CrossRef)

Similar Publications