Xinyan Deng School of Mechanical Engineering United States
Proc ACM SIGMOD Int Conf Manag Data 2018 Jun;2018:1237-1252
Northeastern University, Boston, USA.
We study distributed equi-join computation in the presence of join-attribute skew, which causes load imbalance. Skew can be addressed by more fine-grained partitioning, at the cost of input duplication. For random load assignment, e.g., using a hash function, fine-grained partitioning creates a tradeoff between load expectation and variance. We show that minimizing load variance subject to a constraint on expectation is a monotone submodular maximization problem with Knapsack constraints, hence admitting provably near-optimal greedy solutions. In contrast to previous work on formal optimality guarantees, we can prove this result also for self-joins and more general load functions defined as weighted sum of input and output. We further demonstrate through experiments that this theoretical result leads to an effective algorithm for the problem of minimizing running time, even when load is assigned deterministically.
We have submitted your request - we will update you on status within the next 24 hours.
Sign up for further access to Scientific Publications and Authors!
What are PubFacts Points?
PubFacts points are rewards to PubFacts members, which allow you to better promote your profile and articles throughout PubFacts.com
How do I earn PubFacts Points?
Each member is given 50 PubFacts points upon signing up. You can earn additional points by completing 100% of your profile, creating and participating in discussions, and sharing other members research.
What can I do with PubFacts Points?
Currently, you can use PubFacts Points to promote and increase readership of your articles.