We consider the problem of 2-way interval join, where we want to find all pairs of overlapping intervals, i.e., intervals that share at least one point in common. We present lower and upper bounds on the replication rate for this problem when it is implemented in MapReduce. We study three cases, where intervals in the input are: (i) unit-length and equally-spaced, (ii) variable-length and equally-spaced, and (iii) equally-spaced with specific distribution of the various lengths. Our algorithms offer intuition as how to build algorithms for other cases, especially when we have some statistical knowledge about the distribution of the lengths of the intervals. E.g., if mostly large intervals interact with small intervals and not within themselves, then we believe our techniques can be extended to achieve better replication rate.
Bibtex: Afrati et al. (2015)
Foto N Afrati, Shlomi Dolev, Shantanu Sharma, and Jeffrey D Ullman. Bounds for overlapping interval join on mapreduce. In EDBT/ICDT Workshops, 3–6. 2015. ↩