Implementing recursive algorithms on computing clusters presents a number of new challenges. In particular, we consider the endgame problem: later rounds of a recursion often transfer only small amounts of data, causing high overhead for interprocessor communication. One way to deal with the endgame problem is to use an algorithm that reduces the number of rounds of the recursion. Especially, in an application like transitive closure (“TC”) there are several recursive-doubling algorithms that use a logarithmic, rather than linear, number of rounds. Unfortunately, recursive doubling algorithms can deduce many more facts than the linear TC algorithms, which could negate the cost savings from the elimination of the overhead due to the proliferation of small files. We are thus led to consider TC algorithms that, like the linear algorithms, have the unique decomposition property that assures paths are discovered only once. We find that many such algorithms exist, and we show that they are incomparable, in that any of them could prove best on some data — even lower in cost than the linear algorithms in some cases. The recursive-doubling approach to TC extends to other recursions as well. However, it is not acceptable to reduce the number of rounds at the expense of a major increase in the number of facts that are deduced. In this paper, we prove it is possible to implement any Datalog program of right-linear chain rules with a logarithmic number of rounds and no order-of-magnitude increase in the number of facts deduced. On the other hand, there are linear recursions for which the two goals of reducing the number of rounds and maintaining the total number of deduced facts cannot be met simultaneously. We show that the reachability problem cannot be solved in logarithmic rounds without using a binary predicate, thus squaring the number of potential facts to be deduced. We also show that the same generation recursion cannot be solved in logarithmic rounds without using a predicate of arity three.
Bibtex: Afrati and Ullman (2012)
Foto N Afrati and Jeffrey D Ullman. Transitive closure and recursive datalog implemented on clusters. In Proceedings of the 15th International Conference on Extending Database Technology, 132–143. ACM, 2012. ↩