Optimal distributed execution of join queries (original) (raw)

It is proposed that the execution of a chain query in a distributed system can be usefully and appropriately modeled as an integer linear program. In response to a user request, information in the form of relational tables scattered across the network is to be combined and made available to the user. The formulation initially attained by considering the behavior of the distributed system in processing such a query is then reduced by removing redundant linear constraints, to produce a model of minimal transmission cost execution. In view of varying properties displayed by the possibly many optima of this problem, further attention is devoted to discriminating between them. By perturbing the objective function, those solutions requiring fewer network transmissions can be favored at the expense of equal-cost, but more complicated, strategies. This includes those strategies that may specify the transmission of a relation around a cycle; when the costs of transmission between sites forming the cycle are zero, such a solution might otherwise be optimal. Many different ways have been devised to solve programs having some number of variables restricted to taking only integer values in some interval, and virtually any of these might be used to solve the join query model. One possible method, using a tree-search approach, is discussed here.