Partial match queries in relaxed multidimensional search trees

\begin{abstract} Partial match queries arise frequently in the context of large databases, where each record contains a distinct multidimensional key, that is, each key is a $K$-tuple whose components are traditionally called coordinates or attributes. In a partial match query we specify the value of $s$ attributes, $0 < s < K$, and leave the remaining $K-s$ attributes unspecified. The goal is to retrieve all the records in the database that match the specified attributes. In this paper we present several results about the average performance and variance of partial matches in relaxed $K$-dimensional trees (search trees and digital tries). These data structures are variants of the well known $K$d-trees and $K$d-tries where the sequence of attributes used to guide a query is explicitly stored at the nodes of the tree and randomly generated and, in general, will be different for different search paths. In the standard variants, the sequence is fixed and identical for all search paths: $1, 2, 3, \ldots, K - 1, K, 1, 2, 3, \ldots$. We show that the probabilistic analysis of the relaxed multidimensional trees is very similar to that of standard $K$d-trees and $K$d-tries, and also to the analysis of quadtrees. In fact, besides the average cost and variance of partial match in relaxed $K$d-trees and $K$d-tries, we also obtain the variance of partial matches in 2-dimensional quadtrees. We also compute the average cost of partial matches in other relaxed multidimensional digital tries, namely, relaxed $K$d-Patricia and relaxed $K$d-digital search trees. \end{abstract}
This paper is available in the PostScript format.
(Back to List of Papers)