The  $$ R^{3} $$ Metric: Measuring Performance of Link Prioritization during Traversal-based Query Processing

Ruben Eschauzier; Ruben Taelman; Ruben Verborgh

Introduction

Currently, web user data is stored in centralized data silos, which restricts innovation and poses privacy concerns [1]. Decentralized efforts like Solid distribute user data across multiple personal data stores, which are highly personal and permissioned, requiring a querying approach that effectively handles decentralized and permissioned data.

Link Traversal-based Query Processing (LTQP) is an integrated querying approach designed to meet these requirements [2]. Starting from seed data sources, the LTQP query engine dynamically discovers new data sources by following hyperlinks found in previously discovered sources.

However, LTQP is currently slower compared to centralized alternatives. An optimization avenue is link prioritization, where the engine dynamically determines the order in which links should be dereferenced based on their expected relevance to the query [3, 4].

To optimize LTQP prioritization strategies, researchers need insights into the marginal performance of proposed algorithms and the theoretically optimal performance these strategies can achieve.

Previous studies have used metrics like result arrival times, total execution time, and other arrival time-based metrics [5] to assess prioritization strategies [3, 4, 6]. These measures fail to capture how close implemented prioritization approaches are to the theoretically optimal strategy. Additionally, these metrics fall short when comparing different engine implementations and environments due to unrelated differences that may skew link prioritization performance results:

Programming Languages: Different languages can substantially affect engine performance [7], hindering purely time-based comparisons of algorithms.
Dereferencing Overhead: Network communication times vary widely [8, 9], introducing significant noise into execution times, even for comparisons of the same engine.
Orthogonal Optimization Strategies: Different query optimization [10] strategies might create differences in performance independent of prioritization strategies.

This leads to our research questions: How can we compare the marginal algorithmic performance of prioritization algorithms in an implementation-agnostic manner and How can we define an Oracle baseline that represents optimal prioritization performance?

In this paper, we introduce the metric: Relevant Retrieval Ratio (R^3), pronounced r-cubed. This metric allows for straightforward observation of the algorithmic performance of prioritization approaches, independent of implementations, and provides a benchmark against the theoretically optimal prioritization strategy.

Existing Metrics

Existing LTQP and federated querying metrics [11] are insufficient for measuring the marginal algorithmic performance of link prioritization algorithms:

Query execution time: Prioritization algorithms are unlikely to improve total query execution time [4, 3]. However, Link prioritization can improve [3, 4] arrival times of the first $k$ results. However, the arrival times are influenced by both prioritization strategies and other unrelated engine optimizations obscuring the prioritization algorithm’s effect.
Diefficiency Metrics: Diefficiency metrics measure the continuous efficiency of query engines by reflecting the density of produced results over a given time interval [5]. This is in contrast to measuring the first $k$ arrival times, which only measures engine performance at discrete points. However, the same limitations remain, as these metrics are based on result arrival times.
Number of HTTP Requests: For federated query engines, the number of HTTP requests represents the efficiency of source selection and join planning during query execution [12, 13]. Federated engines often query SPARQL endpoints, allowing them to delegate parts of the query execution to the server [14]. In contrast, in LTQP, the engine uses HTTP requests to download documents, so the number of requests corresponds to the number of dereferenced documents. Thus, the number of HTTP requests measures the engine’s ability to prune irrelevant links. However, it does not indicate the effectiveness of prioritization algorithms, which reorder link dereference order, rather than pruning links.

The metric proposed in this paper complements existing metrics by measuring the marginal performance of link prioritization algorithms rather than the overall performance of the LTQP system.

Relevant Retrieval Ratio

The subweb [15] traversed during LTQP can be represented as a directed graph, where an edge from document A to document B indicates that a URI pointing to document B was first discovered in the data obtained from document A. Fig. 1 shows an example of such a directed graph.

During LTQP query execution, an optimal link dereference order that finds all documents needed to answer a query exists, as shown in the right-most directed graph in Fig. 1. This theoretical optimum can be considered an oracle link prioritization algorithm with perfect information on the location of query-relevant documents. The optimal path can always be achieved, as the oracle only uses links between documents that an engine could also use. An engine should strive to achieve a traversal path close to this optimal path. As such, we use the ratio between the optimal traversal path length and the actual traversal path length during query execution to evaluate the performance of link prioritization algorithms. The optimal traversal path is defined as the shortest path needed to dereference all documents required for the complete query result and is determined retrospectively.

Given the length of the optimal traversal path and the actual traversal path, we define the $R^3$ metric as

\begin{aligned} R^{3} = \dfrac{|T_{O}|}{|T_{E}|}, \quad \quad |T_{O}|, \: |T_{E}| > 0 \end{aligned}

where $|T_{E}|$ is the length of the engine traversal path and $|T_{O}|$ is the length of the optimal path. In this definition, a higher value is better, with $R^{3} = 1$ indicating optimal algorithmic link prioritization performance. Note that when $|T_{O}| = 0$ or $|T_{E}| = 0$ , our query has no results and the notion of optimal link prioritization does not exist. When an engine performs poorly, resulting in smaller differences in the metric’s values, taking the log2 can help visualize relative differences.

Using the $R^{3}$ metric, we can compare the algorithmic efficiency of the two traversal algorithms in our example. Using breadth-first search, the path to dereference the query-relevant document contains six nodes, while the optimal traversal order contains two. Thus, the $R^3$ metric for breadth-first traversal equals $0.33$ . Depth-first visits only three nodes, yielding an $R^3$ value of $0.67$ . From this analysis, we can conclude that depth-first traversal outperforms breadth-first traversal on this subweb.

To find the optimal traversal path to dereference all query-relevant documents, we notice that it is equivalent to solving the directed Steiner tree problem [16] on our directed graph, with the query-relevant documents serving as terminals. As such, we can use existing Steiner tree solvers to efficiently find our optimal traversal path.

Fig. 1: An example subweb [15] showing the differences in traversal paths between breadth-first and depth-first methods, as well as the optimal traversal path.

Results

Our experiment uses the Discover queries from the SolidBench benchmark [3]. To track the required information for computing the metric during query execution, we use the query engine Comunica [17]. To compute the optimal path, we will use a heuristic Steiner tree solver [18] to speed up computations. Although an exact solver would be ideal, it significantly increases computational complexity for large subwebs [19, 20].

Our experiments compare depth and breadth-first prioritization [4] using $R^{3}$ and time until the final query result. We aim to validate the substantiated assumption that Discover queries do not benefit from improved link prioritization [21].

Table 1 shows the computed metrics per template instantiation and the difference in the log2 of the metric value. We find that either method can outperform the other based on the used template. Furthermore, there is a significant difference in performance depending on the template instantiation, likely due to the different fragmentation strategies used in SolidBench [3]. Some pods store query-relevant data in a single file, while others fragment these into directories, complicating optimal traversal.

The differences in prioritization performance are not observed in time until the last result; the Pearson correlation coefficients between $R^{3}$ and time until the last result are -0.042 and 0.201 for depth and breadth-first respectively. These low correlations suggest a weak link between $R^{3}$ and query execution time, possibly caused by noise in execution time.

Using the $R^{3}$ , we confirm our previous paper [21] by computing a metric value instead of an engine-specific in-depth analysis of the query execution progress.

Query	DF	BF	Δ log₂ (R³)	Query	DF	BF	Δ log₂ (R³)
D1.0	0.23	0.08	1.617	D5.0	0.13	0.15	-0.134
D1.1	0.50	0.50	0.000	D5.1	0.50	0.50	0.000
D1.2	0.50	0.50	0.000	D5.2	0.33	0.67	-1.000
D1.3	0.75	0.75	0.000	D5.3	0.04	0.03	0.250
D1.4	0.18	0.18	0.000	D5.4	0.18	0.16	0.196
D2.0	0.50	0.51	-0.031	D6.0	0.53	0.53	0.000
D2.1	0.50	0.67	-0.415	D6.1	0.40	0.40	0.000
D2.2	0.50	0.67	-0.415	D6.2	0.40	0.40	0.000
D2.3	0.90	0.90	0.000	D6.3	0.13	0.13	0.000
D2.4	0.44	0.56	-0.349	D6.4	0.73	0.46	0.690
D3.0	0.50	0.48	0.074	D7.0	0.05	0.05	0.000
D3.1	0.91	0.95	-0.058	D7.1	0.40	0.40	0.000
D3.2	0.93	0.93	-0.000	D7.2	0.40	0.40	0.000
D3.3	0.62	0.62	0.000	D7.3	0.15	0.15	0.000
D3.4	0.61	0.61	0.000	D7.4	0.06	0.03	0.955
D4.0	0.09	0.06	0.520	D8.0	0.45	0.64	-0.500
D4.1	0.44	0.44	0.000	D8.1	-	-	-
D4.2	0.50	0.44	0.170	D8.2	0.47	1.00	-1.100
D4.3	0.03	0.03	0.000	D8.3	-	-	-
D4.4	0.11	0.11	0.000	D8.4	0.36	0.43	-0.263

Table 1: The $R^{3}$ for depth and breadth-first prioritization (DF, BF) separated by query template and instantiation. D1.2 template one, instantiation two.

Conclusion

The current definition and implementation of the Relevant Retrieval Ratio ( $R^{3}$ ) has several limitations. First, due to the computational complexity of the Steiner tree problem for graphs and the lack of readily available exact solvers that work on directed graphs, our implementations use heuristics, thus leading to potentially suboptimal traversal path lengths. Second, the metric can not be computed when a query produces no results, either due to timeout or no results existing for the query. Finally, our metric uses theoretically optimal paths. In practice, document dereference times can vary, making the theoretically optimal path potentially suboptimal. In future work, more extensive benchmarking of the metric is required to validate its effectiveness in measuring prioritization performance. Furthermore, a new metric that includes a penalty term for HTTP request time would account for real-world uncertainties in LTQP scenarios. Finally, an $R^3$ metric for the first $k$ results can be defined.

Acknowledgements

This research was supported by SolidLab Vlaanderen (Flemish Government, EWI and RRF project VV023/10). Ruben Taelman is a postdoctoral researcher at the Research Foundation – Flanders (FWO).