The R3R^{3} Metric: Measuring Performance of Link Prioritization during Traversal-based Query Processing

Abstract

The decentralization envisioned for the current centralized web requires querying approaches capable of accessing multiple small data sources while complying with legal constraints related to personal data, such as licenses and the GDPR. Link Traversal-based Query Processing (LTQP) is a querying approach designed for highly decentralized environments that satisfies these legal requirements. An important optimization avenue in LTQP is the order in which links are dereferenced, which involves prioritizing links to query-relevant documents. However, assessing and comparing the algorithmic performance of these systems is challenging due to various compounding factors during query execution. Therefore, researchers need an implementation-agnostic and deterministic metric that accurately measures the marginal effectiveness of link prioritization algorithms in LTQP engines. In this paper, we motivate the need for accurately measuring link prioritization performance, define and test such a metric, and outline the challenges and potential extensions of the proposed metric. Our findings show that the proposed metric highlights differences in link prioritization performance depending on the queried data fragmentation strategy. The proposed metric allows for evaluating link prioritization performance and enables easily assessing the effectiveness of future link prioritization algorithms.

  AMW 2024: 16th Alberto Mendelzon International Workshop on Foundations of Data Management, September 30th–October 4th, 2024, Mexico City, Mexico
ruben.eschauzier@ugent.be (R. Eschauzier); ruben.taelman@ugent.be (R. Taelman); ruben.verborgh@ugent.be (R. Verborgh)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR Workshop Proceedings (CEUR-WS.org)

Introduction

Currently, web user data is stored in centralized data silos, which restricts innovation and poses privacy concerns [1]. Decentralized efforts like Solid distribute user data across multiple personal data stores, which are highly personal and permissioned, requiring a querying approach that effectively handles decentralized and permissioned data.

Link Traversal-based Query Processing (LTQP) is an integrated querying approach designed to meet these requirements [2]. Starting from seed data sources, the LTQP query engine dynamically discovers new data sources by following hyperlinks found in previously discovered sources.

However, LTQP is currently slower compared to centralized alternatives. An optimization avenue is link prioritization, where the engine dynamically determines the order in which links should be dereferenced based on their expected relevance to the query [3, 4].

To optimize LTQP prioritization strategies, researchers need insights into the marginal performance of proposed algorithms and the theoretically optimal performance these strategies can achieve.

Previous studies have used metrics like result arrival times, total execution time, and other arrival time-based metrics [5] to assess prioritization strategies [3, 4, 6]. These measures fail to capture how close implemented prioritization approaches are to the theoretically optimal strategy. Additionally, these metrics fall short when comparing different engine implementations and environments due to unrelated differences that may skew link prioritization performance results:

Programming Languages
Different languages can substantially affect engine performance [7], hindering purely time-based comparisons of algorithms.
Dereferencing Overhead
Network communication times vary widely [8, 9], introducing significant noise into execution times, even for comparisons of the same engine.
Orthogonal Optimization Strategies
Different query optimization [10] strategies might create differences in performance independent of prioritization strategies.

This leads to our research questions: How can we compare the marginal algorithmic performance of prioritization algorithms in an implementation-agnostic manner and How can we define an Oracle baseline that represents optimal prioritization performance?

In this paper, we introduce the metric: Relevant Retrieval Ratio (R^3), pronounced r-cubed. This metric allows for straightforward observation of the algorithmic performance of prioritization approaches, independent of implementations, and provides a benchmark against the theoretically optimal prioritization strategy.

Existing Metrics

Existing LTQP and federated querying metrics [11] are insufficient for measuring the marginal algorithmic performance of link prioritization algorithms:

Query execution time
Prioritization algorithms are unlikely to improve total query execution time [4, 3]. However, Link prioritization can improve [3, 4] arrival times of the first kk results. However, the arrival times are influenced by both prioritization strategies and other unrelated engine optimizations obscuring the prioritization algorithm’s effect.
Diefficiency Metrics
Diefficiency metrics measure the continuous efficiency of query engines by reflecting the density of produced results over a given time interval [5]. This is in contrast to measuring the first kk arrival times, which only measures engine performance at discrete points. However, the same limitations remain, as these metrics are based on result arrival times.
Number of HTTP Requests

For federated query engines, the number of HTTP requests represents the efficiency of source selection and join planning during query execution [12, 13]. Federated engines often query SPARQL endpoints, allowing them to delegate parts of the query execution to the server [14]. In contrast, in LTQP, the engine uses HTTP requests to download documents, so the number of requests corresponds to the number of dereferenced documents. Thus, the number of HTTP requests measures the engine’s ability to prune irrelevant links. However, it does not indicate the effectiveness of prioritization algorithms, which reorder link dereference order, rather than pruning links.

The metric proposed in this paper complements existing metrics by measuring the marginal performance of link prioritization algorithms rather than the overall performance of the LTQP system.

Relevant Retrieval Ratio

The subweb [15] traversed during LTQP can be represented as a directed graph, where an edge from document A to document B indicates that a URI pointing to document B was first discovered in the data obtained from document A. Fig. 1 shows an example of such a directed graph.

During LTQP query execution, an optimal link dereference order that finds all documents needed to answer a query exists, as shown in the right-most directed graph in Fig. 1. This theoretical optimum can be considered an oracle link prioritization algorithm with perfect information on the location of query-relevant documents. The optimal path can always be achieved, as the oracle only uses links between documents that an engine could also use. An engine should strive to achieve a traversal path close to this optimal path. As such, we use the ratio between the optimal traversal path length and the actual traversal path length during query execution to evaluate the performance of link prioritization algorithms. The optimal traversal path is defined as the shortest path needed to dereference all documents required for the complete query result and is determined retrospectively.

Given the length of the optimal traversal path and the actual traversal path, we define the R3R^3 metric as

R3=TOTE,TO,TE>0\begin{aligned} R^{3} = \dfrac{|T_{O}|}{|T_{E}|}, \quad \quad |T_{O}|, \: |T_{E}| > 0 \end{aligned}

where TE|T_{E}| is the length of the engine traversal path and TO|T_{O}| is the length of the optimal path. In this definition, a higher value is better, with R3=1R^{3} = 1 indicating optimal algorithmic link prioritization performance. Note that when TO=0|T_{O}| = 0 or TE=0|T_{E}| = 0, our query has no results and the notion of optimal link prioritization does not exist. When an engine performs poorly, resulting in smaller differences in the metric’s values, taking the log2 can help visualize relative differences.

Using the R3R^{3} metric, we can compare the algorithmic efficiency of the two traversal algorithms in our example. Using breadth-first search, the path to dereference the query-relevant document contains six nodes, while the optimal traversal order contains two. Thus, the R3R^3 metric for breadth-first traversal equals 0.330.33. Depth-first visits only three nodes, yielding an R3R^3 value of 0.670.67. From this analysis, we can conclude that depth-first traversal outperforms breadth-first traversal on this subweb.

To find the optimal traversal path to dereference all query-relevant documents, we notice that it is equivalent to solving the directed Steiner tree problem [16] on our directed graph, with the query-relevant documents serving as terminals. As such, we can use existing Steiner tree solvers to efficiently find our optimal traversal path.

Fig. 1: An example subweb [15] showing the differences in traversal paths between breadth-first and depth-first methods, as well as the optimal traversal path.

Results

Our experiment uses the Discover queries from the SolidBench benchmark [3]. To track the required information for computing the metric during query execution, we use the query engine Comunica [17]. To compute the optimal path, we will use a heuristic Steiner tree solver [18] to speed up computations. Although an exact solver would be ideal, it significantly increases computational complexity for large subwebs [19, 20].

Our experiments compare depth and breadth-first prioritization [4] using R3R^{3} and time until the final query result. We aim to validate the substantiated assumption that Discover queries do not benefit from improved link prioritization [21].

Table 1 shows the computed metrics per template instantiation and the difference in the log2 of the metric value. We find that either method can outperform the other based on the used template. Furthermore, there is a significant difference in performance depending on the template instantiation, likely due to the different fragmentation strategies used in SolidBench [3]. Some pods store query-relevant data in a single file, while others fragment these into directories, complicating optimal traversal.

The differences in prioritization performance are not observed in time until the last result; the Pearson correlation coefficients between R3R^{3} and time until the last result are -0.042 and 0.201 for depth and breadth-first respectively. These low correlations suggest a weak link between R3R^{3} and query execution time, possibly caused by noise in execution time.

Using the R3R^{3}, we confirm our previous paper [21] by computing a metric value instead of an engine-specific in-depth analysis of the query execution progress.

Query DF BF Δ log2 (R3) Query DF BF Δ log2 (R3)
D1.0 0.23 0.08 1.617 D5.0 0.13 0.15 -0.134
D1.1 0.50 0.50 0.000 D5.1 0.50 0.50 0.000
D1.2 0.50 0.50 0.000 D5.2 0.33 0.67 -1.000
D1.3 0.75 0.75 0.000 D5.3 0.04 0.03 0.250
D1.4 0.18 0.18 0.000 D5.4 0.18 0.16 0.196
D2.0 0.50 0.51 -0.031 D6.0 0.53 0.53 0.000
D2.1 0.50 0.67 -0.415 D6.1 0.40 0.40 0.000
D2.2 0.50 0.67 -0.415 D6.2 0.40 0.40 0.000
D2.3 0.90 0.90 0.000 D6.3 0.13 0.13 0.000
D2.4 0.44 0.56 -0.349 D6.4 0.73 0.46 0.690
D3.0 0.50 0.48 0.074 D7.0 0.05 0.05 0.000
D3.1 0.91 0.95 -0.058 D7.1 0.40 0.40 0.000
D3.2 0.93 0.93 -0.000 D7.2 0.40 0.40 0.000
D3.3 0.62 0.62 0.000 D7.3 0.15 0.15 0.000
D3.4 0.61 0.61 0.000 D7.4 0.06 0.03 0.955
D4.0 0.09 0.06 0.520 D8.0 0.45 0.64 -0.500
D4.1 0.44 0.44 0.000 D8.1 - - -
D4.2 0.50 0.44 0.170 D8.2 0.47 1.00 -1.100
D4.3 0.03 0.03 0.000 D8.3 - - -
D4.4 0.11 0.11 0.000 D8.4 0.36 0.43 -0.263

Table 1: The R3R^{3} for depth and breadth-first prioritization (DF, BF) separated by query template and instantiation. D1.2 template one, instantiation two.

Conclusion

The current definition and implementation of the Relevant Retrieval Ratio (R3R^{3}) has several limitations. First, due to the computational complexity of the Steiner tree problem for graphs and the lack of readily available exact solvers that work on directed graphs, our implementations use heuristics, thus leading to potentially suboptimal traversal path lengths. Second, the metric can not be computed when a query produces no results, either due to timeout or no results existing for the query. Finally, our metric uses theoretically optimal paths. In practice, document dereference times can vary, making the theoretically optimal path potentially suboptimal. In future work, more extensive benchmarking of the metric is required to validate its effectiveness in measuring prioritization performance. Furthermore, a new metric that includes a penalty term for HTTP request time would account for real-world uncertainties in LTQP scenarios. Finally, an R3R^3 metric for the first kk results can be defined.

Acknowledgements

This research was supported by SolidLab Vlaanderen (Flemish Government, EWI and RRF project VV023/10). Ruben Taelman is a postdoctoral researcher at the Research Foundation – Flanders (FWO).