What is QaldGen?
QaldGen, a natural language benchmarks selection framework for Linked Data which is able to select customised QA benchmarks from existing QA repositories. The framework is flexible enough to generate benchmarks of varying sizes and according to the user-defined criteria on the most important features to be considered for QA benchmarking. This is achieved using different clustering algorithms.
How to generate a benchmark?
The benchmark is generated using the following steps:
How to generate costomized benchmark?
- Selection of benchmark generation method : The first step is to select the benchmark generation method(s). Currently, our framework supports 6 well-known clustering methods namely DBSCAN+Kmeans++, Kmean++, Aglomerative, Random selection, FEASIBLE, FEASIBLE-Exemplars.
- Parameters selection : The second step is the selection of parameters like the number of queries in the resulting benchmark or the number of iterations for Kmeans++ clustering etc.
- Benchmark personalization : The third step allows to further customize the resulting benchmark. By default the SPARQL query selects all the queries along with required features to be considered for benchmarking. The user can modify this query to generate customized benchmarks. The benchmark costomization is explained below
- Results : The diversity score and the similarity errors for the selected methods will be shown as bar graphs.
- Benchmarks download : The resulting benchmarks can be finally downloaded and used in the evaluation of containment solvers.
Suppose we want the following costomization:
For example, imagine the user wants to generate customised benchmarks with the following features: and
The query for the selection of such a personalised benchmark is given
The costomized SPRQL query for benchmark selection is given below. This should be replaced with the default benchmark selection query.
- The benchmark should only be generated from QALD9 and hence skipping LC-QuAD questions.
- The personalised benchmark should only contains question of type "what"
the number of triple patterns should be great than 1 and there should be at least one answer of this question.
Project home page, source code, complete evaluation results, and issues
QaldGen is open sourc and available from https://github.com/dice-group/qald-generator.