md`
A recent developer mailing list inquiry caught our attention a few days ago.
The author was wondering, _"[can] anyone ... give me a recommendation for 'how big a sane query can get'?"_
His use case was to apply a [SPARQL](http://www.w3.org/TR/sparql11-query/)
query against the
[LOD dataset](http://lod-cloud.net) in order to test for
the presence of "patterns for human associations".
There was mention of subgraph sizes in excess of 400K triples, but as it were,
his efforts appeared thwarted by much lower limits which his query processor
set for the number of statement patterns in a [BGP](http://www.w3.org/TR/sparql11-query/#BasicGraphPatterns).
The case made me wonder where our limits lie and what accounts for them.
To this end, I used a generator for simple count queries:
* (defun generate-bgp (count)
\`(spocq.a:|select| (spocq.a:|bgp| ,@(loop for i from 0 below count
collect '(spocq.a:|triple| ?::|s| ?::|p| ?::|o|)))
((?::|count| (spocq:|count| SPOCQ.S:*)))))
GENERATE-BGP
which generates trivial queries, such as:
* (generate-bgp 4)
(spocq:|select|
(spocq:|bgp|
(spocq:|triple| ?::|s| ?::|p| ?::|o|)
(spocq:|triple| ?::|s| ?::|p| ?::|o|)
(spocq:|triple| ?::|s| ?::|p| ?::|o|)
(spocq:|triple| ?::|s| ?::|p| ?::|o|))
((?::|count| (spocq:|count| SPOCQ.S:*))))
* (defun run-test-bgp (count)
(run-sparql (generate-bgp count) :repository-id "james/bgp-tests"))
RUN-TEST-BGP
with a single-statement repository, just to exercise the query invocation, without
any concern for solution field size,
* (run-test-bgp 4)
((1))
(?::|count|)
#<QUERY [E7DA6E00-E215-11E4-90B0-000000000000/NIL, select@COMPLETE, james/bgp-tests] {100AFD9199}>
In order to gauge the resource use in terms of execution time, it sufficed to
simply profile the significant operators:
* (sb-profile:profile rdfcache:match rdfcache:count spocq-compile run-agp-thread)
* (loop for count = 2 then (* count 2) until (> count (* 32 1024))
do (let ((start (get-internal-run-time)))
(sb-profile:reset)
(run-test-bgp count)
(format *trace-output* "~%~%~a ~ams" count (- (get-internal-run-time) start))
(sb-profile:report)
(format *trace-output* "~%")))
The result is a transcript of the elapsed times, function call counts and cumulative run-time for
the respective operators, in this case, on a six-core CPU.
For example:
2 101ms
seconds | gc | consed | calls | sec/call | name
--------------------------------------------------------
0.091 | 0.000 | 11,168,640 | 3 | 0.030333 | SPOCQ-COMPILE
0.001 | 0.000 | 32,576 | 1 | 0.001000 | RDFCACHE:MATCH
0.000 | 0.000 | 0 | 1 | 0.000000 | RDFCACHE:COUNT
0.000 | 0.000 | 0 | 1 | 0.000000 | RUN-AGP-THREAD
--------------------------------------------------------
0.092 | 0.000 | 11,201,216 | 6 | | Total
For this query form, that is with a single BGP, which is compiled into a single,
monolithic pattern matching function, the results are as follows:
<table style="text-align: right">
<tr><th>pattern count </th><th>milliseconds </th><th>compile </th><th>count </th><th>match</th></tr>
<tr><td> 2 </td><td> 101 </td><td> 91 </td><td> 1 </td><td> 0 </td></tr>
<tr><td> 4 </td><td> 37 </td><td> 23 </td><td> 1 </td><td> 1 </td></tr>
<tr><td> 8 </td><td> 125 </td><td> 108| 1 </td><td> 1 </td></tr>
<tr><td> 16 </td><td> 147 </td><td> 137 </td><td> 5 </td><td> 0 </td></tr>
<tr><td> 32 </td><td> 228 </td><td> 212 </td><td> 2 </td><td> 2 </td></tr>
<tr><td> 64 </td><td> 447 </td><td> 420 </td><td> 7 </td><td> 1 </td></tr>
<tr><td> 128 </td><td> 1043 </td><td> 1018 </td><td> 5 </td><td> 2 </td></tr>
<tr><td> 256 </td><td> 3726 </td><td> 3678 </td><td> 15 </td><td> 1 </td></tr>
<tr><td> 512 </td><td> 27592 </td><td> 27496 </td><td> 17 </td><td> 4 </td></tr>
<tr><td> 1024 </td><td> 153774 </td><td> 153615 </td><td> 37 </td><td> 5 </td></tr>
<tr><td> 1152 </td><td> 216749 </td><td> 216714 </td><td> 26 </td><td> 7 </td></tr>
</table>
above 1K patterns, however, the run-time reaches a compilation limit.
The compiler exhausts its dynamic binding stack either when precompiling the BGP,
or - if the query processor is set to interpret rather than compile queries, when interpreting
the pattern's source form.
As an alternative, one could limit each BGP to a single statement.
* (defun generate-singleton-bgp (count)
(let ((body (loop with bgp = '(spocq.a:|bgp| (spocq.a:|triple| ?::|s| ?::|p| ?::|o|))
for form = bgp then \`(spocq.a:|join| ,bgp ,form)
for i from 0 below count
finally (return form))))
\`(spocq.a:|select| ,body ((?::|count| (spocq:|count| SPOCQ.S:*))))))
GENERATE-SINGLETON-BGP.
an attempt with queries of this form progressed up to 16K:
<table style="text-align: right">
<tr><th>pattern count</th><th>milliseconds</th></tr>
<tr><td> 1 </td><td> 42 </td></tr>
<tr><td> 2 </td><td> 36 </td></tr>
<tr><td> 4 </td><td> 49 </td></tr>
<tr><td> 8 </td><td> 60 </td></tr>
<tr><td> 16 </td><td> 93 </td></tr>
<tr><td> 32 </td><td> 136 </td></tr>
<tr><td> 64 </td><td> 286 </td></tr>
<tr><td> 128 </td><td> 540 </td></tr>
<tr><td> 256 </td><td> 1237 </td></tr>
<tr><td> 512 </td><td> 2191 </td></tr>
<tr><td> 1024</td><td> 5670 </td></tr>
<tr><td> 2048</td><td> 12538 </td></tr>
<tr><td> 4096</td><td> 64937 </td></tr>
<tr><td> 8192</td><td> 138365 </td></tr>
<tr><td> 16384</td><td> 567760 </td></tr>
</table>
but then also ran into limits.
This time, it was system limits on connections to the store which for the
16K single-pattern BGPs amounted to that many connections, all active in parallel,
with twice that many active
threads - one for each BGP and one for each consequential JOIN operator.
Even this limit required some reconfiguration once to raise the page map limit to permit that many simultaneous
store connections and once to raise the compiler's stack space to permit it to compile a query with
that degree of nested joins.
In any event, given these two limits - 16K parallel BGPs and 1K statements per BGP, it seemed possible
to achieve rather large statement counts.
As indeed it was.
With a configuration option to limit the BGP statement pattern count to 256, the effective limits were
quite high:
<table style="text-align: right">
<tr><th>count </th><th> milliseconds </th><th> compile </th><th> count </th><th> match (net gc)</th></tr>
<tr><td> 2 </td><td>39 </td><td>30 </td><td>1 </td><td>0 </td></tr>
<tr><td> 4 </td><td> 109 </td><td> 97 </td><td> 0 </td><td> 0 </td></tr>
<tr><td> 8 </td><td> 114 </td><td> 108 </td><td> 1 </td><td> 1 </td></tr>
<tr><td> 16 </td><td> 144 </td><td> 136 </td><td> 1 </td><td> 1 </td></tr>
<tr><td> 32 </td><td> 231 </td><td> 218 </td><td> 8 </td><td> 0 </td></tr>
<tr><td> 64 </td><td> 432 </td><td> 412 </td><td> 6 </td><td> 0 </td></tr>
<tr><td> 128 </td><td> 41 </td><td> 32 </td><td> 20 </td><td> 0 </td></tr>
<tr><td> 256 </td><td> 47 </td><td> 32 </td><td> 23 </td><td> 1 </td></tr>
<tr><td> 512 </td><td> 59 </td><td> 41 </td><td> 45 </td><td> 0 </td></tr>
<tr><td> 1024 </td><td> 99 </td><td> 61 </td><td> 133 </td><td> 2 </td></tr>
<tr><td> 2048 </td><td> 155 </td><td> 70 </td><td> 419 </td><td> 2 </td></tr>
<tr><td> 4096 </td><td> 313 </td><td> 95 </td><td> 1636 </td><td> 2 </td></tr>
<tr><td> 8192 </td><td> 672 </td><td> 405 </td><td> 6645 </td><td> 73 </td></tr>
<tr><td> 16384 </td><td> 1965 </td><td> 216 </td><td> 43239 </td><td> 524 </td></tr>
<tr><td> 32768 </td><td> 8125 </td><td> 369 </td><td> 366529 </td><td> 25771 </td></tr>
<tr><td> 65536 </td><td> 7217 </td><td> 709 </td><td> 2077912 </td><td> 49093 </td></tr>
<tr><td> 131072 </td><td> 20522 </td><td> 2066 </td><td> 23031994 </td><td> 62938 </td></tr>
<tr><td> 262144 </td><td> 62569 </td><td> 4128 </td><td> 98516780 </td><td> 665346 </td></tr>
<tr><td>524288 </td><td> 291660 </td><td> 13022 </td><td> 1380142800 </td><td> 2563251 </td></tr>
</table>
<br />
The degenerate case of a single BGP with thousands of statement patterns turns out to not be
all that interesting.
On one hand, if it is implemented with value propagation,
only one thread is involved and, depending on the dataset statistics, it would execute likely
half the statements, but one match at a time.
For larger sizes, in any event, it fails early due to stack limits in the compiler.
The path to test large pattern counts is to force the partitioning into joined subexpressions,
with which, even within the bounds for
- stack space limits in the query processor run-time which constrain operator depth,
- memory mapping limits which constrain the number of simultaneous connection to the store,
- stack space limits within the compiler which limit both BGP statement count and operator depth,
- file descriptor limits which constrain connections to the store, and
- eventual thread count limits which constrain total operator and BGP count,
it is possible to evaluate queries with 512K statements.
The [progression of elapsed times](https://docs.google.com/spreadsheets/d/1Gup83gVu4w72xqaJd7HE4Az-uacto-Or9oU7fuvcVo0/pubhtml?gid=626408870&single=true)
do indicate, however, that significant contention occurs
as the count increases,
which suggests the results would improve if the processor were to limit the number
of BGPs which it executed in parallel.
`