-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Consider query params when detecting duplicate queries #995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #995 +/- ##
==========================================
+ Coverage 84.29% 84.37% +0.08%
==========================================
Files 24 24
Lines 1318 1325 +7
Branches 178 181 +3
==========================================
+ Hits 1111 1118 +7
Misses 157 157
Partials 50 50
Continue to review full report at Codecov.
|
The point of this feature is to detect N + 1 problems, where the same query is made N times with N different parameters, but could be optimized with an appropriate select_related or prefetch_related. The current behavior is the intended behavior. |
Thanks for the feedback, @aaugustin! Alright, I understand that use case, but that's not the use case I was facing: the code I was optimising involved a tree of queries through some disparate code paths where some of the leaves overlap, so The duplicate detection by SQL query alone was not useful for this: it detects all queries for the same type as duplicates, even when they're from completely different places in the code, and are intended that way. What I was interested in was only the relatively small proportion of duplicate queries for the same objects: these are the code paths where a little bit of transient caching would avoid a lot of redundant database traffic. Adjusting the duplicate detection made a big difference in usability: it highlights exactly the 2 or 3 relevant code paths, versus having to comb through 10 or 20 detected duplicates that are unrelated. I would guess I'm not the only one who'd benefit from this view: is there a sensible way to provide for both use cases? I can think of the following options:
I'm leaning toward option 3: it shouldn't be too hard to implement. In the report, the wording that would make sense for me would be to report the current number (ignoring params) as "similar queries", and exact duplicates (including params) as "duplicate queries". So, a query that currently reports 20 duplicates may instead report "20 similar, 3 duplicates". Do you have thoughts or feedback on this? |
Yes, option 3 would be an improvement. |
@pjdelport could you please elaborate on option 3 ? |
@camilonova: Rather than making a choice between including or excluding the
|
(Also, I know this PR is old… I have to find time to pick it up again!) |
@pjdelport Please do, it looks like a great contribution to me. |
bed5578
to
cb9546d
Compare
The duplicate query detection needs this, because `params` is not always JSON-serialisable.
Alright, I finally got to needing this again! I rebased the branch, and updated the code to track and display both metrics. @aaugustin, @camilonova, how does this look to you? |
In terms of wording, this PR so far just keeps the existing "duplicates" wording, for ease of review, and adds "duplicates with params". However, we could maybe make this clearer and simpler to understand by adopting the "similar" versus "duplicate" wording proposed earlier? Current wording:
Proposed wording:
We could improve the developer accessibility of this further by adding
How does this sound? |
@pjdelport sounds great please do it. Thank you. |
@camilonova: Cool, added! As a side question, should the |
The idea sounds good, I don't have time to look at the code, go ahead if it works for you :-) |
Cool, thanks! |
Small follow-up bugfix: #1084 |
Currently, the SQL panel's duplicate query detection considers all queries with the same
raw_sql
to be duplicates, even when theirparams
are different. This leads to some confusing reports.This PR changes the key used to group duplicate queries to include the
params
too, so that queries with distinct params are counted as distinct.