Skip to content

Consider query params when detecting duplicate queries #995

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Sep 1, 2018

Conversation

PiDelport
Copy link
Contributor

Currently, the SQL panel's duplicate query detection considers all queries with the same raw_sql to be duplicates, even when their params are different. This leads to some confusing reports.

This PR changes the key used to group duplicate queries to include the params too, so that queries with distinct params are counted as distinct.

@codecov
Copy link

codecov bot commented Sep 26, 2017

Codecov Report

Merging #995 into master will increase coverage by 0.08%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #995      +/-   ##
==========================================
+ Coverage   84.29%   84.37%   +0.08%     
==========================================
  Files          24       24              
  Lines        1318     1325       +7     
  Branches      178      181       +3     
==========================================
+ Hits         1111     1118       +7     
  Misses        157      157              
  Partials       50       50
Impacted Files Coverage Δ
debug_toolbar/panels/sql/tracking.py 83.83% <ø> (ø) ⬆️
debug_toolbar/panels/sql/panel.py 66.45% <100%> (+1.52%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 72d2d43...5026736. Read the comment docs.

@aaugustin
Copy link
Contributor

The point of this feature is to detect N + 1 problems, where the same query is made N times with N different parameters, but could be optimized with an appropriate select_related or prefetch_related. The current behavior is the intended behavior.

@PiDelport
Copy link
Contributor Author

Thanks for the feedback, @aaugustin!

Alright, I understand that use case, but that's not the use case I was facing: the code I was optimising involved a tree of queries through some disparate code paths where some of the leaves overlap, so select_related / prefetch_related was not the applicable optimisation.

The duplicate detection by SQL query alone was not useful for this: it detects all queries for the same type as duplicates, even when they're from completely different places in the code, and are intended that way. What I was interested in was only the relatively small proportion of duplicate queries for the same objects: these are the code paths where a little bit of transient caching would avoid a lot of redundant database traffic. Adjusting the duplicate detection made a big difference in usability: it highlights exactly the 2 or 3 relevant code paths, versus having to comb through 10 or 20 detected duplicates that are unrelated.

I would guess I'm not the only one who'd benefit from this view: is there a sensible way to provide for both use cases? I can think of the following options:

  1. Allow some kind of setting to control this behaviour. (Inconvenient, not very discoverable.)
  2. Allow an arbitrary developer-specified duplicate_key function. (Probably overkill, and too exposing of internals?)
  3. Track and report both by default. (Most developer-friendly?)

I'm leaning toward option 3: it shouldn't be too hard to implement.

In the report, the wording that would make sense for me would be to report the current number (ignoring params) as "similar queries", and exact duplicates (including params) as "duplicate queries". So, a query that currently reports 20 duplicates may instead report "20 similar, 3 duplicates".

Do you have thoughts or feedback on this?

@aaugustin
Copy link
Contributor

Yes, option 3 would be an improvement.

@camilonova
Copy link
Contributor

@pjdelport could you please elaborate on option 3 ?

@PiDelport
Copy link
Contributor Author

@camilonova: Rather than making a choice between including or excluding the params from the query uniqueness, Option 3 would be to simply track and report both metrics:

  1. Duplicate queries ignoring params (what's implemented, for helping with e.g. N+1 problems )
  2. Duplicate queries including params (the metric I was interested in, for helping with e.g. caching problems)

@PiDelport
Copy link
Contributor Author

(Also, I know this PR is old… I have to find time to pick it up again!)

@camilonova
Copy link
Contributor

@pjdelport Please do, it looks like a great contribution to me.

@PiDelport PiDelport force-pushed the duplicate-queries-consider-params branch from bed5578 to cb9546d Compare August 29, 2018 11:01
@PiDelport
Copy link
Contributor Author

Alright, I finally got to needing this again!

I rebased the branch, and updated the code to track and display both metrics.

@aaugustin, @camilonova, how does this look to you?

@PiDelport
Copy link
Contributor Author

PiDelport commented Aug 29, 2018

In terms of wording, this PR so far just keeps the existing "duplicates" wording, for ease of review, and adds "duplicates with params".

However, we could maybe make this clearer and simpler to understand by adopting the "similar" versus "duplicate" wording proposed earlier?

Current wording:

  • Per DB alias: "(29 queries including 20 duplicates and 4 duplicates with params )"
  • Per query: "Duplicated 5 times. Duplicated 2 times with params."

Proposed wording:

  • Per DB alias: "(29 queries including 20 similar and 4 duplicates )"
  • Per query "5 similar queries. Duplicated 2 times."

We could improve the developer accessibility of this further by adding <abbr> tooltips with explanations:

  • "Similar queries are queries with the same SQL, but potentially different parameters."
  • "Duplicate queries are identical to each other: they execute exactly the same SQL and parameters."

How does this sound?

@camilonova
Copy link
Contributor

@pjdelport sounds great please do it. Thank you.

@PiDelport
Copy link
Contributor Author

@camilonova: Cool, added!

As a side question, should the po translation files be re-generated as part of this PR, or is that a separate process?

@aaugustin
Copy link
Contributor

The idea sounds good, I don't have time to look at the code, go ahead if it works for you :-)

@camilonova camilonova merged commit 844353e into master Sep 1, 2018
@camilonova camilonova deleted the duplicate-queries-consider-params branch September 1, 2018 20:50
@PiDelport
Copy link
Contributor Author

Cool, thanks!

@PiDelport
Copy link
Contributor Author

Small follow-up bugfix: #1084

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants