Consider query params when detecting duplicate queries #995

PiDelport · 2017-09-26T14:46:34Z

Currently, the SQL panel's duplicate query detection considers all queries with the same raw_sql to be duplicates, even when their params are different. This leads to some confusing reports.

This PR changes the key used to group duplicate queries to include the params too, so that queries with distinct params are counted as distinct.

codecov · 2017-09-26T14:46:40Z

Codecov Report

Merging #995 into master will increase coverage by 0.08%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #995      +/-   ##
==========================================
+ Coverage   84.29%   84.37%   +0.08%     
==========================================
  Files          24       24              
  Lines        1318     1325       +7     
  Branches      178      181       +3     
==========================================
+ Hits         1111     1118       +7     
  Misses        157      157              
  Partials       50       50

Impacted Files	Coverage Δ
debug_toolbar/panels/sql/tracking.py	`83.83% <ø> (ø)`	⬆️
debug_toolbar/panels/sql/panel.py	`66.45% <100%> (+1.52%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 72d2d43...5026736. Read the comment docs.

aaugustin · 2017-09-26T15:56:24Z

The point of this feature is to detect N + 1 problems, where the same query is made N times with N different parameters, but could be optimized with an appropriate select_related or prefetch_related. The current behavior is the intended behavior.

PiDelport · 2017-09-28T10:57:22Z

Thanks for the feedback, @aaugustin!

Alright, I understand that use case, but that's not the use case I was facing: the code I was optimising involved a tree of queries through some disparate code paths where some of the leaves overlap, so select_related / prefetch_related was not the applicable optimisation.

The duplicate detection by SQL query alone was not useful for this: it detects all queries for the same type as duplicates, even when they're from completely different places in the code, and are intended that way. What I was interested in was only the relatively small proportion of duplicate queries for the same objects: these are the code paths where a little bit of transient caching would avoid a lot of redundant database traffic. Adjusting the duplicate detection made a big difference in usability: it highlights exactly the 2 or 3 relevant code paths, versus having to comb through 10 or 20 detected duplicates that are unrelated.

I would guess I'm not the only one who'd benefit from this view: is there a sensible way to provide for both use cases? I can think of the following options:

Allow some kind of setting to control this behaviour. (Inconvenient, not very discoverable.)
Allow an arbitrary developer-specified duplicate_key function. (Probably overkill, and too exposing of internals?)
Track and report both by default. (Most developer-friendly?)

I'm leaning toward option 3: it shouldn't be too hard to implement.

In the report, the wording that would make sense for me would be to report the current number (ignoring params) as "similar queries", and exact duplicates (including params) as "duplicate queries". So, a query that currently reports 20 duplicates may instead report "20 similar, 3 duplicates".

Do you have thoughts or feedback on this?

aaugustin · 2017-09-28T13:13:08Z

Yes, option 3 would be an improvement.

camilonova · 2018-07-21T15:11:49Z

@pjdelport could you please elaborate on option 3 ?

PiDelport · 2018-07-24T14:07:47Z

@camilonova: Rather than making a choice between including or excluding the params from the query uniqueness, Option 3 would be to simply track and report both metrics:

Duplicate queries ignoring params (what's implemented, for helping with e.g. N+1 problems )
Duplicate queries including params (the metric I was interested in, for helping with e.g. caching problems)

PiDelport · 2018-07-24T14:08:42Z

(Also, I know this PR is old… I have to find time to pick it up again!)

camilonova · 2018-07-25T14:32:46Z

@pjdelport Please do, it looks like a great contribution to me.

The duplicate query detection needs this, because `params` is not always JSON-serialisable.

PiDelport · 2018-08-29T11:48:50Z

Alright, I finally got to needing this again!

I rebased the branch, and updated the code to track and display both metrics.

@aaugustin, @camilonova, how does this look to you?

PiDelport · 2018-08-29T12:20:31Z

In terms of wording, this PR so far just keeps the existing "duplicates" wording, for ease of review, and adds "duplicates with params".

However, we could maybe make this clearer and simpler to understand by adopting the "similar" versus "duplicate" wording proposed earlier?

Current wording:

Per DB alias: "(29 queries including 20 duplicates and 4 duplicates with params )"
Per query: "Duplicated 5 times. Duplicated 2 times with params."

Proposed wording:

Per DB alias: "(29 queries including 20 similar and 4 duplicates )"
Per query "5 similar queries. Duplicated 2 times."

We could improve the developer accessibility of this further by adding <abbr> tooltips with explanations:

"Similar queries are queries with the same SQL, but potentially different parameters."
"Duplicate queries are identical to each other: they execute exactly the same SQL and parameters."

How does this sound?

camilonova · 2018-08-29T15:40:44Z

@pjdelport sounds great please do it. Thank you.

…counts

PiDelport · 2018-08-31T12:48:57Z

@camilonova: Cool, added!

As a side question, should the po translation files be re-generated as part of this PR, or is that a separate process?

aaugustin · 2018-08-31T15:49:58Z

The idea sounds good, I don't have time to look at the code, go ahead if it works for you :-)

PiDelport · 2018-09-02T09:14:36Z

Cool, thanks!

PiDelport · 2018-09-03T11:13:36Z

Small follow-up bugfix: #1084

PiDelport added 2 commits August 29, 2018 12:26

(Factor out key function for duplicate queries)

2593bf5

Consider the query params when detecting duplicate queries

cb9546d

PiDelport force-pushed the duplicate-queries-consider-params branch from bed5578 to cb9546d Compare August 29, 2018 11:01

PiDelport added the Improvement label Aug 29, 2018

PiDelport added 4 commits August 29, 2018 13:10

SQL query recording: Track the raw_params too

a0cc2b5

The duplicate query detection needs this, because `params` is not always JSON-serialisable.

(Rename variable to avoid shadowing)

9e38b6a

SQLPanel: Track duplicated queries both with and without params

054d55f

SQL panel template: Include counts of duplicates with params too

4653452

PiDelport added 3 commits August 29, 2018 15:55

Convert raw_params lists to hashable tuples

3f57676

Wording change: duplicate -> similar, for queries with differing params

46e8a33

Wording: "duplicate with params" -> duplicate, for identical queries

ddb5fc9

SQL panel: Add explanatory tooltip text to similar / duplicate query …

5026736

…counts

camilonova merged commit 844353e into master Sep 1, 2018

camilonova deleted the duplicate-queries-consider-params branch September 1, 2018 20:50

PiDelport mentioned this pull request Sep 3, 2018

duplicate_key: Handle raw_params being None sometimes #1084

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consider query params when detecting duplicate queries #995

Consider query params when detecting duplicate queries #995

Uh oh!

PiDelport commented Sep 26, 2017

Uh oh!

codecov bot commented Sep 26, 2017 •

edited

Loading

Uh oh!

aaugustin commented Sep 26, 2017

Uh oh!

PiDelport commented Sep 28, 2017

Uh oh!

aaugustin commented Sep 28, 2017

Uh oh!

camilonova commented Jul 21, 2018

Uh oh!

PiDelport commented Jul 24, 2018

Uh oh!

PiDelport commented Jul 24, 2018

Uh oh!

camilonova commented Jul 25, 2018

Uh oh!

PiDelport commented Aug 29, 2018

Uh oh!

PiDelport commented Aug 29, 2018 •

edited

Loading

Uh oh!

camilonova commented Aug 29, 2018

Uh oh!

PiDelport commented Aug 31, 2018

Uh oh!

aaugustin commented Aug 31, 2018

Uh oh!

PiDelport commented Sep 2, 2018

Uh oh!

PiDelport commented Sep 3, 2018

Uh oh!

Uh oh!

Consider query params when detecting duplicate queries #995

Consider query params when detecting duplicate queries #995

Uh oh!

Conversation

PiDelport commented Sep 26, 2017

Uh oh!

codecov bot commented Sep 26, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

aaugustin commented Sep 26, 2017

Uh oh!

PiDelport commented Sep 28, 2017

Uh oh!

aaugustin commented Sep 28, 2017

Uh oh!

camilonova commented Jul 21, 2018

Uh oh!

PiDelport commented Jul 24, 2018

Uh oh!

PiDelport commented Jul 24, 2018

Uh oh!

camilonova commented Jul 25, 2018

Uh oh!

PiDelport commented Aug 29, 2018

Uh oh!

PiDelport commented Aug 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

camilonova commented Aug 29, 2018

Uh oh!

PiDelport commented Aug 31, 2018

Uh oh!

aaugustin commented Aug 31, 2018

Uh oh!

PiDelport commented Sep 2, 2018

Uh oh!

PiDelport commented Sep 3, 2018

Uh oh!

Uh oh!

codecov bot commented Sep 26, 2017 •

edited

Loading

PiDelport commented Aug 29, 2018 •

edited

Loading