Skip to content

Make category search non case-sensitive and more user friendly #3179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
misaochan opened this issue Oct 17, 2019 · 68 comments · Fixed by #3326
Open

Make category search non case-sensitive and more user friendly #3179

misaochan opened this issue Oct 17, 2019 · 68 comments · Fixed by #3326

Comments

@misaochan
Copy link
Member

misaochan commented Oct 17, 2019

Received a report from a 2.11 user on our FB page that category search is case-sensitive for her, which means that sometimes she'll type the right category in the search field but nothing will show up.

AFAIK the MW API that we use is inherently case-sensitive, but the upload wizard seems to be able to find a way around that and produces the same category suggestions regardless of case.


Edit: Apart from the case sensitivity, the allcategories API also has a problem of doing a prefix match. This does not give a great UX. We should explore ways to fix this too.

@nicolas-raoul
Copy link
Member

Maybe if we convert everything to lowercase then the server performs a non-case-sensitive search? That's just an hypothesis, I have not tried.

@misaochan
Copy link
Member Author

@nicolas-raoul Possible! We'll try it out with a direct query first.

@ankit-kumar-dwivedi
Copy link

Can I take this issue?

@misaochan
Copy link
Member Author

@ankit-kumar-dwivedi please feel free!

@kbhardwaj123
Copy link
Contributor

@misaochan Is this issue free to be worked upon? if so can i take it?

@ankit-kumar-dwivedi
Copy link

ankit-kumar-dwivedi commented Jan 12, 2020 via email

@kbhardwaj123
Copy link
Contributor

Thank you:)

maskaravivek pushed a commit that referenced this issue Jan 29, 2020
* CategoryClient: fix category search case-sensitivity by converting to lower case as MW api is inherently case-sensitive, the results obtained will be same

* CategoryItem: reverting javadoc changes

* CategoriesModel: make category search case-insensitive

* CategoryItem: fix whitespaces

* Add tests for case-insensitivity

* CategoryClientTest: add more test cases

* CategoryClientTest: fix travis ci test

* CategoriesModelTest: changes mage to CategoriesModel and tested
@sivaraam
Copy link
Member

sivaraam commented Mar 28, 2020

I'm re-opening this as I believe there's a problem with how this issue was fixed.

@kbhardwaj123 Can you clarify a doubt that I have regarding your PR #3326? In the description you say:

Tested from the MW api fuzzy search url that the category suggestions would deliver the desired results no matter what case you sent in the api call so to fix the issue api call has been converted to lower case.

Are you sure the API really doesn't care about the case of the category name given to it? I'm doubtful about that for several reasons. Here are a couple:

  1. The logical one: If the API really doesn't care about the case of the search text sent to it, this issue shouldn't exist to begin with. Right? IOW, if the API is returning us all categories that match a search text despite the case in which we send the query, then there's no point in just lower-casing the search text we send to the API. Got my point? But the mere existence of this issue indicates otherwise. Correct me if I'm missing something.
  2. The practical one: I just checked with a couple of API calls and I get different results based on the case of the search text I send to the query. Here are a couple of queries which return different results despite only the case of the search text differing:

In case you're wondering why the test case didn't fail. Here's the catch:

The page title is case-sensitive except the first character.

From Manual:Page title - MediaWiki

I think the quote speaks for itself. I'll share the actual problem w.r.t to the app in the next comment.

@sivaraam sivaraam reopened this Mar 28, 2020
@kbhardwaj123
Copy link
Contributor

kbhardwaj123 commented Mar 28, 2020

@sivaraam while I was working on this I went with what @nicolas-raoul suggested so I ensured that all the category strings being passed to the OkHttpClient are converted to lower case and I wrote new tests regarding that and they worked fine but I guess I must have missed something I will take a look at at again

@sivaraam
Copy link
Member

sivaraam commented Mar 28, 2020

Ok. Here's the issue with the respect to the app: category search doesn't return any categories with a prefix that has a upper case character in it (other than the first one, of course). See #3582 for proof.

In case the issue is not clear to you from #3582, here's another example.

Here's what I get when I search for categories with "COVID" (mind the case) in the app (version: 2.12.3.629~a63a358):
Screenshot_2020-03-28-21-27-39

Now, consider the linked example query which returns 25 categories which have "COVID" as it's prefix. Here are the categories that the query returns:

Category:COVID-19 guidelines in Brazil
Category:COVID-19 guidelines in Argentina
Category:COVID-19 guidelines in Albania
Category:COVID-19
Category:COVID-19 guidelines by country
Category:COVID-19 guidelines in Czechia
Category:COVID-19 guidelines
Category:COVID-19 guidelines in Denmark
Category:COVID-19 guidelines in Esperanto
Category:COVID-19 Clinical Cohort Research Conference, March 18, 2019, National Medical Center, Republic of Korea
Category:COVID-19 coronavirus
Category:COVID-19 guidelines by language
Category:COVID-19 guidelines in Arabic
Category:COVID-19 guidelines in English
Category:COVID-19 guidelines in Basque
Category:COVID-19 guidelines in China
Category:COVID-19 guidelines in Estonian
Category:COVID-19 guidelines in Bengali
Category:COVID-19 guidelines in Bangladesh
Category:COVID-19 guideline cartoons by Anika Nawar Eeha and Abdullah Al Mamun in Bengali
Category:COVID-19 guidelines in Bengali by Anika Nawar Eeha and Abdullah Al Mamun
Category:COVID-19 guidelines in Catalan
Category:COVID-19 guidelines in East Timor
Category:COVID-19 DIY
Category:COVID-19 guidelines in Breton

As you can see, none of the above categories are shown in the category suggestions.

@sivaraam
Copy link
Member

sivaraam commented Mar 29, 2020

@misaochan Given that we've now accidentally reduced the category search space rather than increasing it, you might want to ensure we fix this before releasing the next version.

@misaochan misaochan mentioned this issue Mar 29, 2020
30 tasks
@misaochan
Copy link
Member Author

Added to the release list, thanks for the heads up!

@misaochan
Copy link
Member Author

Hi @kbhardwaj123 , are you currently still working on this? Please do keep us updated, thanks!

@kbhardwaj123
Copy link
Contributor

kbhardwaj123 commented Mar 30, 2020

@misaochan sure I'm on it, will update ASAP

@misaochan
Copy link
Member Author

misaochan commented Mar 31, 2020

Thanks @kbhardwaj123 ! As we are planning to include this in v2.13, when you submit your PR could you please rebase and submit it on the 2.13-release branch?

@kbhardwaj123
Copy link
Contributor

kbhardwaj123 commented Apr 1, 2020

I investigated about the problem and here are my findings.

  • thanks @sivaraam for such in depth details, you are right the API is indeed case sensitive
  • We have 2 search options available at our disposal we can search by prefix or we can use the normal search (in between search, is case-insensitive)

So Suppose I want to find the category Temple of Ishtar at Mari by entering temple of ishtar
these are the results using

  • using generator=allcategories in the url API result which apparently is for prefix search and is case-sensitive
  • using generator=search in the url we get API result which is case in-sensitive and gives us the required Temple of Ishtar at Mari category.

Now on reading the logs i realized that the method searchAll() in CategoriesModel was calling for prefix search and that right there is where the problem is, so i when i fix that by calling both prefix and search API and combining the result we finally get a case insensitive search.

But there's a catch
We are using the beta flavor of the APIs which give the following results

  • prefix result API Link which doesn't give the required result category (this was expected)
  • search result API result, now here we expected Temple of Ishtar at Mari right? but it seems like the beta flavor of the API is unable to give the required result

Possible Solution
AFAIK there are two ways

  • (efficient)Use the production version of the commons api which are capable of delivering case-insensitive results by themselves
  • Or make multiple API calls to the existing API by manipulating the search string

@kbhardwaj123
Copy link
Contributor

@misaochan @sivaraam @nicolas-raoul @maskaravivek I need your opinions on my investigation on this to fix it for v2.13 i mean are we going to use the production flavor of the APIs in v2.13

@sivaraam
Copy link
Member

sivaraam commented Apr 1, 2020

@kbhardwaj123 Thanks for the analysis. I'll look into it and share my comments soon. I have a quick doubt about one particular thing:

We are using the beta flavor of the APIs which give the following results

What do you mean by beta flavor of API? Do you mean the API hosted in the beta server (https://commons.wikimedia.beta.wmflabs.org/w/api.php) as opposed to the production server (https://commons.wikimedia.org/w/api.php)?

@nicolas-raoul
Copy link
Member

For category search (and really any testing that does not involve actually uploading), please use the prodDebug flavor of the app. The beta server is unusable for most testing.

@kbhardwaj123
Copy link
Contributor

@sivaraam yes that's exactly what i meant the API hosted on beta server https://commons.wikimedia.beta.wmflabs.org/w/api.php has server API but it is unable to give the required result where as the production API https://commons.wikimedia.org/w/api.php gives the expected result as shown bu the links in by previous comment.
@nicolas-raoul does prodDebug flavor use https://commons.wikimedia.org/w/api.php APIs ?

@sivaraam
Copy link
Member

sivaraam commented May 13, 2024

To continue the discussion from #5712, the only idea I could think of to improve our category search such that it behaves in a case-insensitive and fuzzy way is to possibly consider augmenting the results of the allcategories API with that of the API that the Special:UploadWizard uses (I suppose it is API:Opensearch as per @mnalis's finding).

This has the caveat that we would be starting to get hidden categories in our result again as API:opensearch does not know about hidden categories. Is that a fine trade off?

@nicolas-raoul
Copy link
Member

Sounds like a negative tradeoff to me. 🤔

@sivaraam
Copy link
Member

sivaraam commented May 13, 2024

Yeah. That's kind of the problem. We don't have an API as far as I know that gives us the best of both worlds 🙁

I'll be more than glad to be proved wrong.

@nicolas-raoul
Copy link
Member

@mnalis Does https://commons.wikimedia.org/wiki/Special:UploadWizard give you the best of both worlds, or not? If yes we should investigate further their API calls and processing. :-)

@mnalis
Copy link
Contributor

mnalis commented May 14, 2024

(@nicolas-raoul notice: unfortunately I have my hands full this week, but I'll try to read up and make a comparison table of current app API vs Special:UploadWizard with examples and make some suggestion next week)

@mnalis
Copy link
Contributor

mnalis commented Jun 2, 2024

Sorry it took some time, in the end I found it worth writing a script to generate it... Some notes:

Text "no" or "match" means whether it matched the "Expected Result" in that API call for "Search Term". Its HREF is API call itself, and if you hover over "match" it will show what string(s) it actually matched. ✔️ means that is what we want, and a ❌ that we didn't want that match (note that those differ from "no/matched", for e.g. hidden or redirected categories where we don't want to find a match)

I've increased search space to 90 on some searches (from their default 10/30 matches, otherwise the results do not show how good the API is at matching, but instead how many other results is returns, and they may miss the one we search for because it's on place 60 or something). Like on Special:UW (regular Special:Uploadwizard has smaller limit)

TL;DR: the more green checkboxes API has, the better it is.

commons-app code seems to contain 4 different APIs, although I don't know which ones are used? Those are labeled app1, app2, app3, app4. I've added Category: before search in app3, otherwise it never found anything.

Let me know if I should add/remove search terms or APIs (like are those app3 / app4 really used?) from the list to make it more readable.

@mnalis

This comment was marked as outdated.

@sivaraam
Copy link
Member

sivaraam commented Jun 3, 2024

That's a rather exhaustive analysis. Thank you for your efforts on this, @mnalis !

like are those app3 / app4 really used?

Reg. the app3 and app4, they are the APIs that query categorymemebers and categories respectively. They are used by the app but not for searching categories but for different purposes. Based on the purpose of those APIs, I believe:

  • app3 (categorymembers API) is used to list pages which are members of a given category. This should be used in the screen you see when clicking on the name of the category on an image (the "Subcategories" tab). Strangely, it doesn't seem to be showing any results now.
  • app4 (categories module API) is used to identify the list of categories associated with a given page. It would be used to show the categories associated with a given image in the media details screen.

So, I think both of these don't server the category search use-case. So, we can leave them out during our consideration.

Overall, I think your results confirm that the allcategories API does behave in a case-senstivie prefix search while the search API behaves case insensitiviely but has trouble filtering out hidden categories.

Let me know if I should add/remove search terms or APIs

Does your test set include a category for which a category page does not yet exist? 🤔
That might be a good addition to the already exhaustive list.

@mnalis
Copy link
Contributor

mnalis commented Jun 3, 2024

Does your test set include a category for which a category page does not yet exist? 🤔
That might be a good addition to the already exhaustive list.

Yes, Category:"Rose" factory from #3179 (comment) does not have category page.

What is additionally interesting about it (perhaps because of `"` chars in the name?) it is only matched by `app2` (allcategories) and no other search methods, **even when exact string with correct case** is supplied.

So I've now also added simpler Category:Nature of Skole Raion test case, but it also happens there.

Update: Ah, I see, that is because only "allcategories" search actually match categories without Category:* page...

In next message is new version (make category name clickable with hover-over tooltip about the use case, removed app3/app4 and gsr_intitle / HotCat which didn't seem to bring advantages to the table, updated test cases). Hopefully it makes the table more readable.

@mnalis
Copy link
Contributor

mnalis commented Jun 3, 2024

Expected Result Search Term Special:UW app1 app2
Alojzije Stepinac Alojzije Stepinac ✔️ match ✔️ match ✔️ match
Alojzije Stepinac Alojzije Step ✔️ match no ✔️ match
Alojzije Stepinac alojzije Step ✔️ match no ✔️ match
Alojzije Stepinac alojzije step ✔️ match no no
Alojzije Stepinac alojzije stepinac ✔️ match ✔️ match no
Franjo Tuđman franjo ✔️ match ✔️ match ✔️ match
Franjo Tuđman Franjo Tu ✔️ match no ✔️ match
Franjo Tuđman franjo Tu ✔️ match no ✔️ match
Franjo Tuđman franjo tu ✔️ match no no
Franjo Tuđman Tuđman no ✔️ match no
Tuđman (surname) Tuđman ✔️ match ✔️ match ✔️ match
Holsteiner Ufer (Berlin-Hansaviertel) Hansaviertel no ✔️ match no
Olea (ship, 1981) Olea no ✔️ match ✔️ match
Olea (ship, 1981) olea no ✔️ match ✔️ match
Olea (ship, 1981) Olea ( ✔️ match ✔️ match ✔️ match
Olea (ship, 1981) Olea (ship ✔️ match ✔️ match ✔️ match
Olea (ship, 1981) Olea ship no ✔️ match no
Archeology in Serbia Archeology in Serbia match match match
Archaeology in Serbia Archaeology in Serbia ✔️ match ✔️ match ✔️ match
Archaeology in Serbia archaeology in serbia ✔️ match ✔️ match no
Archaeology in Serbia Archaeology Serbia no ✔️ match no
Archaeology in Serbia archaeology serbia no ✔️ match no
Photographs taken on Photographs taken match match ✔️ no
Photographs taken on Photographs ta match ✔️ no ✔️ no
Photographs taken on taken ✔️ no match ✔️ no
Requested moves move ✔️ no match ✔️ no
Flapjacks flapja ✔️ match no ✔️ match
Azabujuban summer festival Azabujuban match match match
Temple of Ishtar at Mari temple of ishtar ✔️ match ✔️ match no
Temple of Ishtar at Mari Ishtar at Mari no ✔️ match no
Temple of Ishtar at Mari ishtar at mari no ✔️ match no
Nature of Skole Raion Nature of Skole Raion no no ✔️ match
Nature of Skole Raion nature of skole raion no no no
Nature of Skole Raion Nature of Skole R no no ✔️ match
Nature of Skole Raion nature of skole r no no no
"Rose" factory "Rose" factory no no ✔️ match
"Rose" factory "rose" factory no no no
"Rose" factory Rose factory no no no
"Rose" factory rose factory no no no
"Home Alone" house "Home Alone" house ✔️ match ✔️ match ✔️ match
"Home Alone" house Home Alone no ✔️ match no
"Home Alone" house "home alone" house ✔️ match ✔️ match no
"Home Alone" house home alone house no ✔️ match no
COVID-19 pandemic in Africa COVID-19 pandemic in Af ✔️ match no ✔️ match
COVID-19 pandemic in Africa covid-19 pandemic in Af ✔️ match no no
COVID-19 pandemic in Africa covid-19 pandemic in af ✔️ match no no

mnalis added a commit to mnalis/commons_check_search_api that referenced this issue Jun 3, 2024
@mnalis
Copy link
Contributor

mnalis commented Jun 3, 2024

This is the summary of the results as I read it, please verify if I got it right and if I forgot something:

API case-insensitive categories without page ignores missing non-alphanumerics matches partial words matches in the middle of category name / missing words skips hidden skips renamed / moved
UW ✔️ ✔️
app1 ✔️ ✔️ ✔️
app2 ✔️ ✔️ ✔️
Legend: API shortnames (used for display formatting reasons)

@mnalis
Copy link
Contributor

mnalis commented Jun 3, 2024

@sivaraam can you give insight when is app1 (generator=search) being used, and when app2 (generator=allcategories)? Are their results simply merged when user searches for category of their picture, or something else?

It would seem to me that each of the 3 APIs have their own distinct advantages.

The best results for the end user would be merging all three outputs (while removing duplicates), were it not for that skips hidden column. Obviously unions does not work well with exclusions... I might have some ideas, but would like your explanation about current usage of app1 vs. app2 above.

@sivaraam
Copy link
Member

sivaraam commented Jun 3, 2024

@sivaraam can you give insight when is app1 (generator=search) being used,

From my reading of the code, the app1 is used for category suggestions that show up in "Step 3" of the upload step. Specifically, that API is used for suggesting categories based on the caption entered by the user.

and when app2 (generator=allcategories)?

This is the one that we use when a user types in text to search for categories in "Step 3" of the upload step. That's why the search behaves case-sensitively and only returns when the exact prefix is searched for.

Are their results simply merged when user searches for category of their picture, or something else?

Nope. The results aren't merged now. They are used for different cases as my answer above should've hopefully clarified.

The best results for the end user would be merging all three outputs (while removing duplicates), were it not for that skips hidden column. Obviously unions does not work well with exclusions...

You're spot on. Actually, we don't even need to merge the three outputs. If we observe closely, just combining the results of app1 and app2 would address the troubles we have. The only thing stopping us from doing a union of those API results is the "skips hidden" part that you mention.

I might have some ideas, but would like your explanation about current usage of app1 vs. app2 above.

I hope my explanation gave you an idea. I'm all ears on your ideas 🙂

@mnalis
Copy link
Contributor

mnalis commented Jun 3, 2024

I hope my explanation gave you an idea. I'm all ears on your ideas 🙂

Thanks @sivaraam indeed it has been quite informative! Here is my idea:

  • when the user types a text to search for a category in "Step 3" of the upload, we first invoke app2 as we do now, giving user the list without hidden categories. However, if the user scrolls down to the end of that list, we immediately show a loading spinner animation, and then do an automatic request to fetch app1 results, which we then append (after removing duplicates) to the end of the existing list and return control to the user to select category.

That way, users who had found a match in original app2 results and selected it, would have the same workflow as now. However, if they scroll down to the bottom of the list (in most cases because they didn't find what they were looking for) they would get the missing results. The additional benefit is that we don't double the load the on APIs, instead of doing additional query only when it is needed.

  • slight alternative to that idea is to have manual button Load more... at the end of the original app2 list, which would do the additional app1 fetch when pressed, instead of doing that automatically as suggested above. But I think the users would get quite bored if they needed to do click that too often, so perhaps version above is more user friendly. Small benefit of fully manual button would be that it would produce even less load on the APIs (e.g. in the cases user scrolled to the bottom but did not want to find more results) and save user's bandwidth if it is not needed.

  • of course, if the user does load more categories via that second step app1 API call, it will (sometimes) contain some hidden categories. That does not sound like a big problem to me so I think it can be ignored, especially as it only happens on hopefully much smaller subset of searches (after all, that is what every user using Special:UploadWizard is exposed to!), and after the user has run out of options; but if we do think it is a big deal, it can be dealt with in two with:

    • having a hardcoded lists of popular groups of hidden categories to skip (e.g. Photographs taken on..., Requested moves... etc.) and removing them from the list before showing it to the user
    • after user selects the category, check if it is a hidden one, and if so, warn the user / refuse the category (and possibly add it to the "ignore list" above to reduce problem in the future)

    But as noted, I do not think such extra complexities would be needed, but I'd be interested to know what @nicolas-raoul and others think (especially if they have found hidden categories to be quite a hindrance before in older app version or Special:UploadWizard webpage)

  • another suggestion I have (which I perhaps should open separate issue for, although it is tightly related to how category search/selection work; but it can be implemented independently), is after the user does select a category, we should do some API lookup (I haven't yet looked up which one, but I assume it exists) to check whether:

    • category has been moved/renamed. If so, we should select that destination category instead automatically.
    • we might also check if category has zero members (or maybe even zero files -- e.g. if it has only subcategories), and warn the user if that is the case so they might go back and choose other category.
      That would reduce issues with users selecting misspelled or otherwise discouraged categories

So what do you people think?

@sivaraam
Copy link
Member

sivaraam commented Jun 4, 2024

  • when the user types a text to search for a category in "Step 3" of the upload, we first invoke app2 as we do now, giving user the list without hidden categories. However, if the user scrolls down to the end of that list, we immediately show a loading spinner animation, and then do an automatic request to fetch app1 results, which we then append (after removing duplicates) to the end of the existing list and return control to the user to select category.

That sounds like a good idea that would help us get to a situation better than the status quo. But I'm skeptical of appending the app1 results after the whole app2 results are shown. This might be unhelpful if user only has entered few characters in the search box. May be we could go fancy and show two scrollable lists each containing results of the two APIs so that things are better distinguished ? I don't how bad a UX that may be 🤔

If two scrollable liists sound bad, we should think if there's a better strategy to append results from app2 and app1? Like showing 15 results from app2 then appending 15 results from app1 and so on.

  • slight alternative to that idea is to have manual button Load more... at the end of the original app2 list, which would do the additional app1 fetch when pressed, instead of doing that automatically as suggested above. But I think the users would get quite bored if they needed to do click that too often, so perhaps version above is more user friendly.

Yeah. I too agree that automatically loading might be better UX.

Small benefit of fully manual button would be that it would produce even less load on the APIs (e.g. in the cases user scrolled to the bottom but did not want to find more results) and save user's bandwidth if it is not needed.

This is indeed a good point. We could consider implementing the "Load more" button as an enhancement for users who have the "Limited connection" mode enabled.

  • having a hardcoded lists of popular groups of hidden categories to skip (e.g. Photographs taken on..., Requested moves... etc.) and removing them from the list before showing it to the user
  • after user selects the category, check if it is a hidden one, and if so, warn the user / refuse the category (and possibly add it to the "ignore list" above to reduce problem in the future)

These are good ideas that could help improve the UX 👍🏼

  • another suggestion I have (which I perhaps should open separate issue for, although it is tightly related to how category search/selection work; but it can be implemented independently), is after the user does select a category, we should do some API lookup (I haven't yet looked up which one, but I assume it exists) to check whether:

    • category has been moved/renamed. If so, we should select that destination category instead automatically.
    • we might also check if category has zero members (or maybe even zero files -- e.g. if it has only subcategories), and warn the user if that is the case so they might go back and choose other category.
      That would reduce issues with users selecting misspelled or otherwise discouraged categories

Yeah. This is a good suggestion but let us discuss it in a separate issue as you mention. 👍🏼

@mnalis
Copy link
Contributor

mnalis commented Jun 5, 2024

That sounds like a good idea that would help us get to a situation better than the status quo. But I'm skeptical of appending the app1 results after the whole app2 results are shown. This might be unhelpful if user only has entered few characters in the search box.

In what way do you think it would be unhelpful? Could you elaborate?

I don't seem to see the problem - if user has only entered few characters in the search box they will usually get maximum results (25?) from app2 anyway. I suppose they will then either continue typing more (e.g. if they were just too slow in typing and didn't actually want list of results yet), or they will browse that list until they find a match (in which case problem is solved!), or they will reach the end of the list with no match found -- and if they reach end of list, we'd fetch additional results from app1 too as specified.

My idea was to try to use only app2 as much as possible as it preserves user experience (i.e. we won't have an inrush of "previously I had those results and now I have different ones!"), and while it has smaller result set, it also has less "false positives" -- by using app2 as a primary source we avoid hidden categories and other confusing and unwanted results that app1 might return, so we should only fall back to app1 if user was unable to find their category in app2.

May be we could go fancy and show two scrollable lists each containing results of the two APIs so that things are better distinguished ? I don't how bad a UX that may be 🤔

Uh, it seems quite bad 😅 - even on desktop, but especially on small mobile screen where screen estate is at premium.

Displaying both lists at once, or mixing them 15+15, would also emphasize two problems:

  • we'd basically double API load (and data usage) overnight (instead of just slightly increasing it in cases when user didn't find their category in the first try).
  • issues with unfiltered hidden categories would be much more visible and annoying as they would happen always (instead of just when we fallback to app1 because no match was found).

(Sure we can try to hardcode list of popular hidden categories as an additional help as suggested earlier, but we likely won't be 100% successful in removing all of them, so they'll still remain an issue)

We could consider implementing the "Load more" button as an enhancement for users who have the "Limited connection" mode enabled.

Good idea, we could (and there are also other bandwidth-saving opportunities, like not requesting thumbnails downloading when in "Limited connection" mode, or caching results for same string searches etc). But I think we better try to first do simple variant which solves the core problem of this issue (user not finding categories they want); and only when we have that working, we can then suggest additional improvements.

Yeah. This is a good suggestion but let us discuss it in a separate issue as you mention

OK, moved that suggestion to #5751

@nicolas-raoul
Copy link
Member

Huge thanks @mnalis for this table, very interesting and some cells surprised me. I am sure the table will be key to a better category search implementation.

when the user types a text to search for a category in "Step 3" of the upload, we first invoke app2 as we do now, giving user the list without hidden categories. However, if the user scrolls down to the end of that list, we immediately show a loading spinner animation, and then do an automatic request to fetch app1 results, which we then append (after removing duplicates) to the end of the existing list and return control to the user to select category.

This sounds like a good idea to me. I suggest alternating between app2 and app1 if the user scrolls further and further down.
Removing hidden categories from app1 results (maybe using a hardcoded list then checking each remaining category individually via another API call) can be left as a future enhancement.

@sivaraam
Copy link
Member

In what way do you think it would be unhelpful? Could you elaborate?

Sorry for the confusion. I was misassuming that the current UX actually permits infinite scrolling and loads the categories up until the API results are exhausted. Seems like that's not the case. We only load up to 25 results as you mention.

My idea was to try to use only app2 as much as possible as it preserves user experience (i.e. we won't have an inrush of "previously I had those results and now I have different ones!"), and while it has smaller result set, it also has less "false positives" -- by using app2 as a primary source we avoid hidden categories and other confusing and unwanted results that app1 might return, so we should only fall back to app1 if user was unable to find their category in app2.

Yeah. I like this idea. This would be a great improvement to the current situation 👍🏼

May be we could go fancy and show two scrollable lists each containing results of the two APIs so that things are better distinguished ? I don't how bad a UX that may be 🤔

Uh, it seems quite bad 😅 - even on desktop, but especially on small mobile screen where screen estate is at premium.

Yeah. Let's just drop that as a bad idea 🙂

Displaying both lists at once, or mixing them 15+15, would also emphasize two problems:

  • we'd basically double API load (and data usage) overnight (instead of just slightly increasing it in cases when user didn't find their category in the first try).
  • issues with unfiltered hidden categories would be much more visible and annoying as they would happen always (instead of just when we fallback to app1 because no match was found).

Makes sense.

Good idea, we could (and there are also other bandwidth-saving opportunities, like not requesting thumbnails downloading when in "Limited connection" mode, or caching results for same string searches etc). But I think we better try to first do simple variant which solves the core problem of this issue (user not finding categories they want); and only when we have that working, we can then suggest additional improvements.

Yeah. That's true. We could first do the alternating requests to app2 and app1 as that seems to be a mutually agreed solution that would help improve the current situation well 👍🏼

@mnalis
Copy link
Contributor

mnalis commented Jun 23, 2024

Yeah. That's true. We could first do the alternating requests to app2 and app1 as that seems to be a mutually agreed solution that would help improve the current situation well 👍🏼

Sounds good to me @sivaraam !
Now if only someone good with Kotlin and commons app codebase would volunteer to implement it! 😉

@OpenGreenStreet
Copy link

In the current version (5.1.0~707997) the search for categories ignores upper and lower case at the beginning. This is very nice. For categories with several words, however, capitalization is mandatory from the second word onwards. (not so nice).
https://commons.wikimedia.org/wiki/File:20241228_xl_1507-Upper-lower_case_of_category_names.jpg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants