-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Make category search non case-sensitive and more user friendly #3179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Maybe if we convert everything to lowercase then the server performs a non-case-sensitive search? That's just an hypothesis, I have not tried. |
@nicolas-raoul Possible! We'll try it out with a direct query first. |
Can I take this issue? |
@ankit-kumar-dwivedi please feel free! |
@misaochan Is this issue free to be worked upon? if so can i take it? |
Hey! Yes sure you should start working on it as I'm not working on it right
now.
…On Sun, Jan 12, 2020, 7:21 PM Kshitij Bhardwaj ***@***.***> wrote:
@misaochan <https://github.com/misaochan> Is this issue free to be worked
upon? if so can i take it?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3179?email_source=notifications&email_token=AI7ACH2SIIJZ5DQVUZWKGVDQ5MN6ZA5CNFSM4JBW444KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIW2OWQ#issuecomment-573417306>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AI7ACHYZUMAZFWYYVHDK77LQ5MN6ZANCNFSM4JBW444A>
.
|
Thank you:) |
* CategoryClient: fix category search case-sensitivity by converting to lower case as MW api is inherently case-sensitive, the results obtained will be same * CategoryItem: reverting javadoc changes * CategoriesModel: make category search case-insensitive * CategoryItem: fix whitespaces * Add tests for case-insensitivity * CategoryClientTest: add more test cases * CategoryClientTest: fix travis ci test * CategoriesModelTest: changes mage to CategoriesModel and tested
I'm re-opening this as I believe there's a problem with how this issue was fixed. @kbhardwaj123 Can you clarify a doubt that I have regarding your PR #3326? In the description you say:
Are you sure the API really doesn't care about the case of the category name given to it? I'm doubtful about that for several reasons. Here are a couple:
In case you're wondering why the test case didn't fail. Here's the catch:
From Manual:Page title - MediaWiki I think the quote speaks for itself. I'll share the actual problem w.r.t to the app in the next comment. |
@sivaraam while I was working on this I went with what @nicolas-raoul suggested so I ensured that all the category strings being passed to the OkHttpClient are converted to lower case and I wrote new tests regarding that and they worked fine but I guess I must have missed something I will take a look at at again |
Ok. Here's the issue with the respect to the app: category search doesn't return any categories with a prefix that has a upper case character in it (other than the first one, of course). See #3582 for proof. In case the issue is not clear to you from #3582, here's another example. Here's what I get when I search for categories with "COVID" (mind the case) in the app (version: 2.12.3.629~a63a358): Now, consider the linked example query which returns 25 categories which have "COVID" as it's prefix. Here are the categories that the query returns:
As you can see, none of the above categories are shown in the category suggestions. |
@misaochan Given that we've now accidentally reduced the category search space rather than increasing it, you might want to ensure we fix this before releasing the next version. |
Added to the release list, thanks for the heads up! |
Hi @kbhardwaj123 , are you currently still working on this? Please do keep us updated, thanks! |
@misaochan sure I'm on it, will update ASAP |
Thanks @kbhardwaj123 ! As we are planning to include this in v2.13, when you submit your PR could you please rebase and submit it on the |
I investigated about the problem and here are my findings.
So Suppose I want to find the category Temple of Ishtar at Mari by entering temple of ishtar
Now on reading the logs i realized that the method But there's a catch
Possible Solution
|
@misaochan @sivaraam @nicolas-raoul @maskaravivek I need your opinions on my investigation on this to fix it for v2.13 i mean are we going to use the production flavor of the APIs in v2.13 |
@kbhardwaj123 Thanks for the analysis. I'll look into it and share my comments soon. I have a quick doubt about one particular thing:
What do you mean by beta flavor of API? Do you mean the API hosted in the beta server (https://commons.wikimedia.beta.wmflabs.org/w/api.php) as opposed to the production server (https://commons.wikimedia.org/w/api.php)? |
For category search (and really any testing that does not involve actually uploading), please use the |
@sivaraam yes that's exactly what i meant the API hosted on beta server https://commons.wikimedia.beta.wmflabs.org/w/api.php has server API but it is unable to give the required result where as the production API https://commons.wikimedia.org/w/api.php gives the expected result as shown bu the links in by previous comment. |
To continue the discussion from #5712, the only idea I could think of to improve our category search such that it behaves in a case-insensitive and fuzzy way is to possibly consider augmenting the results of the This has the caveat that we would be starting to get hidden categories in our result again as API:opensearch does not know about hidden categories. Is that a fine trade off? |
Sounds like a negative tradeoff to me. 🤔 |
Yeah. That's kind of the problem. We don't have an API as far as I know that gives us the best of both worlds 🙁 I'll be more than glad to be proved wrong. |
@mnalis Does https://commons.wikimedia.org/wiki/Special:UploadWizard give you the best of both worlds, or not? If yes we should investigate further their API calls and processing. :-) |
(@nicolas-raoul notice: unfortunately I have my hands full this week, but I'll try to read up and make a comparison table of current app API vs |
Sorry it took some time, in the end I found it worth writing a script to generate it... Some notes: Text "no" or "match" means whether it matched the "Expected Result" in that API call for "Search Term". Its I've increased search space to 90 on some searches (from their default 10/30 matches, otherwise the results do not show how good the API is at matching, but instead how many other results is returns, and they may miss the one we search for because it's on place 60 or something). Like on TL;DR: the more green checkboxes API has, the better it is. commons-app code seems to contain 4 different APIs, although I don't know which ones are used? Those are labeled Let me know if I should add/remove search terms or APIs (like are those |
This comment was marked as outdated.
This comment was marked as outdated.
That's a rather exhaustive analysis. Thank you for your efforts on this, @mnalis !
Reg. the app3 and app4, they are the APIs that query
So, I think both of these don't server the category search use-case. So, we can leave them out during our consideration. Overall, I think your results confirm that the
Does your test set include a category for which a category page does not yet exist? 🤔 |
Yes, Category:"Rose" factory from #3179 (comment) does not have category page.
Update: Ah, I see, that is because only "allcategories" search actually match categories without In next message is new version (make category name clickable with hover-over tooltip about the use case, removed |
This is the summary of the results as I read it, please verify if I got it right and if I forgot something:
Legend: API shortnames (used for display formatting reasons)
|
@sivaraam can you give insight when is It would seem to me that each of the 3 APIs have their own distinct advantages. The best results for the end user would be merging all three outputs (while removing duplicates), were it not for that |
From my reading of the code, the
This is the one that we use when a user types in text to search for categories in "Step 3" of the upload step. That's why the search behaves case-sensitively and only returns when the exact prefix is searched for.
Nope. The results aren't merged now. They are used for different cases as my answer above should've hopefully clarified.
You're spot on. Actually, we don't even need to merge the three outputs. If we observe closely, just combining the results of
I hope my explanation gave you an idea. I'm all ears on your ideas 🙂 |
Thanks @sivaraam indeed it has been quite informative! Here is my idea:
That way, users who had found a match in original
So what do you people think? |
That sounds like a good idea that would help us get to a situation better than the status quo. But I'm skeptical of appending the If two scrollable liists sound bad, we should think if there's a better strategy to append results from
Yeah. I too agree that automatically loading might be better UX.
This is indeed a good point. We could consider implementing the "Load more" button as an enhancement for users who have the "Limited connection" mode enabled.
These are good ideas that could help improve the UX 👍🏼
Yeah. This is a good suggestion but let us discuss it in a separate issue as you mention. 👍🏼 |
In what way do you think it would be unhelpful? Could you elaborate? I don't seem to see the problem - if user has only entered few characters in the search box they will usually get maximum results (25?) from My idea was to try to use only
Uh, it seems quite bad 😅 - even on desktop, but especially on small mobile screen where screen estate is at premium. Displaying both lists at once, or mixing them 15+15, would also emphasize two problems:
(Sure we can try to hardcode list of popular hidden categories as an additional help as suggested earlier, but we likely won't be 100% successful in removing all of them, so they'll still remain an issue)
Good idea, we could (and there are also other bandwidth-saving opportunities, like not requesting thumbnails downloading when in "Limited connection" mode, or caching results for same string searches etc). But I think we better try to first do simple variant which solves the core problem of this issue (user not finding categories they want); and only when we have that working, we can then suggest additional improvements.
OK, moved that suggestion to #5751 |
Huge thanks @mnalis for this table, very interesting and some cells surprised me. I am sure the table will be key to a better category search implementation.
This sounds like a good idea to me. I suggest alternating between app2 and app1 if the user scrolls further and further down. |
Sorry for the confusion. I was misassuming that the current UX actually permits infinite scrolling and loads the categories up until the API results are exhausted. Seems like that's not the case. We only load up to 25 results as you mention.
Yeah. I like this idea. This would be a great improvement to the current situation 👍🏼
Yeah. Let's just drop that as a bad idea 🙂
Makes sense.
Yeah. That's true. We could first do the alternating requests to app2 and app1 as that seems to be a mutually agreed solution that would help improve the current situation well 👍🏼 |
Sounds good to me @sivaraam ! |
In the current version (5.1.0~707997) the search for categories ignores upper and lower case at the beginning. This is very nice. For categories with several words, however, capitalization is mandatory from the second word onwards. (not so nice). |
Received a report from a 2.11 user on our FB page that category search is case-sensitive for her, which means that sometimes she'll type the right category in the search field but nothing will show up.
AFAIK the MW API that we use is inherently case-sensitive, but the upload wizard seems to be able to find a way around that and produces the same category suggestions regardless of case.
Edit: Apart from the case sensitivity, the
allcategories
API also has a problem of doing a prefix match. This does not give a great UX. We should explore ways to fix this too.The text was updated successfully, but these errors were encountered: