Skip to content

Improve Unique File Name Search #5877

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

AlexG-2003
Copy link
Contributor

@AlexG-2003 AlexG-2003 commented Oct 20, 2024

Changed from increment to alphanumeric hash to append to a file name when file name already exists. Implemented in findUniqueFileName() under UploadWorker.kt.

Fixes #2506

Read comments under issue #2506 for a more in-depth discussion of why this change is necessary and why this is a good solution. findUniqueFileName() was using an incremeanting counter to find unique file names. For example if "Test " already existed as a file title then it would return the title "Test 1" or if "Test 13" exited it would return "Test 14". This made uploading files with popular names very slow as it would sometimes result in many collisions. This implementation has been changed to a random hash, the file name is now appended with a 5-digit numeric hash which will generally result in only 1 collision, making uploading much faster. The updated title now follows the format <existing_title>_#XXX.

Tests performed (required)

Tested testBetaDebugUnitTest on Pixel 9 Pro XL with API level 35.

AlexG-2003 and others added 2 commits October 16, 2024 14:55
…git alphanumeric hash to append to a file name to make it unique. This improves speed over using an incrementing number to append as there are fewer collisions.
Modified findUniqueFileName() in UploadWorker.kt to use a random 3-di…
// Generate a random 5-character alphanumeric string
val randomHash = (1..3)
.map { chars[random.nextInt(chars.length)] }
.joinToString("")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow you are fast!

Would you mind keeping the number sequence strategy?

We are open to a hash strategy, but could you create a GitHub issue to discuss its pros and cons?

Thanks a lot! 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue I’m addressing #2506 does have discussion of the different strategies for speeding up this search and from what I could see a hash was voted for as the fastest and easiest to implement. I could revert to the number sequence but the code would remain virtually unchanged. The other option discussed was binary search to maintain the number sequence format of titles but improve search efficiency.

Copy link
Member

@nicolas-raoul nicolas-raoul Oct 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah understood thanks! :-)
Sorry for the confusion I was on mobile and did not check the issue properly, I incorrectly assumed it was about #5876 that I had created an hour before.

AlexG-2003 and others added 2 commits October 22, 2024 12:17
…git numeric hash rather than the previous 3-digit alphanumeric hash
Modified findUniqueFileName() in UploadWorker.kt to use a random 5-di…
@whym
Copy link
Collaborator

whym commented Oct 29, 2024

It would be nice to add a unit test to verify that the new code works as expected. Can you do that? (I think you will need to mock mediaClient for the unit test, as in app/src/test/kotlin/fr/free/nrw/commons/MediaDataExtractorTest.kt.)

If not that's okay - let's leave it as a new issue for later (or for someone else).


Is .idea/inspectionProfiles/Project_Default.xml accidantally committed into this patch?

@AlexG-2003
Copy link
Contributor Author

Sorry, I am hitting exam period so I won't have time to write tests for this. I will open a new issue for it. Also yes, .idea/inspectionProfiles/Project_Default.xml wasn't meant to be changed.

Copy link
Member

@nicolas-raoul nicolas-raoul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested with two multi-uploads all using the same caption. It works great!

The # character is not allowed in filenames and thus gets transformed into - but that's not an issue as it will be modified anyway in the context of #5876

Thanks a lot! :-)

@nicolas-raoul nicolas-raoul merged commit 183e84c into commons-app:main Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Speed up the search process for available file names in findUniqueFilename
5 participants