feat: add length unit support in FileSystem limits #781

ppkarwasz · 2025-09-05T22:48:53Z

Different filesystems and operating systems measure file and path lengths in different units:

~~macOS and~~ Windows filesystems typically count UTF-16 code units.
Linux and other UNIX filesystems typically count bytes.

This change introduces explicit unit support so these limits can be interpreted consistently.

Key changes

New API
- ~~Added a LengthUnit enum and FileSystem.getLengthUnit() to expose the unit of measure used by getMaxFileNameLength() and getMaxPathLength()~~.
- Added new overloads for isLegalFileName and toLegalFileName that accept a Charset, making conversions between bytes and UTF-16 explicit.
Adjusted defaults
- Reduced the GENERIC filesystem defaults:
  - File name length → 1020 bytes (covers 255 UTF-16 characters encoded as up to 3 UTF-8 bytes).
  - Path length → 1 MiB (covers 32,767 UTF-16 code units, again at 3 UTF-8 bytes each).
Testing
- Added unit tests to validate the new API and updated limits.

Different filesystems and operating systems measure file and path lengths in different units: * macOS and Windows filesystems typically count **UTF-16 code units**. * Linux and other UNIX filesystems typically count **bytes**. This change introduces explicit unit support so these limits can be interpreted consistently. ### Key changes * **New API** * Added a `LengthUnit` enum and `FileSystem.getLengthUnit()` to expose the unit of measure used by `getMaxFileNameLength()` and `getMaxPathLength()`. * Added new overloads for `isLegalFileName` and `toLegalFileName` that accept a `Charset`, making conversions between bytes and UTF-16 explicit. * **Adjusted defaults** * Reduced the `GENERIC` filesystem defaults: * File name length → **1020 bytes** (covers 255 UTF-16 characters encoded as up to 3 UTF-8 bytes). * Path length → **1 MiB** (covers 32,767 UTF-16 code units, again at 3 UTF-8 bytes each). * **Testing** * Added unit tests to validate the new API and updated limits.

garydgregory

Hi @ppkarwasz

I have a question on the LengthUnit class and a few comments.

TY!

src/main/java/org/apache/commons/io/FileSystem.java

* Refactors comparison and truncation logic into `LengthUnit`, renamed to `NameLengthStrategy`. * Makes the `NameLengthStrategy` value internal-only. * Improves Javadoc for `getMaxFileNameLength` and `getMaxPathLength` to clarify that staying within the reported limit is necessary but not sufficient for a name or path to be valid on all filesystems.

ecki

Just a few comments, feel free to ignore (they are more towards the overall concept of this API not your improvement)

src/main/java/org/apache/commons/io/FileSystem.java

garydgregory

Hi @ppkarwasz

I have one small comment. Have @ecki's issues been resolved?

src/main/java/org/apache/commons/io/FileSystem.java

ecki · 2025-09-08T15:27:14Z

BTW: here is the place in the JDK I was refering about, but not sure about the conditions it applies

https://github.com/openjdk/jdk/blob/166ef5e7b1c6d6a9f0f1f29fedb7f65b94f53119/src/java.base/windows/classes/java/io/WinNTFileSystem.java#L42

I think it only applies to new Path APIs.

garydgregory · 2025-09-08T15:47:27Z

Hi @ppkarwasz
Could you please rebase on Git master?

ecki · 2025-09-08T16:23:12Z

Have @ecki's issues been resolved?

I think I had trouble implementing a similar function compatible with the OS, that’s the base of my comments but I think it might be a good attempt. Not sure it is a good candidate for a tool (or actually it would be a good candidate if it works), but the new code seems to improve the current api, so why not,

src/test/java/org/apache/commons/io/FileSystemTest.java

ppkarwasz · 2025-09-11T09:39:05Z

Could you please rebase on Git master?

I merged with master in b99664a, which should be equivalent if this PR is squashed and easier to follow for the review.

garydgregory · 2025-09-11T11:37:36Z

Hi @ppkarwasz
All macOS tests fail in GH CI.

src/main/java/org/apache/commons/io/FileSystem.java

ppkarwasz · 2025-09-11T12:45:56Z

All macOS tests fail in GH CI.

Yes, look at this comment: #781 (comment)

Basically I am testing that the limit in UTF-8 bytes is sharp on macOS and you can not create files with names longer than NAME_MAX. Beyond that limit, calling POSIX methods like open(), readdir() are not safe any more, but apparently macOS handles those as long as the underlying filesystem supports them. Now:

HFS+ supports up to 255 UTF-16 code units and this is probably the reason all macOS tests fail.
APFS supports only 255 UTF-8 bytes, so the test should succeed on a macOS that uses APFS. Could you check on your hardware?

If I can confirm that the APFS limit is strict, I can disable the test on macOS or remove it entirely.

ecki · 2025-09-11T13:06:34Z

Uh wait if the purpose of this utility is to truely report the filenamenlength on the systems then the test should not be skipped but the utility be corrected to handle that case?

ppkarwasz · 2025-09-11T21:30:22Z

@ecki,

You’re right: the test itself shouldn’t be disabled, but only the part that makes assumptions which don’t hold across all macOS filesystems.

After looking more closely at macOS name length limits:

NAME_MAX is 255 bytes (see getconf NAME_MAX /). This is the minimum limit that POSIX guarantees every filesystem must support.
The dirent struct (man dir(5)) reserves space for 1023 bytes since Leopard. That defines the largest name the POSIX API is prepared to handle, but does not mean filesystems can actually allow that length.
HFS+ enforces a maximum of 255 UTF-16 code units per file name.
APFS (the default since macOS High Sierra) enforces a maximum of 255 UTF-8 bytes per file name.

In 087654e I updated the test logic:

Always fail if creating a name ≤ 255 UTF-8 bytes does not work.
For names ≥ 256 UTF-8 bytes, only fail if all variants succeed. At least one (the pure ASCII one, where UTF-8 bytes = UTF-16 units) should fail on HFS+ as well.

This way the test still enforces the guaranteed minimum (255 bytes), but tolerates differences between APFS and HFS+.

garydgregory requested changes Sep 6, 2025

View reviewed changes

ecki reviewed Sep 6, 2025

View reviewed changes

garydgregory requested changes Sep 7, 2025

View reviewed changes

src/main/java/org/apache/commons/io/FileSystem.java Outdated Show resolved Hide resolved

ppkarwasz and others added 6 commits September 8, 2025 12:01

fix: Javadoc of getMaxPathLength

b1cdfdb

fix: do not truncate extension

a0b99bf

fix: checkstyle violations

57482fc

fix: make nameLengthStrategy

ff205f8

fix: make nameLengthStrategy private (2)

26879d9

Fix PMD

9bad974

ppkarwasz added 4 commits September 10, 2025 19:58

fix: rename UTF16_CHARS -> UTF16_CODE_UNITS

bf25189

fix: simplify truncate

cc96ba1

fix: switch MacOS to bytes

04621bf

fix: testMaxNameLength_MatchesRealSystem test

26dd573

ppkarwasz commented Sep 10, 2025

View reviewed changes

src/test/java/org/apache/commons/io/FileSystemTest.java Show resolved Hide resolved

Merge remote-tracking branch 'apache/master' into feat/file-systems

b99664a

ppkarwasz requested a review from garydgregory September 11, 2025 09:39

garydgregory reviewed Sep 11, 2025

View reviewed changes

src/main/java/org/apache/commons/io/FileSystem.java Show resolved Hide resolved

ppkarwasz added 4 commits September 11, 2025 22:02

fix: testMaxNameLength_MatchesRealSystem

087654e

fix: improve truncate tests

9eb8e28

fix: try fix macOS tests

fa57c3d

Merge branch 'master' into feat/file-systems

f58e111

ppkarwasz and others added 3 commits September 12, 2025 08:34

fix: add support for grapheme clusters

81a33ae

fix: tests on JDK 19 or earlier

3fb62d5

Merge branch 'master' into feat/file-systems

b13cd59

garydgregory merged commit 7810325 into master Sep 13, 2025
20 of 21 checks passed

garydgregory deleted the feat/file-systems branch September 13, 2025 02:44

ppkarwasz mentioned this pull request Sep 18, 2025

Add StringUtils.truncateToByteLength apache/commons-lang#1392

Open

feat: add length unit support in FileSystem limits #781

feat: add length unit support in FileSystem limits #781

Uh oh!

Conversation

ppkarwasz commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key changes

Uh oh!

garydgregory left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ecki left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

garydgregory left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ecki commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

garydgregory commented Sep 8, 2025

Uh oh!

ecki commented Sep 8, 2025

Uh oh!

Uh oh!

ppkarwasz commented Sep 11, 2025

Uh oh!

garydgregory commented Sep 11, 2025

Uh oh!

Uh oh!

ppkarwasz commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ecki commented Sep 11, 2025

Uh oh!

ppkarwasz commented Sep 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ppkarwasz commented Sep 5, 2025 •

edited

Loading

ecki commented Sep 8, 2025 •

edited

Loading

ppkarwasz commented Sep 11, 2025 •

edited

Loading