Skip to content

Conversation

@ppkarwasz
Copy link
Contributor

@ppkarwasz ppkarwasz commented Sep 5, 2025

Different filesystems and operating systems measure file and path lengths in different units:

  • macOS and Windows filesystems typically count UTF-16 code units.
  • Linux and other UNIX filesystems typically count bytes.

This change introduces explicit unit support so these limits can be interpreted consistently.

Key changes

  • New API

    • Added a LengthUnit enum and FileSystem.getLengthUnit() to expose the unit of measure used by getMaxFileNameLength() and getMaxPathLength().
    • Added new overloads for isLegalFileName and toLegalFileName that accept a Charset, making conversions between bytes and UTF-16 explicit.
  • Adjusted defaults

    • Reduced the GENERIC filesystem defaults:

      • File name length → 1020 bytes (covers 255 UTF-16 characters encoded as up to 3 UTF-8 bytes).
      • Path length → 1 MiB (covers 32,767 UTF-16 code units, again at 3 UTF-8 bytes each).
  • Testing

    • Added unit tests to validate the new API and updated limits.

Different filesystems and operating systems measure file and path lengths in different units:

* macOS and Windows filesystems typically count **UTF-16 code units**.
* Linux and other UNIX filesystems typically count **bytes**.

This change introduces explicit unit support so these limits can be interpreted consistently.

### Key changes

* **New API**

  * Added a `LengthUnit` enum and `FileSystem.getLengthUnit()` to expose the unit of measure used by `getMaxFileNameLength()` and `getMaxPathLength()`.
  * Added new overloads for `isLegalFileName` and `toLegalFileName` that accept a `Charset`, making conversions between bytes and UTF-16 explicit.

* **Adjusted defaults**

  * Reduced the `GENERIC` filesystem defaults:

    * File name length → **1020 bytes** (covers 255 UTF-16 characters encoded as up to 3 UTF-8 bytes).
    * Path length → **1 MiB** (covers 32,767 UTF-16 code units, again at 3 UTF-8 bytes each).

* **Testing**

  * Added unit tests to validate the new API and updated limits.
Copy link
Member

@garydgregory garydgregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ppkarwasz

I have a question on the LengthUnit class and a few comments.

TY!

* Refactors comparison and truncation logic into `LengthUnit`, renamed to `NameLengthStrategy`.
* Makes the `NameLengthStrategy` value internal-only.
* Improves Javadoc for `getMaxFileNameLength` and `getMaxPathLength` to clarify that staying within the reported limit is necessary but not sufficient for a name or path to be valid on all filesystems.
Copy link

@ecki ecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments, feel free to ignore (they are more towards the overall concept of this API not your improvement)

Copy link
Member

@garydgregory garydgregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ppkarwasz

I have one small comment. Have @ecki's issues been resolved?

@ecki
Copy link

ecki commented Sep 8, 2025

BTW: here is the place in the JDK I was refering about, but not sure about the conditions it applies

https://github.com/openjdk/jdk/blob/166ef5e7b1c6d6a9f0f1f29fedb7f65b94f53119/src/java.base/windows/classes/java/io/WinNTFileSystem.java#L42

I think it only applies to new Path APIs.

@garydgregory
Copy link
Member

Hi @ppkarwasz
Could you please rebase on Git master?

@ecki
Copy link

ecki commented Sep 8, 2025

Have @ecki's issues been resolved?

I think I had trouble implementing a similar function compatible with the OS, that’s the base of my comments but I think it might be a good attempt. Not sure it is a good candidate for a tool (or actually it would be a good candidate if it works), but the new code seems to improve the current api, so why not,

@ppkarwasz
Copy link
Contributor Author

Could you please rebase on Git master?

I merged with master in b99664a, which should be equivalent if this PR is squashed and easier to follow for the review.

@garydgregory
Copy link
Member

Hi @ppkarwasz
All macOS tests fail in GH CI.

@ppkarwasz
Copy link
Contributor Author

ppkarwasz commented Sep 11, 2025

All macOS tests fail in GH CI.

Yes, look at this comment: #781 (comment)

Basically I am testing that the limit in UTF-8 bytes is sharp on macOS and you can not create files with names longer than NAME_MAX. Beyond that limit, calling POSIX methods like open(), readdir() are not safe any more, but apparently macOS handles those as long as the underlying filesystem supports them. Now:

  • HFS+ supports up to 255 UTF-16 code units and this is probably the reason all macOS tests fail.
  • APFS supports only 255 UTF-8 bytes, so the test should succeed on a macOS that uses APFS. Could you check on your hardware?

If I can confirm that the APFS limit is strict, I can disable the test on macOS or remove it entirely.

@ecki
Copy link

ecki commented Sep 11, 2025

Uh wait if the purpose of this utility is to truely report the filenamenlength on the systems then the test should not be skipped but the utility be corrected to handle that case?

@ppkarwasz
Copy link
Contributor Author

@ecki,

You’re right: the test itself shouldn’t be disabled, but only the part that makes assumptions which don’t hold across all macOS filesystems.

After looking more closely at macOS name length limits:

  • NAME_MAX is 255 bytes (see getconf NAME_MAX /). This is the minimum limit that POSIX guarantees every filesystem must support.
  • The dirent struct (man dir(5)) reserves space for 1023 bytes since Leopard. That defines the largest name the POSIX API is prepared to handle, but does not mean filesystems can actually allow that length.
  • HFS+ enforces a maximum of 255 UTF-16 code units per file name.
  • APFS (the default since macOS High Sierra) enforces a maximum of 255 UTF-8 bytes per file name.

In 087654e I updated the test logic:

  • Always fail if creating a name ≤ 255 UTF-8 bytes does not work.
  • For names ≥ 256 UTF-8 bytes, only fail if all variants succeed. At least one (the pure ASCII one, where UTF-8 bytes = UTF-16 units) should fail on HFS+ as well.

This way the test still enforces the guaranteed minimum (255 bytes), but tolerates differences between APFS and HFS+.

@garydgregory garydgregory merged commit 7810325 into master Sep 13, 2025
20 of 21 checks passed
@garydgregory garydgregory deleted the feat/file-systems branch September 13, 2025 02:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants