Skip to content

CODEC-314: Fix possible IndexOutOfBoundsException thrown by PercentCodec.insertAlwaysEncodeChars() method#222

Closed
arthurscchan wants to merge 2 commits into
apache:masterfrom
arthurscchan:CODEC-314-IndexOutOfBound
Closed

CODEC-314: Fix possible IndexOutOfBoundsException thrown by PercentCodec.insertAlwaysEncodeChars() method#222
arthurscchan wants to merge 2 commits into
apache:masterfrom
arthurscchan:CODEC-314-IndexOutOfBound

Conversation

@arthurscchan

@arthurscchan arthurscchan commented Nov 22, 2023

Copy link
Copy Markdown
Contributor

This fixes a possible IndexOutOfBoundsException in src/main/java/org/apache/commons/codec/net/PercentCodec.java thrown by PercentCodec.insertAlwaysEncodeChars() method if the initial byte array contains negative bytes which are used as index for the BitSet.

This PR adds a conditional check to ensure only valid bytes (positive or zero) are processed.

We found this bug using fuzzing by way of OSS-Fuzz. It is reported at https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=64362.

@garydgregory

Copy link
Copy Markdown
Member

Let's just have ONE PR instead of one-liner PRs. Especially since these all claim to fix the exact same kind of issue. Also, you MUST provide a matching test.

@arthurscchan

Copy link
Copy Markdown
Contributor Author

Hi, I have added unit test. I am not sure about combining the PRs since they are opened as 4 different issues in the JIRA.

@garydgregory garydgregory left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @arthurscchan
Please see my comment.

Comment thread src/test/java/org/apache/commons/codec/net/PercentCodecTest.java Outdated
@arthurscchan arthurscchan force-pushed the CODEC-314-IndexOutOfBound branch from 0891e19 to 9845bcc Compare November 22, 2023 20:02
@garydgregory

Copy link
Copy Markdown
Member

Hi, I have added unit test. I am not sure about combining the PRs since they are opened as 4 different issues in the JIRA.

In this case, I don't care about mapping PRs to JIRA 1-1. YMMV ;-) I care about making my life easier ;-)

@garydgregory

Copy link
Copy Markdown
Member

Hi @arthurscchan,

Please help me understand why we need this PR because at first glance this PR could be making the use case worse IMO:

  • Before PR: Grabage in -> exception, albeit not IllegalArgumentException.
  • After PR: Garbage in -> garbage out

Or am I missing something?
TY!

@arthurscchan

arthurscchan commented Nov 23, 2023

Copy link
Copy Markdown
Contributor Author

Hi @garydgregory,

In my understanding, the constructor of PecentCodec calls the insertAlwaysEncodeChars(byte[]) method in Line 83. The use of the insertAlwaysEncodeChars(byte[]) method is to record all bytes in the provided byte array to be some character that requires encoding. Thus if garbage bytes do exist in the byte array, it should be skipped and all other "valid" bytes should still be considered. For example, if someone thinks that the letter A should always be encoded, and accidentally provides byte[] encode = {(byte)'A', (byte) -1};, the logic could still work, skipping the -1 bytes and change the 'A' byte in the BitSet to true. Also, the byte array itself is never stored in the object. Thus the PR could make the running smoother without crashing even if some invalid byte does exist. The new logic simply skips them and continues. The most important part is that "valid" bytes are still processed and interpreted.
This is my understanding, am I missing something here?? Do let me know if I interpret the functionality wrongly. Thanks.

@arthurscchan

arthurscchan commented Nov 23, 2023

Copy link
Copy Markdown
Contributor Author

Hi, I have added unit test. I am not sure about combining the PRs since they are opened as 4 different issues in the JIRA.

In this case, I don't care about mapping PRs to JIRA 1-1. YMMV ;-) I care about making my life easier ;-)

If you want, I don't mind combining all the PRs together, not sure if it is more messy to handle. Although they are throwing similar exceptions, I would say the reason is not too similar IMO.

@garydgregory

Copy link
Copy Markdown
Member

@arthurscchan
Please use a better description in PRs and JIRA: Specify the class and method where the exception occurs.

@garydgregory

Copy link
Copy Markdown
Member

@arthurscchan Please use a better description in PRs and JIRA: Specify the class and method where the exception occurs.

Hi, I have added unit test. I am not sure about combining the PRs since they are opened as 4 different issues in the JIRA.

In this case, I don't care about mapping PRs to JIRA 1-1. YMMV ;-) I care about making my life easier ;-)

If you want, I don't mind combining all the PRs together, not sure if it is more messy to handle. Although they are throwing similar exceptions, I would say the reason is not too similar IMO.

Well, at this point, I commented on all PRs, so we might as well leave it as is.
You'll have to make sure to review all comments from all PRs as they apply to all PRs but I did not want to repeat myself all over the place ;-)

@arthurscchan arthurscchan changed the title CODEC-314: Fix possible IndexOutOfBoundsException CODEC-314: Fix possible IndexOutOfBoundsException thrown by PercentCodec.insertAlwaysEncodeChars() method Nov 24, 2023
@arthurscchan arthurscchan force-pushed the CODEC-314-IndexOutOfBound branch from 9845bcc to 46991d0 Compare November 24, 2023 20:34
@codecov-commenter

codecov-commenter commented Nov 25, 2023

Copy link
Copy Markdown

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.23%. Comparing base (44e4c4d) to head (e193cd6).
Report is 558 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master     #222      +/-   ##
============================================
- Coverage     92.27%   92.23%   -0.05%     
- Complexity     1742     1747       +5     
============================================
  Files            67       67              
  Lines          4584     4595      +11     
  Branches        709      712       +3     
============================================
+ Hits           4230     4238       +8     
- Misses          242      243       +1     
- Partials        112      114       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@garydgregory

garydgregory commented Nov 25, 2023

Copy link
Copy Markdown
Member

Hi @garydgregory,

In my understanding, the constructor of PecentCodec calls the insertAlwaysEncodeChars(byte[]) method in Line 83. The use of the insertAlwaysEncodeChars(byte[]) method is to record all bytes in the provided byte array to be some character that requires encoding. Thus if garbage bytes do exist in the byte array, it should be skipped and all other "valid" bytes should still be considered. For example, if someone thinks that the letter A should always be encoded, and accidentally provides byte[] encode = {(byte)'A', (byte) -1};, the logic could still work, skipping the -1 bytes and change the 'A' byte in the BitSet to true. Also, the byte array itself is never stored in the object. Thus the PR could make the running smoother without crashing even if some invalid byte does exist. The new logic simply skips them and continues. The most important part is that "valid" bytes are still processed and interpreted. This is my understanding, am I missing something here?? Do let me know if I interpret the functionality wrongly. Thanks.

Hi @arthurscchan
Hmm, so that means that one can roundtrip encode/decode, but when you start with bad data, you end up with nothing (if all the input is bad) or good encoded data. But when you decode that encoded data, you don't end up with what you started. If we really want all of that, I think we should update the Javadoc and describe that encoding also performs filtering and can never throw an exception for illegal input. WDYT?
Does that mean encode never throws EncoderException?

@arthurscchan

Copy link
Copy Markdown
Contributor Author

Hi @garydgregory,

I think it is a different case. The invalid bytes are provided through constructor, it is only used for the initial setting of the PercentCodec object to set which bytes are to be encoded during the encode or decode method. Thus if we initialise the PercentCodec with a byte array contains invalid bytes, it just mean that the encode method will not encode that byte. It does not affect the encode and decode process at all. Thus the encode / decode method round trip won't be affected and the result is the same as the original input. Maybe I could add a round trip call and ensure the result are the same in the test case.
Alternatively, instead of checking the length and skip the invalid byte during constructor phase, I could simply add a try catch block in the code, when any IndexOutOfBound (or more general, and kind of RuntimeException) is thrown, wrap it with EncoderException. This could at least make the method not throwing "unexpected" RuntimeException. It maybe a better approach. WDYT?

@garydgregory

Copy link
Copy Markdown
Member

Hi @arthurscchan
I'm not a fan of making the ctor overly clever in a way that was never documented and is not expected or obvious (IMO). This violates the principle of least surprise (IMO). We then risk some feature-creep, and setting expectations that other constructors should somehow sanitize their inputs. That is not the best approach for a lower-level library like Commons Codec I would offer. I'd rather the ctor throws an exception on garbage input.

@arthurscchan

arthurscchan commented Nov 27, 2023

Copy link
Copy Markdown
Contributor Author

Hi @garydgregory, OK. Sorry for not considering in that direction. I will change the code in this PR to wrap around the possible IndexOutOfBoundException in an EncoderException, to at least decrease the change of "unpexected" input thrown directly from the constructor. I will also change a bit on the javadoc comments, mentioning that EncoderException will be thrown by the ctor if invalid bytes exist in the provided byte array. That maybe a better approach. Thanks for your comments.

@garydgregory

Copy link
Copy Markdown
Member

TY @arthurscchan
Let's keep in mind that this is a library that needs to make sense (or try to) for users. As opposed to just addressing fuzzing issues in the most expedient manner ;-) TY again for your work :-)

Signed-off-by: Arthur Chan <arthur.chan@adalogics.com>
Signed-off-by: Arthur Chan <arthur.chan@adalogics.com>
@arthurscchan arthurscchan force-pushed the CODEC-314-IndexOutOfBound branch from f0859ae to e193cd6 Compare November 27, 2023 15:15
@arthurscchan

Copy link
Copy Markdown
Contributor Author

Hi @garydgregory, thanks for your comments and suggestions. I will surely keep that in mind.

@garydgregory garydgregory left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @arthurscchan
Rethought this a bit, please see my comment.

* @throws EncoderException if the alwaysEncodeChars byte array contains invalid bytes
*/
public PercentCodec(final byte[] alwaysEncodeChars, final boolean plusForSpace) {
public PercentCodec(final byte[] alwaysEncodeChars, final boolean plusForSpace) throws EncoderException {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @arthurscchan
Now that I look at this again, it doesn't make sense to me, this seems like the exact use case for an IllegalArgumentException. In addition, the constructor does not encode anything, so an EncoderException really does not make sense IMO.

I propose:

diff --git a/src/main/java/org/apache/commons/codec/net/PercentCodec.java b/src/main/java/org/apache/commons/codec/net/PercentCodec.java
index 8e51538..df623df 100644
--- a/src/main/java/org/apache/commons/codec/net/PercentCodec.java
+++ b/src/main/java/org/apache/commons/codec/net/PercentCodec.java
@@ -231,6 +231,9 @@
      * @param b the byte that is candidate for min and max limit
      */
     private void insertAlwaysEncodeChar(final byte b) {
+        if (b < 0) {
+            throw new IllegalArgumentException("byte must be >= 0");
+        }
         this.alwaysEncodeChars.set(b);
         if (b < alwaysEncodeCharsMin) {
             alwaysEncodeCharsMin = b;
diff --git a/src/test/java/org/apache/commons/codec/net/PercentCodecTest.java b/src/test/java/org/apache/commons/codec/net/PercentCodecTest.java
index 34f8ecb..a537efb 100644
--- a/src/test/java/org/apache/commons/codec/net/PercentCodecTest.java
+++ b/src/test/java/org/apache/commons/codec/net/PercentCodecTest.java
@@ -106,6 +106,12 @@
     }
 
     @Test
+    public void testInvalidByte() throws Exception {
+        final byte[] invalid = { (byte) -1, (byte) 'A' };
+        assertThrows(IllegalArgumentException.class, () -> new PercentCodec(invalid, true));
+    }
+
+    @Test
     public void testPercentEncoderDecoderWithNullOrEmptyInput() throws Exception {
         final PercentCodec percentCodec = new PercentCodec(null, true);
         assertNull(percentCodec.encode(null), "Null input value encoding test");

WDYT?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it is a better changes and interpretation of the exceptions. I will change it accordingly.
@garydgregory

@garydgregory

Copy link
Copy Markdown
Member

Closing: Done.

asfgit pushed a commit that referenced this pull request Dec 3, 2023
PercentCodec.insertAlwaysEncodeChars() method #222
@arthurscchan

Copy link
Copy Markdown
Contributor Author

@garydgregory Sorry I am bit busy and did not handle that last week. Thanks for fixing that for me.

omosteven pushed a commit to omosteven/commons-codec-lab-work that referenced this pull request Jan 8, 2025
PercentCodec.insertAlwaysEncodeChars() method apache#222
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants