-
Notifications
You must be signed in to change notification settings - Fork 596
Dockerfile: Various bug fixes, clean ups, and an integration test #79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
TLDR: * Fix all the headers for correctness (mime + charset), performance, and security. * Use much simpler redirect logic and apply the same approach as we use today. * Verify it all with an integration test. == Differences in file headers In summary: * Missing GZIP support and no "Vary: Accept-Encoding" to avoid cache poisoning. * Missing CORS support (Access-Control-Allow-Origin). * Missing caching allowance (Cache-Control public 10 years, max Expires). * Missing disablement of server_tokens. * Missing charset utf-8. * Unstable E-Tag and Last-Modified which would cause serious cache churn after every rebuild. Status quo: ``` krinkle@wp-03:~$ curl -H 'Host: codeorigin.jquery.com' -i 'http://localhost/jquery-3.0.0.js' | head -n20 Server: nginx Date: Mon, 16 Aug 2021 23:12:52 GMT Connection: keep-alive Access-Control-Allow-Origin: * Cache-Control: max-age=315360000 Cache-Control: public Content-Length: 263268 Content-Type: application/javascript; charset=utf-8 ETag: "28feccc0-40464" Expires: Thu, 31 Dec 2037 23:55:55 GMT Last-Modified: Fri, 18 Oct 1991 12:00:00 GMT Vary: Accept-Encoding Accept-Ranges: bytes ``` Previously, without this patch: ``` dockerize$ curl -i 'http://localhost:4000/jquery-3.0.0.js' | head -n20 Server: nginx/1.21.1 Date: Mon, 16 Aug 2021 23:13:30 GMT Connection: keep-alive Content-Length: 263268 Content-Type: application/javascript ETag: "5fdbec22-40464" Last-Modified: Thu, 17 Dec 2020 23:39:14 GMT Accept-Ranges: bytes ``` == Match existing Nginx config Most of the above was addressed by simply using the same Nginx config as we have in the private infrastructure.git repo for provisioning legacy codeorigin (copied from wordpress-header.conf.erb and wp/jquery.pp, respectively). However, the last two points (charset, and unstable etag), require additional changes. == Nginx: charset The default charset from Nginx differs between Debian and Alpine, possibly due to changes in some of the shared libraries. Fix this by explicitly setting `charset utf-8`. Note that this does (and should) only apply to files with a MIME type in `charset_types` list, which by default contains HTML, JS, and CSS. More importantly, it does not (and should not) cause PNG files and other binary assets to be served with a charset, which would be wrong. == Nginx: E-Tag and Last-Modified This was a tricky one. Git does not store file modification timestamps, which means the on-disk file mtimes are somewhat arbitrarily set to when the file was created by the local Git client. For most files, this means the time you cloned the repository, and for files added later, the time you ran `git pull`. This was de-facto "stable" on the legacy server because it is a persistent server with a persistent git directory (we always use the same clone and just update it to fetch new files after a commit happens). The timestamps themselves don't matter as long as they remain constant for any given file, and they were. This is important because E-Tag is computed in Nginx based on file mtime and size and (unlike Apache) this behaviour is not configurable in Nginx (e.g. to make it compute a content hash instead). And Last-Modified naturally uses the mtime as well. As a workaround, I've added a `touch` command to assign all CDN files a fixed timestamp far in the past. This is safe for us because all files are expected to remain static. Even if we would change a file, it wouldn't propagate to the CDN without a manual CDN API purge (given the all-important high values for Cache-Control max-age and Expires), and after that would not propagate to any browser that has previously fetched it unless the user clears their own cache (given the all-important Cache-Control telling the browser to re-use the file blindly, which is critical for performance). As such, whether the timestamp remains constant or changes after the file is changed, would make no difference for how it propagates. == New state After, with this patch (and verified by a simple test suite). ``` dockerize$ curl -i 'http://localhost:4000/jquery-3.0.0.js' | head -n20 Server: nginx Date: Mon, 16 Aug 2021 23:49:43 GMT Connection: keep-alive Access-Control-Allow-Origin: * Cache-Control: max-age=315360000 Cache-Control: public Content-Length: 263268 Content-Type: application/javascript; charset=utf-8 ETag: "28feccc0-40464" Expires: Thu, 31 Dec 2037 23:55:55 GMT Last-Modified: Fri, 18 Oct 1991 12:00:00 GMT Vary: Accept-Encoding Accept-Ranges: bytes ``` == Misc changes * Remove left-over Nginx config for purge.php. * Remove left-over php-fpm script. * Use our own document root at /var/www/cdn instead of mixing our files into the default /usr/share/nginx/html/ which contains files from the "docker-nginx" project that we may not want to serve blindly. This mainly affects the odd index.html file that could otherwise be served. * Pin the Nginx base image version at 1.x. This is mostly a no-op since they've been on 1.x for 10 years, but if it does rollover we wouldn't want an unmanaged upgrade to a new major version. We could pun this more narrowly at some point, but 1.x should be fine for now. * Invert the config file (let Dockerfile remove code instead of comment out) so that the most critical lines can be easily linted. * Ensure all files listed are redirected correctly, as per https://github.com/jquery/infrastructure/issues/474#issuecomment-844582865 and verify this with an integration test. The expected values are populated based on running `curl` against localhost from a shell on the legacy codeorigin. I specifically did not compare against live code.jquery.com, since we care about matching what we feed to the CDN (although the above problems were surfacing through there as well). Some of the CDN behaviours can't and don't need to be mimicked from the origin. Prior to launch we will (also) compare the live code.jquery.com against the new staging CDN endpoint, but that's separate from the test for the container. Ref https://github.com/jquery/infrastructure/issues/474.
Krinkle
commented
Aug 17, 2021
Krinkle
commented
Aug 17, 2021
brianwarner
reviewed
Aug 17, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
TLDR:
Differences in file headers
In summary:
after every rebuild.
Status quo:
Previously, without this patch:
Match existing Nginx config
Most of the above was addressed by simply using the same Nginx config as we have in the private infrastructure.git repo for provisioning legacy codeorigin (copied from wordpress-header.conf.erb and wp/jquery.pp, respectively).
However, the last two points (charset, and unstable etag), require additional changes.
Nginx: charset
The default charset from Nginx differs between Debian and Alpine, possibly due to changes in some of the shared libraries. Fix this by explicitly setting
charset utf-8
. Note that this does (and should) only apply to files with a MIME type incharset_types
list, which by default contains HTML, JS, and CSS.More importantly, it does not (and should not) cause PNG files and other binary assets to be served with a charset, which would be wrong.
Nginx: E-Tag and Last-Modified
This was a tricky one. Git does not store file modification timestamps, which means the on-disk file mtimes are somewhat arbitrarily set to when the file was created by the local Git client. For most files, this means the time you cloned the repository, and for files added later, the time you ran
git pull
.This was de-facto "stable" on the legacy server because it is a persistent server with a persistent git directory (we always use the same clone and just update it to fetch new files after a commit happens). The timestamps themselves don't matter
as long as they remain constant for any given file, and they were.
This is important because E-Tag is computed in Nginx based on file mtime and size and (unlike Apache) this behaviour is not configurable in Nginx (e.g. to make it compute a content hash instead). And Last-Modified naturally uses the
mtime as well.
As a workaround, I've added a
touch
command to assign all CDN files a fixed timestamp far in the past. This is safe for us because all files are expected to remain static. Even if we would change a file, it wouldn't propagate to the CDN without a manual CDN API purge (given the all-important high values for Cache-Control max-age and Expires), and after that would not propagate to any browser that has previously fetched it unless the user clears their own cache (given the all-important Cache-Control telling the browser to re-use the file blindly, which is critical for performance). As such, whether the timestamp remains constant or changes after the file is changed, would make no difference for how it propagates.New state
After, with this patch (and verified by a simple test suite).
Misc changes
Remove left-over Nginx config for purge.php.
Remove left-over php-fpm script.
Use our own document root at /var/www/cdn instead of mixing our files into the default /usr/share/nginx/html/ which contains files from the "docker-nginx" project that we may not want to serve blindly.
This mainly affects the odd index.html file that could otherwise be served.
Pin the Nginx base image version at 1.x. This is mostly a no-op since they've been on 1.x for 10 years, but if it does rollover
we wouldn't want an unmanaged upgrade to a new major version. We could pun this more narrowly at some point, but 1.x should be fine for now.
Invert the config file (let Dockerfile remove code instead of comment out) so that the most critical lines can be easily
linted.
Ensure all files listed are redirected correctly, as per https://github.com/jquery/infrastructure/issues/554 and verify this with an integration test. The expected values are populated based on running
curl
against localhost from a shell on the legacy codeorigin. I specifically did not compare against live code.jquery.com, since we care about matching what we feed to the CDN (although the above problems were surfacing through there as well). Some of the CDN behaviours can't and don't need to be mimicked from the origin. Prior to launch we will (also) compare the live code.jquery.com against the new staging CDN endpoint, but that's separate from the test for the container.Ref https://github.com/jquery/infrastructure/issues/554.