fix: stream artifact downloads to disk to avoid OOM on large artifacts#1
Conversation
Agent-Logs-Url: https://github.com/yeicor/action-download-artifact/sessions/2d0911e4-0a25-46eb-8f15-c19ce03b1645 Co-authored-by: yeicor <4929005+yeicor@users.noreply.github.com>
…e copy Agent-Logs-Url: https://github.com/yeicor/action-download-artifact/sessions/b0b20368-9d39-4096-8933-115de55a7208 Co-authored-by: yeicor <4929005+yeicor@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Updates the artifact download path in the GitHub Action to avoid buffering entire artifact ZIPs in Node.js heap memory by switching to a redirect-URL + streaming-to-disk approach.
Changes:
- Adds
getArtifactDownloadUrl()to resolve the pre-signed artifact URL without downloading the ZIP into memory. - Adds
streamUrlToFile()to stream the ZIP directly to a temp file and then extract from disk. - Refactors the artifact download/extract loop to use the temp ZIP path for all unpacking modes and ensures temp cleanup via
finally.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (res.statusCode !== 200) { | ||
| reject(new Error(`Unexpected HTTP ${res.statusCode} while downloading artifact from ${url}`)) | ||
| res.resume() |
There was a problem hiding this comment.
The error message includes the full url (Unexpected HTTP ... from ${url}), but this download URL is pre-signed and typically contains sensitive query parameters (SAS token). Emitting it to action logs can leak credentials. Prefer omitting the URL entirely or logging only a scrubbed form (e.g., origin + pathname without query).
| async function getArtifactDownloadUrl(client, owner, repo, artifactId) { | ||
| try { | ||
| // Ask Octokit NOT to follow the redirect so we get the Location header. | ||
| await client.rest.actions.downloadArtifact({ | ||
| owner, | ||
| repo, | ||
| artifact_id: artifactId, | ||
| archive_format: "zip", | ||
| request: { redirect: "manual" }, | ||
| }) | ||
| } catch (error) { | ||
| // Octokit throws on 302 when redirects are disabled; grab the URL. | ||
| if (error.status === 302 && error.response?.headers?.location) { | ||
| return error.response.headers.location | ||
| } | ||
| // Some Octokit versions expose it differently. | ||
| if (error.response?.headers?.location) { | ||
| return error.response.headers.location | ||
| } | ||
| throw error | ||
| } | ||
| throw new Error("Expected a redirect response from downloadArtifact but got none") |
There was a problem hiding this comment.
getArtifactDownloadUrl assumes downloadArtifact(..., request: { redirect: "manual" }) will throw and expose the 302 Location on error.response. In the Octokit client used elsewhere (and in @actions/artifact), this call typically returns { status: 302, headers } without throwing; in that case this function will always hit throw new Error("Expected a redirect...") and break artifact downloads. Consider reading status/headers.location from the successful response and only throwing when status is not 302 or the Location header is missing.
|
|
||
| const handleResponse = (res) => { | ||
| // Follow a single redirect (the pre-signed Azure Blob URL may itself redirect). | ||
| if (res.statusCode >= 300 && res.statusCode < 400 && res.headers.location) { | ||
| res.resume() // drain the redirect body | ||
| const redirectUrl = new URL(res.headers.location) | ||
| const mod = redirectUrl.protocol === 'https:' ? https : http | ||
| mod.get(res.headers.location, handleResponse).on('error', reject) | ||
| return | ||
| } | ||
| if (res.statusCode !== 200) { | ||
| reject(new Error(`Unexpected HTTP ${res.statusCode} while downloading artifact from ${url}`)) | ||
| res.resume() | ||
| return | ||
| } | ||
| res.on('error', reject) | ||
| res.pipe(dest) | ||
| dest.on('finish', resolve) | ||
| dest.on('error', reject) | ||
| } | ||
|
|
||
| const parsedUrl = new URL(url) | ||
| const mod = parsedUrl.protocol === 'https:' ? https : http | ||
| mod.get(url, handleResponse).on('error', reject) |
There was a problem hiding this comment.
streamUrlToFile claims to follow "up to one redirect", but the implementation will follow an unbounded number of redirects because the redirect branch recursively calls handleResponse with no counter/limit. This can hang on redirect loops and makes the behavior diverge from the comment. Add an explicit redirect limit (e.g., max 1 as documented) and fail with a clear error once exceeded.
| const handleResponse = (res) => { | |
| // Follow a single redirect (the pre-signed Azure Blob URL may itself redirect). | |
| if (res.statusCode >= 300 && res.statusCode < 400 && res.headers.location) { | |
| res.resume() // drain the redirect body | |
| const redirectUrl = new URL(res.headers.location) | |
| const mod = redirectUrl.protocol === 'https:' ? https : http | |
| mod.get(res.headers.location, handleResponse).on('error', reject) | |
| return | |
| } | |
| if (res.statusCode !== 200) { | |
| reject(new Error(`Unexpected HTTP ${res.statusCode} while downloading artifact from ${url}`)) | |
| res.resume() | |
| return | |
| } | |
| res.on('error', reject) | |
| res.pipe(dest) | |
| dest.on('finish', resolve) | |
| dest.on('error', reject) | |
| } | |
| const parsedUrl = new URL(url) | |
| const mod = parsedUrl.protocol === 'https:' ? https : http | |
| mod.get(url, handleResponse).on('error', reject) | |
| const maxRedirects = 1 | |
| const makeRequest = (currentUrl, redirectsRemaining) => { | |
| const parsedUrl = new URL(currentUrl) | |
| const mod = parsedUrl.protocol === 'https:' ? https : http | |
| mod.get(currentUrl, (res) => { | |
| // Follow a single redirect (the pre-signed Azure Blob URL may itself redirect). | |
| if (res.statusCode >= 300 && res.statusCode < 400 && res.headers.location) { | |
| if (redirectsRemaining <= 0) { | |
| res.resume() | |
| reject(new Error(`Too many redirects while downloading artifact from ${url}; exceeded maximum of ${maxRedirects} redirect`)) | |
| return | |
| } | |
| res.resume() // drain the redirect body | |
| const redirectUrl = new URL(res.headers.location, currentUrl) | |
| makeRequest(redirectUrl.toString(), redirectsRemaining - 1) | |
| return | |
| } | |
| if (res.statusCode !== 200) { | |
| reject(new Error(`Unexpected HTTP ${res.statusCode} while downloading artifact from ${url}`)) | |
| res.resume() | |
| return | |
| } | |
| res.on('error', reject) | |
| res.pipe(dest) | |
| dest.on('finish', resolve) | |
| dest.on('error', reject) | |
| }).on('error', reject) | |
| } | |
| makeRequest(url, maxRedirects) |
|
|
||
| const handleResponse = (res) => { | ||
| // Follow a single redirect (the pre-signed Azure Blob URL may itself redirect). | ||
| if (res.statusCode >= 300 && res.statusCode < 400 && res.headers.location) { | ||
| res.resume() // drain the redirect body | ||
| const redirectUrl = new URL(res.headers.location) | ||
| const mod = redirectUrl.protocol === 'https:' ? https : http | ||
| mod.get(res.headers.location, handleResponse).on('error', reject) | ||
| return | ||
| } | ||
| if (res.statusCode !== 200) { | ||
| reject(new Error(`Unexpected HTTP ${res.statusCode} while downloading artifact from ${url}`)) | ||
| res.resume() | ||
| return | ||
| } | ||
| res.on('error', reject) | ||
| res.pipe(dest) | ||
| dest.on('finish', resolve) | ||
| dest.on('error', reject) | ||
| } | ||
|
|
||
| const parsedUrl = new URL(url) | ||
| const mod = parsedUrl.protocol === 'https:' ? https : http | ||
| mod.get(url, handleResponse).on('error', reject) |
There was a problem hiding this comment.
On non-200 responses or request/response errors, streamUrlToFile rejects but leaves the destination write stream open and may leave a partial file on disk. To avoid FD leaks and confusing subsequent runs, ensure you destroy()/close() the write stream on any error path and remove the partially written destPath before rejecting.
| const handleResponse = (res) => { | |
| // Follow a single redirect (the pre-signed Azure Blob URL may itself redirect). | |
| if (res.statusCode >= 300 && res.statusCode < 400 && res.headers.location) { | |
| res.resume() // drain the redirect body | |
| const redirectUrl = new URL(res.headers.location) | |
| const mod = redirectUrl.protocol === 'https:' ? https : http | |
| mod.get(res.headers.location, handleResponse).on('error', reject) | |
| return | |
| } | |
| if (res.statusCode !== 200) { | |
| reject(new Error(`Unexpected HTTP ${res.statusCode} while downloading artifact from ${url}`)) | |
| res.resume() | |
| return | |
| } | |
| res.on('error', reject) | |
| res.pipe(dest) | |
| dest.on('finish', resolve) | |
| dest.on('error', reject) | |
| } | |
| const parsedUrl = new URL(url) | |
| const mod = parsedUrl.protocol === 'https:' ? https : http | |
| mod.get(url, handleResponse).on('error', reject) | |
| let settled = false | |
| const fail = (error, res) => { | |
| if (settled) { | |
| return | |
| } | |
| settled = true | |
| if (res && !res.destroyed) { | |
| res.destroy() | |
| } | |
| if (!dest.destroyed) { | |
| dest.destroy() | |
| } | |
| fs.unlink(destPath, (unlinkError) => { | |
| if (unlinkError && unlinkError.code !== 'ENOENT') { | |
| reject(unlinkError) | |
| return | |
| } | |
| reject(error) | |
| }) | |
| } | |
| const succeed = () => { | |
| if (settled) { | |
| return | |
| } | |
| settled = true | |
| dest.close((error) => { | |
| if (error) { | |
| reject(error) | |
| return | |
| } | |
| resolve() | |
| }) | |
| } | |
| const handleResponse = (res) => { | |
| // Follow a single redirect (the pre-signed Azure Blob URL may itself redirect). | |
| if (res.statusCode >= 300 && res.statusCode < 400 && res.headers.location) { | |
| res.resume() // drain the redirect body | |
| const redirectUrl = new URL(res.headers.location) | |
| const mod = redirectUrl.protocol === 'https:' ? https : http | |
| mod.get(res.headers.location, handleResponse).on('error', (error) => fail(error)) | |
| return | |
| } | |
| if (res.statusCode !== 200) { | |
| res.resume() | |
| fail(new Error(`Unexpected HTTP ${res.statusCode} while downloading artifact from ${url}`), res) | |
| return | |
| } | |
| res.on('error', (error) => fail(error, res)) | |
| dest.on('finish', succeed) | |
| dest.on('error', (error) => fail(error, res)) | |
| res.pipe(dest) | |
| } | |
| const parsedUrl = new URL(url) | |
| const mod = parsedUrl.protocol === 'https:' ? https : http | |
| mod.get(url, handleResponse).on('error', (error) => fail(error)) |
Agent-Logs-Url: https://github.com/yeicor/action-download-artifact/sessions/58991d77-6515-4700-8ff1-421d449e566d Co-authored-by: yeicor <4929005+yeicor@users.noreply.github.com>
getArtifactDownloadUrl(broken: relies on Octokit throwing on 302 which doesn't happen with native fetch in @actions/github v9)streamUrlToFile(no longer needed)node:httpsandnode:httpimports (only used by the removed functions)Readablefromnode:streamandpipelinefromnode:stream/promisesdownloadArtifactToFile(client, owner, repo, artifactId, destPath)using Octokitrequestwith customfetchthat pipes response body directly to diskdownloadArtifactToFilefor both skip_unpack and extraction pathsOriginal prompt
Problem
In
main.js, every artifact ZIP is currently downloaded by calling Octokit'sdownloadArtifact, which follows the redirect and fetches the entire binary response body into azip.dataArrayBufferthat lives in Node.js heap memory. ThatBuffer.from(zip.data)is then passed either toAdmZip(which also keeps everything in RAM) or written to a temp file forunzip.For large artifacts this can easily exhaust the runner's memory and kill the process with an OOM error.
Root-cause locations in
main.jsclient.rest.actions.downloadArtifactbuffers the whole ZIP inzip.dataBuffer.from(zip.data)copies the in-memory buffer to disk (skip_unpackpath)Buffer.from(zip.data)copies the in-memory buffer to a temp file beforeunzipnew AdmZip(Buffer.from(zip.data))loads the whole ZIP into AdmZip in-memoryRequired changes to
main.jsReplace all four in-memory paths with a streaming download written directly to disk. Here is the complete target implementation: