PEP 694 (PyPI upload API 2.0) (original) (raw)
January 6, 2025, 11:36pm 1
Back in 2022, @dstufft started a conversation about PEP 694, proposing a PyPI Upload 2.0 API. That discussion petered out back in October, but as it’s something I’m keenly interested in, I am restarting the discussion here in a new thread.
I just merged a significant update to PEP 694, also making myself co-author. I think that, and the 70+ messages in the other thread are enough to warrant a new discussion thread.
While the spirit and definition of the API proposed in the previous incarnation of the PEP is largely retained, some highlights of the changes I made[1] include:
- Added myself as a co-author.
- Changed some terminology to use “stage” rather than “draft” to describe the thing you are uploading non-public wheels to in a session.
- Proposed the root URL for PyPI to be
https://upload.pypi.org/2.0
- Added an optional “nonce” string to the session creation request, which allows clients to decide whether a staged preview is easily guessable or not. Both use cases, as well as the justification for “guessability” rather than “privacy” are described in more detail in the PEP.
- Renamed many of the JSON keys to better align with the change in terminology.
- Added or modified APIs for getting the session status, extending a session, canceling a session, and publishing a session.
- Updated the file upload protocol to better align with the active internet draft as the successor to the older tus.io protocol.
- Several other smaller changes which hopefully fill any gaps in the previous protocol.
Rather than post the full text of the updated PEP 694 here, please head on over to the published version.
Cheers!
- besides copyediting and formatting ↩︎
mgorny (Michał Górny) March 25, 2025, 6:15am 2
Overall, it is very solid and I like it.
A few minor points:
If the upload completes successfully, the server MUST respond with a
201 Created
status.
[…]
For each successful chunk, the server MUST respond with a202 Accepted
header, except for the final chunk, which MUST be a201 Created
, and as with non-chunked uploads, the body of these responses has no content.
This seems to assume the processing of the final chunk will always be synchronous, which is a bit inconsistent with how the session is handled asynchronously. I can imagine that at this point, there could be some delay, e.g. because server verifies the checksum of the uploaded file in one step. However, I’m not sure if making this potentially asynchronous (i.e. returning 202 and expecting the client to recheck status later) is worth the added complexity.
If the offset provided in
Upload-Offset
is not0
or correctly specifies the byte offset of the next chunk in an incomplete upload, then the server MUST respond with a409 Conflict
. This means that a client MAY NOT upload chunks out of order.
I find this sentence really hard to parse. It sounds like it means “if the offset … correctly specifies the byte offset of the next chunk in an incomplete upload, then the server…” — but wasn’t it supposed to mean the opposite of that? Also, shouldn’t that be “MUST NOT”? “MAY NOT” is not in RFC 2119, and I honestly find it confusing.
Perhaps:
If the offset provided in
Upload-Offset
is not0
and does not correctly specify the byte offset of the next chunk in an incomplete upload, then the server MUST respond with a409 Conflict
. This means that a client MUST NOT upload chunks out of order.
Canceling an In-Progress Upload
[…]
Delete a Partial or Fully Uploaded File
Isn’t “deleting a partial file” and “canceling an upload” essentially the same thing? Also, the latter section only covers fully uploaded files, so perhaps it should either be renamed, or both sections merged into one.
Complete the in-progress upload by uploading a zero-length chunk providing the
Upload-Complete: ?1
header. This effectively truncates and completes the in-progress upload, after which point the new upload can commence. In this case, clients SHOULD reuse the previous upload resource URL and do not need to begin the entire file upload sequence over again.
What is the use case for this? FWIU it implies that the file must have the same size and hashes, so I don’t really understand what would be the circumstances under which the client would suddenly need to start uploading the same file over again.
I think error responses are underspecified. In particular, HTTP error codes are listed only in handful of cases. I think it would be worthwhile to at least list some common errors that can be expected from issuing different requests.
barry (Barry Warsaw) March 25, 2025, 8:45pm 3
Thanks for the vote of confidence!
I think I took this from the IETF draft, but at this point, PEP 694 uses that more as inspiration rather than a faithful implementation[1].
I think you have a point; both that the API for the final chunk probably should allow for either a 201
or 202
response here. I think we could relax this to accept either code. If a 202
is returned, the client would have to do a HEAD
request as per resume an upload, and the server would response with the Upload-Offset
indicating the full file has been received, along with an Upload-Complete: ?0
header. The client would continue to do HEAD
requests until they see a response with Upload-Complete: ?1
.
WDYT? If this is workable, it doesn’t seem like a huge additional complication.
- plus, the draft expires on 24 April 2025 and I see no status updates to it ↩︎
barry (Barry Warsaw) March 25, 2025, 9:14pm 4
Thanks for calling this out. I think I misspelled “incorrectly” there. Your suggestion LGTM!
Another good point. I think merging these sections is the right way to go.
Upon re-reading this, I agree. It’s superfluous, and there should be OOWTDI.
Good call. I’ll do a pass to flesh out more of the error responses, and then post a PR to update the PEP.
Thanks for the in-depth reading and excellent feedback!
barry (Barry Warsaw) March 25, 2025, 11:26pm 5
mgorny (Michał Górny) March 26, 2025, 5:31am 6
Well, my immediate thought is that you’d do a session status request to see the status of the file. Not saying I dislike this solution, just saying what immediately came to my mind as the “obvious” next step.
I suppose this also opens the question of whether you can start uploading the next file before processing of the current one finishes. Logically, I’d say not, but this would imply that we either need to make the status check obligatory, or the clients skipping it would quasi-randomly succeed or fail uploading the next file, depending on whether they issue the request before or after the processing happens to finish.
barry (Barry Warsaw) April 1, 2025, 5:49pm 7
This is a great perspective, thanks! Thinking it over more, I agree. I’m going to
- Update the spec to emphasize this procedure.
- Rewrite file status values to better support this use case
- Explicitly allow either a 201 or 202 response to the last chunk upload (i.e.
Upload-Complete: ?1
)
I thought about whether for non-chunked uploads (which also contain the Upload-Complete: ?1
header) should also allow either a 201 or 202, but for now, I’m not changing that.
That’s also a good point. The PEP explicitly disallows parallel uploads of chunks, but is actually silent on whether multiple files can be uploaded in parallel. The PEP needs to be explicit, but it can also waffle, leaving it to server implementations to allow parallel file uploads or not.
barry (Barry Warsaw) April 14, 2025, 10:04pm 8
And I’m finally getting around to updating the PEP, to read:
If the server determines the upload cannot proceed, it MUST return a
409 Conflict
. The server MAY allow parallel uploads of files, but is not required to.
barry (Barry Warsaw) April 14, 2025, 10:11pm 9
I am just about to merge an update to PEP 694 which includes all the changes discussed here and on the PR. Please give it a moment to get published.