PEP 694 (PyPI upload API 2.0) (original) (raw)

January 6, 2025, 11:36pm 1

Back in 2022, @dstufft started a conversation about PEP 694, proposing a PyPI Upload 2.0 API. That discussion petered out back in October, but as it’s something I’m keenly interested in, I am restarting the discussion here in a new thread.

I just merged a significant update to PEP 694, also making myself co-author. I think that, and the 70+ messages in the other thread are enough to warrant a new discussion thread.

While the spirit and definition of the API proposed in the previous incarnation of the PEP is largely retained, some highlights of the changes I made[1] include:

Rather than post the full text of the updated PEP 694 here, please head on over to the published version.

Cheers!


  1. besides copyediting and formatting ↩︎

mgorny (Michał Górny) March 25, 2025, 6:15am 2

Overall, it is very solid and I like it.

A few minor points:

If the upload completes successfully, the server MUST respond with a 201 Created status.
[…]
For each successful chunk, the server MUST respond with a 202 Accepted header, except for the final chunk, which MUST be a 201 Created, and as with non-chunked uploads, the body of these responses has no content.

This seems to assume the processing of the final chunk will always be synchronous, which is a bit inconsistent with how the session is handled asynchronously. I can imagine that at this point, there could be some delay, e.g. because server verifies the checksum of the uploaded file in one step. However, I’m not sure if making this potentially asynchronous (i.e. returning 202 and expecting the client to recheck status later) is worth the added complexity.


If the offset provided in Upload-Offset is not 0 or correctly specifies the byte offset of the next chunk in an incomplete upload, then the server MUST respond with a 409 Conflict. This means that a client MAY NOT upload chunks out of order.

I find this sentence really hard to parse. It sounds like it means “if the offset … correctly specifies the byte offset of the next chunk in an incomplete upload, then the server…” — but wasn’t it supposed to mean the opposite of that? Also, shouldn’t that be “MUST NOT”? “MAY NOT” is not in RFC 2119, and I honestly find it confusing.

Perhaps:

If the offset provided in Upload-Offset is not 0 and does not correctly specify the byte offset of the next chunk in an incomplete upload, then the server MUST respond with a 409 Conflict. This means that a client MUST NOT upload chunks out of order.


Canceling an In-Progress Upload
[…]
Delete a Partial or Fully Uploaded File

Isn’t “deleting a partial file” and “canceling an upload” essentially the same thing? Also, the latter section only covers fully uploaded files, so perhaps it should either be renamed, or both sections merged into one.


Complete the in-progress upload by uploading a zero-length chunk providing the Upload-Complete: ?1 header. This effectively truncates and completes the in-progress upload, after which point the new upload can commence. In this case, clients SHOULD reuse the previous upload resource URL and do not need to begin the entire file upload sequence over again.

What is the use case for this? FWIU it implies that the file must have the same size and hashes, so I don’t really understand what would be the circumstances under which the client would suddenly need to start uploading the same file over again.


I think error responses are underspecified. In particular, HTTP error codes are listed only in handful of cases. I think it would be worthwhile to at least list some common errors that can be expected from issuing different requests.

barry (Barry Warsaw) March 25, 2025, 8:45pm 3

Thanks for the vote of confidence!

I think I took this from the IETF draft, but at this point, PEP 694 uses that more as inspiration rather than a faithful implementation[1].

I think you have a point; both that the API for the final chunk probably should allow for either a 201 or 202 response here. I think we could relax this to accept either code. If a 202 is returned, the client would have to do a HEAD request as per resume an upload, and the server would response with the Upload-Offset indicating the full file has been received, along with an Upload-Complete: ?0 header. The client would continue to do HEAD requests until they see a response with Upload-Complete: ?1.

WDYT? If this is workable, it doesn’t seem like a huge additional complication.


  1. plus, the draft expires on 24 April 2025 and I see no status updates to it ↩︎

barry (Barry Warsaw) March 25, 2025, 9:14pm 4

Thanks for calling this out. I think I misspelled “incorrectly” there. Your suggestion LGTM!

Another good point. I think merging these sections is the right way to go.

Upon re-reading this, I agree. It’s superfluous, and there should be OOWTDI.

Good call. I’ll do a pass to flesh out more of the error responses, and then post a PR to update the PEP.

Thanks for the in-depth reading and excellent feedback!

barry (Barry Warsaw) March 25, 2025, 11:26pm 5

mgorny (Michał Górny) March 26, 2025, 5:31am 6

Well, my immediate thought is that you’d do a session status request to see the status of the file. Not saying I dislike this solution, just saying what immediately came to my mind as the “obvious” next step.

I suppose this also opens the question of whether you can start uploading the next file before processing of the current one finishes. Logically, I’d say not, but this would imply that we either need to make the status check obligatory, or the clients skipping it would quasi-randomly succeed or fail uploading the next file, depending on whether they issue the request before or after the processing happens to finish.

barry (Barry Warsaw) April 1, 2025, 5:49pm 7

This is a great perspective, thanks! Thinking it over more, I agree. I’m going to

I thought about whether for non-chunked uploads (which also contain the Upload-Complete: ?1 header) should also allow either a 201 or 202, but for now, I’m not changing that.

That’s also a good point. The PEP explicitly disallows parallel uploads of chunks, but is actually silent on whether multiple files can be uploaded in parallel. The PEP needs to be explicit, but it can also waffle, leaving it to server implementations to allow parallel file uploads or not.

barry (Barry Warsaw) April 14, 2025, 10:04pm 8

And I’m finally getting around to updating the PEP, to read:

If the server determines the upload cannot proceed, it MUST return a 409 Conflict. The server MAY allow parallel uploads of files, but is not required to.

barry (Barry Warsaw) April 14, 2025, 10:11pm 9

I am just about to merge an update to PEP 694 which includes all the changes discussed here and on the PR. Please give it a moment to get published.