Bundle local images in rustdoc output by GuillaumeGomez · Pull Request #3397 · rust-lang/rfcs (original) (raw)

Thanks for working on this. I really support the goals - consistent reliable images and scripts across local, self-hosted, and docs.rs.

But I'm dissatisfied with the implementation plan for docs.rs: embedding images in the .crate file uploaded to crates.io. I think it unnecessarily bloats .crate files, which wastes bandwidth for crates.io, for end-users, and for CI platforms. Docs.rs downloads each crate once; the various other users of crates download them millions of times. We shouldn't include bytes that are only useful for documentation. Can we find a different solution?

When we first started talking about this issue I was not worried about crate size because I figured this would be mostly used for project icons, and they would be ~5kB of SVG. But thinking about it more I realize some crates will want larger images to illustrate complex concepts. Most doc authors are not experts in minimizing image files, and rustdoc won't have any way to automatically minimize them. And I'd really like to have a rigorous way to use scripts, too; and useful scripts like KaTeX are in the ~200kB range.

IIRC crates.io's limit on crate size is 50MB. In some sense that's good since it means there's plenty of room for doc resources. But on the other hand it's bad since it would allow making very big crate files filled up with doc resources. Of course it is currently allowed to make very big crates anyhow, but this feature would create more incentive to do so.

@rfcbot concern Crate size

The main constraints that are challenging:

For docs.rs, we would like the doc build process to not have network access. It should receive a .crate file as input on the filesytem, and output documentation on the filesystem.
Local / self-hosted doc builds sometimes include docs for dependencies. When building dependencies, rustdoc only "knows" about the .crate file for that dependency.

For the first problem, docs.rs would have to introduce another input. We could make that input predictable, and pre-fetch it to the filesystem before running the build rather than having the doc build process fetch it. I don't love having another input in docs.rs but I think it would be better than making packages bigger for all non-doc uses. We could reduce the operational problems by only fetching that alternate input from one specific host - like crates.io, or another host created for the purpose.

For the second problem, there would have to be a well-known place to fetch additional resources for a given .crate file.

For user experience, we would want to avoid doc authors needing to create separate credentials for uploading doc resources. That argues in favor of having a way to upload additional files to crates.io, specifically linked to a crate version and using the same credentials as the crate itself. I know this makes for a much more complex project. I'd love to find a better way but am having trouble thinking of one!