Implement pep 503 for simple repository. by Carreau · Pull Request #506 · pypi/legacy (original) (raw)

So there are two sort of ways to handle this, one is that it's just denormalizing the release centric data onto release_files (and if we do that, it should be done via a database trigger, I can show you an example in Warehouse) or decide that requires_python is not a release centric piece of data but is actually a file centric piece of data and just move the column.

I will trust you on this one. i honestly doubt people should/would bother to have various files with different requires-python; and I think that per release make sens or at least it is easier to migrate to per-file later if we decide; or at least simpler than deciding that per-file was a mistake and try to gather all at one back at a release level.

Changes to the DB have to go through Warehouse

Sure, just tell me if you want to make the changes or if the performance hit on Postgres because of the Join is acceptable.

Can you also benchmark the following ?

explain analyze select filename, requires_python, md5_digest from (select requires_python, version from releases where name='setuptools') as s1, (select filename, version, md5_digest from release_files where name='setuptools') as s2 where s2.version=s1.version

I don't get the ~200 nested loops I see on your benchmark as I don't have the full db, but this query might be quicker as the cross product should be smaller if we filter by name first; though I don't know how much the query-planner can optimize...

Or maybe even the following that don't make the join explicit :

explain analyse select filename, releases.requires_python, md5_digest from release_files,releases where release_files.version=releases.version and release_files.name='setuptools' and releases.name='setuptools'

Thanks !