bpo-27645: Supporting native backup facility of SQLite by lelit · Pull Request #377 · python/cpython (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation62 Commits21 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

lelit

@lelit

@lelit

@lelit

@lelit

Suggested by Aviv Palivoda.

@lelit

@the-knights-who-say-ni

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA. This is necessary for legal reasons before we can look at your contribution. Please follow these steps to help rectify the issue:

  1. If you don't have an account on b.p.o, please create one
  2. Make sure your GitHub username is listed in "Your Details" at b.p.o
  3. If you have not already done so, please sign the PSF contributor agreement. The "bugs.python.org username " requested by the form is the "Login name" field under "Your Details".
  4. If you just signed the CLA, please wait at least one US business day and then check "Your Details" on bugs.python.org to see if your account has been marked as having signed the CLA (the delay is due to a person having to manually check your signed CLA)
  5. Reply here saying you have completed the above steps

Thanks again to your contribution and we look forward to looking at it!

@mention-bot

@lelit

I configured my GitHub username on b.p.o.

berkerpeksag

@@ -521,6 +521,34 @@ Connection Objects
f.write('%s\n' % line)
.. method:: backup(filename[, pages, progress])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to make pages and progress keyword-only.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@@ -521,6 +521,34 @@ Connection Objects
f.write('%s\n' % line)
.. method:: backup(filename[, pages, progress])
This method exposes the `API`__ that allows to make a backup of a SQLite

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to add a link to SQLite docs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

con = sqlite3.connect('existing_db.db')
con.backup('copy_of_existing_db.db', 1, progress)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

database into the mandatory argument *filename*, even while it's being accessed
by other clients, or concurrently by the same connection.
By default, or when *pages* is either ``0`` or a negative integer, the entire

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we really need to expose this. The default value should cover most of the use cases.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that I should remove the pages argument? But if I do that, also the progress one becomes pointless, because with the default value there will be at most one callback, when the engine completes the backup. What am I missing?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to keep pages for huge databases.

@@ -34,6 +36,9 @@ Core and Builtins
- bpo-29546: Improve from-import error message with location
- Issue #7063: Remove dead code from the "array" module's slice handling.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be deleted.

}
Py_BEGIN_ALLOW_THREADS
rc = sqlite3_open(filename, &bckconn);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#359 could be useful here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you are suggesting: should I take the same approach of conditionally using sqlite3_open_v2() instead?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to say that this should be merged after #359.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This don't actually have any effect. sqlite3_open(filename, db) is the same as sqlite3_open_v2(filename, db, SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE, NULL)

@@ -10,6 +10,8 @@ What's New in Python 3.7.0 alpha 1?
Core and Builtins
-----------------
- bpo-27645: sqlite3.Connection now exposes a backup() method.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add "Patch by Your Name." to the next line.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@@ -18,7 +18,7 @@ def load_tests(*args):
userfunctions.suite(),
factory.suite(), transactions.suite(),
hooks.suite(), regression.suite(),
dump.suite()])
dump.suite(), backup.suite()])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Add this to its own line so we won't clutter git blame next time we add something.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

import sqlite3
def progress(remaining, total):
print("Copied %d of %d pages..." % (total-remaining, total))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use f-strings here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@lelit

palaviv

/* Sleep for 250ms if there are still further pages to copy */
if (rc == SQLITE_OK | rc == SQLITE_BUSY
Py_BEGIN_ALLOW_THREADS
sqlite3_sleep(250);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we need to sleep when SQLITE_OK is returned? I know this is what is done in the example but we do allow other threads to take the lock between calls to sqlite3_backup_step.
In addition we might want to have some kind of timeout here. We can block forever if someone holds the db lock.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we should: the step function returns OK when it “successfully copies N pages and there are still more pages to be copied”.
Not sure about the timeout: maybe we could check whether there is no progress in, say, X calls to the step function?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reconsidering, I think you're right here: there's no reason to delay next iteration when the result was SQLITE_OK.

rc = sqlite3_backup_step(bckhandle, pages);
Py_END_ALLOW_THREADS
if (progress != Py_None) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should call progress only when sqlite3_backup_step return SQLITE_OK

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about that: maybe a better option would be pass the result status to the progress callback, so that it can eventually signal the error and do some cleanup (think about closing a progress dialog).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the progress callback receives three arguments.

Py_INCREF(Py_None);
retval = Py_None;
} else {
/* TODO: should the (probably incomplete/invalid) backup be removed here? */

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should delete the backup

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will do that.

Thanks a lot for the review!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need some help here: I expected to find an abstract/portable PyOS_UnlinkFile(), but couldn't find it.
What is the recommended idiom to remove a file from C? Is the only way to "import" the os module and then PyObject_CallFunction() on its unlink member?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the simplest way for now, using unlink(2) from unistd.h. Any hint for a better way is welcome.

sqlite3_backup_pagecount(bckhandle))) {
/* User's callback raised an error: interrupt the loop and
propagate it. */
rc = -1;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add a test for the case that progress raise an error

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will do!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
if (rc != -1) {
rc = _pysqlite_seterror(bckconn, NULL);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this but from the documentation sqlite3_backup_step will not set an error code on error but will return one. I can see that in the example they do the same as here but I am not sure if this is the right behavior. Would we receive SQLITE_NOMEM error from sqlite3_backup_step?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will try to understand the point.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to address this concern in my latest patches.

}
Py_BEGIN_ALLOW_THREADS
bckhandle = sqlite3_backup_init(bckconn, "main", self->db, "main");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't database name be a parameter?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not really the database name: accordingly to the documentation that name denotes the target, that can either be "main" for the main storage, "temp" for the temporary storage or the name of the attached database.
Of course we can easily expose that as a (default) argument, but I did not see as necessary when I coded the feature.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a source_name and a dest_name parameters.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm, further investigation seems to reveal that the dest_name is pointless: attached databases are not a persistent property of a database, so that we cannot allow the following:

bck = sqlite.connect('/tmp/thebackup') bck.execute(f"ATTACH DATABASE '/tmp/theattachment' AS attached_db") bck.commit()

db = sqlite.connect('/tmp/database') db.backup('/tmp/thebackup', dest_name='attached_db')

Changed back to a single name argument to specify the source database...

@lelit

Suggested by Aviv Palivoda.

@lelit

@lelit

@lelit

When sqlite3_backup_step() returns OK it means that it was able to do it's work but there are further pages to be copied: in such case it's better to immediately start the next iteration, without opening a window that allows other threads to go on and possibly locking the database.

Suggested by Aviv Palivoda.

berkerpeksag

@@ -1478,6 +1478,94 @@ pysqlite_connection_iterdump(pysqlite_Connection* self, PyObject* args)
}
static PyObject *
pysqlite_connection_backup(pysqlite_Connection* self, PyObject* args, PyObject* kwds)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read SQLite release notes correctly, sqlite3_backup_init has been added in 3.6.11 so we need to expose this only if SQLite version is 3.6.11 or newer.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, is it better to omit the whole function or rather raise an NotImplementedError?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's omit the whole function. This is what we do for other version specific features in the stdlib. Please also update the documentation to note that this feature is available from SQLite 3.6.11.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done. Thank you!

from tempfile import NamedTemporaryFile
import unittest
class BackupTests(unittest.TestCase):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a @unittest.skipIf(...) decorator to skip the whole test case if SQLite version is not 3.6.11 or newer.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests to make sure that keyword-only parameters are really keyword-only:

self.cx.backup(bckfn.name, 1, lambda x, y: None) # should raise TypeError

@lelit

@lelit

palaviv

database is copied in a single step; otherwise the method performs a loop
copying up to the specified *pages* at a time.
If *progress* is specified, it must either ``None`` or a callable object that

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either be ``None``

@lelit

…llback

Add new SQLITE_DONE integer constant to the module, that could be used by the callback to determine whether the backup is terminated.

@lelit

This should address Aviv's concerns about how the sqlite3_backup_step() errors are handled.

@lelit

@lelit

Is there anything else I could/should do to see this accepted?

dgilman

This method makes a backup of a SQLite database into the mandatory argument
*filename*, even while it's being accessed by other clients, or concurrently by
the same connection.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention that backups work for on-disk and in-memory databases like the sqlite3 C api docs mention?

@lelit

@lelit

@Mariatta

@lelit We can only review PRs in CPython repo. Currently this PR has conflict. If the conflict is resolved in a different branch, maybe you can create the PR for the other branch?
Although I would prefer that the conflict is resolved in the same branch so we get to keep the past discussions in the same PR.

@lelit

Does that mean you'd prefer me to rebase the branch I opened the PR with?

@brettcannon

@brettcannon brettcannon changed the titlebpo-27645: Supporting native backup facility of SQLite bpo-27645: Supporting native backup facility of SQLite.

Nov 2, 2017

@brettcannon brettcannon changed the titlebpo-27645: Supporting native backup facility of SQLite. bpo-27645: Supporting native backup facility of SQLite

Nov 2, 2017

@lelit

@brettcannon, I rebased the branch into https://github.com/lelit/cpython/compare/sqlite-backup-api-v3
rewriting the NEWS change using the new tool.

As said in this comment I'm sorry about the fuss, but it's not clear to me whether I should have closed this PR and reopened a new one on the v3 branch, or if instead I should have rewritten the same v2 branch.

As always, I'm willing to do whatever is easier/more correct for you!

@brettcannon

@lelit I'm not sure what 'v2' and 'v3' are referring to (some personal branches in your fork you have?), but if you look at https://github.com/python/cpython/pull/377/files you will see that the branch your PR is based on has not been updated in regards to the NEWS file.

If it's easier, you can just close this PR, open a new one if you have some other branch that's in better shape, and then refer to this old PR in the new PR's opening comment.

@lelit

Ok, doing that immediately.

@lelit lelit mentioned this pull request

Nov 2, 2017

@lelit

As suggested by Brett, I'm closing this PR, replaced by #4238.