Issue 29021: Custom functions in sqlite receive None on invalid UTF-8 (original) (raw)

Created on 2016-12-20 09:24 by Ingo Ruhnke, last changed 2022-04-11 14:58 by admin.

Messages (7)

msg283674 - (view)

Author: Ingo Ruhnke (Ingo Ruhnke)

Date: 2016-12-20 09:24

When a sqlite database contains invalid UTF-8 code in a TEXT column, Python can query that data normally when .text_factory is set appropriately. However when a custom function is created with .create_function() and applied to that column the custom function will receive 'None' as argument instead of the value of the column.

The following example demonstrate the issue:

Example:

import sqlite3 import sys import os

con = sqlite3.connect(":memory:") con.text_factory = os.fsdecode

con.create_function("py_identity", 1, lambda x: x)

cur = con.cursor() cur.execute("CREATE TABLE foo(bar TEXT)")

insert some invalid UTF-8 into the database

cur.execute("INSERT INTO foo(bar) VALUES(cast(? AS TEXT))", [b"\xff"])

try to call a custom function on the invalid UTF-8

cur.execute("SELECT " " typeof(bar), " " bar, " # this works " py_identity(bar), " # this returns None instead of the content of 'bar' " cast(py_identity(cast(bar as BLOB)) AS TEXT) " # this works around the issue "FROM foo")

for row in cur: print(row)

Output:

('text', '\udcff', None, '\udcff')

Expected:

('text', '\udcff', '\udcff', '\udcff')

msg284673 - (view)

Author: Aviv Palivoda (palaviv) *

Date: 2017-01-04 22:07

The problem is in _pysqlite_build_py_params function at connection.c. In case we have text we do the following code:

case SQLITE_TEXT: val_str = (const char*)sqlite3_value_text(cur_value); cur_py_value = PyUnicode_FromString(val_str); /* TODO: have a way to show errors here */ if (!cur_py_value) { PyErr_Clear(); Py_INCREF(Py_None); cur_py_value = Py_None; } break;

As you can see we call PyUnicode_FromString instead of text_factory.

I started making a patch to fix this by passing the text_factory to _pysqlite_build_py_params function but I currently have a problem with setting the result to the sqlite. User text_factory may return any type of object and I can't see how to handle that...

msg284700 - (view)

Author: Berker Peksag (berker.peksag) * (Python committer)

Date: 2017-01-05 03:53

We can just fallback to the current behavior (i.e. return None) if the return type of text_factory(val_str) is not str, bytes or bytearray. I think this is also somewhat similar to what we do for the return values of sqlite3_column_text().

msg284776 - (view)

Author: Aviv Palivoda (palaviv) *

Date: 2017-01-05 22:22

I actually was wrong and there is no problem with the type that is returned from the text_factory as it passes directly as argument for the create_function function.

Attached is a patch that change create_function and create_aggregate to use the text_factory when a TEXT data type is received. I am now using sqlite3_create_function_v2 but this is fine because we use sqlite3_stmt_readonly and it is from a newer version.

As for Ingo example code it still don't work with this current fix but this is due to a different problem. Now the _pysqlite_set_result function fail when we do PyUnicode_AsUTF8 on the result from py_identity. As this is a different problem I will fix this in a different patch.

msg284783 - (view)

Author: Aviv Palivoda (palaviv) *

Date: 2017-01-05 22:51

Actually had a small mistake in the patch I uploaded. Uploading a fixed one.

msg284924 - (view)

Author: Aviv Palivoda (palaviv) *

Date: 2017-01-07 16:12

After looking more into the _pysqlite_set_result function fail in Ingo example I think this is the expected behavior. The PyUnicode_AsUTF8 fails but this is expected as the value is an invalid UTF-8.

msg291726 - (view)

Author: Aviv Palivoda (palaviv) *

Date: 2017-04-15 18:18

In my patch I use sqlite3_create_function_v2 which was added in sqlite 3.7.3 (2010-10-08). There were a few problems after adding sqlite3_stmt_readonly in 284676cf2ac8 as can be seen in issue #29355. sqlite3_stmt_readonly was added in 3.7.4 (2010-12-07) so I guess using sqlite3_create_function_v2 will cause the same problems.

Do you think I should find another way to fix this issue? I can follow all the user functions and free on my own but that will make the code more complex. On the other hand I can add this fix in ifdef and the behavior will be different when using sqlite in versions smaller then 3.7.3.

History

Date

User

Action

Args

2022-04-11 14:58:40

admin

set

github: 73207

2017-04-15 18🔞47

palaviv

set

messages: +

2017-01-07 16:12:13

palaviv

set

messages: +

2017-01-05 22:51:39

palaviv

set

files: + 29021-fixed.patch

messages: +

2017-01-05 22:22:16

palaviv

set

files: + 29021.patch
keywords: + patch
messages: +

2017-01-05 03:53:03

berker.peksag

set

versions: + Python 3.7
nosy: + berker.peksag

messages: +

stage: needs patch

2017-01-04 22:07:23

palaviv

set

nosy: + palaviv
messages: +

2016-12-20 22:06:46

ned.deily

set

nosy: + ghaering

2016-12-20 09:24:51

Ingo Ruhnke

create