Issue 29021: Custom functions in sqlite receive None on invalid UTF-8 (original) (raw)
Created on 2016-12-20 09:24 by Ingo Ruhnke, last changed 2022-04-11 14:58 by admin.
Messages (7)
Author: Ingo Ruhnke (Ingo Ruhnke)
Date: 2016-12-20 09:24
When a sqlite database contains invalid UTF-8 code in a TEXT column, Python can query that data normally when .text_factory is set appropriately. However when a custom function is created with .create_function() and applied to that column the custom function will receive 'None' as argument instead of the value of the column.
The following example demonstrate the issue:
Example:
import sqlite3 import sys import os
con = sqlite3.connect(":memory:") con.text_factory = os.fsdecode
con.create_function("py_identity", 1, lambda x: x)
cur = con.cursor() cur.execute("CREATE TABLE foo(bar TEXT)")
insert some invalid UTF-8 into the database
cur.execute("INSERT INTO foo(bar) VALUES(cast(? AS TEXT))", [b"\xff"])
try to call a custom function on the invalid UTF-8
cur.execute("SELECT " " typeof(bar), " " bar, " # this works " py_identity(bar), " # this returns None instead of the content of 'bar' " cast(py_identity(cast(bar as BLOB)) AS TEXT) " # this works around the issue "FROM foo")
for row in cur: print(row)
Output:
('text', '\udcff', None, '\udcff')
Expected:
('text', '\udcff', '\udcff', '\udcff')
Author: Aviv Palivoda (palaviv) *
Date: 2017-01-04 22:07
The problem is in _pysqlite_build_py_params function at connection.c. In case we have text we do the following code:
case SQLITE_TEXT: val_str = (const char*)sqlite3_value_text(cur_value); cur_py_value = PyUnicode_FromString(val_str); /* TODO: have a way to show errors here */ if (!cur_py_value) { PyErr_Clear(); Py_INCREF(Py_None); cur_py_value = Py_None; } break;
As you can see we call PyUnicode_FromString instead of text_factory.
I started making a patch to fix this by passing the text_factory to _pysqlite_build_py_params function but I currently have a problem with setting the result to the sqlite. User text_factory may return any type of object and I can't see how to handle that...
Author: Berker Peksag (berker.peksag) *
Date: 2017-01-05 03:53
We can just fallback to the current behavior (i.e. return None) if the return type of text_factory(val_str) is not str, bytes or bytearray. I think this is also somewhat similar to what we do for the return values of sqlite3_column_text().
Author: Aviv Palivoda (palaviv) *
Date: 2017-01-05 22:22
I actually was wrong and there is no problem with the type that is returned from the text_factory as it passes directly as argument for the create_function function.
Attached is a patch that change create_function and create_aggregate to use the text_factory when a TEXT data type is received. I am now using sqlite3_create_function_v2 but this is fine because we use sqlite3_stmt_readonly and it is from a newer version.
As for Ingo example code it still don't work with this current fix but this is due to a different problem. Now the _pysqlite_set_result function fail when we do PyUnicode_AsUTF8 on the result from py_identity. As this is a different problem I will fix this in a different patch.
Author: Aviv Palivoda (palaviv) *
Date: 2017-01-05 22:51
Actually had a small mistake in the patch I uploaded. Uploading a fixed one.
Author: Aviv Palivoda (palaviv) *
Date: 2017-01-07 16:12
After looking more into the _pysqlite_set_result function fail in Ingo example I think this is the expected behavior. The PyUnicode_AsUTF8 fails but this is expected as the value is an invalid UTF-8.
Author: Aviv Palivoda (palaviv) *
Date: 2017-04-15 18:18
In my patch I use sqlite3_create_function_v2 which was added in sqlite 3.7.3 (2010-10-08). There were a few problems after adding sqlite3_stmt_readonly in 284676cf2ac8 as can be seen in issue #29355. sqlite3_stmt_readonly was added in 3.7.4 (2010-12-07) so I guess using sqlite3_create_function_v2 will cause the same problems.
Do you think I should find another way to fix this issue? I can follow all the user functions and free on my own but that will make the code more complex. On the other hand I can add this fix in ifdef and the behavior will be different when using sqlite in versions smaller then 3.7.3.
History
Date
User
Action
Args
2022-04-11 14:58:40
admin
set
github: 73207
2017-04-15 18🔞47
palaviv
set
messages: +
2017-01-07 16:12:13
palaviv
set
messages: +
2017-01-05 22:51:39
palaviv
set
files: + 29021-fixed.patch
messages: +
2017-01-05 22:22:16
palaviv
set
files: + 29021.patch
keywords: + patch
messages: +
2017-01-05 03:53:03
berker.peksag
set
versions: + Python 3.7
nosy: + berker.peksag
messages: +
stage: needs patch
2017-01-04 22:07:23
palaviv
set
nosy: + palaviv
messages: +
2016-12-20 22:06:46
ned.deily
set
nosy: + ghaering
2016-12-20 09:24:51
Ingo Ruhnke
create