The Problem
I am currently running into an "incorrect string value" error when attempting to run Wagtail's update_index management command.
The stack trace looks like this:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python3.10/site-packages/django/db/backends/mysql/base.py", line 73, in execute
return self.cursor.execute(query, args)
File "/usr/local/lib/python3.10/site-packages/MySQLdb/cursors.py", line 209, in execute
res = self._query(query)
File "/usr/local/lib/python3.10/site-packages/MySQLdb/cursors.py", line 315, in _query
db.query(q)
File "/usr/local/lib/python3.10/site-packages/MySQLdb/connections.py", line 239, in query
_mysql.connection.query(self, query)
MySQLdb._exceptions.OperationalError: (1366, "Incorrect string value: '\\xCE\\x93\\xC3\\xBB\\xC3\\xAA...' for column 'body' at row 12")
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/app/manage.py", line 22, in <module>
execute_from_command_line(sys.argv)
File "/usr/local/lib/python3.10/site-packages/django/core/management/__init__.py", line 419, in execute_from_command_line
utility.execute()
File "/usr/local/lib/python3.10/site-packages/django/core/management/__init__.py", line 413, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/usr/local/lib/python3.10/site-packages/django/core/management/base.py", line 354, in run_from_argv
self.execute(*args, **cmd_options)
File "/usr/local/lib/python3.10/site-packages/django/core/management/base.py", line 398, in execute
output = self.handle(*args, **options)
File "/usr/local/lib/python3.10/site-packages/wagtail/search/management/commands/update_index.py", line 124, in handle
self.update_backend(
File "/usr/local/lib/python3.10/site-packages/wagtail/search/management/commands/update_index.py", line 88, in update_backend
index.add_items(model, chunk)
File "/usr/local/lib/python3.10/site-packages/wagtail/search/backends/database/mysql/mysql.py", line 241, in add_items
update_method(content_type_pk, indexers)
File "/usr/local/lib/python3.10/site-packages/wagtail/search/backends/database/mysql/mysql.py", line 224, in add_items_update_then_create
self.entries.bulk_create(to_be_created)
File "/usr/local/lib/python3.10/site-packages/django/db/models/query.py", line 514, in bulk_create
returned_columns = self._batched_insert(
File "/usr/local/lib/python3.10/site-packages/django/db/models/query.py", line 1293, in _batched_insert
self._insert(item, fields=fields, using=self.db, ignore_conflicts=ignore_conflicts)
File "/usr/local/lib/python3.10/site-packages/django/db/models/query.py", line 1270, in _insert
return query.get_compiler(using=using).execute_sql(returning_fields)
File "/usr/local/lib/python3.10/site-packages/django/db/models/sql/compiler.py", line 1416, in execute_sql
cursor.execute(sql, params)
File "/usr/local/lib/python3.10/site-packages/django/db/backends/utils.py", line 66, in execute
return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
File "/usr/local/lib/python3.10/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
return executor(sql, params, many, context)
File "/usr/local/lib/python3.10/site-packages/django/db/backends/utils.py", line 79, in _execute
with self.db.wrap_database_errors:
File "/usr/local/lib/python3.10/site-packages/django/db/utils.py", line 90, in __exit__
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/usr/local/lib/python3.10/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python3.10/site-packages/django/db/backends/mysql/base.py", line 73, in execute
return self.cursor.execute(query, args)
File "/usr/local/lib/python3.10/site-packages/MySQLdb/cursors.py", line 209, in execute
res = self._query(query)
File "/usr/local/lib/python3.10/site-packages/MySQLdb/cursors.py", line 315, in _query
db.query(q)
File "/usr/local/lib/python3.10/site-packages/MySQLdb/connections.py", line 239, in query
_mysql.connection.query(self, query)
django.db.utils.OperationalError: (1366, "Incorrect string value: '\\xCE\\x93\\xC3\\xBB\\xC3\\xAA...' for column 'body' at row 12")
This error seems to be related to the database using incorrect character encoding. The database for this website was originally created long ago and while I don't recall the exact MySQL version the database was originally created under, it was definitely before utf8mb4 became the default character encoding.
The Fix
I attempted to resolve the issue by issuing a series of ALTER DATABASE & ALTER TABLE commands but still received the same error. At this point I think the best way to handle it is to take advantage of the dockerized environment the database is running it since it makes it so easy to restore a database from backup. So rather than making changes to the running production database I should be able to edit my latest SQL backup file to change all mentions of encoding to utf8mb4.
I originally planned to set COLLATION to utf8mb4_unicode_ci but ultimately found it preferable to just remove all mentions of COLLATION from the backup file so everything falls back to the default of utf8mb4_0900_ai_ci which suits me just fine.
I also made sure the MySQL config file was setup to use utf8mb4.
[client]
default-character-set = utf8mb4
[mysql]
default-character-set = utf8mb4
[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
Strangely though, this fix only seems to work in my local dev environment but not here in the production environment. Since everything is a docker container there aren't many differences between the two environments so it's possible I'm missing something obvious somewhere or otherwise making a silly mistake.
I'll take a break and come back to this issue with fresh eyes since updating the search index isn't at all critical.
Update: I finally got things working here in the production environment using the exact steps as above. In my first attempt I think I must have grabbed the wrong backup file to restore. It didn't help that the server taking the automated backups names the files using the date command in UTC time, while I had created a few manual backups in MST time.