MySQL Doesn't Like My Strings

Tags: database django mysql wagtail

The Problem

I am currently running into an "incorrect string value" error when attempting to run Wagtail's update_index management command.

The stack trace looks like this:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/usr/local/lib/python3.10/site-packages/django/db/backends/mysql/base.py", line 73, in execute
    return self.cursor.execute(query, args)
  File "/usr/local/lib/python3.10/site-packages/MySQLdb/cursors.py", line 209, in execute
    res = self._query(query)
  File "/usr/local/lib/python3.10/site-packages/MySQLdb/cursors.py", line 315, in _query
    db.query(q)
  File "/usr/local/lib/python3.10/site-packages/MySQLdb/connections.py", line 239, in query
    _mysql.connection.query(self, query)
MySQLdb._exceptions.OperationalError: (1366, "Incorrect string value: '\\xCE\\x93\\xC3\\xBB\\xC3\\xAA...' for column 'body' at row 12")

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/manage.py", line 22, in <module>
    execute_from_command_line(sys.argv)
  File "/usr/local/lib/python3.10/site-packages/django/core/management/__init__.py", line 419, in execute_from_command_line
    utility.execute()
  File "/usr/local/lib/python3.10/site-packages/django/core/management/__init__.py", line 413, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/usr/local/lib/python3.10/site-packages/django/core/management/base.py", line 354, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/usr/local/lib/python3.10/site-packages/django/core/management/base.py", line 398, in execute
    output = self.handle(*args, **options)
  File "/usr/local/lib/python3.10/site-packages/wagtail/search/management/commands/update_index.py", line 124, in handle
    self.update_backend(
  File "/usr/local/lib/python3.10/site-packages/wagtail/search/management/commands/update_index.py", line 88, in update_backend
    index.add_items(model, chunk)
  File "/usr/local/lib/python3.10/site-packages/wagtail/search/backends/database/mysql/mysql.py", line 241, in add_items
    update_method(content_type_pk, indexers)
  File "/usr/local/lib/python3.10/site-packages/wagtail/search/backends/database/mysql/mysql.py", line 224, in add_items_update_then_create
    self.entries.bulk_create(to_be_created)
  File "/usr/local/lib/python3.10/site-packages/django/db/models/query.py", line 514, in bulk_create
    returned_columns = self._batched_insert(
  File "/usr/local/lib/python3.10/site-packages/django/db/models/query.py", line 1293, in _batched_insert
    self._insert(item, fields=fields, using=self.db, ignore_conflicts=ignore_conflicts)
  File "/usr/local/lib/python3.10/site-packages/django/db/models/query.py", line 1270, in _insert
    return query.get_compiler(using=using).execute_sql(returning_fields)
  File "/usr/local/lib/python3.10/site-packages/django/db/models/sql/compiler.py", line 1416, in execute_sql
    cursor.execute(sql, params)
  File "/usr/local/lib/python3.10/site-packages/django/db/backends/utils.py", line 66, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/usr/local/lib/python3.10/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/usr/local/lib/python3.10/site-packages/django/db/backends/utils.py", line 79, in _execute
    with self.db.wrap_database_errors:
  File "/usr/local/lib/python3.10/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/usr/local/lib/python3.10/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/usr/local/lib/python3.10/site-packages/django/db/backends/mysql/base.py", line 73, in execute
    return self.cursor.execute(query, args)
  File "/usr/local/lib/python3.10/site-packages/MySQLdb/cursors.py", line 209, in execute
    res = self._query(query)
  File "/usr/local/lib/python3.10/site-packages/MySQLdb/cursors.py", line 315, in _query
    db.query(q)
  File "/usr/local/lib/python3.10/site-packages/MySQLdb/connections.py", line 239, in query
    _mysql.connection.query(self, query)
django.db.utils.OperationalError: (1366, "Incorrect string value: '\\xCE\\x93\\xC3\\xBB\\xC3\\xAA...' for column 'body' at row 12")

This error seems to be related to the database using incorrect character encoding. The database for this website was originally created long ago and while I don't recall the exact MySQL version the database was originally created under, it was definitely before utf8mb4 became the default character encoding.

The Fix

I attempted to resolve the issue by issuing a series of ALTER DATABASE & ALTER TABLE commands but still received the same error. At this point I think the best way to handle it is to take advantage of the dockerized environment the database is running it since it makes it so easy to restore a database from backup. So rather than making changes to the running production database I should be able to edit my latest SQL backup file to change all mentions of encoding to utf8mb4.

I originally planned to set COLLATION to utf8mb4_unicode_ci but ultimately found it preferable to just remove all mentions of COLLATION from the backup file so everything falls back to the default of utf8mb4_0900_ai_ci which suits me just fine.

I also made sure the MySQL config file was setup to use utf8mb4.

[client]
default-character-set = utf8mb4

[mysql]
default-character-set = utf8mb4

[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4

Strangely though, this fix only seems to work in my local dev environment but not here in the production environment. Since everything is a docker container there aren't many differences between the two environments so it's possible I'm missing something obvious somewhere or otherwise making a silly mistake.

I'll take a break and come back to this issue with fresh eyes since updating the search index isn't at all critical.

Update: I finally got things working here in the production environment using the exact steps as above. In my first attempt I think I must have grabbed the wrong backup file to restore. It didn't help that the server taking the automated backups names the files using the date command in UTC time, while I had created a few manual backups in MST time.