我试图找出 Django 和数据库连接错误问题的根源。此时,我正在调试提示,因为我认为症状太不具体。
一些背景 - 我一直在使用这个堆栈,在 AWS 中部署了很多年,没有出现任何问题:
- Ubuntu(在本例中为 20.04 LTS)
- Nginx
- Uwsgi
- Postgresql(RDS 中的 v12 - 尝试了 v13 但同样的错误)
AWS 负载均衡器将流量发送到 Ubuntu 实例,该实例由 Nginx 处理,然后转发到在 Uwsgi 中运行的 Django (3.2.6)。 Django 使用 psycopg2 (2.9.1) 连接到数据库。通常这个设置对我来说非常有效。
我遇到的问题是数据库连接似乎是随机关闭的。 Django 报告这样的错误:
Traceback (most recent call last):
[my code...]
for answer in q.select_related('entry__session__player'):
File "/usr/local/lib/python3.8/dist-packages/django/db/models/query.py", line 280, in __iter__
self._fetch_all()
File "/usr/local/lib/python3.8/dist-packages/django/db/models/query.py", line 1324, in _fetch_all
self._result_cache = list(self._iterable_class(self))
File "/usr/local/lib/python3.8/dist-packages/django/db/models/query.py", line 51, in __iter__
results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
File "/usr/local/lib/python3.8/dist-packages/django/db/models/sql/compiler.py", line 1173, in execute_sql
cursor = self.connection.cursor()
File "/usr/local/lib/python3.8/dist-packages/django/utils/asyncio.py", line 26, in inner
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/django/db/backends/base/base.py", line 259, in cursor
return self._cursor()
File "/usr/local/lib/python3.8/dist-packages/django/db/backends/base/base.py", line 237, in _cursor
return self._prepare_cursor(self.create_cursor(name))
File "/usr/local/lib/python3.8/dist-packages/django/db/utils.py", line 90, in __exit__
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/usr/local/lib/python3.8/dist-packages/django/db/backends/base/base.py", line 237, in _cursor
return self._prepare_cursor(self.create_cursor(name))
File "/usr/local/lib/python3.8/dist-packages/django/utils/asyncio.py", line 26, in inner
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/django/db/backends/postgresql/base.py", line 236, in create_cursor
cursor = self.connection.cursor()
django.db.utils.InterfaceError: connection already closed
我的代码中的位置有所不同。有时(不太频繁)我也会得到这个:
Traceback (most recent call last):
[my code...]
group = contest.groups.create(restaurant = restaurant, supergroup = supergroup)
File "/usr/local/lib/python3.8/dist-packages/django/db/models/fields/related_descriptors.py", line 677, in create
return super(RelatedManager, self.db_manager(db)).create(**kwargs)
File "/usr/local/lib/python3.8/dist-packages/django/db/models/manager.py", line 85, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/django/db/models/query.py", line 453, in create
obj.save(force_insert=True, using=self.db)
File "/usr/local/lib/python3.8/dist-packages/django/db/models/base.py", line 726, in save
self.save_base(using=using, force_insert=force_insert,
File "/usr/local/lib/python3.8/dist-packages/django/db/models/base.py", line 763, in save_base
updated = self._save_table(
File "/usr/local/lib/python3.8/dist-packages/django/db/models/base.py", line 868, in _save_table
results = self._do_insert(cls._base_manager, using, fields, returning_fields, raw)
File "/usr/local/lib/python3.8/dist-packages/django/db/models/base.py", line 906, in _do_insert
return manager._insert(
File "/usr/local/lib/python3.8/dist-packages/django/db/models/manager.py", line 85, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/django/db/models/query.py", line 1270, in _insert
return query.get_compiler(using=using).execute_sql(returning_fields)
File "/usr/local/lib/python3.8/dist-packages/django/db/models/sql/compiler.py", line 1416, in execute_sql
cursor.execute(sql, params)
File "/usr/local/lib/python3.8/dist-packages/django/db/backends/utils.py", line 66, in execute
return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
File "/usr/local/lib/python3.8/dist-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
return executor(sql, params, many, context)
File "/usr/local/lib/python3.8/dist-packages/django/db/backends/utils.py", line 78, in _execute
self.db.validate_no_broken_transaction()
File "/usr/local/lib/python3.8/dist-packages/django/db/backends/base/base.py", line 447, in validate_no_broken_transaction
raise TransactionManagementError(
django.db.transaction.TransactionManagementError: An error occurred in the current transaction. You can't execute queries until the end of the 'atomic' block.
同样,我的代码中的位置各不相同,并且并不总是在简单的位置create打电话 - 有时是批量创建, 有时获取或创建。我猜测根本原因可能与“connection_close”错误相同,但我不确定。
这就是姜戈告诉我的。 Postgresql 日志不包含任何与 Django 报告的错误在时间上一致的错误。日志中唯一的错误具有以下形式:
LOG: could not receive data from client: Connection reset by peer
这些与 uwsgi 杀死工作进程相一致(我为每个工作进程设置了 1000 个请求限制,以避免任何潜在的内存泄漏问题),因此它们不相关。
所以 Postgresql 没有报告相关错误 - 我只能假设连接已正确关闭,而 Django 并没有预料到这一点。 Ubuntu 实例上的 systemd 日志中根本没有错误。
我不确定如何继续。我怀疑这是 Django 中的一个错误,但系统中没有其他组件在抱怨,这一定是一个低级问题。这种情况很少发生,但足以引起关注 - 大约千分之一的请求。
任何关于如何进一步调查这一问题的见解或建议将不胜感激:)