If you follow me on Twitter, you might have seen this recent thread, in which I spent a few days metaphorically banging my head against a wall trying to figure out why my Python program would occasionally segfault (with no apparent trigger).
This post is an attempt to write up the cause (and my workaround) in a clearer form, and hopefully make it Google-able for the next poor soul to encounter this (if it's not fixed before then).
Dramatis Personæ
The program in question here is a Python web application, written in Flask, and run using gunicorn (using a few sync workers). It also uses SQLAlchemy (via Flask-SQLAlchemy) to access a sqlite3 database. It’s all running on a Google Compute Engine VM running Debian Linux.
Most of those details aren’t relevant, but I mention them just to give a sense of the scope: each of them was a suspect at some stage or another during my investigation.
The Evidence
Aside from seeing 5xx responses from my web server, the only information I had was in syslog:
kernel: [1234.567890] gunicorn[1234]: segfault at ad0 ip 0000000000000ad0 sp 00007fffc0401448 error 14 in python3.5[55f0889ee000+3ef000]
The gunicorn logs noted that it had started a new worker to replace the segfaulted one, but nothing more. The syslog entry is really quite odd: the at ad0 and ip 0000000000000ad0 parts indicate that the program tried to jump to (and hence execute) code at the address 0xad0, which is quite suspicious: I don't know offhand what's at that address, but it’s not usually program code.
Clearly I needed more information to debug this, so I tried to get a core dump. For reasons I’m still not entirely sure of, getting a core dump didn’t just work out-of-the-box—I believe Linux is meant to write a file named core to the process’s working directory (provided ulimit is set correctly).
Some quick Googling revealed that systemd-coredump is a thing, so I installed that, and then my program didn’t segfault for an hour, so I tweeted thinking that this was a Heisenbug. It turned out not to be the case: the segfault only happened a few times an hour, and I was just unlucky.
Anyhow, I later came back and used coredumpctl list to see when the program had segfaulted, and coredumpctl gdb to launch gdb with the core dump loaded. For all the hate systemd seems to get for reinventing the wheel, I will say that systemd-coredump seems quite good at its job.
$ coredumpctl gdb ... Core was generated by `/usr/bin/python3'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000000000000ad0 in ?? () (gdb)
Great: there's the 0xad0 address from my syslog entry, so the crash is the same as the one I’m seeing! This might seem like a minor point, but fervently verifying the world is as you expect it to be is critically important in debugging. Let’s look at a backtrace to see what the program was doing when it segfaulted:
(gdb) bt #0 0x0000000000000ad0 in ?? () #1 0x0000556dce8116df in PyCFunction_Call () at ../Objects/methodobject.c:109 #2 0x0000556dce7d16e9 in call_function (oparg=<optimized out>, pp_stack=0x7ffd2fc7e1e0) at ../Python/ceval.c:4720 #3 PyEval_EvalFrameEx () at ../Python/ceval.c:3251 #4 0x0000556dce7d193f in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7ffd2fc7e310, func=<optimized out>) at ../Python/ceval.c:4818 #5 call_function (oparg=<optimized out>, pp_stack=0x7ffd2fc7e310) at ../Python/ceval.c:4745 #6 PyEval_EvalFrameEx () at ../Python/ceval.c:3251 #7 0x0000556dce7d193f in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7ffd2fc7e440, func=<optimized out>) at ../Python/ceval.c:4818 #8 call_function (oparg=<optimized out>, pp_stack=0x7ffd2fc7e440) at ../Python/ceval.c:4745 #9 PyEval_EvalFrameEx () at ../Python/ceval.c:3251 #10 0x0000556dce7d6286 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4033 #11 0x0000556dce7d24c9 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7ffd2fc7e650, func=<optimized out>) at ../Python/ceval.c:4828 #12 call_function (oparg=<optimized out>, pp_stack=0x7ffd2fc7e650) at ../Python/ceval.c:4745 #13 PyEval_EvalFrameEx () at ../Python/ceval.c:3251 #14 0x0000556dce7d193f in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7ffd2fc7e780, func=<optimized out>) at ../Python/ceval.c:4818 #15 call_function (oparg=<optimized out>, pp_stack=0x7ffd2fc7e780) at ../Python/ceval.c:4745 #16 PyEval_EvalFrameEx () at ../Python/ceval.c:3251 #17 0x0000556dce7d70df in _PyEval_EvalCodeWithName (qualname=0x0, name=<optimized out>, closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwcount=0, kws=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=<optimized out>, globals=<optimized out>, _co=<optimized out>) at ../Python/ceval.c:4033 #18 PyEval_EvalCodeEx () at ../Python/ceval.c:4054 #19 0x0000556dce8135d3 in function_call.lto_priv () at ../Objects/funcobject.c:627 #20 0x0000556dce85a647 in PyObject_Call () at ../Objects/abstract.c:2166 #21 0x0000556dce77a94e in method_call.lto_priv () at ../Objects/classobject.c:330 #22 0x0000556dce85a647 in PyObject_Call () at ../Objects/abstract.c:2166 #23 0x0000556dce81aebf in slot_tp_iter () at ../Objects/typeobject.c:6214 #24 0x0000556dce85a4a3 in PyObject_GetIter () at ../Objects/abstract.c:2755 #25 0x0000556dce84cd56 in listextend.lto_priv.683 ( b=<BaseQuery(_group_by=False, _params={<_anonymous_label at remote 0x7f26c3adbd48>: 'flowblok.id.au'}, _entities=[<_MapperEntity(entity_zero=<Mapper(self_and_descendants=<WeakSequence(_storage=[<weakref at remote 0x7f26c3a9bf98>]) at remote 0x7f26c3aae898>, inherits=None, version_id_generator=<function at remote 0x7f26c3e837b8>, confirm_deleted_rows=True, _readonly_props=set(), local_table=<Table(name=<quoted_name at remote 0x7f26c3e8c1c8>, description=<...>, _cloned_set={<...>}, constraints={<ForeignKeyConstraint(name='domain_admin_group_constraint', initially=None, dialect_kwargs=<_DialectArgView(obj=<...>) at remote 0x7f26c82b5160>, parent=<...>, _referred_table=<Table(name=<quoted_name at remote 0x7f26c3e8cbc8>, description=<...>, _cloned_set={<...>}, constraints={<ForeignKeyConstraint(name=None, initially=None, parent=<...>, _referred_table=<...>, columns=<ColumnCollection at remote 0x7f26c3e22cc0>, ondelete=None, elements=[<ForeignKey(name=None, column=<Column(_cloned_set={<...>}, constraints=set(), forei...(truncated), self=0x7f26c3433608) at ../Objects/listobject.c:831 ... (truncated for your sanity)
This isn’t terribly useful: it’s full of Python interpreter stuff. Knowing the title of the blog, you might notice that the calling function is PyCFunction_Call, which you might start to get ideas from: but that could be from anything. Fortunately, gdb has amazing Python integration, so I can do this instead:
(gdb) py-bt Traceback (most recent call first): <error reading variable: Cannot access memory at address 0xe90> File "/…/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1024, in _execute_clauseelement distilled_params = _distill_params(multiparams, params) File "/…/lib/python3.5/site-packages/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "/…/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 948, in execute return meth(self, multiparams, params) File "/…/lib/python3.5/site-packages/sqlalchemy/orm/query.py", line 2958, in _execute_and_instances result = conn.execute(querycontext.statement, self._params) File "/…/lib/python3.5/site-packages/sqlalchemy/orm/query.py", line 2935, in __iter__ return self._execute_and_instances(context) File "/…/lib/python3.5/site-packages/sqlalchemy/orm/query.py", line 2864, in one_or_none ret = list(self) File "/…/lib/python3.5/site-packages/sqlalchemy/orm/query.py", line 2894, in one ret = self.one_or_none() File "/…/lib/python3.5/site-packages/sqlalchemy/orm/loading.py", line 250, in load_on_pk_identity return q.one() File "/…/lib/python3.5/site-packages/sqlalchemy/orm/query.py", line 962, in _get_impl return db_load_fn(self, primary_key_identity) File "/…/lib/python3.5/site-packages/sqlalchemy/orm/query.py", line 887, in get ident, loading.load_on_pk_identity) File "/…/app.py", line 47, in setup_context ... (truncated for your sanity)
The last frame of that backtrace is my application code, trying to run a SQLAlchemy query. The rest of the backtrace is SQLAlchemy trying to execute it. But the first frame is most interesting: SQLAlchemy is calling a function _distill_params, but we don't have the source for that call.
If you look at the source for sqlalchemy/engine/base.py, you’ll see that _distill_params gets imported from sqlalchemy/engine/util.py, which has this:
71 72 73 74 | try: from sqlalchemy.cutils import _distill_params except ImportError: globals().update(py_fallback()) |
That is, _distill_params comes from a C extension module, unless it’s not importable, in which case there’s a pure Python version as a fallback.
C Extension Modules
At this point, I’ve figured out the first part of what’s gone wrong: something is causing Python to think that the address of the _distill_params function is 0xad0, instead of whatever it’s meant to actually be.
For the next part, we need to delve a little into how C extension modules work. Here’s part of the source for sqlalchemy/cextension/utils.c, which provides the sqlalchemy.cutils module imported above.
178 179 180 181 182 | static PyMethodDef module_methods[] = { {"_distill_params", distill_params, METH_VARARGS, "Distill an execute() parameter structure."}, {NULL, NULL, 0, NULL} /* Sentinel */ }; |
This C file gets compiled into a shared library: on my system it’s a file named cutils.cpython-35m-x86_64-linux-gnu.so. The CPython interpreter loads extension modules in Python/importdl.c, which uses the _PyImport_FindSharedFuncptr function in Python/dynload_shlib.c, which uses dlopen to load that .so file into memory and get a function pointer.
Most of those details weren’t important: the key part is that we have some C code, being loaded into the process’s memory at runtime. So, let’s attach GDB to a running process, and have at look at that module_methods array:
$ gdb -p 1234 ... (gdb) p 'sqlalchemy/cextension/utils.c'::module_methods $1 = {{ml_name = 0x7f66abdfde90 "_distill_params", ml_meth = 0x7f66abdfdad0 <distill_params>, ml_flags = 1, ml_doc = 0x7f66abdfdef8 "Distill an execute() parameter structure."}, {ml_name = 0x0, ml_meth = 0x0, ml_flags = 0, ml_doc = 0x0}}
That’s more or less what I’d expect to see: ml_name has the Python name of the function, and ml_meth is a pointer to the distill_params C function. What happens if we try the same thing in our core dump?
$ coredumpctl gdb ... (gdb) p 'sqlalchemy/cextension/utils.c'::module_methods $1 = {{ml_name = 0xe90 <error: Cannot access memory at address 0xe90>, ml_meth = 0xad0, ml_flags = 1, ml_doc = 0xef8 <error: Cannot access memory at address 0xef8>}, {ml_name = 0x0, ml_meth = 0x0, ml_flags = 0, ml_doc = 0x0}}
Well. That’s not right.
But wait, those hex numbers look kinda similar to the correct numbers…
>>> print('{:48b}'.format(0x7f66abdfde90))
11111110110011010101011110111111101111010010000
>>> print('{:48b}'.format(0xe90))
111010010000
>>> print('{:48b}'.format(0x7f66abdfdad0))
11111110110011010101011110111111101101011010000
>>> print('{:48b}'.format(0xad0))
101011010000
Huh? Sometime between being a perfectly happy process chugging along, and it segfaulting, the module_methods struct in a C extension module gets all its numbers truncated to 12 [*] bits‽
Time and Space
Thinking that the problem was something writing to the memory to corrupt it, I spent quite some time looking through strace and ltrace outputs to try and find something touching the relevant addresses in memory, but to no avail.
My breakthrough came when I compared /proc/$PID/smaps from a running process and a almost-segfaulted process. To capture the latter, I attached GDB, and then typed in shell cat /proc/1234/smaps > 1234.smaps and hit enter, so it would get run when the program had segfaulted (but before Linux forgot the process state). I also attempted using a breakpoint, but in hindsight I don’t think that worked. Regardless, I found the section of memory in which the module_methods struct resided, and found this difference:
Before:
7f26c44e2000-7f26c44e3000 rw-p 00002000 08:10 666070 /…/cutils.cpython-35m-x86_64-linux-gnu.so Size: 4 kB Rss: 4 kB Pss: 4 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 4 kB Referenced: 4 kB Anonymous: 4 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Locked: 0 kB VmFlags: rd wr mr mw me ac sd
After:
7f26c44e2000-7f26c44e3000 rw-p 00002000 08:10 666070 /…/cutils.cpython-35m-x86_64-linux-gnu.so Size: 4 kB Rss: 4 kB Pss: 4 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 4 kB Private_Dirty: 0 kB Referenced: 4 kB Anonymous: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Locked: 0 kB
The key difference here is that this section of memory went from having 4 kB of Private_Dirty data to having 4 kB of Private_Clean data.
Combined with the knowledge that the numbers inside that section of memory seemed to be being “truncated”, I realised what was happening: this section of memory was being reloaded from the shared library file. I’m glossing over some details here, but on disk the shared library has various pointers to code and data within that shared library, but when it gets loaded into a process’s memory, it’ll be loaded at some offset, and all the pointers within the shared library need to be adjusted in order to work.
So on disk, inside the module_methods struct, the ml_name pointer is actually 0xe90, but when it initially gets loaded into memory, it gets adjusted to 0x7f66abdfde90 so it’s pointing at the location where the string "_distill_params" got loaded into memory.
The Piping Gun
Knowing that I was looking for something which was causing the shared library file to be reloaded from disk, I remembered that I had a bug in my automation which was unnecessarily reinstalling all Python libraries using pip. I didn’t think it was a huge problem: the downloads are cached locally, and reinstalling a package with the same version should just be a no-op, right?
Turns out not so much. When pip installs a wheel, it replaces all the files using shutil.copyfile. This opens the file for writing, truncates it, and then copies the content in.
Because this is changing the same file which is also memory mapped into a running process, Linux notices that section of data has been re-written (even with exactly the same bytes), and invalidates all unwritten changes in active processes (this is what makes the memory mapping go from Private_Dirty to Private_Clean), thus undoing the offset adjustments the process made when loading the shared library.
Honestly, I’m not sure where the bug is here. Probably after loading a shared library, Python should unassociate the data section from the file it was loaded from: though I’m not sure if the C functions to do that even exist. I’d also say pip should do atomic file replacements to avoid the problem, but this is more of a bandaid than an actual fix.
My Workaround
After figuring out what was happening, I had a realisation that this problem could be mitigated in the application, with a pure Python solution.
Instead of trying to unassociate the memory mapping, I can atomically replace the file it loaded with a copy of itself, thus making the memory mapping in the process point at data which no longer exists in the filesystem.
So here is a Python module which you can import, which will periodically find C extension modules loaded into the Python process and perform shenanigans on them to make them resilient against this failure mode. This potentially costs memory, but I doubt it’ll ever be an amount you care about.
# Copyright 2018 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
'''
Module sopin implements "pinning" of shared libraries.
This is a workaround for the problem where rewriting a Python C extension module
on disk corrupts the state of any Python process which has previously imported
it: for more details, see
https://blog.flowblok.id.au/2018-06/python-c-extension-module-corruption.html
Usage:
import sopin
sopin.start()
'''
import os
import shutil
import sys
import tempfile
import threading
import warnings
SUFFIXES = ('.so',)
# Modules previously pinned.
_pinned = set()
# Modules which we couldn't pin (so shouldn't reattempt).
_blacklisted = set()
def pin(filename):
if filename in _pinned or filename in _blacklisted:
return
try:
# Atomically create a unique tempfile.
# Don't delete, because we'll move it over the old file.
with tempfile.NamedTemporaryFile(delete=False) as f:
# Copy the content+permissions of the old file into our new file.
shutil.copy2(filename, f.name)
# Move the new file over the old file.
os.rename(f.name, filename)
except IOError:
warnings.warn('could not pin %s' % filename)
_blacklisted.add(filename)
else:
_pinned.add(filename)
def pin_all():
for name, module in sys.modules.items():
try:
filename = module.__file__
except AttributeError:
continue
if filename.endswith(SUFFIXES):
pin(filename)
def start(period_seconds=10):
threading.Timer(period_seconds, pin_all).start()
[*] | Twitter followers may note this is not the 13 or 14 that I tweeted: I was actually debugging processors.c at that point, so these details are slightly different. |