git-svn-id: https://swig.svn.sourceforge.net/svnroot/swig/trunk/SWIG@1053 626c5289-ae23-0410-ae9c-e8d60b6d4f22
860 lines
38 KiB
HTML
860 lines
38 KiB
HTML
<html>
|
|
<head>
|
|
<title>WAD: A Module for Converting Fatal Extension Errors into Python Exceptions</title>
|
|
</head>
|
|
<body bgcolor="#ffffff">
|
|
<center>
|
|
|
|
<h2>WAD: A Module for Converting Fatal Extension Errors into Python Exceptions</h2>
|
|
<h6>David M. Beazley <br>
|
|
Department of Computer Science<br>
|
|
University of Chicago<br>
|
|
Chicago, IL 60637<br>
|
|
beazley@cs.uchicago.edu<br>
|
|
</h6>
|
|
</center>
|
|
|
|
<h3>Abstract</h3>
|
|
<em>
|
|
One of the more popular uses of Python is as an extension language for
|
|
applications written in compiled languages such as C, C++, and
|
|
Fortran. Unfortunately, one of the biggest drawbacks of this approach
|
|
is the lack of a useful debugging and error handling facility for
|
|
identifying problems in extension code. In part, this limitation is
|
|
due to the fact that Python does not know anything about the internal
|
|
implementation of an extension module. A more difficult problem is
|
|
that compiled extensions sometimes fail with catastrophic errors such
|
|
as memory access violations, failed assertions, and floating point
|
|
exceptions. These types of errors fall outside the realm of normal
|
|
Python exception handling and are particularly difficult to identify
|
|
and debug. Although traditional debuggers can find the location of a
|
|
fatal error, they are unable to report the context in which such an
|
|
error has occurred with respect to a Python script. This paper describes
|
|
an experimental system that converts fatal extension errors
|
|
into Python exceptions. In particular, a dynamically
|
|
loadable module, WAD (Wrapped Application Debugger), has been developed which catches
|
|
fatal errors, unwinds the call stack, and generates Python exceptions
|
|
with debugging information. WAD requires no modifications to Python,
|
|
works with all extension modules, and introduces no performance
|
|
overhead. An initial implementation of the system is currently
|
|
available for Sun SPARC Solaris and i386-Linux.
|
|
|
|
</em>
|
|
|
|
<h3>1. Introduction</h3>
|
|
|
|
One of the primary reasons C, C++, and Fortran programmers are
|
|
attracted to Python is its ability to serve as an extension language
|
|
for compiled programs. Furthermore, tools such as SIP, CXX, Pyfort, FPIG,
|
|
and SWIG make it extremely easy for a programmer to ``wrap'' existing
|
|
software into an extension module [1,2,3,4,5]. Although this approach is
|
|
extremely attractive in terms of providing a highly usable and
|
|
flexible environment for users, extension modules suffer from
|
|
problems not normally associated with Python
|
|
scripts---especially when they don't work.
|
|
|
|
<p>
|
|
Normally, Python programming errors result in an exception like this:
|
|
|
|
<blockquote><pre>
|
|
% python foo.py
|
|
Traceback (innermost last):
|
|
File "foo.py", line 11, in ?
|
|
foo()
|
|
File "foo.py", line 8, in foo
|
|
bar()
|
|
File "foo.py", line 5, in bar
|
|
spam()
|
|
File "foo.py", line 2, in spam
|
|
doh()
|
|
NameError: doh
|
|
%
|
|
</pre></blockquote>
|
|
|
|
Unfortunately for compiled extensions, the following situation sometimes occurs:
|
|
|
|
<blockquote><pre>
|
|
% python foo.py
|
|
Segmentation Fault (core dumped)
|
|
%
|
|
</pre></blockquote>
|
|
|
|
Needless to say, this isn't very informative--well,
|
|
other than indicating that something ``very bad'' happened.
|
|
|
|
<p>
|
|
In order to identify the source of a fatal error, a programmer can run a
|
|
debugger on the Python executable or on a core file like this:
|
|
|
|
<blockquote><pre>
|
|
% gdb /usr/local/bin/python
|
|
(gdb) run foo.py
|
|
Starting program: /usr/local/bin/python foo.py
|
|
|
|
Program received signal SIGSEGV, Segmentation fault.
|
|
0xff060568 in doh () from /u0/beazley/Projects/WAD/Python/./libfoo.so
|
|
(gdb) where
|
|
#0 0xff060568 in doh () from /u0/beazley/Projects/WAD/Python/./libfoo.so
|
|
#1 0xff082f34 in _wrap_doh ()
|
|
from /u0/beazley/Projects/WAD/Python/./dohmodule.so
|
|
#2 0x2777c in call_builtin (func=0x1984b8, arg=0x1a1ccc, kw=0x0)
|
|
at ceval.c:2650
|
|
#3 0x27648 in PyEval_CallObjectWithKeywords (func=0x1984b8, arg=0x1a1ccc,
|
|
kw=0x0) at ceval.c:2618
|
|
#4 0x25d18 in eval_code2 (co=0x19acf8, globals=0x0, locals=0x1c7844,
|
|
args=0x1984b8, argcount=1625472, kws=0x0, kwcount=0, defs=0x0, defcount=0,
|
|
owner=0x0) at ceval.c:1951
|
|
#5 0x25954 in eval_code2 (co=0x199620, globals=0x0, locals=0x1984b8,
|
|
args=0x196654, argcount=1862720, kws=0x197788, kwcount=0, defs=0x0,
|
|
#6 0x25954 in eval_code2 (co=0x19ad38, globals=0x0, locals=0x196654,
|
|
args=0x1962fc, argcount=1862800, kws=0x198e90, kwcount=0, defs=0x0,
|
|
defcount=0, owner=0x0) at ceval.c:1850
|
|
#7 0x25954 in eval_code2 (co=0x1b6c60, globals=0x0, locals=0x1962fc,
|
|
args=0x1a1eb4, argcount=1862920, kws=0x0, kwcount=0, defs=0x0, defcount=0,
|
|
owner=0x0) at ceval.c:1850
|
|
#8 0x22da4 in PyEval_EvalCode (co=0x1b6c60, globals=0x1962c4, locals=0x1962c4)
|
|
at ceval.c:319
|
|
#9 0x3adb4 in run_node (n=0x18abf8, filename=0x1b6c60 "", globals=0x1962c4,
|
|
locals=0x1962c4) at pythonrun.c:886
|
|
#10 0x3ad64 in run_err_node (n=0x18abf8, filename=0x1b6c60 "",
|
|
globals=0x1962c4, locals=0x1962c4) at pythonrun.c:874
|
|
#11 0x3ad38 in PyRun_FileEx (fp=0x18a5a8, filename=0xffbefd7a "foo.py",
|
|
start=1616888, globals=0x1962c4, locals=0x1962c4, closeit=1)
|
|
at pythonrun.c:866
|
|
#12 0x3a1d8 in PyRun_SimpleFileEx (fp=0x18a5a8, filename=0xffbefd7a "foo.py",
|
|
closeit=1) at pythonrun.c:579
|
|
#13 0x39d84 in PyRun_AnyFileEx (fp=0x18a5a8, filename=0xffbefd7a "foo.py",
|
|
closeit=1) at pythonrun.c:459
|
|
#14 0x1f498 in Py_Main (argc=2, argv=0xffbefc84) at main.c:289
|
|
#15 0x1eec0 in main (argc=2, argv=0xffbefc84) at python.c:10
|
|
</pre></blockquote>
|
|
|
|
Unfortunately, even though the debugger identifies the location where the fault occurred, it
|
|
mostly provides information about the internals of the
|
|
interpreter. The debugger certainly doesn't reveal anything about the Python
|
|
program that led to the error (i.e., it doesn't reveal the
|
|
same information that would be contained in a Python traceback). As a result,
|
|
the debugger is of limited use when it comes to debugging an application that
|
|
consists of both compiled and Python code.
|
|
|
|
<p>
|
|
Normally, extension developers try to avoid catastrophic errors by
|
|
adding error handling. If
|
|
an application is small or customized for use with Python, it can be
|
|
modified to raise Python exceptions.
|
|
Automated tools such as SWIG can also convert C++
|
|
exceptions and C-related error handling mechanisms into Python
|
|
exceptions. However, no matter how much error checking is added,
|
|
there is always a chance that an extension will fail in an unexpected
|
|
manner. This is especially true for large applications that have been wrapped
|
|
into an extension module. In addition, certain types of errors such as floating
|
|
point exceptions (e.g., division by zero) are especially difficult to find
|
|
and eliminate. Finally, rigorous error checking may be omitted to improve
|
|
performance.
|
|
|
|
<p>
|
|
To address these problems, an experimental module known as WAD (Wrapped
|
|
Application Debugger) has been developed.
|
|
WAD is able to
|
|
convert fatal errors into Python exceptions that include information
|
|
from the call stack as well as debugging
|
|
information. By turning such errors into Python exceptions, fatal
|
|
errors now result in a traceback that crosses the boundary between
|
|
Python code and compiled extension code. This makes it much
|
|
easier to identify and correct extension-related programming errors.
|
|
WAD requires no modifications to Python and is compatible with all
|
|
extension modules. However, it is also highly platform specific
|
|
and currently only runs on Sun Sparc
|
|
Solaris and i386-Linux. The primary goal of this paper is to motivate the problem
|
|
and to describe one possible solution. In addition, many of the
|
|
implementation issues
|
|
associated with providing an integrated error reporting mechanism are described.
|
|
|
|
<h3>2. An Example</h3>
|
|
|
|
WAD can either be imported as a Python extension module or linked to an
|
|
extension module. To illustrate, consider the earlier example:
|
|
|
|
<blockquote><pre>
|
|
% python foo.py
|
|
Segmentation Fault (core dumped)
|
|
%
|
|
</pre></blockquote>
|
|
|
|
To identify the problem, a programmer can run Python interactively and import WAD as follows:
|
|
|
|
<blockquote><pre>
|
|
% python
|
|
Python 2.0 (#1, Oct 27 2000, 14:34:45)
|
|
[GCC 2.95.2 19991024 (release)] on sunos5
|
|
Type "copyright", "credits" or "license" for more information.
|
|
>>> import libwadpy
|
|
WAD Enabled
|
|
>>> execfile("foo.py")
|
|
Traceback (most recent call last):
|
|
File "<stdin>", line 1, in ?
|
|
File "foo.py", line 16, in ?
|
|
foo()
|
|
File "foo.py", line 13, in foo
|
|
bar()
|
|
File "foo.py", line 10, in bar
|
|
spam()
|
|
File "foo.py", line 7, in spam
|
|
doh.doh(a,b,c)
|
|
SegFault: [ C stack trace ]
|
|
|
|
#2 0x00027774 in call_builtin(func=0x1c74f0,arg=0x1a1ccc,kw=0x0)
|
|
#1 0xff022f7c in _wrap_doh(0x0,0x1a1ccc,0x160ef4,0x9c,0x56b44,0x1aa3d8)
|
|
#0 0xfe7e0568 in doh(a=0x3,b=0x4,c=0x0) in 'foo.c', line 28
|
|
|
|
/u0/beazley/Projects/WAD/Python/foo.c, line 28
|
|
|
|
int doh(int a, int b, int *c) {
|
|
=> *c = a + b;
|
|
return *c;
|
|
}
|
|
|
|
>>>
|
|
</pre></blockquote>
|
|
|
|
In this case, we can
|
|
see that the program has tried to assign a value to a
|
|
NULL pointer (indicated by the value "c=0x0" in the last function call). Furthermore, we obtain a Python traceback that shows the
|
|
entire sequence of functions leading to the problem. Finally, since
|
|
control returned to the interpreter, it is possible to interactively
|
|
inspect various aspects of the application or to continue with the computation
|
|
(although this clearly depends on the severity of the error and the nature of the application).
|
|
|
|
<p>
|
|
In certain applications, it may be difficult to run Python
|
|
interactively or to modify the code to explicitly import a special
|
|
debugging module. In these cases, WAD can be attached to an extension module with the
|
|
linker. For example:
|
|
|
|
<blockquote><pre>
|
|
% ld -G $(OBJS) -o dohmodule.so -lwadpy
|
|
</pre></blockquote>
|
|
|
|
This requires no recompilation of any source code--only a relinking of the
|
|
extension module. When Python loads the relinked extension module, WAD is automatically
|
|
initialized before Python invokes the module initialization function.
|
|
|
|
<h3>3. Design Considerations for Embedded Error Recovery</h3>
|
|
|
|
The primary design goal of WAD is provide an error reporting mechanism
|
|
for extension modules that is a natural extension of normal Python
|
|
exception handling. There are two primary motivations for
|
|
handling fatal errors in this manner: first, in the context of Python
|
|
programming, it is simply unnatural to run a separate debugging
|
|
application to identify a problem in an extension module when no such
|
|
requirement exists for scripts. Thus, an embedded error reporting
|
|
mechanism is simply more convenient. Second, the target users
|
|
of an extension module may not know how to use a debugger or even have
|
|
a development environment installed on their machine. Therefore,
|
|
the ability to produce an informative traceback within the
|
|
confines of the Python interpreter can be of tremendous value to an
|
|
extension developer. This is because users who report a problem will
|
|
be able to include an informative traceback as opposed to simply
|
|
saying ``the code crashed.''
|
|
|
|
<p>
|
|
A secondary design goal is to provide a system that is as non-invasive
|
|
as possible. The system should not require modifications to Python or
|
|
any extension modules and it should be easy to integrate
|
|
into the runtime environment of an application. In addition, it shouldn't
|
|
introduce any performance overhead.
|
|
|
|
<p>
|
|
Finally, since WAD co-exists with the Python interpreter (i.e., in the same
|
|
process), there are a number of technical issues that have to be
|
|
addressed. First, fatal errors can theoretically occur anywhere in
|
|
the interpreter as well as in extension modules. Therefore, WAD needs
|
|
to know about Python's internal organization if it is going to provide
|
|
a graceful recovery back to the interpreter. Second, in order to
|
|
implement this recovery scheme, the system has to perform direct
|
|
manipulation of the CPU context and call stack. Last, but not least,
|
|
since the recovery code lives in the same address space as the
|
|
interpreter and extension modules it should not depend on the process
|
|
stack and heap (since both could have been corrupted by the faulting
|
|
application).
|
|
|
|
<h3>4. Catching Fatal Errors</h3>
|
|
|
|
WAD catches catastrophic errors by installing a
|
|
reliable signal handler for SIGSEGV, SIGBUS, SIGABRT, SIGILL, and SIGFPE [9]. Unlike the
|
|
more familiar BSD-style signal interface (as provided by the Python
|
|
signal module), reliable signal handlers are installed using the <tt>sigaction()</tt> system call and have a few notable properties:
|
|
|
|
<ul>
|
|
<li> The signal handler can be configured to run on its own dedicated stack.
|
|
|
|
<p>
|
|
<li> Handler functions can receive a structure containing the CPU context
|
|
including the CPU registers, program counter, and stack pointer.
|
|
|
|
<p>
|
|
<li> Changes to the CPU context take effect immediately after the signal handler returns.
|
|
</ul>
|
|
|
|
Therefore, the high level implementation of WAD is relatively straightforward: when a fatal signal occurs,
|
|
a handler function runs on an isolated signal handling stack.
|
|
The CPU context is then used to unwind the call stack and to inspect the process state. Finally,
|
|
if possible, the CPU context is modified in a manner that allows the signal handler to
|
|
return to Python with a raised exception.
|
|
|
|
<h3>5. A Detailed Description of the Recovery Mechanism</h3>
|
|
|
|
In this section, a more detailed description of the error recovery
|
|
scheme is presented. The precise implementation details of this are
|
|
highly platform specific and involve a number of advanced topics including
|
|
the Unix process file system (/proc), the ELF object file format, and the
|
|
Stabs compiler debugging format [6,7,8]. The details of these topics are
|
|
beyond the scope of this paper. However, this section hopes to
|
|
give the reader a small taste of the steps involved in implementing the recovery mechanism.
|
|
|
|
<P>
|
|
The services of WAD are only invoked upon the reception of a fatal
|
|
signal. This triggers a signal handling function that results in a return to Python
|
|
as illustrated in the following figure:
|
|
|
|
<center>
|
|
<img src="fig1.png">
|
|
<h6>Control flow of the error recovery mechanism</h6>
|
|
</center>
|
|
|
|
<p>
|
|
The steps required to implement this recovery are as follows:
|
|
|
|
<ol>
|
|
<li> The values of the program counter and stack pointer are obtained from the CPU
|
|
context structure passed to the WAD signal handler.
|
|
|
|
<p>
|
|
<li> The virtual memory map of the process is inspected to identify all of
|
|
the shared libraries, dynamically loaded modules, and valid memory regions.
|
|
This information is obtained by reading from the Unix /proc filesystem.
|
|
The following table illustrates the nature of this data:
|
|
|
|
|
|
<blockquote><pre>
|
|
Address Size Permissions File
|
|
---------- ----- ----------------- ---------------------------------
|
|
00010000 1264K read/exec /usr/local/bin/python
|
|
0015A000 184K read/write/exec /usr/local/bin/python
|
|
00188000 296K read/write/exec [ heap ]
|
|
FE7C0000 32K read/exec /u0/beazley/Projects/dohmodule.so
|
|
FE7D6000 8K read/write/exec /u0/beazley/Projects/dohmodule.so
|
|
...
|
|
FF100000 664K read/exec /usr/lib/libc.so.1
|
|
FF1B6000 24K read/write/exec /usr/lib/libc.so.1
|
|
FF1BC000 8K read/write/exec /usr/lib/libc.so.1
|
|
FF2C0000 120K read/exec /usr/lib/libthread.so.1
|
|
FF2EE000 8K read/write/exec /usr/lib/libthread.so.1
|
|
FF2F0000 48K read/write/exec /usr/lib/libthread.so.1
|
|
FF310000 40K read/exec /usr/lib/libsocket.so.1
|
|
FF32A000 8K read/write/exec /usr/lib/libsocket.so.1
|
|
FF330000 24K read/exec /usr/lib/libpthread.so.1
|
|
FF346000 8K read/write/exec /usr/lib/libpthread.so.1
|
|
FF350000 8K read/write/exec [ anon ]
|
|
FF3B0000 8K read/exec /usr/lib/libdl.so.1
|
|
FF3C0000 128K read/exec /usr/lib/ld.so.1
|
|
FF3E0000 8K read/write/exec /usr/lib/ld.so.1
|
|
FFBEA000 24K read/write/exec [ stack ]
|
|
</pre></blockquote>
|
|
|
|
<p>
|
|
<li> The call stack is unwound to produce a traceback of the
|
|
calling sequence that led to the error. The unwinding process is just a simple
|
|
loop that is similar to the following:
|
|
|
|
<blockquote><pre>
|
|
long *pc = get_pc(context);
|
|
long *sp = get_sp(context);
|
|
while (sp) {
|
|
/* Move to previous stack frame */
|
|
pc = (long *) sp[15]; /* %i7 register on SPARC */
|
|
sp = (long *) sp[14]; /* %i6 register on SPARC */
|
|
}
|
|
</pre></blockquote>
|
|
|
|
<li> For each stack frame, symbol table and debugging information
|
|
is gathered and stored in a WAD exception frame object.
|
|
Obtaining this information is the most complicated part of WAD and involves
|
|
the following steps: first, the current program counter is mapped to an object file
|
|
using the virtual memory map obtained in step 2. Next, the object file is loaded
|
|
using mmap(). Once loaded, the ELF symbol table
|
|
is searched for an address match. The symbol table contains a collection of records
|
|
containing memory offsets, sizes, and names such as this:
|
|
|
|
<blockquote><pre>
|
|
Offset Size Name
|
|
-------- ------ ---------
|
|
0x1280 324 wrap_foo
|
|
0x1600 128 foo
|
|
0x2408 192 bar
|
|
...
|
|
</pre></blockquote>
|
|
|
|
To find a match for a virtual memory address <em>addr</em>, WAD simply
|
|
searches for a symbol <em>s</em> such that <em>base</em> +
|
|
<em>s</em>.offset <= <em>addr</em> < <em>base</em> +
|
|
<em>s</em>.offset + <em>s</em>.size, where <em>base</em> is the base
|
|
virtual address of the object file in the virtual memory map.
|
|
|
|
<p>
|
|
Debugging information, if available, is scanned to identify a source
|
|
file, function name, and line number. This involves scanning object files for a
|
|
table of debugging information stored in a format
|
|
known as ``stabs.''. Stabs is a relatively simple, but highly extensible format that
|
|
is language independent and capable of encoding almost every aspect of the
|
|
original source code. For the purposes of WAD, only a small subset of this
|
|
data is actually used.
|
|
|
|
<p>
|
|
The following table shows a small fragment of relevant stabs data:
|
|
<blockquote><pre>
|
|
type desc value string description
|
|
------ ----- --------- --------------------------- -----------
|
|
0x64 0 0 /u0/beazley/Projects/foo/ Pathname
|
|
0x64 0 0 foo.c Filename
|
|
...
|
|
0x24 0 0 foo:F(0,3);(0,3) Function
|
|
0xa0 4 68 n:p(0,3) Parameter
|
|
...
|
|
0x44 6 8 Line number
|
|
0x44 7 12 Line number
|
|
0x44 8 44 Line number
|
|
0x44 9 56 Line number
|
|
...
|
|
</pre></blockquote>
|
|
|
|
In the table, the type field indicates the type of debugging information. For
|
|
example, 0x64 specifies the source file, 0x24 is a function
|
|
definition, 0xa0 is a function parameter, and 0x44 is line number
|
|
information. Associated with each stab is a collection of parameters
|
|
and an optional string. The string usually contains symbol names and
|
|
other information. The <tt>desc</tt> and <tt>value</tt> fields are numbers
|
|
that usually contain byte offsets and line number data.
|
|
Therefore, to collect debugging information, WAD simply walks through the debugging
|
|
tables until it finds the function of interest. Once found, parameter and line
|
|
number specifiers are inspected to determine the location and values of the function
|
|
arguments as well the source line at which the error occurred.
|
|
|
|
<p>
|
|
<li> After the complete traceback has been obtained, it is examined to see if
|
|
there are any ``safe'' return points to which control can be returned.
|
|
This is accomplished by maintaining an internal table of predefined symbolic return
|
|
points as shown in the following table:
|
|
|
|
<blockquote><pre>
|
|
Python symbol Return value
|
|
----------------------------- ------------------
|
|
call_builtin NULL
|
|
_PyImport_LoadDynamicModule NULL
|
|
PyObject_Repr NULL
|
|
PyObject_Print -1
|
|
PyObject_CallFunction NULL
|
|
PyObject_CallMethod NULL
|
|
PyObject_CallObject NULL
|
|
PyObject_Cmp -1
|
|
PyObject_Compare -1
|
|
PyObject_DelAttrString -1
|
|
PyObject_DelItem -1
|
|
PyObject_GetAttrString NULL
|
|
PyObject_GetItem NULL
|
|
PyObject_HasAttrString -1
|
|
PyObject_Hash -1
|
|
PyObject_Length -1
|
|
PyObject_SetAttrString -1
|
|
PyObject_SetItem -1
|
|
PyObject_Str NULL
|
|
PyObject_Type NULL
|
|
...
|
|
PyEval_EvalCode NULL
|
|
</pre></blockquote>
|
|
|
|
The symbols in this table correspond to functions within the Python interpreter that
|
|
might execute extension code and include the parts of the interpreter that invoke builtin functions
|
|
as well as the functions from the abstract object interface.
|
|
If any of these symbols appear on the call stack,
|
|
a handler function is invoked to raise a Python exception.
|
|
This handler function
|
|
is given a WAD-specific traceback object that contains a copy of the
|
|
call stack and CPU registers as well as any symbolic and debugging
|
|
information that was obtained. If none of the symbolic return points
|
|
are encountered, WAD invokes a default handler that simply prints the
|
|
full C stack trace and generates a core file.
|
|
|
|
<P>
|
|
<li> If a return point is found, the CPU context is modified in a manner that allows the signal handler to return
|
|
with a suitable Python error.
|
|
This last step is the most tricky part of the recovery process, but the general
|
|
idea is that CPU context is modified in a way that makes Python think that
|
|
an extension function simply raised an exception and returned an error. Currently, this
|
|
is implemented by having the signal handler return to a small
|
|
handler function written in assembly language which arranges to return the
|
|
desired value back to the specified return point.
|
|
|
|
<p>
|
|
The most complicated part of modifying the CPU context is that of restoring
|
|
previously saved CPU registers. By manually unwinding the call stack, the
|
|
WAD exception handler effectively performs the same operation as a longjmp() call in C.
|
|
However, unlike longjmp(), no previously saved set of CPU registers are available from which to resume
|
|
execution in the Python interpreter. The solution to this problem depends entirely on the
|
|
underlying architecture. On the SPARC, register values are saved in register windows
|
|
which WAD manually unwinds to restore the proper state. On the Intel, the solution is much
|
|
more interesting. To restore the register values, WAD must manually inspect the
|
|
machine instructions of each function on the call stack in order to find out where the
|
|
registers might have been saved. This information is then used to restore the registers from their
|
|
saved locations before returning to the Python interpreter.
|
|
|
|
<p>
|
|
<li> Python receives the exception and produces a traceback.
|
|
</ol>
|
|
|
|
<h3>6. Initialization and Loading</h3>
|
|
|
|
In the earlier example, it was shown that WAD could be both
|
|
loaded as an extension module or simply attached to an existing module
|
|
with the linker. This latter case is implemented by
|
|
wrapping the WAD initialization function inside the constructor of a
|
|
statically allocated C++ object like this:
|
|
|
|
<blockquote>
|
|
<pre>
|
|
class WadInit {
|
|
public:
|
|
WadInit() {
|
|
wad_init(); /* Call the real initialization function */
|
|
}
|
|
};
|
|
static WadInit wad_initializer;
|
|
</pre></blockquote>
|
|
|
|
When the dynamic loader brings WAD into memory, it automatically
|
|
executes the constructors of all statically allocated C++ objects.
|
|
Therefore, this initialization code executes immediately after
|
|
loading, but before Python actually calls the module initialization
|
|
function. As a result, when an extension module is linked with WAD,
|
|
the debugging capability is enabled before any other operations occur---this
|
|
allows WAD to respond to fatal errors that might occur during module
|
|
initialization.
|
|
|
|
The rest of the initialization process consists of the following:
|
|
<ul>
|
|
<li> The WAD signal handler is installed.
|
|
<li> A collection of return symbols are registered with the signal handler (see the previous section).
|
|
<li> Four new Python exception objects <tt>SegFault</tt>, <tt>BusError</tt>, <tt>AbortError</tt>,
|
|
and <tt>IllegalInstruction</tt> are added
|
|
to the <tt>__builtin__</tt> module.
|
|
</ul>
|
|
|
|
Although the use of a C++ static constructor has the potential to
|
|
conflict with C++ extension code that also uses static constructors,
|
|
it is always possible to enable WAD prior to loading a C++ extension
|
|
(e.g., WAD could be loaded separately).
|
|
|
|
<h3>7. Implementation Details</h3>
|
|
|
|
Currently, WAD is written in ANSI C with a small amount of C++,
|
|
and a small amount of assembly code (to assist in the return to the interpreter).
|
|
The entire implementation contains approximately 2000 semicolons and most of the code
|
|
relates to the gathering of source code information (symbol tables,
|
|
debugging information, etc.).
|
|
|
|
<p>
|
|
Although there are libraries such as GNU bfd that can assist with the
|
|
reading of object files, none of these are used in the implementation [10].
|
|
First, these libraries tend to be quite large
|
|
and are oriented more towards stand-alone tools such as debuggers,
|
|
linkers, and compilers. Second, due to usual nature of the runtime
|
|
environment and the restrictions on memory utilization (no heap, no
|
|
stack), the behavior of these libraries is somewhat unclear and
|
|
would require further study.
|
|
Finally, given the small size of the prototype implementation, it didn't seem necessary to rely on a
|
|
large general purpose library.
|
|
|
|
<h3>8. Discussion</h3>
|
|
|
|
The primary focus of this work is to provide a more useful error
|
|
reporting mechanism to extension developers.
|
|
However, this does not imply that
|
|
WAD is appropriate as a general purpose exception
|
|
handling mechanism. First, let's focus
|
|
on the recovery mechanism:
|
|
|
|
<ul>
|
|
<li> When WAD unwinds the call stack, objects allocated on the stack
|
|
are lost. This may interact poorly with C++ extensions since the
|
|
unwinding process does not invoke C++ destructors. It may be possible to fix
|
|
this problem, but doing so would require coordination with the C++ runtime library.
|
|
|
|
<p>
|
|
<li> Similarly, if a procedure allocates objects on the heap, stack unwinding
|
|
may cause those objects to never be reclaimed.
|
|
|
|
<p>
|
|
<li> Closely related to heap management, stack unwinding may result in
|
|
open files, sockets, and other system resources. Furthermore, in a multithreaded
|
|
environment, deadlock may occur if a procedure is holding a lock when an error occurs.
|
|
|
|
<p>
|
|
<li> An application may fail by overwriting the process heap and corrupting
|
|
memory. Although WAD can produce internal diagnostics even when the heap has been
|
|
destroyed, Python may fail immediately upon return from the
|
|
WAD signal handler or shortly thereafter.
|
|
|
|
<p>
|
|
<li> If an application destroys the call stack (via buffer overflow), WAD will
|
|
be unable to complete a stack trace and will be unable to return to
|
|
Python.
|
|
|
|
<p>
|
|
<li> Memory management problems such as double-freeing of memory are particularly
|
|
difficult to identify. If an extension module corrupts the memory allocator
|
|
in some manner, this may cause Python to fail in a completely unexpected location.
|
|
WAD is usually able to produce a traceback in this situation, but
|
|
it may not correspond to the real source of the problem.
|
|
|
|
</ul>
|
|
|
|
In addition, there are a number of issues that pertain to WAD's interaction with the
|
|
Python interpreter:
|
|
|
|
<ul>
|
|
<li> The recovery mechanism is entirely based on symbolic information stored
|
|
in the Python executable. Therefore, the return points are simply specified
|
|
as strings such as ``call_builtin'' as opposed to real memory addresses.
|
|
Because of this, WAD is compatible with essentially any version of Python (provided
|
|
it supports class-based exceptions).
|
|
|
|
<P>
|
|
<li> WAD is unable to manage multiple return values to same procedure.
|
|
For example, Python's <tt>eval_code2()</tt> procedure contains a huge
|
|
case statement for executing byte codes. Within this procedure, certain
|
|
function calls return NULL to indicate an error and others return -1. Since WAD
|
|
is unable to determine which value to return, this particular procedure does not make a very
|
|
good return point for error recovery.
|
|
|
|
<P>
|
|
<li> An alternative approach to the symbolic recovery scheme would be to
|
|
instrument Python with a collection of safe return points using setjmp()/longjmp().
|
|
This approach is not used because it would require a significant number of changes to
|
|
the interpreter and it would introduce an unacceptable amount of performance overhead.
|
|
|
|
<p>
|
|
<li> WAD is generally safe to use with Python threads. However, if a
|
|
compiled extension function manually releases the Python interpreter
|
|
lock and subsequently faults, the return behavior is unspecified. In
|
|
the future, it may be possible to use the interpreter lock to provide coordination
|
|
between the interpreter and the error recovery mechanism.
|
|
|
|
<p>
|
|
<li> Compiled extension code may perform an eval operation in which Python code is executed
|
|
in the interpreter. This results in a situation where the complete call-stack of an
|
|
application crosses the boundary between Python and C several times. WAD can
|
|
still handle faults in this setting as long as an application is doing a reasonable amount of
|
|
error checking. For example, a fatal error that occurs inside an eval operation could
|
|
be caught by the extension code and propagated further up the call stack.
|
|
|
|
<p>
|
|
<li> In certain cases, Python may be configured to handle the SIGFPE signal for floating point
|
|
exceptions. The default Python handling of this error is to abort and dump core. However,
|
|
with WAD, a complete stack traceback will be obtained when a SIGFPE occurs.
|
|
|
|
<p>
|
|
<li> WAD is extremely inefficient. Due to restrictions on the heap and stack,
|
|
WAD relies heavily on mmap() and a variety of other file
|
|
operations as it handles errors. It also performs linear searches of symbol and
|
|
debugging tables. As a result, WAD's generation of a
|
|
Python exception is several orders of magnitude slower than an ordinary
|
|
exception.
|
|
</ul>
|
|
|
|
Finally, there are a number of application specific issues to note:
|
|
|
|
<ul>
|
|
<li> Aggressive compiler optimization techniques may prevent WAD from
|
|
accurately reporting locations within the original source code.
|
|
This is particularly problematic with numerical applications where
|
|
techniques such procedure inlining can make it impossible to obtain accurate
|
|
debugging information. Since these types of problems also arise in
|
|
full-featured debuggers, it is unlikely that they can be easily fixed in WAD (at least not
|
|
without a considerable amount of work).
|
|
|
|
<p>
|
|
<li> If an application implements its own exception handling,
|
|
it may provide Python with less information than what would obtained with WAD.
|
|
For example, a programmer might implement a function like this:
|
|
|
|
<blockquote><pre>
|
|
void *Malloc(int size) {
|
|
void *ptr;
|
|
ptr = malloc(size);
|
|
if (!ptr) throw("Out of memory");
|
|
return ptr;
|
|
}
|
|
</pre></blockquote>
|
|
|
|
In this case, the ``throw'' function may initiate an internal
|
|
exception handling mechanism that relies upon setjmp/longjmp or C++ exceptions.
|
|
When the error eventually makes it back to the interpreter, the user will get an ``out
|
|
of memory'' exception, but no additional information will be
|
|
provided. In contrast, if the programmer simply used an <tt>assert()</tt> statement, WAD would produce a full stack trace leading to
|
|
the error.
|
|
</ul>
|
|
|
|
|
|
Despite its various limitations, WAD is applicable to a wide range of
|
|
extension-related errors. Furthermore, most of the errors that are
|
|
likely to occur are of a more benign variety. For example, a
|
|
segmentation fault may simply be the result of an uninitialized
|
|
pointer (perhaps the user forgot to call an initialization procedure).
|
|
Likewise, bus errors, failed assertions, and floating point exceptions
|
|
rarely result in a situation where the WAD recovery mechanism would be
|
|
unable to produce a meaningful Python traceback.
|
|
|
|
<h3>9. Related Work</h3>
|
|
|
|
There is a huge body of literature concerning the implementation of
|
|
exception handling in various programming languages and environments.
|
|
A detailed discussion of this work is clearly not possible here, but
|
|
a general overview of various exception handling issues can be found in [11].
|
|
In general, there are a few themes that seem to prevail.
|
|
First,
|
|
considerable attention has been given to exception handling mechanisms
|
|
in specific languages such as efficient exception handling for C++.
|
|
Second, a great deal of work has been given to the semantic aspects of
|
|
exception handling such as exception hierarchies, finalization, and
|
|
whether or not code is restartable after an exception has occurred.
|
|
Finally, a fair amount of exception work has been done in the context
|
|
of component frameworks and distributed systems. Most of this work
|
|
tends to concentrate on explicit exception handling mechanisms. Very little
|
|
work appears to have been done in the area of converting hardware generated errors
|
|
into exceptions.
|
|
|
|
<p>
|
|
With respect to debuggers, quite a lot of work has been done in
|
|
creating advanced debugging support for specific languages and
|
|
integrated development environments. However, very little of this work
|
|
has concentrated on the problem of extensible systems and
|
|
compiled-interpreted language integration. For instance, debuggers
|
|
for Python are currently unable to cross over into C extensions whereas C
|
|
debuggers aren't able to easily extract useful information from the
|
|
internals of the Python interpreter.
|
|
|
|
<p>
|
|
One system of possible interest is Rn which was developed in the
|
|
mid-1980s at Rice University [12]. This system, primarily
|
|
designed for working with large scientific applications written in
|
|
Fortran, provided an execution monitor that consisted of a special
|
|
debugging process with an embedded interpreter. When attached to
|
|
compiled Fortran code, this monitor could dynamically patch
|
|
the executable in a manner that allowed parts of the code to be executed in the
|
|
interpreter. This was used to provide a debugging environment in which
|
|
essentially any part of the compiled application could be modified at
|
|
run-time by simply compiling the modified code (Fortran) to an
|
|
interpreted form and inserting a breakpoint in the original executable
|
|
that transferred control to the interpreter. Although this
|
|
particular scheme is not directly related to the functionality
|
|
of WAD, it is one of the few systems in which
|
|
interpreted and compiled code have been tightly coupled within
|
|
a debugging framework. Several aspects of the interpreted/compiled
|
|
interface are closely related to way in which WAD operates. In addition,
|
|
various aspects of this work may be useful should WAD be extended with
|
|
new capabilities.
|
|
|
|
<h3>10. Future Directions</h3>
|
|
|
|
WAD is currently an experimental prototype. Although this paper has
|
|
described its use with Python, the core of the system is generic and
|
|
is easily extended to other programming environments. For example, when
|
|
linked to C/C++ code, WAD will automatically produce stack
|
|
traces for fatal errors. A module for generating Tcl exceptions has
|
|
also been developed. Plans are underway to provide support for other
|
|
extensible systems including Perl, Ruby, and Guile.
|
|
|
|
<p>
|
|
Finally, a number of extensions to the WAD approach may be possible.
|
|
For example, even though the current implementation only returns a
|
|
traceback string to the Python interpreter, the WAD signal handler
|
|
actually generates a full traceback of the C call stack including all
|
|
of the CPU registers and a copy of the stack data. Therefore, with a
|
|
little work, it may be possible to implement a diagnostic tool that
|
|
allows the state of the C stack to be inspected from the Python
|
|
interpreter after a crash has occurred. Similarly, it may be possible
|
|
to integrate the capabilities of WAD with those provided by the Python
|
|
debugger.
|
|
|
|
<h3>11. Conclusions and Availability</h3>
|
|
|
|
WAD provides a simple mechanism for converting fatal errors into
|
|
Python exceptions that provide useful information to extension
|
|
writers. In doing so, it solves one of the most frustrating aspects
|
|
of working with compiled Python extensions--that of identifying program errors.
|
|
Furthermore the system requires no code modifications to Python and introduces
|
|
no performance overhead.
|
|
Although the system is
|
|
necessarily platform specific, the system does not involve a
|
|
significant amount of code. As a result, it may be relatively
|
|
straightforward to port to other Unix systems.
|
|
|
|
<p>
|
|
As of this writing, WAD is still undergoing active development. However,
|
|
the software is available for experimentation and download at
|
|
at <tt>http://systems.cs.uchicago.edu/wad</tt>.
|
|
|
|
<h3>References</h3>
|
|
|
|
[1] D.M. Beazley, <em>Using SWIG to Control, Prototype, and Debug C Programs with Python</em>,
|
|
4th International Python Conference, Livermore, CA. (1996).
|
|
|
|
<p>
|
|
[2] P.F. Dubois, <em>Climate Data Analysis Software</em>, 8th International Python Conference,
|
|
Arlington, VA. (2000).
|
|
|
|
<p>
|
|
[3] P.F. Dubois, <em>A Facility for Creating Python Extensions in C++</em>, 7th International Python
|
|
Conference, Houston, TX. (1998).
|
|
|
|
<p>
|
|
[4] SIP. <tt>http://www.thekompany.com/projects/pykde/</tt>.
|
|
|
|
<p>
|
|
[5] FPIG. <tt>http://cens.ioc.ee/projects/f2py2e/</tt>.
|
|
|
|
<p>
|
|
[6] R. Faulkner and R. Gomes, <em>The Process File System and Process Model in UNIX System V</em>, USENIX Conference Proceedings,
|
|
January 1991.
|
|
|
|
<p>
|
|
[7] J.R. Levine, <em>Linkers & Loaders.</em> Morgan Kaufmann Publishers, 2000.
|
|
|
|
<p>
|
|
[8] Free Software Foundation, <em>The "stabs" debugging format</em>. GNU info document.
|
|
|
|
<p>
|
|
[9] W. Richard Stevens, <em>UNIX Network Programming: Interprocess Communication, Volume 2</em>. PTR
|
|
Prentice-Hall, 1998.
|
|
|
|
<p>
|
|
[10] S. Chamberlain. <em>libbfd: The Binary File Descriptor Library</em>. Cygnus Support, bfd version 3.0 edition, April 1991.
|
|
|
|
<p>
|
|
[11] M.L. Scott. <em>Programming Languages Pragmatics</em>. Morgan Kaufmann Publishers, 2000.
|
|
|
|
<p>
|
|
[12] A. Carle, D. Cooper, R. Hood, K. Kennedy, L. Torczon, S. Warren, <em>A Practical Environment for Scientific Programming.</em>
|
|
IEEE Computer, Vol 20, No. 11, (1987). p. 75-89.
|
|
|
|
|
|
</body>
|
|
</html>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|