swig/Tools/WAD/Papers/python.html

<html>
<head>
<title>WAD: A Module for Converting Fatal Extension Errors into Python Exceptions</title>
</head>
<body bgcolor="#ffffff">
<center>

<h2>WAD: A Module for Converting Fatal Extension Errors into Python Exceptions</h2>
<h6>David M. Beazley <br>
Department of Computer Science<br>
University of Chicago<br>
Chicago, IL  60637<br>
beazley@cs.uchicago.edu<br>
</h6>
</center>

<h3>Abstract</h3>
<em>
One of the more popular uses of Python is as an extension language for
applications written in compiled languages such as C, C++, and
Fortran.  Unfortunately, one of the biggest drawbacks of this approach
is the lack of a useful debugging and error handling facility for
identifying problems in extension code. In part, this limitation is
due to the fact that Python does not know anything about the internal
implementation of an extension module.  A more difficult problem is
that compiled extensions sometimes fail with catastrophic errors such
as memory access violations, failed assertions, and floating point
exceptions.  These types of errors fall outside the realm of normal
Python exception handling and are particularly difficult to identify
and debug.  Although traditional debuggers can find the location of a
fatal error, they are unable to report the context in which such an
error has occurred with respect to a Python script.  This paper describes
an experimental system that converts fatal extension errors
into Python exceptions.  In particular, a dynamically
loadable module, WAD (Wrapped Application Debugger), has been developed which catches
fatal errors, unwinds the call stack, and generates Python exceptions
with debugging information.  WAD requires no modifications to Python,
works with all extension modules, and introduces no performance
overhead.  An initial implementation of the system is currently
available for Sun SPARC Solaris and i386-Linux.

</em>

<h3>1. Introduction</h3>

One of the primary reasons C, C++, and Fortran programmers are
attracted to Python is its ability to serve as an extension language
for compiled programs.  Furthermore, tools such as SIP, CXX, Pyfort, FPIG,
and SWIG make it extremely easy for a programmer to ``wrap'' existing
software into an extension module [1,2,3,4,5]. Although this approach is
extremely attractive in terms of providing a highly usable and
flexible environment for users, extension modules suffer from
problems not normally associated with Python
scripts---especially when they don't work.

<p>
Normally, Python programming errors result in an exception like this:

<blockquote><pre>
% python foo.py
Traceback (innermost last):
  File "foo.py", line 11, in ?
    foo()
  File "foo.py", line 8, in foo
    bar()
  File "foo.py", line 5, in bar
    spam()
  File "foo.py", line 2, in spam
    doh()
NameError: doh
%
</pre></blockquote>

Unfortunately for compiled extensions, the following situation sometimes occurs:

<blockquote><pre>
% python foo.py
Segmentation Fault (core dumped)
%
</pre></blockquote>

Needless to say, this isn't very informative--well,
other than indicating that something ``very bad'' happened.

<p>
In order to identify the source of a fatal error, a programmer can run a
debugger on the Python executable or on a core file like this:

<blockquote><pre>
% gdb /usr/local/bin/python
(gdb) run foo.py
Starting program: /usr/local/bin/python foo.py

Program received signal SIGSEGV, Segmentation fault.
0xff060568 in doh () from /u0/beazley/Projects/WAD/Python/./libfoo.so
(gdb) where
#0  0xff060568 in doh () from /u0/beazley/Projects/WAD/Python/./libfoo.so
#1  0xff082f34 in _wrap_doh ()
   from /u0/beazley/Projects/WAD/Python/./dohmodule.so
#2  0x2777c in call_builtin (func=0x1984b8, arg=0x1a1ccc, kw=0x0)
    at ceval.c:2650
#3  0x27648 in PyEval_CallObjectWithKeywords (func=0x1984b8, arg=0x1a1ccc,
    kw=0x0) at ceval.c:2618
#4  0x25d18 in eval_code2 (co=0x19acf8, globals=0x0, locals=0x1c7844,
    args=0x1984b8, argcount=1625472, kws=0x0, kwcount=0, defs=0x0, defcount=0,
    owner=0x0) at ceval.c:1951
#5  0x25954 in eval_code2 (co=0x199620, globals=0x0, locals=0x1984b8,
    args=0x196654, argcount=1862720, kws=0x197788, kwcount=0, defs=0x0,
#6  0x25954 in eval_code2 (co=0x19ad38, globals=0x0, locals=0x196654,
    args=0x1962fc, argcount=1862800, kws=0x198e90, kwcount=0, defs=0x0,
    defcount=0, owner=0x0) at ceval.c:1850
#7  0x25954 in eval_code2 (co=0x1b6c60, globals=0x0, locals=0x1962fc,
    args=0x1a1eb4, argcount=1862920, kws=0x0, kwcount=0, defs=0x0, defcount=0,
    owner=0x0) at ceval.c:1850
#8  0x22da4 in PyEval_EvalCode (co=0x1b6c60, globals=0x1962c4, locals=0x1962c4)
    at ceval.c:319
#9  0x3adb4 in run_node (n=0x18abf8, filename=0x1b6c60 "", globals=0x1962c4,
    locals=0x1962c4) at pythonrun.c:886
#10 0x3ad64 in run_err_node (n=0x18abf8, filename=0x1b6c60 "",
    globals=0x1962c4, locals=0x1962c4) at pythonrun.c:874
#11 0x3ad38 in PyRun_FileEx (fp=0x18a5a8, filename=0xffbefd7a "foo.py",
    start=1616888, globals=0x1962c4, locals=0x1962c4, closeit=1)
    at pythonrun.c:866
#12 0x3a1d8 in PyRun_SimpleFileEx (fp=0x18a5a8, filename=0xffbefd7a "foo.py",
    closeit=1) at pythonrun.c:579
#13 0x39d84 in PyRun_AnyFileEx (fp=0x18a5a8, filename=0xffbefd7a "foo.py",
    closeit=1) at pythonrun.c:459
#14 0x1f498 in Py_Main (argc=2, argv=0xffbefc84) at main.c:289
#15 0x1eec0 in main (argc=2, argv=0xffbefc84) at python.c:10
</pre></blockquote>

Unfortunately, even though the debugger identifies the location where the fault occurred, it
mostly provides information about the internals of the
interpreter.  The debugger certainly doesn't reveal anything about the Python
program that led to the error (i.e., it doesn't reveal the
same information that would be contained in a Python traceback).  As a result,
the debugger is of limited use when it comes to debugging an application that
consists of both compiled and Python code.

<p>
Normally, extension developers try to avoid catastrophic errors by
adding error handling. If
an application is small or customized for use with Python, it can be
modified to raise Python exceptions.
Automated tools such as SWIG can also convert C++
exceptions and C-related error handling mechanisms into Python
exceptions. However, no matter how much error checking is added,
there is always a chance that an extension will fail in an unexpected
manner.  This is especially true for large applications that have been wrapped
into an extension module. In addition, certain types of errors such as floating
point exceptions (e.g., division by zero) are especially difficult to find
and eliminate. Finally, rigorous error checking may be omitted to improve
performance.

<p>
To address these problems, an experimental module known as WAD (Wrapped
Application Debugger) has been developed.
WAD is able to
convert fatal errors into Python exceptions that include information
from the call stack as well as debugging
information.  By turning such errors into Python exceptions, fatal
errors now result in a traceback that crosses the boundary between
Python code and compiled extension code.  This makes it much
easier to identify and correct extension-related programming errors.
WAD requires no modifications to Python and is compatible with all
extension modules.  However, it is also highly platform specific
and currently only runs on Sun Sparc
Solaris and i386-Linux.  The primary goal of this paper is to motivate the problem
and to describe one possible solution.  In addition, many of the
implementation issues
associated with providing an integrated error reporting mechanism are described.

<h3>2. An Example</h3>

WAD can either be imported as a Python extension module or linked to an
extension module.  To illustrate, consider the earlier example:

<blockquote><pre>
% python foo.py
Segmentation Fault (core dumped)
%
</pre></blockquote>

To identify the problem, a programmer can run Python interactively and import WAD as follows:

<blockquote><pre>
% python
Python 2.0 (#1, Oct 27 2000, 14:34:45)
[GCC 2.95.2 19991024 (release)] on sunos5
Type "copyright", "credits" or "license" for more information.
>>> import libwadpy
WAD Enabled
>>> execfile("foo.py")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "foo.py", line 16, in ?
    foo()
  File "foo.py", line 13, in foo
    bar()
  File "foo.py", line 10, in bar
    spam()
  File "foo.py", line 7, in spam
    doh.doh(a,b,c)
SegFault: [ C stack trace ]

#2   0x00027774 in call_builtin(func=0x1c74f0,arg=0x1a1ccc,kw=0x0)
#1   0xff022f7c in _wrap_doh(0x0,0x1a1ccc,0x160ef4,0x9c,0x56b44,0x1aa3d8)
#0   0xfe7e0568 in doh(a=0x3,b=0x4,c=0x0) in 'foo.c', line 28

/u0/beazley/Projects/WAD/Python/foo.c, line 28

    int doh(int a, int b, int *c) {
 =>   *c = a + b;
      return *c;
    }

>>>
</pre></blockquote>

In this case, we can
see that the program has tried to assign a value to a
NULL pointer (indicated by the value "c=0x0" in the last function call). Furthermore, we obtain a Python traceback that shows the
entire sequence of functions leading to the problem.  Finally, since
control returned to the interpreter, it is possible to interactively
inspect various aspects of the application or to continue with the computation
(although this clearly depends on the severity of the error and the nature of the application).

<p>
In certain applications, it may be difficult to run Python
interactively or to modify the code to explicitly import a special
debugging module.  In these cases, WAD can be attached to an extension module with the
linker.  For example:

<blockquote><pre>
% ld -G $(OBJS) -o dohmodule.so -lwadpy
</pre></blockquote>

This requires no recompilation of any source code--only a relinking of the
extension module.  When Python loads the relinked extension module, WAD is automatically
initialized before Python invokes the module initialization function.

<h3>3. Design Considerations for Embedded Error Recovery</h3>

The primary design goal of WAD is provide an error reporting mechanism
for extension modules that is a natural extension of normal Python
exception handling.  There are two primary motivations for
handling fatal errors in this manner: first, in the context of Python
programming, it is simply unnatural to run a separate debugging
application to identify a problem in an extension module when no such
requirement exists for scripts.  Thus, an embedded error reporting
mechanism is simply more convenient.  Second, the target users
of an extension module may not know how to use a debugger or even have
a development environment installed on their machine.  Therefore,
the ability to produce an informative traceback within the
confines of the Python interpreter can be of tremendous value to an
extension developer.  This is because users who report a problem will
be able to include an informative traceback as opposed to simply
saying ``the code crashed.''

<p>
A secondary design goal is to provide a system that is as non-invasive
as possible.  The system should not require modifications to Python or
any extension modules and it should be easy to integrate
into the runtime environment of an application. In addition, it shouldn't
introduce any performance overhead.

<p>
Finally, since WAD co-exists with the Python interpreter (i.e., in the same
process), there are a number of technical issues that have to be
addressed.  First, fatal errors can theoretically occur anywhere in
the interpreter as well as in extension modules.  Therefore, WAD needs
to know about Python's internal organization if it is going to provide
a graceful recovery back to the interpreter.  Second, in order to
implement this recovery scheme, the system has to perform direct
manipulation of the CPU context and call stack.  Last, but not least,
since the recovery code lives in the same address space as the
interpreter and extension modules it should not depend on the process
stack and heap (since both could have been corrupted by the faulting
application).

<h3>4. Catching Fatal Errors</h3>

WAD catches catastrophic errors by installing a
reliable signal handler for SIGSEGV, SIGBUS, SIGABRT, SIGILL, and SIGFPE [9].  Unlike the
more familiar BSD-style signal interface (as provided by the Python
signal module), reliable signal handlers are installed using the <tt>sigaction()</tt> system call and have a few notable properties:

<ul>
<li> The signal handler can be configured to run on its own dedicated stack.

<p>
<li> Handler functions can receive a structure containing the CPU context
including the CPU registers, program counter, and stack pointer.

<p>
<li> Changes to the CPU context take effect immediately after the signal handler returns.
</ul>

Therefore, the high level implementation of WAD is relatively straightforward:  when a fatal signal occurs,
a handler function runs on an isolated signal handling stack.
The CPU context is then used to unwind the call stack and to inspect the process state.  Finally,
if possible, the CPU context is modified in a manner that allows the signal handler to
return to Python with a raised exception.

<h3>5. A Detailed Description of the Recovery Mechanism</h3>

In this section, a more detailed description of the error recovery
scheme is presented.  The precise implementation details of this are
highly platform specific and involve a number of advanced topics including
the Unix process file system (/proc), the ELF object file format, and the
Stabs compiler debugging format [6,7,8]. The details of these topics are
beyond the scope of this paper.  However, this section hopes to
give the reader a small taste of the steps involved in implementing the recovery mechanism.

<P>
The services of WAD are only invoked upon the reception of a fatal
signal. This triggers a signal handling function that results in a return to Python
as illustrated in the following figure:

<center>
<img src="fig1.png">
<h6>Control flow of the error recovery mechanism</h6>
</center>

<p>
The steps required to implement this recovery are as follows:

<ol>
<li>  The values of the program counter and stack pointer are obtained from the CPU
       context structure passed to the WAD signal handler.

<p>
<li> The virtual memory map of the process is inspected to identify all of
the shared libraries, dynamically loaded modules, and valid memory regions.
This information is obtained by reading from the Unix /proc filesystem.
The following table illustrates the nature of this data:


<blockquote><pre>
Address     Size    Permissions        File
----------  -----   -----------------  ---------------------------------
00010000    1264K   read/exec         /usr/local/bin/python
0015A000     184K   read/write/exec   /usr/local/bin/python
00188000     296K   read/write/exec     [ heap ]
FE7C0000      32K   read/exec         /u0/beazley/Projects/dohmodule.so
FE7D6000       8K   read/write/exec   /u0/beazley/Projects/dohmodule.so
...
FF100000     664K   read/exec         /usr/lib/libc.so.1
FF1B6000      24K   read/write/exec   /usr/lib/libc.so.1
FF1BC000       8K   read/write/exec   /usr/lib/libc.so.1
FF2C0000     120K   read/exec         /usr/lib/libthread.so.1
FF2EE000       8K   read/write/exec   /usr/lib/libthread.so.1
FF2F0000      48K   read/write/exec   /usr/lib/libthread.so.1
FF310000      40K   read/exec         /usr/lib/libsocket.so.1
FF32A000       8K   read/write/exec   /usr/lib/libsocket.so.1
FF330000      24K   read/exec         /usr/lib/libpthread.so.1
FF346000       8K   read/write/exec   /usr/lib/libpthread.so.1
FF350000       8K   read/write/exec    [ anon ]
FF3B0000       8K   read/exec         /usr/lib/libdl.so.1
FF3C0000     128K   read/exec         /usr/lib/ld.so.1
FF3E0000       8K   read/write/exec   /usr/lib/ld.so.1
FFBEA000      24K   read/write/exec    [ stack ]
</pre></blockquote>

<p>
<li>  The call stack is unwound to produce a traceback of the
calling sequence that led to the error.  The unwinding process is just a simple
loop that is similar to the following:

<blockquote><pre>
long *pc = get_pc(context);
long *sp = get_sp(context);
while (sp) {
    /* Move to previous stack frame */
    pc = (long *) sp[15];      /* %i7 register on SPARC */
    sp = (long *) sp[14];      /* %i6 register on SPARC */
}
</pre></blockquote>

<li> For each stack frame, symbol table and debugging information
is gathered and stored in a WAD exception frame object.
Obtaining this information is the most complicated part of WAD and involves
the following steps: first, the current program counter is mapped to an object file
using the virtual memory map obtained in step 2.  Next, the object file is loaded
using mmap().  Once loaded, the ELF symbol table
is searched for an address match.  The symbol table contains a collection of records
containing memory offsets, sizes, and names such as this:

<blockquote><pre>
Offset    Size    Name
--------  ------  ---------
0x1280    324     wrap_foo
0x1600    128     foo
0x2408    192     bar
...
</pre></blockquote>

To find a match for a virtual memory address <em>addr</em>, WAD simply
searches for a symbol <em>s</em> such that <em>base</em> +
<em>s</em>.offset &lt;= <em>addr</em> &lt <em>base</em> +
<em>s</em>.offset + <em>s</em>.size, where <em>base</em> is the base
virtual address of the object file in the virtual memory map.

<p>
Debugging information, if available, is scanned to identify a source
file, function name, and line number.  This involves scanning object files for a
table of debugging information stored in a format
known as ``stabs.''. Stabs is a relatively simple, but highly extensible format that
is language independent and capable of encoding almost every aspect of the
original source code.   For the purposes of WAD, only a small subset of this
data is actually used.

<p>
The following table shows a small fragment of relevant stabs data:
<blockquote><pre>
type    desc   value        string                        description
------  -----  ---------    ---------------------------   -----------
0x64      0        0         /u0/beazley/Projects/foo/    Pathname
0x64      0        0        foo.c                         Filename
...
0x24      0        0        foo:F(0,3);(0,3)              Function
0xa0      4        68       n:p(0,3)                      Parameter
...
0x44      6        8                                      Line number
0x44      7        12                                     Line number
0x44      8        44                                     Line number
0x44      9        56                                     Line number
...
</pre></blockquote>

In the table, the type field indicates the type of debugging information.  For
example, 0x64 specifies the source file, 0x24 is a function
definition, 0xa0 is a function parameter, and 0x44 is line number
information. Associated with each stab is a collection of parameters
and an optional string.  The string usually contains symbol names and
other information.  The <tt>desc</tt> and <tt>value</tt> fields are numbers
that usually contain byte offsets and line number data.
Therefore, to collect debugging information, WAD simply walks through the debugging
tables until it finds the function of interest.  Once found, parameter and line
number specifiers are inspected to determine the location and values of the function
arguments as well the source line at which the error occurred.

<p>
<li> After the complete traceback has been obtained, it is examined to see if
there are any ``safe'' return points to which control can be returned.
This is accomplished by maintaining an internal table of predefined symbolic return
points as shown in the following table:

<blockquote><pre>
Python symbol                     Return value
-----------------------------     ------------------
call_builtin                      NULL
_PyImport_LoadDynamicModule       NULL
PyObject_Repr                     NULL
PyObject_Print                    -1
PyObject_CallFunction             NULL
PyObject_CallMethod               NULL
PyObject_CallObject               NULL
PyObject_Cmp                      -1
PyObject_Compare                  -1
PyObject_DelAttrString            -1
PyObject_DelItem                  -1
PyObject_GetAttrString            NULL
PyObject_GetItem                  NULL
PyObject_HasAttrString            -1
PyObject_Hash                     -1
PyObject_Length                   -1
PyObject_SetAttrString            -1
PyObject_SetItem                  -1
PyObject_Str                      NULL
PyObject_Type                     NULL
...
PyEval_EvalCode                   NULL
</pre></blockquote>

The symbols in this table correspond to functions within the Python interpreter that
might execute extension code and include the parts of the interpreter that invoke builtin functions
as well as the functions from the abstract object interface.
If any of these symbols appear on the call stack,
a handler function is invoked to raise a Python exception.
This handler function
is given a WAD-specific traceback object that contains a copy of the
call stack and CPU registers as well as any symbolic and debugging
information that was obtained.  If none of the symbolic return points
are encountered, WAD invokes a default handler that simply prints the
full C stack trace and generates a core file.

<P>
<li>  If a return point is found, the CPU context is modified in a manner that allows the signal handler to return
       with a suitable Python error.
       This last step is the most tricky part of the recovery process, but the general
       idea is that CPU context is modified in a way that makes Python think that
       an extension function simply raised an exception and returned an error.  Currently, this
       is implemented by having the signal handler return to a small
       handler function written in assembly language which arranges to return the
       desired value back to the specified return point.

<p>
       The most complicated part of modifying the CPU context is that of restoring
       previously saved CPU registers.   By manually unwinding the call stack, the
       WAD exception handler effectively performs the same operation as a longjmp() call in C.
       However, unlike longjmp(), no previously saved set of CPU registers are available from which to resume
       execution in the Python interpreter.  The solution to this problem depends entirely on the
       underlying architecture.  On the SPARC, register values are saved in register windows
       which WAD manually unwinds to restore the proper state. On the Intel, the solution is much
       more interesting.  To restore the register values, WAD must manually inspect the
       machine instructions of each function on the call stack in order to find out where the
       registers might have been saved. This information is then used to restore the registers from their
       saved locations before returning to the Python interpreter.

<p>
<li> Python receives the exception and produces a traceback.
</ol>

<h3>6. Initialization and Loading</h3>

In the earlier example, it was shown that WAD could be both
loaded as an extension module or simply attached to an existing module
with the linker.  This latter case is implemented by
wrapping the WAD initialization function inside the constructor of a
statically allocated C++ object like this:

<blockquote>
<pre>
class WadInit {
public:
    WadInit() {
        wad_init();   /* Call the real initialization function */
    }
};
static WadInit wad_initializer;
</pre></blockquote>

When the dynamic loader brings WAD into memory, it automatically
executes the constructors of all statically allocated C++ objects.
Therefore, this initialization code executes immediately after
loading, but before Python actually calls the module initialization
function.  As a result, when an extension module is linked with WAD,
the debugging capability is enabled before any other operations occur---this
allows WAD to respond to fatal errors that might occur during module
initialization.

The rest of the initialization process consists of the following:
<ul>
<li> The WAD signal handler is installed.
<li> A collection of return symbols are registered with the signal handler (see the previous section).
<li> Four new Python exception objects <tt>SegFault</tt>, <tt>BusError</tt>, <tt>AbortError</tt>,
and <tt>IllegalInstruction</tt> are added
to the <tt>__builtin__</tt> module.
</ul>

Although the use of a C++ static constructor has the potential to
conflict with C++ extension code that also uses static constructors,
it is always possible to enable WAD prior to loading a C++ extension
(e.g., WAD could be loaded separately).

<h3>7. Implementation Details</h3>

Currently, WAD is written in ANSI C with a small amount of C++,
and a small amount of assembly code (to assist in the return to the interpreter).
The entire implementation contains approximately 2000 semicolons and most of the code
relates to the gathering of source code information (symbol tables,
debugging information, etc.).

<p>
Although there are libraries such as GNU bfd that can assist with the
reading of object files, none of these are used in the implementation [10].
First, these libraries tend to be quite large
and are oriented more towards stand-alone tools such as debuggers,
linkers, and compilers.  Second, due to usual nature of the runtime
environment and the restrictions on memory utilization (no heap, no
stack), the behavior of these libraries is somewhat unclear and
would require further study.
Finally, given the small size of the prototype implementation, it didn't seem necessary to rely on a
large general purpose library.

<h3>8. Discussion</h3>

The primary focus of this work is to provide a more useful error
reporting mechanism to extension developers.
However, this does not imply that
WAD is appropriate as a general purpose exception
handling mechanism.  First, let's focus
on the recovery mechanism:

<ul>
<li> When WAD unwinds the call stack, objects allocated on the stack
are lost.  This may interact poorly with C++ extensions since the
unwinding process does not invoke C++ destructors.  It may be possible to fix
this problem, but doing so would require coordination with the C++ runtime library.

<p>
<li> Similarly, if a procedure allocates objects on the heap, stack unwinding
may cause those objects to never be reclaimed.

<p>
<li> Closely related to heap management, stack unwinding may result in
open files, sockets, and other system resources.  Furthermore, in a multithreaded
environment, deadlock may occur if a procedure is holding a lock when an error occurs.

<p>
<li> An application may fail by overwriting the process heap and corrupting
memory.  Although WAD can produce internal diagnostics even when the heap has been
destroyed, Python may fail immediately upon return from the
WAD signal handler or shortly thereafter.

<p>
<li> If an application destroys the call stack (via buffer overflow), WAD will
be unable to complete a stack trace and will be unable to return to
Python.

<p>
<li> Memory management problems such as double-freeing of memory are particularly
difficult to identify.  If an extension module corrupts the memory allocator
in some manner, this may cause Python to fail in a completely unexpected location.
WAD is usually able to produce a traceback in this situation, but
it may not correspond to the real source of the problem.

</ul>

In addition, there are a number of issues that pertain to WAD's interaction with the
Python interpreter:

<ul>
<li> The recovery mechanism is entirely based on symbolic information stored
in the Python executable.  Therefore, the return points are simply specified
as strings such as ``call_builtin'' as opposed to real memory addresses.
Because of this, WAD is compatible with essentially any version of Python (provided
it supports class-based exceptions).

<P>
<li> WAD is unable to manage multiple return values to same procedure.
For example, Python's <tt>eval_code2()</tt> procedure contains a huge
case statement for executing byte codes.  Within this procedure, certain
function calls return NULL to indicate an error and others return -1.  Since WAD
is unable to determine which value to return, this particular procedure does not make a very
good return point for error recovery.

<P>
<li> An alternative approach to the symbolic recovery scheme would be to
instrument Python with a collection of safe return points using setjmp()/longjmp().
This approach is not used because it would require a significant number of changes to
the interpreter and it would introduce an unacceptable amount of performance overhead.

<p>
<li> WAD is generally safe to use with Python threads.  However, if a
compiled extension function manually releases the Python interpreter
lock and subsequently faults, the return behavior is unspecified.  In
the future, it may be possible to use the interpreter lock to provide coordination
between the interpreter and the error recovery mechanism.

<p>
<li> Compiled extension code may perform an eval operation in which Python code is executed
in the interpreter.  This results in a situation where the complete call-stack of an
application crosses the boundary between Python and C several times.  WAD can
still handle faults in this setting as long as an application is doing a reasonable amount of
error checking.  For example, a fatal error that occurs inside an eval operation could
be caught by the extension code and propagated further up the call stack.

<p>
<li> In certain cases, Python may be configured to handle the SIGFPE signal for floating point
exceptions. The default Python handling of this error is to abort and dump core. However,
with WAD, a complete stack traceback will be obtained when a SIGFPE occurs.

<p>
<li> WAD is extremely inefficient.  Due to restrictions on the heap and stack,
WAD relies heavily on mmap() and a variety of other file
operations as it handles errors.  It also performs linear searches of symbol and
debugging tables. As a result, WAD's generation of a
Python exception is several orders of magnitude slower than an ordinary
exception.
</ul>

Finally, there are a number of application specific issues to note:

<ul>
<li> Aggressive compiler optimization techniques may prevent WAD from
accurately reporting locations within the original source code.
This is particularly problematic with numerical applications where
techniques such procedure inlining can make it impossible to obtain accurate
debugging information.  Since these types of problems also arise in
full-featured debuggers, it is unlikely that they can be easily fixed in WAD (at least not
without a considerable amount of work).

<p>
<li> If an application implements its own exception handling,
it may provide Python with less information than what would obtained with WAD.
For example, a programmer might implement a function like this:

<blockquote><pre>
void *Malloc(int size) {
   void *ptr;
   ptr = malloc(size);
   if (!ptr) throw("Out of memory");
   return ptr;
}
</pre></blockquote>

In this case, the ``throw'' function may initiate an internal
exception handling mechanism that relies upon setjmp/longjmp or C++ exceptions.
When the error eventually makes it back to the interpreter, the user will get an ``out
of memory'' exception, but no additional information will be
provided.  In contrast, if the programmer simply used an <tt>assert()</tt> statement, WAD would produce a full stack trace leading to
the error.
</ul>


Despite its various limitations, WAD is applicable to a wide range of
extension-related errors.  Furthermore, most of the errors that are
likely to occur are of a more benign variety.  For example, a
segmentation fault may simply be the result of an uninitialized
pointer (perhaps the user forgot to call an initialization procedure).
Likewise, bus errors, failed assertions, and floating point exceptions
rarely result in a situation where the WAD recovery mechanism would be
unable to produce a meaningful Python traceback.

<h3>9. Related Work</h3>

There is a huge body of literature concerning the implementation of
exception handling in various programming languages and environments.
A detailed discussion of this work is clearly not possible here, but
a general overview of various exception handling issues can be found in [11].
In general, there are a few themes that seem to prevail.
First,
considerable attention has been given to exception handling mechanisms
in specific languages such as efficient exception handling for C++.
Second, a great deal of work has been given to the semantic aspects of
exception handling such as exception hierarchies, finalization, and
whether or not code is restartable after an exception has occurred.
Finally, a fair amount of exception work has been done in the context
of component frameworks and distributed systems.  Most of this work
tends to concentrate on explicit exception handling mechanisms.  Very little
work appears to have been done in the area of converting hardware generated errors
into exceptions.

<p>
With respect to debuggers, quite a lot of work has been done in
creating advanced debugging support for specific languages and
integrated development environments.  However, very little of this work
has concentrated on the problem of extensible systems and
compiled-interpreted language integration.  For instance, debuggers
for Python are currently unable to cross over into C extensions whereas C
debuggers aren't able to easily extract useful information from the
internals of the Python interpreter.

<p>
One system of possible interest is Rn which was developed in the
mid-1980s at Rice University [12]. This system, primarily
designed for working with large scientific applications written in
Fortran, provided an execution monitor that consisted of a special
debugging process with an embedded interpreter. When attached to
compiled Fortran code, this monitor could dynamically patch
the executable in a manner that allowed parts of the code to be executed in the
interpreter. This was used to provide a debugging environment in which
essentially any part of the compiled application could be modified at
run-time by simply compiling the modified code (Fortran) to an
interpreted form and inserting a breakpoint in the original executable
that transferred control to the interpreter.  Although this
particular scheme is not directly related to the functionality
of WAD, it is one of the few systems in which
interpreted and compiled code have been tightly coupled within
a debugging framework.  Several aspects of the interpreted/compiled
interface are closely related to way in which WAD operates.  In addition,
various aspects of this work may be useful should WAD be extended with
new capabilities.

<h3>10. Future Directions</h3>

WAD is currently an experimental prototype.  Although this paper has
described its use with Python, the core of the system is generic and
is easily extended to other programming environments.  For example, when
linked to C/C++ code, WAD will automatically produce stack
traces for fatal errors.  A module for generating Tcl exceptions has
also been developed.  Plans are underway to provide support for other
extensible systems including Perl, Ruby, and Guile.

<p>
Finally, a number of extensions to the WAD approach may be possible.
For example, even though the current implementation only returns a
traceback string to the Python interpreter, the WAD signal handler
actually generates a full traceback of the C call stack including all
of the CPU registers and a copy of the stack data.  Therefore, with a
little work, it may be possible to implement a diagnostic tool that
allows the state of the C stack to be inspected from the Python
interpreter after a crash has occurred.  Similarly, it may be possible
to integrate the capabilities of WAD with those provided by the Python
debugger.

<h3>11. Conclusions and Availability</h3>

WAD provides a simple mechanism for converting fatal errors into
Python exceptions that provide useful information to extension
writers.  In doing so, it solves one of the most frustrating aspects
of working with compiled Python extensions--that of identifying program errors.
Furthermore the system requires no code modifications to Python and introduces
no performance overhead.
Although the system is
necessarily platform specific, the system does not involve a
significant amount of code.  As a result, it may be relatively
straightforward to port to other Unix systems.

<p>
As of this writing, WAD is still undergoing active development.   However,
the software is available for experimentation and download at
at <tt>http://systems.cs.uchicago.edu/wad</tt>.

<h3>References</h3>

[1] D.M. Beazley, <em>Using SWIG to Control, Prototype, and Debug C Programs with Python</em>,
4th International Python Conference, Livermore, CA. (1996).

<p>
[2] P.F. Dubois, <em>Climate Data Analysis Software</em>, 8th International Python Conference,
Arlington, VA. (2000).

<p>
[3] P.F. Dubois, <em>A Facility for Creating Python Extensions in C++</em>, 7th International Python
Conference, Houston, TX. (1998).

<p>
[4] SIP. <tt>http://www.thekompany.com/projects/pykde/</tt>.

<p>
[5] FPIG. <tt>http://cens.ioc.ee/projects/f2py2e/</tt>.

<p>
[6] R. Faulkner and R. Gomes, <em>The Process File System and Process Model in UNIX System V</em>, USENIX Conference Proceedings,
January 1991.

<p>
[7] J.R. Levine, <em>Linkers &amp; Loaders.</em> Morgan Kaufmann Publishers, 2000.

<p>
[8] Free Software Foundation, <em>The "stabs" debugging format</em>. GNU info document.

<p>
[9] W. Richard Stevens, <em>UNIX Network Programming: Interprocess Communication, Volume 2</em>. PTR
Prentice-Hall, 1998.

<p>
[10] S. Chamberlain. <em>libbfd: The Binary File Descriptor Library</em>. Cygnus Support, bfd version 3.0 edition, April 1991.

<p>
[11] M.L. Scott. <em>Programming Languages Pragmatics</em>. Morgan Kaufmann Publishers, 2000.

<p>
[12] A. Carle, D. Cooper, R. Hood, K. Kennedy, L. Torczon, S. Warren, <em>A Practical Environment for Scientific Programming.</em>
IEEE Computer, Vol 20, No. 11, (1987). p. 75-89.


</body>
</html>