*** empty log message ***
git-svn-id: https://swig.svn.sourceforge.net/svnroot/swig/trunk/SWIG@1053 626c5289-ae23-0410-ae9c-e8d60b6d4f22
This commit is contained in:
parent
9579a7f3a3
commit
b249cc5e26
5 changed files with 1861 additions and 0 deletions
8
Tools/WAD/Papers/README
Normal file
8
Tools/WAD/Papers/README
Normal file
|
|
@ -0,0 +1,8 @@
|
|||
This directory contains papers and information about WAD.
|
||||
|
||||
python.html - WAD paper from Python9.
|
||||
usenix2001.tex - USENIX 2001 Technical conference submission.
|
||||
This paper was accepted, but the text has not yet
|
||||
been updated to final copy.
|
||||
WADTalk.pdf - Slides from the WAD Talk at Python9.
|
||||
|
||||
BIN
Tools/WAD/Papers/WADTalk.pdf
Normal file
BIN
Tools/WAD/Papers/WADTalk.pdf
Normal file
Binary file not shown.
BIN
Tools/WAD/Papers/fig1.png
Normal file
BIN
Tools/WAD/Papers/fig1.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 7.8 KiB |
860
Tools/WAD/Papers/python.html
Normal file
860
Tools/WAD/Papers/python.html
Normal file
|
|
@ -0,0 +1,860 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>WAD: A Module for Converting Fatal Extension Errors into Python Exceptions</title>
|
||||
</head>
|
||||
<body bgcolor="#ffffff">
|
||||
<center>
|
||||
|
||||
<h2>WAD: A Module for Converting Fatal Extension Errors into Python Exceptions</h2>
|
||||
<h6>David M. Beazley <br>
|
||||
Department of Computer Science<br>
|
||||
University of Chicago<br>
|
||||
Chicago, IL 60637<br>
|
||||
beazley@cs.uchicago.edu<br>
|
||||
</h6>
|
||||
</center>
|
||||
|
||||
<h3>Abstract</h3>
|
||||
<em>
|
||||
One of the more popular uses of Python is as an extension language for
|
||||
applications written in compiled languages such as C, C++, and
|
||||
Fortran. Unfortunately, one of the biggest drawbacks of this approach
|
||||
is the lack of a useful debugging and error handling facility for
|
||||
identifying problems in extension code. In part, this limitation is
|
||||
due to the fact that Python does not know anything about the internal
|
||||
implementation of an extension module. A more difficult problem is
|
||||
that compiled extensions sometimes fail with catastrophic errors such
|
||||
as memory access violations, failed assertions, and floating point
|
||||
exceptions. These types of errors fall outside the realm of normal
|
||||
Python exception handling and are particularly difficult to identify
|
||||
and debug. Although traditional debuggers can find the location of a
|
||||
fatal error, they are unable to report the context in which such an
|
||||
error has occurred with respect to a Python script. This paper describes
|
||||
an experimental system that converts fatal extension errors
|
||||
into Python exceptions. In particular, a dynamically
|
||||
loadable module, WAD (Wrapped Application Debugger), has been developed which catches
|
||||
fatal errors, unwinds the call stack, and generates Python exceptions
|
||||
with debugging information. WAD requires no modifications to Python,
|
||||
works with all extension modules, and introduces no performance
|
||||
overhead. An initial implementation of the system is currently
|
||||
available for Sun SPARC Solaris and i386-Linux.
|
||||
|
||||
</em>
|
||||
|
||||
<h3>1. Introduction</h3>
|
||||
|
||||
One of the primary reasons C, C++, and Fortran programmers are
|
||||
attracted to Python is its ability to serve as an extension language
|
||||
for compiled programs. Furthermore, tools such as SIP, CXX, Pyfort, FPIG,
|
||||
and SWIG make it extremely easy for a programmer to ``wrap'' existing
|
||||
software into an extension module [1,2,3,4,5]. Although this approach is
|
||||
extremely attractive in terms of providing a highly usable and
|
||||
flexible environment for users, extension modules suffer from
|
||||
problems not normally associated with Python
|
||||
scripts---especially when they don't work.
|
||||
|
||||
<p>
|
||||
Normally, Python programming errors result in an exception like this:
|
||||
|
||||
<blockquote><pre>
|
||||
% python foo.py
|
||||
Traceback (innermost last):
|
||||
File "foo.py", line 11, in ?
|
||||
foo()
|
||||
File "foo.py", line 8, in foo
|
||||
bar()
|
||||
File "foo.py", line 5, in bar
|
||||
spam()
|
||||
File "foo.py", line 2, in spam
|
||||
doh()
|
||||
NameError: doh
|
||||
%
|
||||
</pre></blockquote>
|
||||
|
||||
Unfortunately for compiled extensions, the following situation sometimes occurs:
|
||||
|
||||
<blockquote><pre>
|
||||
% python foo.py
|
||||
Segmentation Fault (core dumped)
|
||||
%
|
||||
</pre></blockquote>
|
||||
|
||||
Needless to say, this isn't very informative--well,
|
||||
other than indicating that something ``very bad'' happened.
|
||||
|
||||
<p>
|
||||
In order to identify the source of a fatal error, a programmer can run a
|
||||
debugger on the Python executable or on a core file like this:
|
||||
|
||||
<blockquote><pre>
|
||||
% gdb /usr/local/bin/python
|
||||
(gdb) run foo.py
|
||||
Starting program: /usr/local/bin/python foo.py
|
||||
|
||||
Program received signal SIGSEGV, Segmentation fault.
|
||||
0xff060568 in doh () from /u0/beazley/Projects/WAD/Python/./libfoo.so
|
||||
(gdb) where
|
||||
#0 0xff060568 in doh () from /u0/beazley/Projects/WAD/Python/./libfoo.so
|
||||
#1 0xff082f34 in _wrap_doh ()
|
||||
from /u0/beazley/Projects/WAD/Python/./dohmodule.so
|
||||
#2 0x2777c in call_builtin (func=0x1984b8, arg=0x1a1ccc, kw=0x0)
|
||||
at ceval.c:2650
|
||||
#3 0x27648 in PyEval_CallObjectWithKeywords (func=0x1984b8, arg=0x1a1ccc,
|
||||
kw=0x0) at ceval.c:2618
|
||||
#4 0x25d18 in eval_code2 (co=0x19acf8, globals=0x0, locals=0x1c7844,
|
||||
args=0x1984b8, argcount=1625472, kws=0x0, kwcount=0, defs=0x0, defcount=0,
|
||||
owner=0x0) at ceval.c:1951
|
||||
#5 0x25954 in eval_code2 (co=0x199620, globals=0x0, locals=0x1984b8,
|
||||
args=0x196654, argcount=1862720, kws=0x197788, kwcount=0, defs=0x0,
|
||||
#6 0x25954 in eval_code2 (co=0x19ad38, globals=0x0, locals=0x196654,
|
||||
args=0x1962fc, argcount=1862800, kws=0x198e90, kwcount=0, defs=0x0,
|
||||
defcount=0, owner=0x0) at ceval.c:1850
|
||||
#7 0x25954 in eval_code2 (co=0x1b6c60, globals=0x0, locals=0x1962fc,
|
||||
args=0x1a1eb4, argcount=1862920, kws=0x0, kwcount=0, defs=0x0, defcount=0,
|
||||
owner=0x0) at ceval.c:1850
|
||||
#8 0x22da4 in PyEval_EvalCode (co=0x1b6c60, globals=0x1962c4, locals=0x1962c4)
|
||||
at ceval.c:319
|
||||
#9 0x3adb4 in run_node (n=0x18abf8, filename=0x1b6c60 "", globals=0x1962c4,
|
||||
locals=0x1962c4) at pythonrun.c:886
|
||||
#10 0x3ad64 in run_err_node (n=0x18abf8, filename=0x1b6c60 "",
|
||||
globals=0x1962c4, locals=0x1962c4) at pythonrun.c:874
|
||||
#11 0x3ad38 in PyRun_FileEx (fp=0x18a5a8, filename=0xffbefd7a "foo.py",
|
||||
start=1616888, globals=0x1962c4, locals=0x1962c4, closeit=1)
|
||||
at pythonrun.c:866
|
||||
#12 0x3a1d8 in PyRun_SimpleFileEx (fp=0x18a5a8, filename=0xffbefd7a "foo.py",
|
||||
closeit=1) at pythonrun.c:579
|
||||
#13 0x39d84 in PyRun_AnyFileEx (fp=0x18a5a8, filename=0xffbefd7a "foo.py",
|
||||
closeit=1) at pythonrun.c:459
|
||||
#14 0x1f498 in Py_Main (argc=2, argv=0xffbefc84) at main.c:289
|
||||
#15 0x1eec0 in main (argc=2, argv=0xffbefc84) at python.c:10
|
||||
</pre></blockquote>
|
||||
|
||||
Unfortunately, even though the debugger identifies the location where the fault occurred, it
|
||||
mostly provides information about the internals of the
|
||||
interpreter. The debugger certainly doesn't reveal anything about the Python
|
||||
program that led to the error (i.e., it doesn't reveal the
|
||||
same information that would be contained in a Python traceback). As a result,
|
||||
the debugger is of limited use when it comes to debugging an application that
|
||||
consists of both compiled and Python code.
|
||||
|
||||
<p>
|
||||
Normally, extension developers try to avoid catastrophic errors by
|
||||
adding error handling. If
|
||||
an application is small or customized for use with Python, it can be
|
||||
modified to raise Python exceptions.
|
||||
Automated tools such as SWIG can also convert C++
|
||||
exceptions and C-related error handling mechanisms into Python
|
||||
exceptions. However, no matter how much error checking is added,
|
||||
there is always a chance that an extension will fail in an unexpected
|
||||
manner. This is especially true for large applications that have been wrapped
|
||||
into an extension module. In addition, certain types of errors such as floating
|
||||
point exceptions (e.g., division by zero) are especially difficult to find
|
||||
and eliminate. Finally, rigorous error checking may be omitted to improve
|
||||
performance.
|
||||
|
||||
<p>
|
||||
To address these problems, an experimental module known as WAD (Wrapped
|
||||
Application Debugger) has been developed.
|
||||
WAD is able to
|
||||
convert fatal errors into Python exceptions that include information
|
||||
from the call stack as well as debugging
|
||||
information. By turning such errors into Python exceptions, fatal
|
||||
errors now result in a traceback that crosses the boundary between
|
||||
Python code and compiled extension code. This makes it much
|
||||
easier to identify and correct extension-related programming errors.
|
||||
WAD requires no modifications to Python and is compatible with all
|
||||
extension modules. However, it is also highly platform specific
|
||||
and currently only runs on Sun Sparc
|
||||
Solaris and i386-Linux. The primary goal of this paper is to motivate the problem
|
||||
and to describe one possible solution. In addition, many of the
|
||||
implementation issues
|
||||
associated with providing an integrated error reporting mechanism are described.
|
||||
|
||||
<h3>2. An Example</h3>
|
||||
|
||||
WAD can either be imported as a Python extension module or linked to an
|
||||
extension module. To illustrate, consider the earlier example:
|
||||
|
||||
<blockquote><pre>
|
||||
% python foo.py
|
||||
Segmentation Fault (core dumped)
|
||||
%
|
||||
</pre></blockquote>
|
||||
|
||||
To identify the problem, a programmer can run Python interactively and import WAD as follows:
|
||||
|
||||
<blockquote><pre>
|
||||
% python
|
||||
Python 2.0 (#1, Oct 27 2000, 14:34:45)
|
||||
[GCC 2.95.2 19991024 (release)] on sunos5
|
||||
Type "copyright", "credits" or "license" for more information.
|
||||
>>> import libwadpy
|
||||
WAD Enabled
|
||||
>>> execfile("foo.py")
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in ?
|
||||
File "foo.py", line 16, in ?
|
||||
foo()
|
||||
File "foo.py", line 13, in foo
|
||||
bar()
|
||||
File "foo.py", line 10, in bar
|
||||
spam()
|
||||
File "foo.py", line 7, in spam
|
||||
doh.doh(a,b,c)
|
||||
SegFault: [ C stack trace ]
|
||||
|
||||
#2 0x00027774 in call_builtin(func=0x1c74f0,arg=0x1a1ccc,kw=0x0)
|
||||
#1 0xff022f7c in _wrap_doh(0x0,0x1a1ccc,0x160ef4,0x9c,0x56b44,0x1aa3d8)
|
||||
#0 0xfe7e0568 in doh(a=0x3,b=0x4,c=0x0) in 'foo.c', line 28
|
||||
|
||||
/u0/beazley/Projects/WAD/Python/foo.c, line 28
|
||||
|
||||
int doh(int a, int b, int *c) {
|
||||
=> *c = a + b;
|
||||
return *c;
|
||||
}
|
||||
|
||||
>>>
|
||||
</pre></blockquote>
|
||||
|
||||
In this case, we can
|
||||
see that the program has tried to assign a value to a
|
||||
NULL pointer (indicated by the value "c=0x0" in the last function call). Furthermore, we obtain a Python traceback that shows the
|
||||
entire sequence of functions leading to the problem. Finally, since
|
||||
control returned to the interpreter, it is possible to interactively
|
||||
inspect various aspects of the application or to continue with the computation
|
||||
(although this clearly depends on the severity of the error and the nature of the application).
|
||||
|
||||
<p>
|
||||
In certain applications, it may be difficult to run Python
|
||||
interactively or to modify the code to explicitly import a special
|
||||
debugging module. In these cases, WAD can be attached to an extension module with the
|
||||
linker. For example:
|
||||
|
||||
<blockquote><pre>
|
||||
% ld -G $(OBJS) -o dohmodule.so -lwadpy
|
||||
</pre></blockquote>
|
||||
|
||||
This requires no recompilation of any source code--only a relinking of the
|
||||
extension module. When Python loads the relinked extension module, WAD is automatically
|
||||
initialized before Python invokes the module initialization function.
|
||||
|
||||
<h3>3. Design Considerations for Embedded Error Recovery</h3>
|
||||
|
||||
The primary design goal of WAD is provide an error reporting mechanism
|
||||
for extension modules that is a natural extension of normal Python
|
||||
exception handling. There are two primary motivations for
|
||||
handling fatal errors in this manner: first, in the context of Python
|
||||
programming, it is simply unnatural to run a separate debugging
|
||||
application to identify a problem in an extension module when no such
|
||||
requirement exists for scripts. Thus, an embedded error reporting
|
||||
mechanism is simply more convenient. Second, the target users
|
||||
of an extension module may not know how to use a debugger or even have
|
||||
a development environment installed on their machine. Therefore,
|
||||
the ability to produce an informative traceback within the
|
||||
confines of the Python interpreter can be of tremendous value to an
|
||||
extension developer. This is because users who report a problem will
|
||||
be able to include an informative traceback as opposed to simply
|
||||
saying ``the code crashed.''
|
||||
|
||||
<p>
|
||||
A secondary design goal is to provide a system that is as non-invasive
|
||||
as possible. The system should not require modifications to Python or
|
||||
any extension modules and it should be easy to integrate
|
||||
into the runtime environment of an application. In addition, it shouldn't
|
||||
introduce any performance overhead.
|
||||
|
||||
<p>
|
||||
Finally, since WAD co-exists with the Python interpreter (i.e., in the same
|
||||
process), there are a number of technical issues that have to be
|
||||
addressed. First, fatal errors can theoretically occur anywhere in
|
||||
the interpreter as well as in extension modules. Therefore, WAD needs
|
||||
to know about Python's internal organization if it is going to provide
|
||||
a graceful recovery back to the interpreter. Second, in order to
|
||||
implement this recovery scheme, the system has to perform direct
|
||||
manipulation of the CPU context and call stack. Last, but not least,
|
||||
since the recovery code lives in the same address space as the
|
||||
interpreter and extension modules it should not depend on the process
|
||||
stack and heap (since both could have been corrupted by the faulting
|
||||
application).
|
||||
|
||||
<h3>4. Catching Fatal Errors</h3>
|
||||
|
||||
WAD catches catastrophic errors by installing a
|
||||
reliable signal handler for SIGSEGV, SIGBUS, SIGABRT, SIGILL, and SIGFPE [9]. Unlike the
|
||||
more familiar BSD-style signal interface (as provided by the Python
|
||||
signal module), reliable signal handlers are installed using the <tt>sigaction()</tt> system call and have a few notable properties:
|
||||
|
||||
<ul>
|
||||
<li> The signal handler can be configured to run on its own dedicated stack.
|
||||
|
||||
<p>
|
||||
<li> Handler functions can receive a structure containing the CPU context
|
||||
including the CPU registers, program counter, and stack pointer.
|
||||
|
||||
<p>
|
||||
<li> Changes to the CPU context take effect immediately after the signal handler returns.
|
||||
</ul>
|
||||
|
||||
Therefore, the high level implementation of WAD is relatively straightforward: when a fatal signal occurs,
|
||||
a handler function runs on an isolated signal handling stack.
|
||||
The CPU context is then used to unwind the call stack and to inspect the process state. Finally,
|
||||
if possible, the CPU context is modified in a manner that allows the signal handler to
|
||||
return to Python with a raised exception.
|
||||
|
||||
<h3>5. A Detailed Description of the Recovery Mechanism</h3>
|
||||
|
||||
In this section, a more detailed description of the error recovery
|
||||
scheme is presented. The precise implementation details of this are
|
||||
highly platform specific and involve a number of advanced topics including
|
||||
the Unix process file system (/proc), the ELF object file format, and the
|
||||
Stabs compiler debugging format [6,7,8]. The details of these topics are
|
||||
beyond the scope of this paper. However, this section hopes to
|
||||
give the reader a small taste of the steps involved in implementing the recovery mechanism.
|
||||
|
||||
<P>
|
||||
The services of WAD are only invoked upon the reception of a fatal
|
||||
signal. This triggers a signal handling function that results in a return to Python
|
||||
as illustrated in the following figure:
|
||||
|
||||
<center>
|
||||
<img src="fig1.png">
|
||||
<h6>Control flow of the error recovery mechanism</h6>
|
||||
</center>
|
||||
|
||||
<p>
|
||||
The steps required to implement this recovery are as follows:
|
||||
|
||||
<ol>
|
||||
<li> The values of the program counter and stack pointer are obtained from the CPU
|
||||
context structure passed to the WAD signal handler.
|
||||
|
||||
<p>
|
||||
<li> The virtual memory map of the process is inspected to identify all of
|
||||
the shared libraries, dynamically loaded modules, and valid memory regions.
|
||||
This information is obtained by reading from the Unix /proc filesystem.
|
||||
The following table illustrates the nature of this data:
|
||||
|
||||
|
||||
<blockquote><pre>
|
||||
Address Size Permissions File
|
||||
---------- ----- ----------------- ---------------------------------
|
||||
00010000 1264K read/exec /usr/local/bin/python
|
||||
0015A000 184K read/write/exec /usr/local/bin/python
|
||||
00188000 296K read/write/exec [ heap ]
|
||||
FE7C0000 32K read/exec /u0/beazley/Projects/dohmodule.so
|
||||
FE7D6000 8K read/write/exec /u0/beazley/Projects/dohmodule.so
|
||||
...
|
||||
FF100000 664K read/exec /usr/lib/libc.so.1
|
||||
FF1B6000 24K read/write/exec /usr/lib/libc.so.1
|
||||
FF1BC000 8K read/write/exec /usr/lib/libc.so.1
|
||||
FF2C0000 120K read/exec /usr/lib/libthread.so.1
|
||||
FF2EE000 8K read/write/exec /usr/lib/libthread.so.1
|
||||
FF2F0000 48K read/write/exec /usr/lib/libthread.so.1
|
||||
FF310000 40K read/exec /usr/lib/libsocket.so.1
|
||||
FF32A000 8K read/write/exec /usr/lib/libsocket.so.1
|
||||
FF330000 24K read/exec /usr/lib/libpthread.so.1
|
||||
FF346000 8K read/write/exec /usr/lib/libpthread.so.1
|
||||
FF350000 8K read/write/exec [ anon ]
|
||||
FF3B0000 8K read/exec /usr/lib/libdl.so.1
|
||||
FF3C0000 128K read/exec /usr/lib/ld.so.1
|
||||
FF3E0000 8K read/write/exec /usr/lib/ld.so.1
|
||||
FFBEA000 24K read/write/exec [ stack ]
|
||||
</pre></blockquote>
|
||||
|
||||
<p>
|
||||
<li> The call stack is unwound to produce a traceback of the
|
||||
calling sequence that led to the error. The unwinding process is just a simple
|
||||
loop that is similar to the following:
|
||||
|
||||
<blockquote><pre>
|
||||
long *pc = get_pc(context);
|
||||
long *sp = get_sp(context);
|
||||
while (sp) {
|
||||
/* Move to previous stack frame */
|
||||
pc = (long *) sp[15]; /* %i7 register on SPARC */
|
||||
sp = (long *) sp[14]; /* %i6 register on SPARC */
|
||||
}
|
||||
</pre></blockquote>
|
||||
|
||||
<li> For each stack frame, symbol table and debugging information
|
||||
is gathered and stored in a WAD exception frame object.
|
||||
Obtaining this information is the most complicated part of WAD and involves
|
||||
the following steps: first, the current program counter is mapped to an object file
|
||||
using the virtual memory map obtained in step 2. Next, the object file is loaded
|
||||
using mmap(). Once loaded, the ELF symbol table
|
||||
is searched for an address match. The symbol table contains a collection of records
|
||||
containing memory offsets, sizes, and names such as this:
|
||||
|
||||
<blockquote><pre>
|
||||
Offset Size Name
|
||||
-------- ------ ---------
|
||||
0x1280 324 wrap_foo
|
||||
0x1600 128 foo
|
||||
0x2408 192 bar
|
||||
...
|
||||
</pre></blockquote>
|
||||
|
||||
To find a match for a virtual memory address <em>addr</em>, WAD simply
|
||||
searches for a symbol <em>s</em> such that <em>base</em> +
|
||||
<em>s</em>.offset <= <em>addr</em> < <em>base</em> +
|
||||
<em>s</em>.offset + <em>s</em>.size, where <em>base</em> is the base
|
||||
virtual address of the object file in the virtual memory map.
|
||||
|
||||
<p>
|
||||
Debugging information, if available, is scanned to identify a source
|
||||
file, function name, and line number. This involves scanning object files for a
|
||||
table of debugging information stored in a format
|
||||
known as ``stabs.''. Stabs is a relatively simple, but highly extensible format that
|
||||
is language independent and capable of encoding almost every aspect of the
|
||||
original source code. For the purposes of WAD, only a small subset of this
|
||||
data is actually used.
|
||||
|
||||
<p>
|
||||
The following table shows a small fragment of relevant stabs data:
|
||||
<blockquote><pre>
|
||||
type desc value string description
|
||||
------ ----- --------- --------------------------- -----------
|
||||
0x64 0 0 /u0/beazley/Projects/foo/ Pathname
|
||||
0x64 0 0 foo.c Filename
|
||||
...
|
||||
0x24 0 0 foo:F(0,3);(0,3) Function
|
||||
0xa0 4 68 n:p(0,3) Parameter
|
||||
...
|
||||
0x44 6 8 Line number
|
||||
0x44 7 12 Line number
|
||||
0x44 8 44 Line number
|
||||
0x44 9 56 Line number
|
||||
...
|
||||
</pre></blockquote>
|
||||
|
||||
In the table, the type field indicates the type of debugging information. For
|
||||
example, 0x64 specifies the source file, 0x24 is a function
|
||||
definition, 0xa0 is a function parameter, and 0x44 is line number
|
||||
information. Associated with each stab is a collection of parameters
|
||||
and an optional string. The string usually contains symbol names and
|
||||
other information. The <tt>desc</tt> and <tt>value</tt> fields are numbers
|
||||
that usually contain byte offsets and line number data.
|
||||
Therefore, to collect debugging information, WAD simply walks through the debugging
|
||||
tables until it finds the function of interest. Once found, parameter and line
|
||||
number specifiers are inspected to determine the location and values of the function
|
||||
arguments as well the source line at which the error occurred.
|
||||
|
||||
<p>
|
||||
<li> After the complete traceback has been obtained, it is examined to see if
|
||||
there are any ``safe'' return points to which control can be returned.
|
||||
This is accomplished by maintaining an internal table of predefined symbolic return
|
||||
points as shown in the following table:
|
||||
|
||||
<blockquote><pre>
|
||||
Python symbol Return value
|
||||
----------------------------- ------------------
|
||||
call_builtin NULL
|
||||
_PyImport_LoadDynamicModule NULL
|
||||
PyObject_Repr NULL
|
||||
PyObject_Print -1
|
||||
PyObject_CallFunction NULL
|
||||
PyObject_CallMethod NULL
|
||||
PyObject_CallObject NULL
|
||||
PyObject_Cmp -1
|
||||
PyObject_Compare -1
|
||||
PyObject_DelAttrString -1
|
||||
PyObject_DelItem -1
|
||||
PyObject_GetAttrString NULL
|
||||
PyObject_GetItem NULL
|
||||
PyObject_HasAttrString -1
|
||||
PyObject_Hash -1
|
||||
PyObject_Length -1
|
||||
PyObject_SetAttrString -1
|
||||
PyObject_SetItem -1
|
||||
PyObject_Str NULL
|
||||
PyObject_Type NULL
|
||||
...
|
||||
PyEval_EvalCode NULL
|
||||
</pre></blockquote>
|
||||
|
||||
The symbols in this table correspond to functions within the Python interpreter that
|
||||
might execute extension code and include the parts of the interpreter that invoke builtin functions
|
||||
as well as the functions from the abstract object interface.
|
||||
If any of these symbols appear on the call stack,
|
||||
a handler function is invoked to raise a Python exception.
|
||||
This handler function
|
||||
is given a WAD-specific traceback object that contains a copy of the
|
||||
call stack and CPU registers as well as any symbolic and debugging
|
||||
information that was obtained. If none of the symbolic return points
|
||||
are encountered, WAD invokes a default handler that simply prints the
|
||||
full C stack trace and generates a core file.
|
||||
|
||||
<P>
|
||||
<li> If a return point is found, the CPU context is modified in a manner that allows the signal handler to return
|
||||
with a suitable Python error.
|
||||
This last step is the most tricky part of the recovery process, but the general
|
||||
idea is that CPU context is modified in a way that makes Python think that
|
||||
an extension function simply raised an exception and returned an error. Currently, this
|
||||
is implemented by having the signal handler return to a small
|
||||
handler function written in assembly language which arranges to return the
|
||||
desired value back to the specified return point.
|
||||
|
||||
<p>
|
||||
The most complicated part of modifying the CPU context is that of restoring
|
||||
previously saved CPU registers. By manually unwinding the call stack, the
|
||||
WAD exception handler effectively performs the same operation as a longjmp() call in C.
|
||||
However, unlike longjmp(), no previously saved set of CPU registers are available from which to resume
|
||||
execution in the Python interpreter. The solution to this problem depends entirely on the
|
||||
underlying architecture. On the SPARC, register values are saved in register windows
|
||||
which WAD manually unwinds to restore the proper state. On the Intel, the solution is much
|
||||
more interesting. To restore the register values, WAD must manually inspect the
|
||||
machine instructions of each function on the call stack in order to find out where the
|
||||
registers might have been saved. This information is then used to restore the registers from their
|
||||
saved locations before returning to the Python interpreter.
|
||||
|
||||
<p>
|
||||
<li> Python receives the exception and produces a traceback.
|
||||
</ol>
|
||||
|
||||
<h3>6. Initialization and Loading</h3>
|
||||
|
||||
In the earlier example, it was shown that WAD could be both
|
||||
loaded as an extension module or simply attached to an existing module
|
||||
with the linker. This latter case is implemented by
|
||||
wrapping the WAD initialization function inside the constructor of a
|
||||
statically allocated C++ object like this:
|
||||
|
||||
<blockquote>
|
||||
<pre>
|
||||
class WadInit {
|
||||
public:
|
||||
WadInit() {
|
||||
wad_init(); /* Call the real initialization function */
|
||||
}
|
||||
};
|
||||
static WadInit wad_initializer;
|
||||
</pre></blockquote>
|
||||
|
||||
When the dynamic loader brings WAD into memory, it automatically
|
||||
executes the constructors of all statically allocated C++ objects.
|
||||
Therefore, this initialization code executes immediately after
|
||||
loading, but before Python actually calls the module initialization
|
||||
function. As a result, when an extension module is linked with WAD,
|
||||
the debugging capability is enabled before any other operations occur---this
|
||||
allows WAD to respond to fatal errors that might occur during module
|
||||
initialization.
|
||||
|
||||
The rest of the initialization process consists of the following:
|
||||
<ul>
|
||||
<li> The WAD signal handler is installed.
|
||||
<li> A collection of return symbols are registered with the signal handler (see the previous section).
|
||||
<li> Four new Python exception objects <tt>SegFault</tt>, <tt>BusError</tt>, <tt>AbortError</tt>,
|
||||
and <tt>IllegalInstruction</tt> are added
|
||||
to the <tt>__builtin__</tt> module.
|
||||
</ul>
|
||||
|
||||
Although the use of a C++ static constructor has the potential to
|
||||
conflict with C++ extension code that also uses static constructors,
|
||||
it is always possible to enable WAD prior to loading a C++ extension
|
||||
(e.g., WAD could be loaded separately).
|
||||
|
||||
<h3>7. Implementation Details</h3>
|
||||
|
||||
Currently, WAD is written in ANSI C with a small amount of C++,
|
||||
and a small amount of assembly code (to assist in the return to the interpreter).
|
||||
The entire implementation contains approximately 2000 semicolons and most of the code
|
||||
relates to the gathering of source code information (symbol tables,
|
||||
debugging information, etc.).
|
||||
|
||||
<p>
|
||||
Although there are libraries such as GNU bfd that can assist with the
|
||||
reading of object files, none of these are used in the implementation [10].
|
||||
First, these libraries tend to be quite large
|
||||
and are oriented more towards stand-alone tools such as debuggers,
|
||||
linkers, and compilers. Second, due to usual nature of the runtime
|
||||
environment and the restrictions on memory utilization (no heap, no
|
||||
stack), the behavior of these libraries is somewhat unclear and
|
||||
would require further study.
|
||||
Finally, given the small size of the prototype implementation, it didn't seem necessary to rely on a
|
||||
large general purpose library.
|
||||
|
||||
<h3>8. Discussion</h3>
|
||||
|
||||
The primary focus of this work is to provide a more useful error
|
||||
reporting mechanism to extension developers.
|
||||
However, this does not imply that
|
||||
WAD is appropriate as a general purpose exception
|
||||
handling mechanism. First, let's focus
|
||||
on the recovery mechanism:
|
||||
|
||||
<ul>
|
||||
<li> When WAD unwinds the call stack, objects allocated on the stack
|
||||
are lost. This may interact poorly with C++ extensions since the
|
||||
unwinding process does not invoke C++ destructors. It may be possible to fix
|
||||
this problem, but doing so would require coordination with the C++ runtime library.
|
||||
|
||||
<p>
|
||||
<li> Similarly, if a procedure allocates objects on the heap, stack unwinding
|
||||
may cause those objects to never be reclaimed.
|
||||
|
||||
<p>
|
||||
<li> Closely related to heap management, stack unwinding may result in
|
||||
open files, sockets, and other system resources. Furthermore, in a multithreaded
|
||||
environment, deadlock may occur if a procedure is holding a lock when an error occurs.
|
||||
|
||||
<p>
|
||||
<li> An application may fail by overwriting the process heap and corrupting
|
||||
memory. Although WAD can produce internal diagnostics even when the heap has been
|
||||
destroyed, Python may fail immediately upon return from the
|
||||
WAD signal handler or shortly thereafter.
|
||||
|
||||
<p>
|
||||
<li> If an application destroys the call stack (via buffer overflow), WAD will
|
||||
be unable to complete a stack trace and will be unable to return to
|
||||
Python.
|
||||
|
||||
<p>
|
||||
<li> Memory management problems such as double-freeing of memory are particularly
|
||||
difficult to identify. If an extension module corrupts the memory allocator
|
||||
in some manner, this may cause Python to fail in a completely unexpected location.
|
||||
WAD is usually able to produce a traceback in this situation, but
|
||||
it may not correspond to the real source of the problem.
|
||||
|
||||
</ul>
|
||||
|
||||
In addition, there are a number of issues that pertain to WAD's interaction with the
|
||||
Python interpreter:
|
||||
|
||||
<ul>
|
||||
<li> The recovery mechanism is entirely based on symbolic information stored
|
||||
in the Python executable. Therefore, the return points are simply specified
|
||||
as strings such as ``call_builtin'' as opposed to real memory addresses.
|
||||
Because of this, WAD is compatible with essentially any version of Python (provided
|
||||
it supports class-based exceptions).
|
||||
|
||||
<P>
|
||||
<li> WAD is unable to manage multiple return values to same procedure.
|
||||
For example, Python's <tt>eval_code2()</tt> procedure contains a huge
|
||||
case statement for executing byte codes. Within this procedure, certain
|
||||
function calls return NULL to indicate an error and others return -1. Since WAD
|
||||
is unable to determine which value to return, this particular procedure does not make a very
|
||||
good return point for error recovery.
|
||||
|
||||
<P>
|
||||
<li> An alternative approach to the symbolic recovery scheme would be to
|
||||
instrument Python with a collection of safe return points using setjmp()/longjmp().
|
||||
This approach is not used because it would require a significant number of changes to
|
||||
the interpreter and it would introduce an unacceptable amount of performance overhead.
|
||||
|
||||
<p>
|
||||
<li> WAD is generally safe to use with Python threads. However, if a
|
||||
compiled extension function manually releases the Python interpreter
|
||||
lock and subsequently faults, the return behavior is unspecified. In
|
||||
the future, it may be possible to use the interpreter lock to provide coordination
|
||||
between the interpreter and the error recovery mechanism.
|
||||
|
||||
<p>
|
||||
<li> Compiled extension code may perform an eval operation in which Python code is executed
|
||||
in the interpreter. This results in a situation where the complete call-stack of an
|
||||
application crosses the boundary between Python and C several times. WAD can
|
||||
still handle faults in this setting as long as an application is doing a reasonable amount of
|
||||
error checking. For example, a fatal error that occurs inside an eval operation could
|
||||
be caught by the extension code and propagated further up the call stack.
|
||||
|
||||
<p>
|
||||
<li> In certain cases, Python may be configured to handle the SIGFPE signal for floating point
|
||||
exceptions. The default Python handling of this error is to abort and dump core. However,
|
||||
with WAD, a complete stack traceback will be obtained when a SIGFPE occurs.
|
||||
|
||||
<p>
|
||||
<li> WAD is extremely inefficient. Due to restrictions on the heap and stack,
|
||||
WAD relies heavily on mmap() and a variety of other file
|
||||
operations as it handles errors. It also performs linear searches of symbol and
|
||||
debugging tables. As a result, WAD's generation of a
|
||||
Python exception is several orders of magnitude slower than an ordinary
|
||||
exception.
|
||||
</ul>
|
||||
|
||||
Finally, there are a number of application specific issues to note:
|
||||
|
||||
<ul>
|
||||
<li> Aggressive compiler optimization techniques may prevent WAD from
|
||||
accurately reporting locations within the original source code.
|
||||
This is particularly problematic with numerical applications where
|
||||
techniques such procedure inlining can make it impossible to obtain accurate
|
||||
debugging information. Since these types of problems also arise in
|
||||
full-featured debuggers, it is unlikely that they can be easily fixed in WAD (at least not
|
||||
without a considerable amount of work).
|
||||
|
||||
<p>
|
||||
<li> If an application implements its own exception handling,
|
||||
it may provide Python with less information than what would obtained with WAD.
|
||||
For example, a programmer might implement a function like this:
|
||||
|
||||
<blockquote><pre>
|
||||
void *Malloc(int size) {
|
||||
void *ptr;
|
||||
ptr = malloc(size);
|
||||
if (!ptr) throw("Out of memory");
|
||||
return ptr;
|
||||
}
|
||||
</pre></blockquote>
|
||||
|
||||
In this case, the ``throw'' function may initiate an internal
|
||||
exception handling mechanism that relies upon setjmp/longjmp or C++ exceptions.
|
||||
When the error eventually makes it back to the interpreter, the user will get an ``out
|
||||
of memory'' exception, but no additional information will be
|
||||
provided. In contrast, if the programmer simply used an <tt>assert()</tt> statement, WAD would produce a full stack trace leading to
|
||||
the error.
|
||||
</ul>
|
||||
|
||||
|
||||
Despite its various limitations, WAD is applicable to a wide range of
|
||||
extension-related errors. Furthermore, most of the errors that are
|
||||
likely to occur are of a more benign variety. For example, a
|
||||
segmentation fault may simply be the result of an uninitialized
|
||||
pointer (perhaps the user forgot to call an initialization procedure).
|
||||
Likewise, bus errors, failed assertions, and floating point exceptions
|
||||
rarely result in a situation where the WAD recovery mechanism would be
|
||||
unable to produce a meaningful Python traceback.
|
||||
|
||||
<h3>9. Related Work</h3>
|
||||
|
||||
There is a huge body of literature concerning the implementation of
|
||||
exception handling in various programming languages and environments.
|
||||
A detailed discussion of this work is clearly not possible here, but
|
||||
a general overview of various exception handling issues can be found in [11].
|
||||
In general, there are a few themes that seem to prevail.
|
||||
First,
|
||||
considerable attention has been given to exception handling mechanisms
|
||||
in specific languages such as efficient exception handling for C++.
|
||||
Second, a great deal of work has been given to the semantic aspects of
|
||||
exception handling such as exception hierarchies, finalization, and
|
||||
whether or not code is restartable after an exception has occurred.
|
||||
Finally, a fair amount of exception work has been done in the context
|
||||
of component frameworks and distributed systems. Most of this work
|
||||
tends to concentrate on explicit exception handling mechanisms. Very little
|
||||
work appears to have been done in the area of converting hardware generated errors
|
||||
into exceptions.
|
||||
|
||||
<p>
|
||||
With respect to debuggers, quite a lot of work has been done in
|
||||
creating advanced debugging support for specific languages and
|
||||
integrated development environments. However, very little of this work
|
||||
has concentrated on the problem of extensible systems and
|
||||
compiled-interpreted language integration. For instance, debuggers
|
||||
for Python are currently unable to cross over into C extensions whereas C
|
||||
debuggers aren't able to easily extract useful information from the
|
||||
internals of the Python interpreter.
|
||||
|
||||
<p>
|
||||
One system of possible interest is Rn which was developed in the
|
||||
mid-1980s at Rice University [12]. This system, primarily
|
||||
designed for working with large scientific applications written in
|
||||
Fortran, provided an execution monitor that consisted of a special
|
||||
debugging process with an embedded interpreter. When attached to
|
||||
compiled Fortran code, this monitor could dynamically patch
|
||||
the executable in a manner that allowed parts of the code to be executed in the
|
||||
interpreter. This was used to provide a debugging environment in which
|
||||
essentially any part of the compiled application could be modified at
|
||||
run-time by simply compiling the modified code (Fortran) to an
|
||||
interpreted form and inserting a breakpoint in the original executable
|
||||
that transferred control to the interpreter. Although this
|
||||
particular scheme is not directly related to the functionality
|
||||
of WAD, it is one of the few systems in which
|
||||
interpreted and compiled code have been tightly coupled within
|
||||
a debugging framework. Several aspects of the interpreted/compiled
|
||||
interface are closely related to way in which WAD operates. In addition,
|
||||
various aspects of this work may be useful should WAD be extended with
|
||||
new capabilities.
|
||||
|
||||
<h3>10. Future Directions</h3>
|
||||
|
||||
WAD is currently an experimental prototype. Although this paper has
|
||||
described its use with Python, the core of the system is generic and
|
||||
is easily extended to other programming environments. For example, when
|
||||
linked to C/C++ code, WAD will automatically produce stack
|
||||
traces for fatal errors. A module for generating Tcl exceptions has
|
||||
also been developed. Plans are underway to provide support for other
|
||||
extensible systems including Perl, Ruby, and Guile.
|
||||
|
||||
<p>
|
||||
Finally, a number of extensions to the WAD approach may be possible.
|
||||
For example, even though the current implementation only returns a
|
||||
traceback string to the Python interpreter, the WAD signal handler
|
||||
actually generates a full traceback of the C call stack including all
|
||||
of the CPU registers and a copy of the stack data. Therefore, with a
|
||||
little work, it may be possible to implement a diagnostic tool that
|
||||
allows the state of the C stack to be inspected from the Python
|
||||
interpreter after a crash has occurred. Similarly, it may be possible
|
||||
to integrate the capabilities of WAD with those provided by the Python
|
||||
debugger.
|
||||
|
||||
<h3>11. Conclusions and Availability</h3>
|
||||
|
||||
WAD provides a simple mechanism for converting fatal errors into
|
||||
Python exceptions that provide useful information to extension
|
||||
writers. In doing so, it solves one of the most frustrating aspects
|
||||
of working with compiled Python extensions--that of identifying program errors.
|
||||
Furthermore the system requires no code modifications to Python and introduces
|
||||
no performance overhead.
|
||||
Although the system is
|
||||
necessarily platform specific, the system does not involve a
|
||||
significant amount of code. As a result, it may be relatively
|
||||
straightforward to port to other Unix systems.
|
||||
|
||||
<p>
|
||||
As of this writing, WAD is still undergoing active development. However,
|
||||
the software is available for experimentation and download at
|
||||
at <tt>http://systems.cs.uchicago.edu/wad</tt>.
|
||||
|
||||
<h3>References</h3>
|
||||
|
||||
[1] D.M. Beazley, <em>Using SWIG to Control, Prototype, and Debug C Programs with Python</em>,
|
||||
4th International Python Conference, Livermore, CA. (1996).
|
||||
|
||||
<p>
|
||||
[2] P.F. Dubois, <em>Climate Data Analysis Software</em>, 8th International Python Conference,
|
||||
Arlington, VA. (2000).
|
||||
|
||||
<p>
|
||||
[3] P.F. Dubois, <em>A Facility for Creating Python Extensions in C++</em>, 7th International Python
|
||||
Conference, Houston, TX. (1998).
|
||||
|
||||
<p>
|
||||
[4] SIP. <tt>http://www.thekompany.com/projects/pykde/</tt>.
|
||||
|
||||
<p>
|
||||
[5] FPIG. <tt>http://cens.ioc.ee/projects/f2py2e/</tt>.
|
||||
|
||||
<p>
|
||||
[6] R. Faulkner and R. Gomes, <em>The Process File System and Process Model in UNIX System V</em>, USENIX Conference Proceedings,
|
||||
January 1991.
|
||||
|
||||
<p>
|
||||
[7] J.R. Levine, <em>Linkers & Loaders.</em> Morgan Kaufmann Publishers, 2000.
|
||||
|
||||
<p>
|
||||
[8] Free Software Foundation, <em>The "stabs" debugging format</em>. GNU info document.
|
||||
|
||||
<p>
|
||||
[9] W. Richard Stevens, <em>UNIX Network Programming: Interprocess Communication, Volume 2</em>. PTR
|
||||
Prentice-Hall, 1998.
|
||||
|
||||
<p>
|
||||
[10] S. Chamberlain. <em>libbfd: The Binary File Descriptor Library</em>. Cygnus Support, bfd version 3.0 edition, April 1991.
|
||||
|
||||
<p>
|
||||
[11] M.L. Scott. <em>Programming Languages Pragmatics</em>. Morgan Kaufmann Publishers, 2000.
|
||||
|
||||
<p>
|
||||
[12] A. Carle, D. Cooper, R. Hood, K. Kennedy, L. Torczon, S. Warren, <em>A Practical Environment for Scientific Programming.</em>
|
||||
IEEE Computer, Vol 20, No. 11, (1987). p. 75-89.
|
||||
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
993
Tools/WAD/Papers/usenix2001.tex
Normal file
993
Tools/WAD/Papers/usenix2001.tex
Normal file
|
|
@ -0,0 +1,993 @@
|
|||
%template for producing IEEE-format articles using LaTeX.
|
||||
%written by Matthew Ward, CS Department, Worcester Polytechnic Institute.
|
||||
%use at your own risk. Complaints to /dev/null.
|
||||
%make two column with no page numbering, default is 10 point
|
||||
%\documentstyle{article}
|
||||
\documentstyle[twocolumn]{article}
|
||||
%\pagestyle{empty}
|
||||
|
||||
%set dimensions of columns, gap between columns, and space between paragraphs
|
||||
%\setlength{\textheight}{8.75in}
|
||||
\setlength{\textheight}{9.0in}
|
||||
\setlength{\columnsep}{0.25in}
|
||||
\setlength{\textwidth}{6.45in}
|
||||
\setlength{\footheight}{0.0in}
|
||||
\setlength{\topmargin}{0.0in}
|
||||
\setlength{\headheight}{0.0in}
|
||||
\setlength{\headsep}{0.0in}
|
||||
\setlength{\oddsidemargin}{0in}
|
||||
%\setlength{\oddsidemargin}{-.065in}
|
||||
%\setlength{\oddsidemargin}{-.17in}
|
||||
%\setlength{\parindent}{0pc}
|
||||
|
||||
%I copied stuff out of art10.sty and modified them to conform to IEEE format
|
||||
|
||||
\makeatletter
|
||||
%as Latex considers descenders in its calculation of interline spacing,
|
||||
%to get 12 point spacing for normalsize text, must set it to 10 points
|
||||
\def\@normalsize{\@setsize\normalsize{12pt}\xpt\@xpt
|
||||
\abovedisplayskip 10pt plus2pt minus5pt\belowdisplayskip \abovedisplayskip
|
||||
\abovedisplayshortskip \z@ plus3pt\belowdisplayshortskip 6pt plus3pt
|
||||
minus3pt\let\@listi\@listI}
|
||||
|
||||
%need an 11 pt font size for subsection and abstract headings
|
||||
\def\subsize{\@setsize\subsize{12pt}\xipt\@xipt}
|
||||
|
||||
%make section titles bold and 12 point, 2 blank lines before, 1 after
|
||||
\def\section{\@startsection {section}{1}{\z@}{24pt plus 2pt minus 2pt}
|
||||
{12pt plus 2pt minus 2pt}{\large\bf}}
|
||||
|
||||
%make subsection titles bold and 11 point, 1 blank line before, 1 after
|
||||
\def\subsection{\@startsection {subsection}{2}{\z@}{12pt plus 2pt minus 2pt}
|
||||
{12pt plus 2pt minus 2pt}{\subsize\bf}}
|
||||
\makeatother
|
||||
|
||||
\newcommand{\ignore}[1]{}
|
||||
%\renewcommand{\thesubsection}{\arabic{subsection}.}
|
||||
|
||||
\begin{document}
|
||||
|
||||
%don't want date printed
|
||||
\date{}
|
||||
|
||||
%make title bold and 14 pt font (Latex default is non-bold, 16 pt)
|
||||
\title{\Large \bf An Embedded Error Recovery and Debugging Mechanism for Scripting Language Extensions}
|
||||
|
||||
%for single author (just remove % characters)
|
||||
\author{{David M.\ Beazley} \\
|
||||
{\em Department of Computer Science} \\
|
||||
{\em University of Chicago }\\
|
||||
{\em Chicago, Illinois 60637 }\\
|
||||
{\em beazley@cs.uchicago.edu }}
|
||||
|
||||
% My Department \\
|
||||
% My Institute \\
|
||||
% My City, ST, zip}
|
||||
|
||||
%for two authors (this is what is printed)
|
||||
%\author{\begin{tabular}[t]{c@{\extracolsep{8em}}c}
|
||||
% Roscoe Giles & Pablo Tamayo \\
|
||||
% \\
|
||||
% Department of Electrical, Computer, & Thinking Machines Corp. \\
|
||||
% and Systems Engineering & Cambridge, MA~~02142. \\
|
||||
% and & \\
|
||||
% Center for Computational Science & \\
|
||||
% Boston University, Boston, MA~~02215. &
|
||||
%\end{tabular}}
|
||||
|
||||
\maketitle
|
||||
|
||||
%I don't know why I have to reset thispagesyle, but otherwise get page numbers
|
||||
\thispagestyle{empty}
|
||||
|
||||
|
||||
\subsection*{Abstract}
|
||||
{\em
|
||||
In recent years, scripting languages such as Perl, Python, and Tcl
|
||||
have become popular development tools for the creation of
|
||||
sophisticated application software. One of the most useful features
|
||||
of these languages is their ability to easily interact with compiled
|
||||
languages such as C and C++. Although this mixed language approach
|
||||
has many benefits, one of the greatest drawbacks is the complexity of
|
||||
debugging that results from using interpreted and compiled code in
|
||||
the same application. In part, this is due to the fact that scripting
|
||||
language interpreters are unable to recover from catastrophic errors in
|
||||
compiled extension code. Furthermore, traditional C/C++ debuggers do
|
||||
not provide a satisfactory degree of integration with interpreted
|
||||
languages. This paper describes an experimental system in which fatal
|
||||
extension errors such as segmentation faults, bus errors, and failed
|
||||
assertions are handled as scripting language exceptions. This system,
|
||||
which has been implemented as a general purpose shared library,
|
||||
requires no modifications to the target scripting language, introduces
|
||||
no performance overhead, and simplifies the debugging of mixed
|
||||
interpreted-compiled application software.
|
||||
}
|
||||
|
||||
\section{Introduction}
|
||||
|
||||
Slightly more than ten years have passed since John Ousterhout
|
||||
introduced the Tcl scripting language at the 1990 USENIX technical
|
||||
conference \cite{ousterhout}. Since then, scripting languages have
|
||||
been gaining in popularity as evidenced by the wide-spread use of
|
||||
systems such as Tcl, Perl, Python, Guile, PHP, and Ruby
|
||||
\cite{ousterhout,perl,python,guile,php,ruby}.
|
||||
|
||||
In part, the success of modern scripting languages is due to their
|
||||
ability to be easily integrated with software written in compiled
|
||||
languages such as C, C++, and Fortran. In addition, a wide variety of wrapper
|
||||
generation tools can be used
|
||||
to automatically produce bindings between existing code and a
|
||||
variety of scripting language environments
|
||||
\cite{swig,sip,pyfort,f2py,advperl,heidrich,vtk,gwrap,wrappy}. As a result, a large number of
|
||||
programmers are using scripting languages to control
|
||||
complex C/C++ programs or as a tool for re-engineering legacy
|
||||
software. This approach is attractive because it allows programmers
|
||||
to benefit from the flexibility and rapid development of
|
||||
scripting while retaining the best features of compiled code such as high
|
||||
performance \cite{ouster1}.
|
||||
|
||||
A critical aspect of scripting-compiled code integration is the way in
|
||||
which it departs from traditional C/C++ development. Rather than
|
||||
building large monolithic stand-alone applications, scripting
|
||||
languages strongly encourage the creation of modular software
|
||||
components. As a result, scripted software tends to be constructed as
|
||||
a mix of dynamically loadable libraries, scripts, and third-party
|
||||
extension modules. In this sense, one might argue that the benefits of
|
||||
scripting are achieved at the expense of creating a somewhat more
|
||||
complicated development environment.
|
||||
|
||||
A consequence of this complexity is an increased degree of difficulty
|
||||
associated with debugging programs that utilize multiple languages,
|
||||
dynamically loadable modules, and a sophisticated runtime environment.
|
||||
To address this problem, this paper describes an experimental system
|
||||
known as WAD (Wrapped Application Debugger) in which an embedded error
|
||||
recovery and debugging mechanism is added to common scripting
|
||||
languages. This system converts catastrophic signals such as
|
||||
segmentation faults and failed assertions to exceptions that can be
|
||||
handled by the scripting language interpreter. In doing so, it
|
||||
provides more seamless integration between error handling in
|
||||
scripting language interpreters and compiled extensions.
|
||||
|
||||
\section{The Debugging Problem}
|
||||
|
||||
Normally, a programming error in a scripted application
|
||||
results in an exception that describes the problem and the context in
|
||||
which it occurred. For example, an error in a Python script might
|
||||
produce a traceback similar to the following:
|
||||
|
||||
\begin{verbatim}
|
||||
% python foo.py
|
||||
Traceback (innermost last):
|
||||
File "foo.py", line 11, in ?
|
||||
foo()
|
||||
File "foo.py", line 8, in foo
|
||||
bar()
|
||||
File "foo.py", line 5, in bar
|
||||
spam()
|
||||
File "foo.py", line 2, in spam
|
||||
doh()
|
||||
NameError: doh
|
||||
\end{verbatim}
|
||||
|
||||
In this case, a programmer might be able to apply a fix simply based
|
||||
on information in the traceback. Alternatively, if the problem is
|
||||
more complicated, a script-level debugger can be used to provide more information. In contrast,
|
||||
a failure in compiled extension code might produce the following result:
|
||||
|
||||
\begin{verbatim}
|
||||
% python foo.py
|
||||
Segmentation Fault (core dumped)
|
||||
\end{verbatim}
|
||||
|
||||
In this case, the user has no idea of what has happened other
|
||||
than it appears to be ``very bad.'' Furthermore, script-level
|
||||
debuggers are unable to identify the problem since they also crash
|
||||
when the error occurs (they usually run in the same process as
|
||||
the interpreter). A user might be able to narrow the source of the
|
||||
problem through trial-and-error techniques such as inserting print
|
||||
statements or commenting out sections of script code. Unfortunately,
|
||||
neither of these techniques are very attractive for obvious reasons.
|
||||
|
||||
Alternatively, a user could run the application under the control of a
|
||||
traditional debugger such as gdb \cite{gdb}. Unfortunately, this also has
|
||||
drawbacks. First, even though the debugger provides information about the error,
|
||||
the debugger mostly provides information about the internal
|
||||
implementation of the scripting language interpreter. Needless
|
||||
to say, this isn't very useful nor does it provide much insight as to
|
||||
where the error might have occurred within a script. Second,
|
||||
the structure of a scripted application tends to be much more complex
|
||||
than a traditional stand-alone program. As a result, a user may not
|
||||
have a good sense of how to actually attach a C/C++ debugger to their
|
||||
script. In addition, execution may occur within a
|
||||
complex run-time environment involving events, threads, and network
|
||||
connections. Because of this, it can be difficult to reproduce
|
||||
and identify certain types of catastrophic errors (especially if they
|
||||
depend on timing or peculiar sequences of events). Finally, this approach
|
||||
assumes that a programmer has a C/C++ development environment installed on
|
||||
their machine and that they know how to use a low-level
|
||||
debugger. Unfortunately, neither of these assumptions may hold in practice.
|
||||
This is because scripting languages are often used to provide programmability to
|
||||
applications in which end-users might write scripts, yet would not be expected
|
||||
to write low-level C code.
|
||||
|
||||
Even if a traditional debugger such as gdb were modified to
|
||||
provide better integration with scripting languages, it is not clear
|
||||
that this would be the most natural solution to the problem.
|
||||
For one, the whole notion of having to run a separate debugging process to debug
|
||||
extension code is unnatural when no such requirement exists for
|
||||
a script. Furthermore, even if such a debugger existed, an inexperienced user may not
|
||||
have the expertise or inclination to use it. Finally,
|
||||
obscure fatal errors may occur long after an application has been deployed.
|
||||
Unless the debugger is distributed along with the application in some manner, it will be
|
||||
extraordinary difficult to obtain useful diagnostics when such errors occur.
|
||||
|
||||
\begin{figure*}[t]
|
||||
{\small
|
||||
\begin{verbatim}
|
||||
% python foo.py
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in ?
|
||||
File "foo.py", line 16, in ?
|
||||
foo()
|
||||
File "foo.py", line 13, in foo
|
||||
bar()
|
||||
File "foo.py", line 10, in bar
|
||||
spam()
|
||||
File "foo.py", line 7, in spam
|
||||
doh.doh(a,b,c)
|
||||
|
||||
SegFault: [ C stack trace ]
|
||||
|
||||
#2 0x00027774 in call_builtin(func=0x1c74f0,arg=0x1a1ccc,kw=0x0) in 'ceval.c', line 2650
|
||||
#1 0xff083544 in _wrap_doh(self=0x0,args=0x1a1ccc) in 'foo_wrap.c', line 745
|
||||
#0 0xfe7e0568 in doh(a=0x3,b=0x4,c=0x0) in 'foo.c', line 28
|
||||
|
||||
/u0/beazley/Projects/WAD/Python/foo.c, line 28
|
||||
|
||||
int doh(int a, int b, int *c) {
|
||||
=> *c = a + b;
|
||||
return *c;
|
||||
}
|
||||
\end{verbatim}
|
||||
}
|
||||
\caption{Cross language traceback generated for a segmentation fault in a Python extension}
|
||||
\end{figure*}
|
||||
|
||||
The easiest solution to the debugging problem is
|
||||
to simply add as much error checking as possible. Although this is never
|
||||
a bad thing to do, it's usually not enough to completely eliminate the problem.
|
||||
For one, scripting languages are sometimes used to control hundreds
|
||||
of thousands to millions of lines of compiled code. In this case, it is improbable
|
||||
that a programmer will be able to foresee every conceivable error.
|
||||
Second, scripting languages are often used to put new user interfaces on legacy software. In this
|
||||
case, scripting may introduce new modes of execution that cause a formerly ``bug-free''
|
||||
application to fail in an unexpected manner. Finally, certain types
|
||||
of errors such as floating-point exceptions can be particularly
|
||||
difficult to eliminate because they might be generated algorithmically (e.g.,
|
||||
as the result of a numerical method). Therefore, even when a programmer has worked hard to eliminate
|
||||
crashes, there is always a small probability that a complex application
|
||||
will fail.
|
||||
|
||||
\section{Embedded Error Recovery}
|
||||
|
||||
Rather than modifying an existing debugger to support scripting
|
||||
languages, an alternative approach is to add a more powerful error
|
||||
handling and recovery mechanism to the scripting language interpreter.
|
||||
This approach has been implemented in the form of an
|
||||
experimental system known as WAD. WAD
|
||||
is packaged as dynamically loadable shared library that can either be
|
||||
loaded as a scripting language extension or linked to existing
|
||||
extension modules as a library. The core of the system is generic and
|
||||
requires no modifications to the scripting interpreter or existing
|
||||
extension modules. Furthermore, the system does not introduce a performance penalty as it
|
||||
does not rely upon program instrumentation or tracing.
|
||||
|
||||
WAD works by converting fatal signals such as SIGSEGV,
|
||||
SIGBUS, SIGFPE, and SIGABRT into scripting language exceptions that contain
|
||||
debugging information collected from the call-stack of compiled
|
||||
extension code. By handling errors in this manner, the scripting
|
||||
language interpreter is able to produce a cross-language stack trace that
|
||||
contains information from both the script code and extension code as
|
||||
shown for Python and Tcl/Tk in Figures 1 and 2. In this case, the user
|
||||
is given a very clear idea of what has happened without having
|
||||
to launch a separate debugger.
|
||||
|
||||
The advantage to this approach is that it provides
|
||||
more seamless integration between error handling
|
||||
in scripts and error handling in extensions. In addition, it eliminates
|
||||
the most common debugging step that a developer is likely to perform
|
||||
in the event of a fatal error--running a separate debugger on a core
|
||||
file and typing 'where' to get a stack trace. Finally, this allows
|
||||
end-users to provide extension writers with useful debugging
|
||||
information since they can supply a stack trace as opposed to a vague
|
||||
complaint that the program ``crashed.''
|
||||
|
||||
\begin{figure*}[t]
|
||||
\begin{picture}(400,250)(0,0)
|
||||
\put(50,-110){\special{psfile = tcl.ps hscale = 60 vscale = 60}}
|
||||
\end{picture}
|
||||
\caption{Dialogue box with traceback information for a failed assertion in a Tcl/Tk extension}
|
||||
\end{figure*}
|
||||
|
||||
\section{Scripting Language Internals}
|
||||
|
||||
In order to provide embedded error recovery, it is critical to understand how
|
||||
scripting language interpreters interface with extension code. Despite the wide variety
|
||||
of scripting languages, essentially every implementation uses a similar
|
||||
technique for accessing foreign code.
|
||||
|
||||
The most widely used extension mechanism is a foreign function
|
||||
interface in which compiled procedures can be called from the scripting language
|
||||
interpreter. This is accomplished by writing a collection of wrapper functions that conform
|
||||
to a specified calling convention. The primary purpose of the wrappers are to
|
||||
marshal arguments and return values between the two languages and to handle errors.
|
||||
For example, in Tcl, every wrapper
|
||||
function must conform to the following prototype:
|
||||
|
||||
\begin{verbatim}
|
||||
int
|
||||
wrap_foo(ClientData clientData,
|
||||
Tcl_Interp *interp,
|
||||
int objc,
|
||||
Tcl_Obj *CONST objv[])
|
||||
{
|
||||
/* Convert arguments */
|
||||
...
|
||||
/* Call a function */
|
||||
|
||||
result = foo(args);
|
||||
|
||||
/* Set result */
|
||||
...
|
||||
if (success) {
|
||||
return TCL_OK;
|
||||
} else {
|
||||
return TCL_ERROR;
|
||||
}
|
||||
}
|
||||
\end{verbatim}
|
||||
|
||||
The other extension mechanism is an object/type interface that allows programmers to create new
|
||||
kinds of fundamental types or attach special properties to objects in
|
||||
the interpreter. This usually involves setting up tables of function
|
||||
pointers that define various properties of an object. For example, if
|
||||
you wanted to add complex numbers to an interpreter, you might fill in a special
|
||||
data structure with pointers to various methods like this:
|
||||
|
||||
\begin{verbatim}
|
||||
NumberMethods ComplexMethods {
|
||||
complex_add,
|
||||
complex_sub,
|
||||
complex_mul,
|
||||
complex_div,
|
||||
...
|
||||
};
|
||||
\end{verbatim}
|
||||
|
||||
\noindent
|
||||
Once registered with the interpreter, the methods in this structure
|
||||
would be invoked by various interpreter operators such as $+$,
|
||||
$-$, $*$, and $/$.
|
||||
|
||||
Most interpreters handle errors as a two-step process in which
|
||||
detailed error information is first registered with the interpreter
|
||||
and then a special error code is returned. For example, in Tcl, errors
|
||||
are handled by setting error information in the interpreter and
|
||||
returning a value of TCL\_ERROR. Similarly in Python, errors are
|
||||
handled by raising an exception and returning NULL. In both cases,
|
||||
this triggers the interpreter's error handler---possibly resulting in
|
||||
a stack trace of the running script. In some cases, an interpreter
|
||||
might handle errors using a form of the C {\tt longjmp} function.
|
||||
For example, Perl provides a special function {\tt die} that jumps back
|
||||
to the interpreter with a fatal error \cite{advperl}.
|
||||
|
||||
The precise implementation details of these mechanisms aren't so
|
||||
important for our discussion. The critical point is that scripting
|
||||
languages always access extension code though a well-defined interface
|
||||
that precisely defines how arguments are to be passed, values are to be
|
||||
returned, and errors are to be handled.
|
||||
|
||||
\section{Scripting Languages and Signals}
|
||||
|
||||
Under normal circumstances, errors in extension code are handled
|
||||
through the error-handling API provided by the scripting language
|
||||
interpreter. For example, if an invalid function parameter is passed,
|
||||
a program can simply set an error message and return to the
|
||||
interpreter. Similarly, automatic wrapper generators such as SWIG can produce
|
||||
code to convert C++ exceptions and other C-related error handling
|
||||
schemes to scripting language errors \cite{swigexcept}. On the other
|
||||
hand, segmentation faults, failed assertions, and similar problems
|
||||
produce signals that cause the interpreter to crash.
|
||||
|
||||
Most scripting languages provide limited support for Unix signal
|
||||
handling \cite{stevens}. However, this support is not sufficiently advanced to
|
||||
recover from fatal signals produced by extension code.
|
||||
First, unlike signals generated for asynchronous events such as I/O,
|
||||
execution can {\em not} be resumed at the point of a fatal signal.
|
||||
Therefore, even if such a signal could be caught and handled by a script,
|
||||
there isn't much that it can do except to print a diagnostic
|
||||
message and abort before the signal handler returns. Second,
|
||||
some interpreters block signal delivery while executing
|
||||
extension code--opting to handle signals at a time when it is more convenient.
|
||||
In this case, a signal such as SIGSEGV would simply cause the whole application
|
||||
to freeze since there is no way for execution to continue to a point where
|
||||
the signal could be delivered. Because of these issues, scripting languages
|
||||
either ignore the problem or label it as an ``limitation.''
|
||||
|
||||
\section{Overview of WAD}
|
||||
|
||||
WAD installs a reliable signal handler for
|
||||
SIGSEGV, SIGBUS, SIGABRT, SIGILL, and SIGFPE using {\tt sigaction}
|
||||
\cite{stevens}. Since none of these signals are normally used in the implementation
|
||||
of the scripting interpreter or by any user scripts, this typically does not override any previous
|
||||
signal handling. Afterwards, when one of these signals occurs, a two-phase
|
||||
recovery process executes. First,
|
||||
information is collected about the execution context including a
|
||||
full stack-trace, symbol table entries, and debugging information.
|
||||
Second, the current stream of execution is aborted and an error is
|
||||
returned to the interpreter. This process is illustrated in Figure~3.
|
||||
|
||||
The collection of context and debugging information is a relatively
|
||||
straightforward process involving the following steps:
|
||||
|
||||
\begin{itemize}
|
||||
\item The program counter and stack pointer are obtained from
|
||||
context information passed to the WAD signal handler.
|
||||
|
||||
\item The virtual memory map of the process is obtained from /proc
|
||||
and used to associate virtual memory addresses with executable files,
|
||||
shared libraries, and dynamically loaded extension modules \cite{proc}.
|
||||
|
||||
\item The call stack is unwound to collect traceback information.
|
||||
each step of the stack traceback, symbol table and debugging
|
||||
information is gathered and stored in a generic data structure for later use
|
||||
in the recovery process. This data is obtained by memory-mapping
|
||||
the ELF format object files associated with the process and extracting
|
||||
symbol table and stabs debugging information \cite{elf,stabs}.
|
||||
\end{itemize}
|
||||
|
||||
Once debugging information has been collected, the signal handler
|
||||
enters an error-recovery phase that
|
||||
attempts to raise an exception and return to a suitable location in the
|
||||
interpreter. To do this, the following steps are performed:
|
||||
|
||||
\begin{itemize}
|
||||
|
||||
\item The stack trace is examined to see if there are any locations to which
|
||||
control can be returned.
|
||||
|
||||
\item If a suitable return location is found, the CPU context is modified in
|
||||
a manner that makes the signal handler return to the interpreter
|
||||
with an error. This return process is assisted by a small
|
||||
trampoline function (partially written in assembly language) that arranges a proper
|
||||
return to the interpreter after the signal handler returns.
|
||||
\end{itemize}
|
||||
|
||||
\noindent
|
||||
Of the two phases, the return to the interpreter is of greater interest. Therefore, it
|
||||
is now described in greater detail.
|
||||
|
||||
\begin{figure*}[t]
|
||||
\begin{picture}(480,340)(5,60)
|
||||
|
||||
\put(50,330){\framebox(200,70){}}
|
||||
\put(60,388){\tt >>> {\bf foo()}}
|
||||
\put(60,376){\tt Traceback (most recent call last):}
|
||||
\put(70,364){\tt File "<stdin>", line 1, in ?}
|
||||
\put(60,352){\tt SegFault: [ C stack trace ]}
|
||||
\put(60,340){\tt ...}
|
||||
|
||||
\put(55,392){\line(-1,0){25}}
|
||||
\put(30,392){\line(0,-1){80}}
|
||||
\put(30,312){\line(1,0){95}}
|
||||
\put(125,312){\vector(0,-1){10}}
|
||||
\put(175,302){\line(0,1){10}}
|
||||
\put(175,312){\line(1,0){95}}
|
||||
\put(270,312){\line(0,1){65}}
|
||||
\put(270,377){\vector(-1,0){30}}
|
||||
|
||||
\put(50,285){\framebox(200,15)[c]{[Python internals]}}
|
||||
\put(125,285){\vector(0,-1){10}}
|
||||
\put(175,275){\vector(0,1){10}}
|
||||
\put(50,260){\framebox(200,15)[c]{call\_builtin()}}
|
||||
\put(125,260){\vector(0,-1){10}}
|
||||
%\put(175,250){\vector(0,1){10}}
|
||||
\put(50,235){\framebox(200,15)[c]{wrap\_foo()}}
|
||||
\put(125,235){\vector(0,-1){10}}
|
||||
\put(50,210){\framebox(200,15)[c]{foo()}}
|
||||
\put(125,210){\vector(0,-1){10}}
|
||||
\put(50,185){\framebox(200,15)[c]{doh()}}
|
||||
\put(125,185){\vector(0,-1){20}}
|
||||
\put(110,148){SIGSEGV}
|
||||
\put(160,152){\vector(1,0){100}}
|
||||
\put(260,70){\framebox(200,100){}}
|
||||
\put(310,155){WAD signal handler}
|
||||
\put(265,140){1. Unwind C stack}
|
||||
\put(265,125){2. Gather symbols and debugging info}
|
||||
\put(265,110){3. Find safe return location}
|
||||
\put(265,95){4. Raise Python exception}
|
||||
\put(265,80){5. Modify CPU context and return}
|
||||
|
||||
\put(260,185){\framebox(200,15)[c]{return assist}}
|
||||
\put(365,174){Return from signal}
|
||||
\put(360,170){\vector(0,1){15}}
|
||||
\put(360,200){\line(0,1){65}}
|
||||
|
||||
%\put(360,70){\line(0,-1){10}}
|
||||
%\put(360,60){\line(1,0){110}}
|
||||
%\put(470,60){\line(0,1){130}}
|
||||
%\put(470,190){\vector(-1,0){10}}
|
||||
|
||||
\put(360,265){\vector(-1,0){105}}
|
||||
\put(255,250){NULL}
|
||||
\put(255,270){Return to interpreter}
|
||||
|
||||
\end{picture}
|
||||
|
||||
\caption{Control Flow of the Error Recovery Mechanism for Python}
|
||||
\label{wad}
|
||||
\end{figure*}
|
||||
|
||||
\section{Returning to the Interpreter}
|
||||
|
||||
To return to the interpreter, WAD maintains a table of symbolic names
|
||||
and return values that correspond to locations within the interpreter responsible for invoking
|
||||
wrapper functions and object/type methods. For example, Table 1 shows a partial list of
|
||||
return locations used in the Python implementation. When an error
|
||||
occurs, the call stack is scanned for the first occurrence of any
|
||||
symbol in this table. If a match is found, control is returned to that location
|
||||
by emulating the return of a wrapper function with the error code from the table. If
|
||||
no match is found, the error handler simply prints a stack trace to
|
||||
standard output and aborts.
|
||||
|
||||
When a symbolic match is found, WAD invokes a special user-defined
|
||||
handler function that is written for a specific scripting language.
|
||||
The primary role of this handler is to take debugging information
|
||||
gathered from the call stack and generate an appropriate scripting language error.
|
||||
One peculiar problem of this step is that the generation
|
||||
of an error may require the use of parameters passed to a
|
||||
wrapper function. For example, in the Tcl wrapper shown earlier, one
|
||||
of the arguments was an object of type ``{\tt Tcl\_Interp *}''.
|
||||
This object contains information specific to the state of the
|
||||
interpreter (and multiple interpreter objects may exist in a single
|
||||
application). Unfortunately, no reference to the interpreter object is
|
||||
available in the signal handler. Furthermore, the interpreter
|
||||
object may not be available in the context of a function that generated the error.
|
||||
|
||||
|
||||
\begin{table}[t]
|
||||
\begin{center}
|
||||
\begin{tabular}{ll}
|
||||
Python symbol & Return value \\ \hline
|
||||
call\_builtin & NULL \\
|
||||
PyObject\_Print & -1 \\
|
||||
PyObject\_CallFunction & NULL \\
|
||||
PyObject\_CallMethod & NULL \\
|
||||
PyObject\_CallObject & NULL \\
|
||||
PyObject\_Cmp & -1 \\
|
||||
PyObject\_DelAttrString & -1 \\
|
||||
PyObject\_DelItem & -1 \\
|
||||
PyObject\_GetAttrString & NULL \\
|
||||
\end{tabular}
|
||||
\end{center}
|
||||
\label{returnpoints}
|
||||
\caption{A partial list of symbolic return locations in the Python interpreter}
|
||||
\end{table}
|
||||
|
||||
To work around this problem, WAD implements a feature
|
||||
known as argument stealing. When examining the call-stack, the signal
|
||||
handler has full access to all function arguments and local variables.
|
||||
Therefore, if the handler knows that an error was generated while
|
||||
calling a wrapper function (as determined by looking at the symbol names),
|
||||
it can grab the interpreter object from the stack frame of the wrapper and
|
||||
use it to set an appropriate error code before returning to the interpreter.
|
||||
Currently, this is managed by allowing the signal handler to steal
|
||||
arguments from the caller using positional information.
|
||||
For example, to grab the {\tt Tcl\_Interp *} object from a Tcl wrapper function,
|
||||
code similar to the following is written:
|
||||
|
||||
\begin{verbatim}
|
||||
Tcl_Interp *interp;
|
||||
int err;
|
||||
|
||||
interp = (Tcl_Interp *) wad_steal_outarg(
|
||||
stack,
|
||||
"TclExecuteByteCode",
|
||||
1,
|
||||
&err);
|
||||
if (!err) {
|
||||
Tcl_SetResult(interp,errtype,TCL_STATIC);
|
||||
Tcl_AddErrorInfo(interp,errdetails);
|
||||
}
|
||||
\end{verbatim}
|
||||
|
||||
In this case, the 2nd argument passed to a wrapper function
|
||||
is stolen and used to generate an error. Also, the name {\tt TclExecuteByteCode}
|
||||
refers to the calling function, not the wrapper function itself.
|
||||
At this time, argument stealing is only applicable to simple types
|
||||
such as integers and pointers. However, this is adequate for generating
|
||||
scripting language errors.
|
||||
|
||||
\section{Register Management}
|
||||
|
||||
A final issue concerning the return mechanism has to do with the
|
||||
precise behavior of the non-local return to the interpreter. Roughly
|
||||
speaking, this emulates the behavior of the C {\tt longjmp}
|
||||
library call. However, this is done without the use of a matching
|
||||
{\tt setjmp} in the interpreter.
|
||||
|
||||
The primary problem with aborting execution and returning to the
|
||||
interpreter in this manner is that most compilers use a register management technique
|
||||
known as callee-save \cite{prag}. In this case, it is the responsibility of
|
||||
the called function to save the state of the registers and to restore
|
||||
them before returning to the caller. By making a non-local jump,
|
||||
registers may be left in an inconsistent state due to the fact that
|
||||
they are not restored to their original values. The {\tt longjmp} function
|
||||
in the C library avoids this problem by relying upon {\tt setjmp} to save
|
||||
the registers. Unfortunately, WAD does not have this
|
||||
luxury. As a result, a return from the signal handler may produce a
|
||||
corrupted set of registers at the point of return in the interpreter.
|
||||
|
||||
The severity of this problem depends greatly on the architecture and
|
||||
compiler. For example, on the SPARC, register windows effectively
|
||||
solve the callee-save problem \cite{sparc}. In this case, each stack frame has its own
|
||||
register window and the windows are flushed to the stack whenever a
|
||||
signal occurs. Therefore, the recovery mechanism can examine the stack and
|
||||
arrange to restore the registers to their proper values when control
|
||||
is returned. Furthermore, certain conventions of the SPARC ABI resolve several related
|
||||
issues. For example, floating point registers are caller-saved
|
||||
and the contents of the SPARC global registers are not guaranteed to be preserved
|
||||
across procedure calls (in fact, they are not even saved by {\tt setjmp}).
|
||||
|
||||
On other platforms, the problem of register management becomes much
|
||||
more interesting. One approach is to simply ignore the problem
|
||||
altogether and return to the interpreter with the registers in an
|
||||
essentially random state. Surprisingly, this approach actually seems to work (although a considerable degree of
|
||||
caution might be in order).
|
||||
This is because the return of an error code tends to trigger
|
||||
a cascade of procedure returns within the implementation of the interpreter.
|
||||
As a result, the values of the registers are simply discarded and
|
||||
overwritten with restored values as the interpreter unwinds itself and prepares to handle an
|
||||
exception. A better solution to this problem is to modify the recovery mechanism to discover and
|
||||
restore saved registers from the stack. Unfortunately, there is
|
||||
no standardized way to know exactly where the registers might have been saved.
|
||||
Therefore, a heuristic scheme that examines the machine code for each procedure would
|
||||
have to be used to try and identify stack locations. This approach is used by gdb
|
||||
and other debuggers when they allow users to inspect register values
|
||||
within arbitrary stack frames \cite{gdb}. However, this technique has
|
||||
not yet been implemented in WAD due to its obvious implementation difficulty and the
|
||||
fact that the WAD prototype has primarily been developed for the SPARC.
|
||||
|
||||
As a fall-back, WAD can be configured to return control to a location
|
||||
previously specified with {\tt setjmp}. Unfortunately, this either
|
||||
requires modifications to the interpreter or its extension modules.
|
||||
Although this kind of instrumentation can be facilitated by automatic
|
||||
wrapper code generators, it is not a preferred solution and is
|
||||
not discussed further.
|
||||
|
||||
\section{Implementation Details}
|
||||
|
||||
Currently, WAD is implemented in ANSI C and small amount of assembly
|
||||
code to assist in the return to the interpreter. The current
|
||||
implementation supports Python, Tcl, and Perl extensions on SPARC Solaris. An
|
||||
i386-Linux port has also been developed. The entire implementation contains
|
||||
approximately 1500 semicolons and most of this code is related to the gathering of debugging
|
||||
information. Furthermore, due to the hostile environment in which the
|
||||
recovery process must run, the implementation takes great care not to utilize the
|
||||
process heap. This allows the signal handler to collect information in situations
|
||||
where the heap allocator has been corrupted or destroyed in some manner.
|
||||
|
||||
Although there are libraries such as the GNU Binary File Descriptor
|
||||
(BFD) library that can assist with the manipulation of object files
|
||||
these are not used in the implementation \cite{bfd}. First, these
|
||||
libraries tend to be quite large and are oriented more towards
|
||||
stand-alone tools such as debuggers, linkers, and loaders. Second,
|
||||
the behavior of these libraries with respect to memory management
|
||||
would need to be carefully studied before they could be safely used in
|
||||
an embedded environment. Finally, given the small size of the
|
||||
implementation, it didn't seem necessary to rely upon such a
|
||||
heavyweight solution.
|
||||
|
||||
\section{Discussion}
|
||||
|
||||
The primary goal of embedded error recovery is to provide an
|
||||
alternative approach for debugging scripting language extensions.
|
||||
Although this approach has many benefits, there are a number
|
||||
drawbacks and issues that must be discussed.
|
||||
|
||||
First, like the C {\tt longjmp} function, the error recovery mechanism
|
||||
does not cleanly unwind the call stack. For C++, this means that
|
||||
objects allocated on stack will not be finalized (destructors will not
|
||||
be invoked) and that memory allocated on the heap may be
|
||||
leaked. Similarly, this could result in open files, sockets, and other
|
||||
system resources. Furthermore, in a multi-threaded environment,
|
||||
deadlock may occur if a procedure holds a lock when an error occurs.
|
||||
|
||||
Second, the use of signals may interact adversely with both scripting
|
||||
language signal handling and signal handling in thread libraries.
|
||||
Since scripting languages ordinarily do not catch signals such as
|
||||
SIGSEGV, SIGBUS, and SIGABRT, the use of WAD is unlikely to conflict
|
||||
with any existing signal handling. However, this does not prevent a
|
||||
module from overriding the error recovery mechanism with its own
|
||||
signal handler. Threads present a different sort of signal handling problem
|
||||
due to the fact that thread libraries tend to override default signal handling \cite{thread}.
|
||||
In this case, the thread library directs fatal signals to the thread in which the problem occurred.
|
||||
However, first-hand experience has shown that certain implementations
|
||||
of user threads do not reliably pass signal context information nor do
|
||||
they universally support advanced signal operations such as {\tt
|
||||
sigaltstack}. Because of this, the WAD recovery mechanism may not be
|
||||
compatible with a crippled implementation of user threads on certain
|
||||
platforms. To further complicate matters, the recovery process itself is
|
||||
not thread-safe (i.e., it is not possible to concurrently handle fatal errors
|
||||
occurring different threads).
|
||||
|
||||
Third, certain types of errors may result in an unrecoverable crash.
|
||||
For example, if an application overwrites the heap, it may destroy
|
||||
critical data structures within the interpreter.
|
||||
Similarly,
|
||||
destruction of the call stack (via buffer overflow) makes it
|
||||
impossible for the recovery mechanism to create a stack-trace and
|
||||
return to the interpreter. Although it might be possible to add a heuristic scheme for
|
||||
recovering a partial stack trace such as backward stack tracing, no such feature has been implemented
|
||||
\cite{debug}. Finally, memory management problems such as
|
||||
double-freeing of heap allocated memory can cause a system to fail in
|
||||
a way that bears little resemblance to the actual source of the
|
||||
problem.
|
||||
|
||||
Finally, there are a number of issues that pertain
|
||||
to the interaction of the recovery mechanism with the interpreter.
|
||||
First, the recovery scheme is unable to return to procedures
|
||||
that might invoke wrapper functions with conflicting return codes.
|
||||
This problem manifests itself when the interpreter's virtual
|
||||
machine is built around a large {\tt switch} statement from which different
|
||||
types of wrapper functions are called. For example, in Python, certain
|
||||
internal procedures call a mix of functions where both NULL and -1 are
|
||||
returned to indicate errors (depending on the function). In this case, there
|
||||
is no way for WAD to easily determine which return value to use. Second,
|
||||
the recovery process is extremely inefficient. This is because the
|
||||
data collection process relies heavily upon {\tt mmap}, file I/O, and linear search
|
||||
algorithms for finding symbols and debugging information. Therefore, it would
|
||||
probably not be suitable as a general purpose exception handling mechanism.
|
||||
Finally, even when an error is successfully returned to the interpreter
|
||||
and presented to the user, it may not be possible to resume execution of
|
||||
the application (e.g., even though the interpreter is operational, the extension
|
||||
module may be corrupted in some manner).
|
||||
|
||||
Despite these limitations, embedded error recovery is applicable to a
|
||||
wide range of extension-related errors. This is because errors such as
|
||||
failed assertions, bus errors, and floating point exceptions rarely
|
||||
result in a situation where the recovery process would be unable to run or the
|
||||
interpreter would crash. Furthermore, more serious errors such as segmentation faults are more
|
||||
likely to caused by an uninitialized pointer than a blatant
|
||||
destruction of the heap or stack.
|
||||
|
||||
\section{Related Work}
|
||||
|
||||
A huge body of literature is devoted to the topic of exception
|
||||
handling in various languages and systems. Furthermore, the topic
|
||||
remains one of active interest in the software community. For
|
||||
instance, IEEE Transactions on Software Engineering recently devoted
|
||||
two entire issues to current trends in exception handling
|
||||
\cite{except1,except2}. Unfortunately, very little of this work seems
|
||||
to be directly related to mixed compiled-interpreted exception
|
||||
handling, recovery from fatal signals, and problems pertaining to
|
||||
mixed-language debugging.
|
||||
|
||||
Perhaps the most directly relevant work is that of advanced programming
|
||||
environments for Common Lisp \cite{lisp}. Not only does CL have a foreign function interface,
|
||||
debuggers such as gdb have previously been modified to walk the Lisp stack
|
||||
\cite{ffi,wcl}. Furthermore, certain Lisp development environments have
|
||||
provided a high degree of integration between compiled code and
|
||||
the Lisp interpreter\footnote{Note to program committee: I
|
||||
have been unable to find a suitable reference describing this capability. However,
|
||||
discussions with Richard Gabriel and other people in the Lisp community seem to indicate that
|
||||
such work has been done. Please advise.}
|
||||
|
||||
In certain cases, a scripting language module has been used to provide
|
||||
partial information for fatal signals. For example, the Perl {\tt
|
||||
sigtrap} module can be used to produce a Perl stack trace when a
|
||||
problem occurs \cite{perl}. Unfortunately, this module does not
|
||||
provide any information from the C stack. Similarly, advanced software development
|
||||
environments such as Microsoft's Visual Studio can automatically launch a C/C++
|
||||
debugger when an error occurs. Unfortunately, this doesn't provide any information
|
||||
about the script that was running.
|
||||
|
||||
In the area of programming languages, a number of efforts have been made to
|
||||
map signals to exceptions in the form of asynchronous exception handling
|
||||
\cite{buhr,ml,haskell}. Unfortunately, this work tends to
|
||||
concentrate on the problem of handling asynchronous signals related to I/O as opposed
|
||||
to synchronously generated signals caused by software faults.
|
||||
|
||||
With respect to debugging, little work appears to have been done in the area of
|
||||
mixed compiled-interpreted debugging. Although modern debuggers
|
||||
certainly try to provide advanced capabilities for debugging within a
|
||||
single language, they tend to ignore the boundary between languages.
|
||||
As previously mentioned, debuggers have occasionally been modified to
|
||||
support other languages such as Common Lisp \cite{wcl}. However, no such work appears
|
||||
to have been done in the context of modern scripting languages. One system of possible interest
|
||||
in the context of mixed compiled-interpreted debugging is the R$^{n}$
|
||||
system developed at Rice University in the mid-1980's \cite{carle}. This
|
||||
system, primarily developed for scientific computing, allowed control
|
||||
to transparently pass between compiled code and an interpreter.
|
||||
Furthermore, the system allowed dynamic patching of an executable in
|
||||
which compiled procedures could be replaced by an interpreted
|
||||
replacement. Although this system does not directly pertain to the problem of
|
||||
debugging of scripting language extensions, it is one of the few
|
||||
examples of a system in which compiled and interpreted code have been
|
||||
tightly integrated within a debugger.
|
||||
|
||||
\section{Future Directions}
|
||||
|
||||
As of this writing, WAD is only an experimental prototype. Because of
|
||||
this, there are certainly a wide variety of incremental improvements
|
||||
that could be made to support additional platforms and scripting
|
||||
languages. In addition, there are a variety of improvements that could be made
|
||||
to provide better integration with threads and C++.
|
||||
|
||||
A more interesting extension of this work would be to expose a broader
|
||||
range of debugging capabilities to the scripting interpreter. For example,
|
||||
rather than simply raising an exception with limited diagnostic
|
||||
information, the recovery mechanism might be able to provide the
|
||||
interpreter with a detailed snapshot of the entire call stack
|
||||
including symbolic debugging information. Using this information, it
|
||||
might be possible to implement an interactive post-mortem debugger
|
||||
that allows a programmer to inspect the values of local
|
||||
variables and other aspects of the application without leaving the
|
||||
interpreter. Alternatively, it may be possible to integrate this information
|
||||
into an existing script-level debugger.
|
||||
|
||||
\section{Conclusions and Availability}
|
||||
|
||||
This paper has presented a mechanism by which fatal errors such as
|
||||
segmentation faults and failed assertions can be handled as scripting
|
||||
language exceptions. This approach, which relies upon advanced
|
||||
features of Unix signal handling, allows fatal signals to be caught
|
||||
and transformed into errors from which interpreters can produce an
|
||||
informative cross-language stack trace. In doing so, it provides more
|
||||
seamless integration between scripting languages and compiled
|
||||
extensions. Furthermore, this has the potential to greatly simplify the
|
||||
frustrating task of debugging complicated mixed scripted-compiled
|
||||
software.
|
||||
|
||||
The prototype implementation of this system is available at :
|
||||
|
||||
\begin{center}
|
||||
{\tt http://systems.cs.uchicago.edu/wad}.
|
||||
\end{center}
|
||||
|
||||
\noindent
|
||||
Currently, WAD supports Python,
|
||||
Tcl, and Perl on SPARC Solaris and i386-Linux systems. Work to
|
||||
support additional scripting languages and platforms is ongoing.
|
||||
|
||||
\section{Acknowledgments}
|
||||
|
||||
Richard Gabriel and Harlan Sexton provided interesting insights concerning similar capabilities
|
||||
in Common Lisp.
|
||||
|
||||
\begin{thebibliography}{99}
|
||||
|
||||
|
||||
\bibitem{ousterhout} J. K. Ousterhout, {\em Tcl: An Embedable Command Language},
|
||||
Proceedings of the USENIX Association Winter Conference, 1990.
|
||||
|
||||
\bibitem{ouster1} J. K. Ousterhout, {\em Scripting: Higher-Level Programming for the 21st Century},
|
||||
IEEE Computer, Vol 31, No. 3, p. 23-30, 1998.
|
||||
|
||||
\bibitem{perl} L. Wall, T. Christiansen, and R. Schwartz, {\em Programming Perl}, 2nd. Ed.
|
||||
O'Reilly \& Associates, 1996.
|
||||
|
||||
\bibitem{python} M. Lutz, {\em Programming Python}, O'Reilly \& Associates, 1996.
|
||||
|
||||
\bibitem{guile} Thomas Lord, {\em An Anatomy of Guile, The Interface to
|
||||
Tcl/Tk}, USENIX 3rd Annual Tcl/Tk Workshop 1995.
|
||||
|
||||
\bibitem{php} T. Ratschiller and T. Gerken, {\em Web Application Development with PHP 4.0},
|
||||
New Riders, 2000.
|
||||
|
||||
\bibitem{ruby} D. Thomas, A. Hunt, {\em Programming Ruby}, Addison-Wesley, 2001.
|
||||
|
||||
\bibitem{swig} D.M. Beazley, {\em SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++}, Proceedings of the 4th USENIX Tcl/Tk Workshop, p. 129-139, July 1996.
|
||||
|
||||
\bibitem{sip} P. Thompson, {\em SIP},\\
|
||||
{\tt http://www.thekompany.com/projects/pykde}.
|
||||
|
||||
\bibitem{pyfort} P.~F.~Dubois, {\em Climate Data Analysis Software}, 8th International Python Conference,
|
||||
Arlington, VA., 2000.
|
||||
|
||||
\bibitem{f2py} P. Peterson, J. Martins, and J. Alonso,
|
||||
{\em Fortran to Python Interface Generator with an application to Aerospace
|
||||
Engineering}, 9th International Python Conference, submitted, 2000.
|
||||
|
||||
\bibitem{advperl} S. Srinivasan, {\em Advanced Perl Programming}, O'Reilly \& Associates, 1997.
|
||||
|
||||
\bibitem{heidrich} Wolfgang Heidrich and Philipp Slusallek, {\em Automatic Generation of Tcl Bindings for C and C++ Libraries.},
|
||||
USENIX 3rd Tcl/Tk Workshop, 1995.
|
||||
|
||||
\bibitem{vtk} K. Martin, {\em Automated Wrapping of a C++ Class Library into Tcl},
|
||||
USENIX 4th Tcl/Tk Workshop, p. 141-148, 1996.
|
||||
|
||||
\bibitem{gwrap} C. Lee, {\em G-Wrap: A tool for exporting C libraries into Scheme Interpreters},\\
|
||||
{\tt http://www.cs.cmu.edu/\~{ }chrislee/
|
||||
Software/g-wrap}.
|
||||
|
||||
\bibitem{wrappy} G. Couch, C. Huang, and T. Ferrin, {\em Wrappy :A Python Wrapper
|
||||
Generator for C++ Classes}, O'Reilly Open Source Software Convention, 1999.
|
||||
|
||||
\bibitem{gdb} R. Stallman and R. Pesch, {\em Using GDB: A Guide to the GNU Source-Level Debugger}.
|
||||
Free Software Foundation and Cygnus Support, Cambridge, MA, 1991.
|
||||
|
||||
\bibitem{swigexcept} D.M. Beazley and P.S. Lomdahl, {\em Feeding a
|
||||
Large-scale Physics Application to Python}, 6th International Python
|
||||
Conference, co-sponsored by USENIX, p. 21-28, 1997.
|
||||
|
||||
\bibitem{stevens} W. Richard Stevens, {\em UNIX Network Programming: Interprocess Communication, Volume 2}. PTR
|
||||
Prentice-Hall, 1998.
|
||||
|
||||
\bibitem{proc} R. Faulkner and R. Gomes, {\em The Process File System and Process Model in UNIX System V}, USENIX Conference Proceedings,
|
||||
January 1991.
|
||||
|
||||
\bibitem{elf} J.~R.~Levine, {\em Linkers \& Loaders.} Morgan Kaufmann Publishers, 2000.
|
||||
|
||||
\bibitem{stabs} Free Software Foundation, {\em The "stabs" debugging format}. GNU info document.
|
||||
|
||||
\bibitem{prag} M.L. Scott. {\em Programming Language Pragmatics}, Morgan Kaufmann Publishers, 2000.
|
||||
|
||||
\bibitem{sparc} D. Weaver and T. Germond, {\em SPARC Architecture Manual Version 9},
|
||||
Prentice-Hall, 1993.
|
||||
|
||||
\bibitem{bfd} S. Chamberlain. {\em libbfd: The Binary File Descriptor Library}. Cygnus Support, bfd version 3.0 edition, April 1991.
|
||||
|
||||
\bibitem{thread} F. Mueller, {\em A Library Implementation of POSIX Threads Under Unix},
|
||||
USENIX Winter Technical Conference, San Diego, CA., p. 29-42, 1993.
|
||||
|
||||
\bibitem{debug} J. B. Rosenberg, {\em How Debuggers Work: Algorithms, Data Structures, and
|
||||
Architecture}, John Wiley \& Sons, 1996.
|
||||
|
||||
\bibitem{except1} D.E. Perry, A. Romanovsky, and A. Tripathi, {\em
|
||||
Current Trends in Exception Handling-Part I},
|
||||
IEEE Transactions on Software Engineering, Vol 26, No. 9, p. 817-819, 2000.
|
||||
|
||||
\bibitem{except2} D.E. Perry, A. Romanovsky, and A. Tripathi, {\em
|
||||
Current Trends in Exception Handling-Part II},
|
||||
IEEE Transactions on Software Engineering, Vol 26, No. 10, p. 921-922, 2000.
|
||||
|
||||
|
||||
\bibitem{lisp} G.L. Steele Jr., {\em Common Lisp: The Language, Second Edition}, Digital Press,
|
||||
Bedford, MA. 1990.
|
||||
|
||||
\bibitem{ffi} H. Sexton, {\em Foreign Functions and Common Lisp}, in Lisp Pointers, Vol 1, No. 5, 1988.
|
||||
|
||||
\bibitem{wcl} W. Henessey, {\em WCL: Delivering Efficient Common Lisp Applications Under Unix},
|
||||
ACM Conference on Lisp and Functional Languages, p. 260-269, 1992.
|
||||
|
||||
\bibitem{buhr} P.A. Buhr and W.Y.R. Mok, {\em Advanced Exception Handling Mechanisms}, IEEE Transactions on Software Engineering,
|
||||
Vol. 26, No. 9, p. 820-836, 2000.
|
||||
|
||||
\bibitem{haskell} S. Marlow, S. P. Jones, and A. Moran. {\em
|
||||
Asynchronous Exceptions in Haskell.} In 4th International Workshop on
|
||||
High-Level Concurrent Languages, September 2000.
|
||||
|
||||
\bibitem{ml} J. H. Reppy, {\em Asynchronous Signals in Standard ML}. Technical Report TR90-1144,
|
||||
Cornell University, Computer Science Department, 1990.
|
||||
|
||||
\bibitem{carle} A. Carle, D. Cooper, R. Hood, K. Kennedy, L. Torczon, S. Warren,
|
||||
{\em A Practical Environment for Scientific Programming.}
|
||||
IEEE Computer, Vol 20, No. 11, p. 75-89, 1987.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
\end{thebibliography}
|
||||
|
||||
\end{document}
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Loading…
Add table
Add a link
Reference in a new issue