diff --git a/Tools/WAD/Papers/README b/Tools/WAD/Papers/README new file mode 100644 index 000000000..8cac04da5 --- /dev/null +++ b/Tools/WAD/Papers/README @@ -0,0 +1,8 @@ +This directory contains papers and information about WAD. + +python.html - WAD paper from Python9. +usenix2001.tex - USENIX 2001 Technical conference submission. + This paper was accepted, but the text has not yet + been updated to final copy. +WADTalk.pdf - Slides from the WAD Talk at Python9. + diff --git a/Tools/WAD/Papers/WADTalk.pdf b/Tools/WAD/Papers/WADTalk.pdf new file mode 100644 index 000000000..8ca120d93 Binary files /dev/null and b/Tools/WAD/Papers/WADTalk.pdf differ diff --git a/Tools/WAD/Papers/fig1.png b/Tools/WAD/Papers/fig1.png new file mode 100644 index 000000000..dbbfb1dd9 Binary files /dev/null and b/Tools/WAD/Papers/fig1.png differ diff --git a/Tools/WAD/Papers/python.html b/Tools/WAD/Papers/python.html new file mode 100644 index 000000000..90282bcea --- /dev/null +++ b/Tools/WAD/Papers/python.html @@ -0,0 +1,860 @@ + + +WAD: A Module for Converting Fatal Extension Errors into Python Exceptions + + +
+ +

WAD: A Module for Converting Fatal Extension Errors into Python Exceptions

+
David M. Beazley
+Department of Computer Science
+University of Chicago
+Chicago, IL 60637
+beazley@cs.uchicago.edu
+
+
+ +

Abstract

+ +One of the more popular uses of Python is as an extension language for +applications written in compiled languages such as C, C++, and +Fortran. Unfortunately, one of the biggest drawbacks of this approach +is the lack of a useful debugging and error handling facility for +identifying problems in extension code. In part, this limitation is +due to the fact that Python does not know anything about the internal +implementation of an extension module. A more difficult problem is +that compiled extensions sometimes fail with catastrophic errors such +as memory access violations, failed assertions, and floating point +exceptions. These types of errors fall outside the realm of normal +Python exception handling and are particularly difficult to identify +and debug. Although traditional debuggers can find the location of a +fatal error, they are unable to report the context in which such an +error has occurred with respect to a Python script. This paper describes +an experimental system that converts fatal extension errors +into Python exceptions. In particular, a dynamically +loadable module, WAD (Wrapped Application Debugger), has been developed which catches +fatal errors, unwinds the call stack, and generates Python exceptions +with debugging information. WAD requires no modifications to Python, +works with all extension modules, and introduces no performance +overhead. An initial implementation of the system is currently +available for Sun SPARC Solaris and i386-Linux. + + + +

1. Introduction

+ +One of the primary reasons C, C++, and Fortran programmers are +attracted to Python is its ability to serve as an extension language +for compiled programs. Furthermore, tools such as SIP, CXX, Pyfort, FPIG, +and SWIG make it extremely easy for a programmer to ``wrap'' existing +software into an extension module [1,2,3,4,5]. Although this approach is +extremely attractive in terms of providing a highly usable and +flexible environment for users, extension modules suffer from +problems not normally associated with Python +scripts---especially when they don't work. + +

+Normally, Python programming errors result in an exception like this: + +

+% python foo.py
+Traceback (innermost last):
+  File "foo.py", line 11, in ?
+    foo()
+  File "foo.py", line 8, in foo
+    bar()
+  File "foo.py", line 5, in bar
+    spam()
+  File "foo.py", line 2, in spam
+    doh()
+NameError: doh
+% 
+
+ +Unfortunately for compiled extensions, the following situation sometimes occurs: + +
+% python foo.py
+Segmentation Fault (core dumped)
+%
+
+ +Needless to say, this isn't very informative--well, +other than indicating that something ``very bad'' happened. + +

+In order to identify the source of a fatal error, a programmer can run a +debugger on the Python executable or on a core file like this: + +

+% gdb /usr/local/bin/python
+(gdb) run foo.py
+Starting program: /usr/local/bin/python foo.py
+
+Program received signal SIGSEGV, Segmentation fault.
+0xff060568 in doh () from /u0/beazley/Projects/WAD/Python/./libfoo.so
+(gdb) where
+#0  0xff060568 in doh () from /u0/beazley/Projects/WAD/Python/./libfoo.so
+#1  0xff082f34 in _wrap_doh ()
+   from /u0/beazley/Projects/WAD/Python/./dohmodule.so
+#2  0x2777c in call_builtin (func=0x1984b8, arg=0x1a1ccc, kw=0x0)
+    at ceval.c:2650
+#3  0x27648 in PyEval_CallObjectWithKeywords (func=0x1984b8, arg=0x1a1ccc,
+    kw=0x0) at ceval.c:2618
+#4  0x25d18 in eval_code2 (co=0x19acf8, globals=0x0, locals=0x1c7844,
+    args=0x1984b8, argcount=1625472, kws=0x0, kwcount=0, defs=0x0, defcount=0,
+    owner=0x0) at ceval.c:1951
+#5  0x25954 in eval_code2 (co=0x199620, globals=0x0, locals=0x1984b8,
+    args=0x196654, argcount=1862720, kws=0x197788, kwcount=0, defs=0x0,
+#6  0x25954 in eval_code2 (co=0x19ad38, globals=0x0, locals=0x196654,
+    args=0x1962fc, argcount=1862800, kws=0x198e90, kwcount=0, defs=0x0,
+    defcount=0, owner=0x0) at ceval.c:1850
+#7  0x25954 in eval_code2 (co=0x1b6c60, globals=0x0, locals=0x1962fc,
+    args=0x1a1eb4, argcount=1862920, kws=0x0, kwcount=0, defs=0x0, defcount=0,
+    owner=0x0) at ceval.c:1850
+#8  0x22da4 in PyEval_EvalCode (co=0x1b6c60, globals=0x1962c4, locals=0x1962c4)
+    at ceval.c:319
+#9  0x3adb4 in run_node (n=0x18abf8, filename=0x1b6c60 "", globals=0x1962c4,
+    locals=0x1962c4) at pythonrun.c:886
+#10 0x3ad64 in run_err_node (n=0x18abf8, filename=0x1b6c60 "",
+    globals=0x1962c4, locals=0x1962c4) at pythonrun.c:874
+#11 0x3ad38 in PyRun_FileEx (fp=0x18a5a8, filename=0xffbefd7a "foo.py",
+    start=1616888, globals=0x1962c4, locals=0x1962c4, closeit=1)
+    at pythonrun.c:866
+#12 0x3a1d8 in PyRun_SimpleFileEx (fp=0x18a5a8, filename=0xffbefd7a "foo.py",
+    closeit=1) at pythonrun.c:579
+#13 0x39d84 in PyRun_AnyFileEx (fp=0x18a5a8, filename=0xffbefd7a "foo.py",
+    closeit=1) at pythonrun.c:459
+#14 0x1f498 in Py_Main (argc=2, argv=0xffbefc84) at main.c:289
+#15 0x1eec0 in main (argc=2, argv=0xffbefc84) at python.c:10
+
+ +Unfortunately, even though the debugger identifies the location where the fault occurred, it +mostly provides information about the internals of the +interpreter. The debugger certainly doesn't reveal anything about the Python +program that led to the error (i.e., it doesn't reveal the +same information that would be contained in a Python traceback). As a result, +the debugger is of limited use when it comes to debugging an application that +consists of both compiled and Python code. + +

+Normally, extension developers try to avoid catastrophic errors by +adding error handling. If +an application is small or customized for use with Python, it can be +modified to raise Python exceptions. +Automated tools such as SWIG can also convert C++ +exceptions and C-related error handling mechanisms into Python +exceptions. However, no matter how much error checking is added, +there is always a chance that an extension will fail in an unexpected +manner. This is especially true for large applications that have been wrapped +into an extension module. In addition, certain types of errors such as floating +point exceptions (e.g., division by zero) are especially difficult to find +and eliminate. Finally, rigorous error checking may be omitted to improve +performance. + +

+To address these problems, an experimental module known as WAD (Wrapped +Application Debugger) has been developed. +WAD is able to +convert fatal errors into Python exceptions that include information +from the call stack as well as debugging +information. By turning such errors into Python exceptions, fatal +errors now result in a traceback that crosses the boundary between +Python code and compiled extension code. This makes it much +easier to identify and correct extension-related programming errors. +WAD requires no modifications to Python and is compatible with all +extension modules. However, it is also highly platform specific +and currently only runs on Sun Sparc +Solaris and i386-Linux. The primary goal of this paper is to motivate the problem +and to describe one possible solution. In addition, many of the +implementation issues +associated with providing an integrated error reporting mechanism are described. + +

2. An Example

+ +WAD can either be imported as a Python extension module or linked to an +extension module. To illustrate, consider the earlier example: + +
+% python foo.py
+Segmentation Fault (core dumped)
+%
+
+ +To identify the problem, a programmer can run Python interactively and import WAD as follows: + +
+% python
+Python 2.0 (#1, Oct 27 2000, 14:34:45) 
+[GCC 2.95.2 19991024 (release)] on sunos5
+Type "copyright", "credits" or "license" for more information.
+>>> import libwadpy
+WAD Enabled
+>>> execfile("foo.py")
+Traceback (most recent call last):
+  File "", line 1, in ?
+  File "foo.py", line 16, in ?
+    foo()
+  File "foo.py", line 13, in foo
+    bar()
+  File "foo.py", line 10, in bar
+    spam()
+  File "foo.py", line 7, in spam
+    doh.doh(a,b,c)
+SegFault: [ C stack trace ]
+
+#2   0x00027774 in call_builtin(func=0x1c74f0,arg=0x1a1ccc,kw=0x0)
+#1   0xff022f7c in _wrap_doh(0x0,0x1a1ccc,0x160ef4,0x9c,0x56b44,0x1aa3d8)
+#0   0xfe7e0568 in doh(a=0x3,b=0x4,c=0x0) in 'foo.c', line 28
+
+/u0/beazley/Projects/WAD/Python/foo.c, line 28
+
+    int doh(int a, int b, int *c) {
+ =>   *c = a + b;
+      return *c;
+    }
+
+>>>
+
+ +In this case, we can +see that the program has tried to assign a value to a +NULL pointer (indicated by the value "c=0x0" in the last function call). Furthermore, we obtain a Python traceback that shows the +entire sequence of functions leading to the problem. Finally, since +control returned to the interpreter, it is possible to interactively +inspect various aspects of the application or to continue with the computation +(although this clearly depends on the severity of the error and the nature of the application). + +

+In certain applications, it may be difficult to run Python +interactively or to modify the code to explicitly import a special +debugging module. In these cases, WAD can be attached to an extension module with the +linker. For example: + +

+% ld -G $(OBJS) -o dohmodule.so -lwadpy
+
+ +This requires no recompilation of any source code--only a relinking of the +extension module. When Python loads the relinked extension module, WAD is automatically +initialized before Python invokes the module initialization function. + +

3. Design Considerations for Embedded Error Recovery

+ +The primary design goal of WAD is provide an error reporting mechanism +for extension modules that is a natural extension of normal Python +exception handling. There are two primary motivations for +handling fatal errors in this manner: first, in the context of Python +programming, it is simply unnatural to run a separate debugging +application to identify a problem in an extension module when no such +requirement exists for scripts. Thus, an embedded error reporting +mechanism is simply more convenient. Second, the target users +of an extension module may not know how to use a debugger or even have +a development environment installed on their machine. Therefore, +the ability to produce an informative traceback within the +confines of the Python interpreter can be of tremendous value to an +extension developer. This is because users who report a problem will +be able to include an informative traceback as opposed to simply +saying ``the code crashed.'' + +

+A secondary design goal is to provide a system that is as non-invasive +as possible. The system should not require modifications to Python or +any extension modules and it should be easy to integrate +into the runtime environment of an application. In addition, it shouldn't +introduce any performance overhead. + +

+Finally, since WAD co-exists with the Python interpreter (i.e., in the same +process), there are a number of technical issues that have to be +addressed. First, fatal errors can theoretically occur anywhere in +the interpreter as well as in extension modules. Therefore, WAD needs +to know about Python's internal organization if it is going to provide +a graceful recovery back to the interpreter. Second, in order to +implement this recovery scheme, the system has to perform direct +manipulation of the CPU context and call stack. Last, but not least, +since the recovery code lives in the same address space as the +interpreter and extension modules it should not depend on the process +stack and heap (since both could have been corrupted by the faulting +application). + +

4. Catching Fatal Errors

+ +WAD catches catastrophic errors by installing a +reliable signal handler for SIGSEGV, SIGBUS, SIGABRT, SIGILL, and SIGFPE [9]. Unlike the +more familiar BSD-style signal interface (as provided by the Python +signal module), reliable signal handlers are installed using the sigaction() system call and have a few notable properties: + + + +Therefore, the high level implementation of WAD is relatively straightforward: when a fatal signal occurs, +a handler function runs on an isolated signal handling stack. +The CPU context is then used to unwind the call stack and to inspect the process state. Finally, +if possible, the CPU context is modified in a manner that allows the signal handler to +return to Python with a raised exception. + +

5. A Detailed Description of the Recovery Mechanism

+ +In this section, a more detailed description of the error recovery +scheme is presented. The precise implementation details of this are +highly platform specific and involve a number of advanced topics including +the Unix process file system (/proc), the ELF object file format, and the +Stabs compiler debugging format [6,7,8]. The details of these topics are +beyond the scope of this paper. However, this section hopes to +give the reader a small taste of the steps involved in implementing the recovery mechanism. + +

+The services of WAD are only invoked upon the reception of a fatal +signal. This triggers a signal handling function that results in a return to Python +as illustrated in the following figure: + +

+ +
Control flow of the error recovery mechanism
+
+ +

+The steps required to implement this recovery are as follows: + +

    +
  1. The values of the program counter and stack pointer are obtained from the CPU + context structure passed to the WAD signal handler. + +

    +

  2. The virtual memory map of the process is inspected to identify all of +the shared libraries, dynamically loaded modules, and valid memory regions. +This information is obtained by reading from the Unix /proc filesystem. +The following table illustrates the nature of this data: + + +
    +Address     Size    Permissions        File
    +----------  -----   -----------------  ---------------------------------
    +00010000    1264K   read/exec         /usr/local/bin/python 
    +0015A000     184K   read/write/exec   /usr/local/bin/python 
    +00188000     296K   read/write/exec     [ heap ] 
    +FE7C0000      32K   read/exec         /u0/beazley/Projects/dohmodule.so 
    +FE7D6000       8K   read/write/exec   /u0/beazley/Projects/dohmodule.so 
    +...
    +FF100000     664K   read/exec         /usr/lib/libc.so.1 
    +FF1B6000      24K   read/write/exec   /usr/lib/libc.so.1 
    +FF1BC000       8K   read/write/exec   /usr/lib/libc.so.1 
    +FF2C0000     120K   read/exec         /usr/lib/libthread.so.1 
    +FF2EE000       8K   read/write/exec   /usr/lib/libthread.so.1 
    +FF2F0000      48K   read/write/exec   /usr/lib/libthread.so.1 
    +FF310000      40K   read/exec         /usr/lib/libsocket.so.1 
    +FF32A000       8K   read/write/exec   /usr/lib/libsocket.so.1 
    +FF330000      24K   read/exec         /usr/lib/libpthread.so.1 
    +FF346000       8K   read/write/exec   /usr/lib/libpthread.so.1 
    +FF350000       8K   read/write/exec    [ anon ] 
    +FF3B0000       8K   read/exec         /usr/lib/libdl.so.1 
    +FF3C0000     128K   read/exec         /usr/lib/ld.so.1 
    +FF3E0000       8K   read/write/exec   /usr/lib/ld.so.1 
    +FFBEA000      24K   read/write/exec    [ stack ] 
    +
    + +

    +

  3. The call stack is unwound to produce a traceback of the +calling sequence that led to the error. The unwinding process is just a simple +loop that is similar to the following: + +
    +long *pc = get_pc(context);
    +long *sp = get_sp(context);
    +while (sp) {
    +    /* Move to previous stack frame */
    +    pc = (long *) sp[15];      /* %i7 register on SPARC */
    +    sp = (long *) sp[14];      /* %i6 register on SPARC */
    +}
    +
    + +
  4. For each stack frame, symbol table and debugging information +is gathered and stored in a WAD exception frame object. +Obtaining this information is the most complicated part of WAD and involves +the following steps: first, the current program counter is mapped to an object file +using the virtual memory map obtained in step 2. Next, the object file is loaded +using mmap(). Once loaded, the ELF symbol table +is searched for an address match. The symbol table contains a collection of records +containing memory offsets, sizes, and names such as this: + +
    +Offset    Size    Name 
    +--------  ------  ---------
    +0x1280    324     wrap_foo
    +0x1600    128     foo
    +0x2408    192     bar
    +...
    +
    + +To find a match for a virtual memory address addr, WAD simply +searches for a symbol s such that base + +s.offset <= addr < base + +s.offset + s.size, where base is the base +virtual address of the object file in the virtual memory map. + +

    +Debugging information, if available, is scanned to identify a source +file, function name, and line number. This involves scanning object files for a +table of debugging information stored in a format +known as ``stabs.''. Stabs is a relatively simple, but highly extensible format that +is language independent and capable of encoding almost every aspect of the +original source code. For the purposes of WAD, only a small subset of this +data is actually used. + +

    +The following table shows a small fragment of relevant stabs data: +

    +type    desc   value        string                        description
    +------  -----  ---------    ---------------------------   -----------
    +0x64      0        0         /u0/beazley/Projects/foo/    Pathname
    +0x64      0        0        foo.c                         Filename
    +...                             
    +0x24      0        0        foo:F(0,3);(0,3)              Function
    +0xa0      4        68       n:p(0,3)                      Parameter
    +...                                                      
    +0x44      6        8                                      Line number
    +0x44      7        12                                     Line number
    +0x44      8        44                                     Line number
    +0x44      9        56                                     Line number
    +...                                                      
    +
    + +In the table, the type field indicates the type of debugging information. For +example, 0x64 specifies the source file, 0x24 is a function +definition, 0xa0 is a function parameter, and 0x44 is line number +information. Associated with each stab is a collection of parameters +and an optional string. The string usually contains symbol names and +other information. The desc and value fields are numbers +that usually contain byte offsets and line number data. +Therefore, to collect debugging information, WAD simply walks through the debugging +tables until it finds the function of interest. Once found, parameter and line +number specifiers are inspected to determine the location and values of the function +arguments as well the source line at which the error occurred. + +

    +

  5. After the complete traceback has been obtained, it is examined to see if +there are any ``safe'' return points to which control can be returned. +This is accomplished by maintaining an internal table of predefined symbolic return +points as shown in the following table: + +
    +Python symbol                     Return value
    +-----------------------------     ------------------
    +call_builtin                      NULL
    +_PyImport_LoadDynamicModule       NULL
    +PyObject_Repr                     NULL
    +PyObject_Print                    -1
    +PyObject_CallFunction             NULL
    +PyObject_CallMethod               NULL
    +PyObject_CallObject               NULL
    +PyObject_Cmp                      -1
    +PyObject_Compare                  -1
    +PyObject_DelAttrString            -1
    +PyObject_DelItem                  -1
    +PyObject_GetAttrString            NULL
    +PyObject_GetItem                  NULL
    +PyObject_HasAttrString            -1
    +PyObject_Hash                     -1
    +PyObject_Length                   -1
    +PyObject_SetAttrString            -1
    +PyObject_SetItem                  -1
    +PyObject_Str                      NULL
    +PyObject_Type                     NULL
    +...
    +PyEval_EvalCode                   NULL
    +
    + +The symbols in this table correspond to functions within the Python interpreter that +might execute extension code and include the parts of the interpreter that invoke builtin functions +as well as the functions from the abstract object interface. +If any of these symbols appear on the call stack, +a handler function is invoked to raise a Python exception. +This handler function +is given a WAD-specific traceback object that contains a copy of the +call stack and CPU registers as well as any symbolic and debugging +information that was obtained. If none of the symbolic return points +are encountered, WAD invokes a default handler that simply prints the +full C stack trace and generates a core file. + +

    +

  6. If a return point is found, the CPU context is modified in a manner that allows the signal handler to return + with a suitable Python error. + This last step is the most tricky part of the recovery process, but the general + idea is that CPU context is modified in a way that makes Python think that + an extension function simply raised an exception and returned an error. Currently, this + is implemented by having the signal handler return to a small + handler function written in assembly language which arranges to return the + desired value back to the specified return point. + +

    + The most complicated part of modifying the CPU context is that of restoring + previously saved CPU registers. By manually unwinding the call stack, the + WAD exception handler effectively performs the same operation as a longjmp() call in C. + However, unlike longjmp(), no previously saved set of CPU registers are available from which to resume + execution in the Python interpreter. The solution to this problem depends entirely on the + underlying architecture. On the SPARC, register values are saved in register windows + which WAD manually unwinds to restore the proper state. On the Intel, the solution is much + more interesting. To restore the register values, WAD must manually inspect the + machine instructions of each function on the call stack in order to find out where the + registers might have been saved. This information is then used to restore the registers from their + saved locations before returning to the Python interpreter. + +

    +

  7. Python receives the exception and produces a traceback. +
+ +

6. Initialization and Loading

+ +In the earlier example, it was shown that WAD could be both +loaded as an extension module or simply attached to an existing module +with the linker. This latter case is implemented by +wrapping the WAD initialization function inside the constructor of a +statically allocated C++ object like this: + +
+
+class WadInit {
+public:
+    WadInit() {
+        wad_init();   /* Call the real initialization function */
+    }
+};
+static WadInit wad_initializer;
+
+ +When the dynamic loader brings WAD into memory, it automatically +executes the constructors of all statically allocated C++ objects. +Therefore, this initialization code executes immediately after +loading, but before Python actually calls the module initialization +function. As a result, when an extension module is linked with WAD, +the debugging capability is enabled before any other operations occur---this +allows WAD to respond to fatal errors that might occur during module +initialization. + +The rest of the initialization process consists of the following: + + +Although the use of a C++ static constructor has the potential to +conflict with C++ extension code that also uses static constructors, +it is always possible to enable WAD prior to loading a C++ extension +(e.g., WAD could be loaded separately). + +

7. Implementation Details

+ +Currently, WAD is written in ANSI C with a small amount of C++, +and a small amount of assembly code (to assist in the return to the interpreter). +The entire implementation contains approximately 2000 semicolons and most of the code +relates to the gathering of source code information (symbol tables, +debugging information, etc.). + +

+Although there are libraries such as GNU bfd that can assist with the +reading of object files, none of these are used in the implementation [10]. +First, these libraries tend to be quite large +and are oriented more towards stand-alone tools such as debuggers, +linkers, and compilers. Second, due to usual nature of the runtime +environment and the restrictions on memory utilization (no heap, no +stack), the behavior of these libraries is somewhat unclear and +would require further study. +Finally, given the small size of the prototype implementation, it didn't seem necessary to rely on a +large general purpose library. + +

8. Discussion

+ +The primary focus of this work is to provide a more useful error +reporting mechanism to extension developers. +However, this does not imply that +WAD is appropriate as a general purpose exception +handling mechanism. First, let's focus +on the recovery mechanism: + + + +In addition, there are a number of issues that pertain to WAD's interaction with the +Python interpreter: + + + +Finally, there are a number of application specific issues to note: + + + + +Despite its various limitations, WAD is applicable to a wide range of +extension-related errors. Furthermore, most of the errors that are +likely to occur are of a more benign variety. For example, a +segmentation fault may simply be the result of an uninitialized +pointer (perhaps the user forgot to call an initialization procedure). +Likewise, bus errors, failed assertions, and floating point exceptions +rarely result in a situation where the WAD recovery mechanism would be +unable to produce a meaningful Python traceback. + +

9. Related Work

+ +There is a huge body of literature concerning the implementation of +exception handling in various programming languages and environments. +A detailed discussion of this work is clearly not possible here, but +a general overview of various exception handling issues can be found in [11]. +In general, there are a few themes that seem to prevail. +First, +considerable attention has been given to exception handling mechanisms +in specific languages such as efficient exception handling for C++. +Second, a great deal of work has been given to the semantic aspects of +exception handling such as exception hierarchies, finalization, and +whether or not code is restartable after an exception has occurred. +Finally, a fair amount of exception work has been done in the context +of component frameworks and distributed systems. Most of this work +tends to concentrate on explicit exception handling mechanisms. Very little +work appears to have been done in the area of converting hardware generated errors +into exceptions. + +

+With respect to debuggers, quite a lot of work has been done in +creating advanced debugging support for specific languages and +integrated development environments. However, very little of this work +has concentrated on the problem of extensible systems and +compiled-interpreted language integration. For instance, debuggers +for Python are currently unable to cross over into C extensions whereas C +debuggers aren't able to easily extract useful information from the +internals of the Python interpreter. + +

+One system of possible interest is Rn which was developed in the +mid-1980s at Rice University [12]. This system, primarily +designed for working with large scientific applications written in +Fortran, provided an execution monitor that consisted of a special +debugging process with an embedded interpreter. When attached to +compiled Fortran code, this monitor could dynamically patch +the executable in a manner that allowed parts of the code to be executed in the +interpreter. This was used to provide a debugging environment in which +essentially any part of the compiled application could be modified at +run-time by simply compiling the modified code (Fortran) to an +interpreted form and inserting a breakpoint in the original executable +that transferred control to the interpreter. Although this +particular scheme is not directly related to the functionality +of WAD, it is one of the few systems in which +interpreted and compiled code have been tightly coupled within +a debugging framework. Several aspects of the interpreted/compiled +interface are closely related to way in which WAD operates. In addition, +various aspects of this work may be useful should WAD be extended with +new capabilities. + +

10. Future Directions

+ +WAD is currently an experimental prototype. Although this paper has +described its use with Python, the core of the system is generic and +is easily extended to other programming environments. For example, when +linked to C/C++ code, WAD will automatically produce stack +traces for fatal errors. A module for generating Tcl exceptions has +also been developed. Plans are underway to provide support for other +extensible systems including Perl, Ruby, and Guile. + +

+Finally, a number of extensions to the WAD approach may be possible. +For example, even though the current implementation only returns a +traceback string to the Python interpreter, the WAD signal handler +actually generates a full traceback of the C call stack including all +of the CPU registers and a copy of the stack data. Therefore, with a +little work, it may be possible to implement a diagnostic tool that +allows the state of the C stack to be inspected from the Python +interpreter after a crash has occurred. Similarly, it may be possible +to integrate the capabilities of WAD with those provided by the Python +debugger. + +

11. Conclusions and Availability

+ +WAD provides a simple mechanism for converting fatal errors into +Python exceptions that provide useful information to extension +writers. In doing so, it solves one of the most frustrating aspects +of working with compiled Python extensions--that of identifying program errors. +Furthermore the system requires no code modifications to Python and introduces +no performance overhead. +Although the system is +necessarily platform specific, the system does not involve a +significant amount of code. As a result, it may be relatively +straightforward to port to other Unix systems. + +

+As of this writing, WAD is still undergoing active development. However, +the software is available for experimentation and download at +at http://systems.cs.uchicago.edu/wad. + +

References

+ +[1] D.M. Beazley, Using SWIG to Control, Prototype, and Debug C Programs with Python, +4th International Python Conference, Livermore, CA. (1996). + +

+[2] P.F. Dubois, Climate Data Analysis Software, 8th International Python Conference, +Arlington, VA. (2000). + +

+[3] P.F. Dubois, A Facility for Creating Python Extensions in C++, 7th International Python +Conference, Houston, TX. (1998). + +

+[4] SIP. http://www.thekompany.com/projects/pykde/. + +

+[5] FPIG. http://cens.ioc.ee/projects/f2py2e/. + +

+[6] R. Faulkner and R. Gomes, The Process File System and Process Model in UNIX System V, USENIX Conference Proceedings, +January 1991. + +

+[7] J.R. Levine, Linkers & Loaders. Morgan Kaufmann Publishers, 2000. + +

+[8] Free Software Foundation, The "stabs" debugging format. GNU info document. + +

+[9] W. Richard Stevens, UNIX Network Programming: Interprocess Communication, Volume 2. PTR +Prentice-Hall, 1998. + +

+[10] S. Chamberlain. libbfd: The Binary File Descriptor Library. Cygnus Support, bfd version 3.0 edition, April 1991. + +

+[11] M.L. Scott. Programming Languages Pragmatics. Morgan Kaufmann Publishers, 2000. + +

+[12] A. Carle, D. Cooper, R. Hood, K. Kennedy, L. Torczon, S. Warren, A Practical Environment for Scientific Programming. +IEEE Computer, Vol 20, No. 11, (1987). p. 75-89. + + + + + + + + + + + + diff --git a/Tools/WAD/Papers/usenix2001.tex b/Tools/WAD/Papers/usenix2001.tex new file mode 100644 index 000000000..6fa05fcec --- /dev/null +++ b/Tools/WAD/Papers/usenix2001.tex @@ -0,0 +1,993 @@ +%template for producing IEEE-format articles using LaTeX. +%written by Matthew Ward, CS Department, Worcester Polytechnic Institute. +%use at your own risk. Complaints to /dev/null. +%make two column with no page numbering, default is 10 point +%\documentstyle{article} +\documentstyle[twocolumn]{article} +%\pagestyle{empty} + +%set dimensions of columns, gap between columns, and space between paragraphs +%\setlength{\textheight}{8.75in} +\setlength{\textheight}{9.0in} +\setlength{\columnsep}{0.25in} +\setlength{\textwidth}{6.45in} +\setlength{\footheight}{0.0in} +\setlength{\topmargin}{0.0in} +\setlength{\headheight}{0.0in} +\setlength{\headsep}{0.0in} +\setlength{\oddsidemargin}{0in} +%\setlength{\oddsidemargin}{-.065in} +%\setlength{\oddsidemargin}{-.17in} +%\setlength{\parindent}{0pc} + +%I copied stuff out of art10.sty and modified them to conform to IEEE format + +\makeatletter +%as Latex considers descenders in its calculation of interline spacing, +%to get 12 point spacing for normalsize text, must set it to 10 points +\def\@normalsize{\@setsize\normalsize{12pt}\xpt\@xpt +\abovedisplayskip 10pt plus2pt minus5pt\belowdisplayskip \abovedisplayskip +\abovedisplayshortskip \z@ plus3pt\belowdisplayshortskip 6pt plus3pt +minus3pt\let\@listi\@listI} + +%need an 11 pt font size for subsection and abstract headings +\def\subsize{\@setsize\subsize{12pt}\xipt\@xipt} + +%make section titles bold and 12 point, 2 blank lines before, 1 after +\def\section{\@startsection {section}{1}{\z@}{24pt plus 2pt minus 2pt} +{12pt plus 2pt minus 2pt}{\large\bf}} + +%make subsection titles bold and 11 point, 1 blank line before, 1 after +\def\subsection{\@startsection {subsection}{2}{\z@}{12pt plus 2pt minus 2pt} +{12pt plus 2pt minus 2pt}{\subsize\bf}} +\makeatother + +\newcommand{\ignore}[1]{} +%\renewcommand{\thesubsection}{\arabic{subsection}.} + +\begin{document} + +%don't want date printed +\date{} + +%make title bold and 14 pt font (Latex default is non-bold, 16 pt) +\title{\Large \bf An Embedded Error Recovery and Debugging Mechanism for Scripting Language Extensions} + +%for single author (just remove % characters) +\author{{David M.\ Beazley} \\ +{\em Department of Computer Science} \\ +{\em University of Chicago }\\ +{\em Chicago, Illinois 60637 }\\ +{\em beazley@cs.uchicago.edu }} + +% My Department \\ +% My Institute \\ +% My City, ST, zip} + +%for two authors (this is what is printed) +%\author{\begin{tabular}[t]{c@{\extracolsep{8em}}c} +% Roscoe Giles & Pablo Tamayo \\ +% \\ +% Department of Electrical, Computer, & Thinking Machines Corp. \\ +% and Systems Engineering & Cambridge, MA~~02142. \\ +% and & \\ +% Center for Computational Science & \\ +% Boston University, Boston, MA~~02215. & +%\end{tabular}} + +\maketitle + +%I don't know why I have to reset thispagesyle, but otherwise get page numbers +\thispagestyle{empty} + + +\subsection*{Abstract} +{\em +In recent years, scripting languages such as Perl, Python, and Tcl +have become popular development tools for the creation of +sophisticated application software. One of the most useful features +of these languages is their ability to easily interact with compiled +languages such as C and C++. Although this mixed language approach +has many benefits, one of the greatest drawbacks is the complexity of +debugging that results from using interpreted and compiled code in +the same application. In part, this is due to the fact that scripting +language interpreters are unable to recover from catastrophic errors in +compiled extension code. Furthermore, traditional C/C++ debuggers do +not provide a satisfactory degree of integration with interpreted +languages. This paper describes an experimental system in which fatal +extension errors such as segmentation faults, bus errors, and failed +assertions are handled as scripting language exceptions. This system, +which has been implemented as a general purpose shared library, +requires no modifications to the target scripting language, introduces +no performance overhead, and simplifies the debugging of mixed +interpreted-compiled application software. +} + +\section{Introduction} + +Slightly more than ten years have passed since John Ousterhout +introduced the Tcl scripting language at the 1990 USENIX technical +conference \cite{ousterhout}. Since then, scripting languages have +been gaining in popularity as evidenced by the wide-spread use of +systems such as Tcl, Perl, Python, Guile, PHP, and Ruby +\cite{ousterhout,perl,python,guile,php,ruby}. + +In part, the success of modern scripting languages is due to their +ability to be easily integrated with software written in compiled +languages such as C, C++, and Fortran. In addition, a wide variety of wrapper +generation tools can be used +to automatically produce bindings between existing code and a +variety of scripting language environments +\cite{swig,sip,pyfort,f2py,advperl,heidrich,vtk,gwrap,wrappy}. As a result, a large number of +programmers are using scripting languages to control +complex C/C++ programs or as a tool for re-engineering legacy +software. This approach is attractive because it allows programmers +to benefit from the flexibility and rapid development of +scripting while retaining the best features of compiled code such as high +performance \cite{ouster1}. + +A critical aspect of scripting-compiled code integration is the way in +which it departs from traditional C/C++ development. Rather than +building large monolithic stand-alone applications, scripting +languages strongly encourage the creation of modular software +components. As a result, scripted software tends to be constructed as +a mix of dynamically loadable libraries, scripts, and third-party +extension modules. In this sense, one might argue that the benefits of +scripting are achieved at the expense of creating a somewhat more +complicated development environment. + +A consequence of this complexity is an increased degree of difficulty +associated with debugging programs that utilize multiple languages, +dynamically loadable modules, and a sophisticated runtime environment. +To address this problem, this paper describes an experimental system +known as WAD (Wrapped Application Debugger) in which an embedded error +recovery and debugging mechanism is added to common scripting +languages. This system converts catastrophic signals such as +segmentation faults and failed assertions to exceptions that can be +handled by the scripting language interpreter. In doing so, it +provides more seamless integration between error handling in +scripting language interpreters and compiled extensions. + +\section{The Debugging Problem} + +Normally, a programming error in a scripted application +results in an exception that describes the problem and the context in +which it occurred. For example, an error in a Python script might +produce a traceback similar to the following: + +\begin{verbatim} +% python foo.py +Traceback (innermost last): + File "foo.py", line 11, in ? + foo() + File "foo.py", line 8, in foo + bar() + File "foo.py", line 5, in bar + spam() + File "foo.py", line 2, in spam + doh() +NameError: doh +\end{verbatim} + +In this case, a programmer might be able to apply a fix simply based +on information in the traceback. Alternatively, if the problem is +more complicated, a script-level debugger can be used to provide more information. In contrast, +a failure in compiled extension code might produce the following result: + +\begin{verbatim} +% python foo.py +Segmentation Fault (core dumped) +\end{verbatim} + +In this case, the user has no idea of what has happened other +than it appears to be ``very bad.'' Furthermore, script-level +debuggers are unable to identify the problem since they also crash +when the error occurs (they usually run in the same process as +the interpreter). A user might be able to narrow the source of the +problem through trial-and-error techniques such as inserting print +statements or commenting out sections of script code. Unfortunately, +neither of these techniques are very attractive for obvious reasons. + +Alternatively, a user could run the application under the control of a +traditional debugger such as gdb \cite{gdb}. Unfortunately, this also has +drawbacks. First, even though the debugger provides information about the error, +the debugger mostly provides information about the internal +implementation of the scripting language interpreter. Needless +to say, this isn't very useful nor does it provide much insight as to +where the error might have occurred within a script. Second, +the structure of a scripted application tends to be much more complex +than a traditional stand-alone program. As a result, a user may not +have a good sense of how to actually attach a C/C++ debugger to their +script. In addition, execution may occur within a +complex run-time environment involving events, threads, and network +connections. Because of this, it can be difficult to reproduce +and identify certain types of catastrophic errors (especially if they +depend on timing or peculiar sequences of events). Finally, this approach +assumes that a programmer has a C/C++ development environment installed on +their machine and that they know how to use a low-level +debugger. Unfortunately, neither of these assumptions may hold in practice. +This is because scripting languages are often used to provide programmability to +applications in which end-users might write scripts, yet would not be expected +to write low-level C code. + +Even if a traditional debugger such as gdb were modified to +provide better integration with scripting languages, it is not clear +that this would be the most natural solution to the problem. +For one, the whole notion of having to run a separate debugging process to debug +extension code is unnatural when no such requirement exists for +a script. Furthermore, even if such a debugger existed, an inexperienced user may not +have the expertise or inclination to use it. Finally, +obscure fatal errors may occur long after an application has been deployed. +Unless the debugger is distributed along with the application in some manner, it will be +extraordinary difficult to obtain useful diagnostics when such errors occur. + +\begin{figure*}[t] +{\small +\begin{verbatim} +% python foo.py +Traceback (most recent call last): + File "", line 1, in ? + File "foo.py", line 16, in ? + foo() + File "foo.py", line 13, in foo + bar() + File "foo.py", line 10, in bar + spam() + File "foo.py", line 7, in spam + doh.doh(a,b,c) + +SegFault: [ C stack trace ] + +#2 0x00027774 in call_builtin(func=0x1c74f0,arg=0x1a1ccc,kw=0x0) in 'ceval.c', line 2650 +#1 0xff083544 in _wrap_doh(self=0x0,args=0x1a1ccc) in 'foo_wrap.c', line 745 +#0 0xfe7e0568 in doh(a=0x3,b=0x4,c=0x0) in 'foo.c', line 28 + +/u0/beazley/Projects/WAD/Python/foo.c, line 28 + + int doh(int a, int b, int *c) { + => *c = a + b; + return *c; + } +\end{verbatim} +} +\caption{Cross language traceback generated for a segmentation fault in a Python extension} +\end{figure*} + +The easiest solution to the debugging problem is +to simply add as much error checking as possible. Although this is never +a bad thing to do, it's usually not enough to completely eliminate the problem. +For one, scripting languages are sometimes used to control hundreds +of thousands to millions of lines of compiled code. In this case, it is improbable +that a programmer will be able to foresee every conceivable error. +Second, scripting languages are often used to put new user interfaces on legacy software. In this +case, scripting may introduce new modes of execution that cause a formerly ``bug-free'' +application to fail in an unexpected manner. Finally, certain types +of errors such as floating-point exceptions can be particularly +difficult to eliminate because they might be generated algorithmically (e.g., +as the result of a numerical method). Therefore, even when a programmer has worked hard to eliminate +crashes, there is always a small probability that a complex application +will fail. + +\section{Embedded Error Recovery} + +Rather than modifying an existing debugger to support scripting +languages, an alternative approach is to add a more powerful error +handling and recovery mechanism to the scripting language interpreter. +This approach has been implemented in the form of an +experimental system known as WAD. WAD +is packaged as dynamically loadable shared library that can either be +loaded as a scripting language extension or linked to existing +extension modules as a library. The core of the system is generic and +requires no modifications to the scripting interpreter or existing +extension modules. Furthermore, the system does not introduce a performance penalty as it +does not rely upon program instrumentation or tracing. + +WAD works by converting fatal signals such as SIGSEGV, +SIGBUS, SIGFPE, and SIGABRT into scripting language exceptions that contain +debugging information collected from the call-stack of compiled +extension code. By handling errors in this manner, the scripting +language interpreter is able to produce a cross-language stack trace that +contains information from both the script code and extension code as +shown for Python and Tcl/Tk in Figures 1 and 2. In this case, the user +is given a very clear idea of what has happened without having +to launch a separate debugger. + +The advantage to this approach is that it provides +more seamless integration between error handling +in scripts and error handling in extensions. In addition, it eliminates +the most common debugging step that a developer is likely to perform +in the event of a fatal error--running a separate debugger on a core +file and typing 'where' to get a stack trace. Finally, this allows +end-users to provide extension writers with useful debugging +information since they can supply a stack trace as opposed to a vague +complaint that the program ``crashed.'' + +\begin{figure*}[t] +\begin{picture}(400,250)(0,0) +\put(50,-110){\special{psfile = tcl.ps hscale = 60 vscale = 60}} +\end{picture} +\caption{Dialogue box with traceback information for a failed assertion in a Tcl/Tk extension} +\end{figure*} + +\section{Scripting Language Internals} + +In order to provide embedded error recovery, it is critical to understand how +scripting language interpreters interface with extension code. Despite the wide variety +of scripting languages, essentially every implementation uses a similar +technique for accessing foreign code. + +The most widely used extension mechanism is a foreign function +interface in which compiled procedures can be called from the scripting language +interpreter. This is accomplished by writing a collection of wrapper functions that conform +to a specified calling convention. The primary purpose of the wrappers are to +marshal arguments and return values between the two languages and to handle errors. +For example, in Tcl, every wrapper +function must conform to the following prototype: + +\begin{verbatim} +int +wrap_foo(ClientData clientData, + Tcl_Interp *interp, + int objc, + Tcl_Obj *CONST objv[]) +{ + /* Convert arguments */ + ... + /* Call a function */ + + result = foo(args); + + /* Set result */ + ... + if (success) { + return TCL_OK; + } else { + return TCL_ERROR; + } +} +\end{verbatim} + +The other extension mechanism is an object/type interface that allows programmers to create new +kinds of fundamental types or attach special properties to objects in +the interpreter. This usually involves setting up tables of function +pointers that define various properties of an object. For example, if +you wanted to add complex numbers to an interpreter, you might fill in a special +data structure with pointers to various methods like this: + +\begin{verbatim} +NumberMethods ComplexMethods { + complex_add, + complex_sub, + complex_mul, + complex_div, + ... +}; +\end{verbatim} + +\noindent +Once registered with the interpreter, the methods in this structure +would be invoked by various interpreter operators such as $+$, +$-$, $*$, and $/$. + +Most interpreters handle errors as a two-step process in which +detailed error information is first registered with the interpreter +and then a special error code is returned. For example, in Tcl, errors +are handled by setting error information in the interpreter and +returning a value of TCL\_ERROR. Similarly in Python, errors are +handled by raising an exception and returning NULL. In both cases, +this triggers the interpreter's error handler---possibly resulting in +a stack trace of the running script. In some cases, an interpreter +might handle errors using a form of the C {\tt longjmp} function. +For example, Perl provides a special function {\tt die} that jumps back +to the interpreter with a fatal error \cite{advperl}. + +The precise implementation details of these mechanisms aren't so +important for our discussion. The critical point is that scripting +languages always access extension code though a well-defined interface +that precisely defines how arguments are to be passed, values are to be +returned, and errors are to be handled. + +\section{Scripting Languages and Signals} + +Under normal circumstances, errors in extension code are handled +through the error-handling API provided by the scripting language +interpreter. For example, if an invalid function parameter is passed, +a program can simply set an error message and return to the +interpreter. Similarly, automatic wrapper generators such as SWIG can produce +code to convert C++ exceptions and other C-related error handling +schemes to scripting language errors \cite{swigexcept}. On the other +hand, segmentation faults, failed assertions, and similar problems +produce signals that cause the interpreter to crash. + +Most scripting languages provide limited support for Unix signal +handling \cite{stevens}. However, this support is not sufficiently advanced to +recover from fatal signals produced by extension code. +First, unlike signals generated for asynchronous events such as I/O, +execution can {\em not} be resumed at the point of a fatal signal. +Therefore, even if such a signal could be caught and handled by a script, +there isn't much that it can do except to print a diagnostic +message and abort before the signal handler returns. Second, +some interpreters block signal delivery while executing +extension code--opting to handle signals at a time when it is more convenient. +In this case, a signal such as SIGSEGV would simply cause the whole application +to freeze since there is no way for execution to continue to a point where +the signal could be delivered. Because of these issues, scripting languages +either ignore the problem or label it as an ``limitation.'' + +\section{Overview of WAD} + +WAD installs a reliable signal handler for +SIGSEGV, SIGBUS, SIGABRT, SIGILL, and SIGFPE using {\tt sigaction} +\cite{stevens}. Since none of these signals are normally used in the implementation +of the scripting interpreter or by any user scripts, this typically does not override any previous +signal handling. Afterwards, when one of these signals occurs, a two-phase +recovery process executes. First, +information is collected about the execution context including a +full stack-trace, symbol table entries, and debugging information. +Second, the current stream of execution is aborted and an error is +returned to the interpreter. This process is illustrated in Figure~3. + +The collection of context and debugging information is a relatively +straightforward process involving the following steps: + +\begin{itemize} +\item The program counter and stack pointer are obtained from +context information passed to the WAD signal handler. + +\item The virtual memory map of the process is obtained from /proc +and used to associate virtual memory addresses with executable files, +shared libraries, and dynamically loaded extension modules \cite{proc}. + +\item The call stack is unwound to collect traceback information. +each step of the stack traceback, symbol table and debugging +information is gathered and stored in a generic data structure for later use +in the recovery process. This data is obtained by memory-mapping +the ELF format object files associated with the process and extracting +symbol table and stabs debugging information \cite{elf,stabs}. +\end{itemize} + +Once debugging information has been collected, the signal handler +enters an error-recovery phase that +attempts to raise an exception and return to a suitable location in the +interpreter. To do this, the following steps are performed: + +\begin{itemize} + +\item The stack trace is examined to see if there are any locations to which +control can be returned. + +\item If a suitable return location is found, the CPU context is modified in +a manner that makes the signal handler return to the interpreter +with an error. This return process is assisted by a small +trampoline function (partially written in assembly language) that arranges a proper +return to the interpreter after the signal handler returns. +\end{itemize} + +\noindent +Of the two phases, the return to the interpreter is of greater interest. Therefore, it +is now described in greater detail. + +\begin{figure*}[t] +\begin{picture}(480,340)(5,60) + +\put(50,330){\framebox(200,70){}} +\put(60,388){\tt >>> {\bf foo()}} +\put(60,376){\tt Traceback (most recent call last):} +\put(70,364){\tt File "", line 1, in ?} +\put(60,352){\tt SegFault: [ C stack trace ]} +\put(60,340){\tt ...} + +\put(55,392){\line(-1,0){25}} +\put(30,392){\line(0,-1){80}} +\put(30,312){\line(1,0){95}} +\put(125,312){\vector(0,-1){10}} +\put(175,302){\line(0,1){10}} +\put(175,312){\line(1,0){95}} +\put(270,312){\line(0,1){65}} +\put(270,377){\vector(-1,0){30}} + +\put(50,285){\framebox(200,15)[c]{[Python internals]}} +\put(125,285){\vector(0,-1){10}} +\put(175,275){\vector(0,1){10}} +\put(50,260){\framebox(200,15)[c]{call\_builtin()}} +\put(125,260){\vector(0,-1){10}} +%\put(175,250){\vector(0,1){10}} +\put(50,235){\framebox(200,15)[c]{wrap\_foo()}} +\put(125,235){\vector(0,-1){10}} +\put(50,210){\framebox(200,15)[c]{foo()}} +\put(125,210){\vector(0,-1){10}} +\put(50,185){\framebox(200,15)[c]{doh()}} +\put(125,185){\vector(0,-1){20}} +\put(110,148){SIGSEGV} +\put(160,152){\vector(1,0){100}} +\put(260,70){\framebox(200,100){}} +\put(310,155){WAD signal handler} +\put(265,140){1. Unwind C stack} +\put(265,125){2. Gather symbols and debugging info} +\put(265,110){3. Find safe return location} +\put(265,95){4. Raise Python exception} +\put(265,80){5. Modify CPU context and return} + +\put(260,185){\framebox(200,15)[c]{return assist}} +\put(365,174){Return from signal} +\put(360,170){\vector(0,1){15}} +\put(360,200){\line(0,1){65}} + +%\put(360,70){\line(0,-1){10}} +%\put(360,60){\line(1,0){110}} +%\put(470,60){\line(0,1){130}} +%\put(470,190){\vector(-1,0){10}} + +\put(360,265){\vector(-1,0){105}} +\put(255,250){NULL} +\put(255,270){Return to interpreter} + +\end{picture} + +\caption{Control Flow of the Error Recovery Mechanism for Python} +\label{wad} +\end{figure*} + +\section{Returning to the Interpreter} + +To return to the interpreter, WAD maintains a table of symbolic names +and return values that correspond to locations within the interpreter responsible for invoking +wrapper functions and object/type methods. For example, Table 1 shows a partial list of +return locations used in the Python implementation. When an error +occurs, the call stack is scanned for the first occurrence of any +symbol in this table. If a match is found, control is returned to that location +by emulating the return of a wrapper function with the error code from the table. If +no match is found, the error handler simply prints a stack trace to +standard output and aborts. + +When a symbolic match is found, WAD invokes a special user-defined +handler function that is written for a specific scripting language. +The primary role of this handler is to take debugging information +gathered from the call stack and generate an appropriate scripting language error. +One peculiar problem of this step is that the generation +of an error may require the use of parameters passed to a +wrapper function. For example, in the Tcl wrapper shown earlier, one +of the arguments was an object of type ``{\tt Tcl\_Interp *}''. +This object contains information specific to the state of the +interpreter (and multiple interpreter objects may exist in a single +application). Unfortunately, no reference to the interpreter object is +available in the signal handler. Furthermore, the interpreter +object may not be available in the context of a function that generated the error. + + +\begin{table}[t] +\begin{center} +\begin{tabular}{ll} +Python symbol & Return value \\ \hline +call\_builtin & NULL \\ +PyObject\_Print & -1 \\ +PyObject\_CallFunction & NULL \\ +PyObject\_CallMethod & NULL \\ +PyObject\_CallObject & NULL \\ +PyObject\_Cmp & -1 \\ +PyObject\_DelAttrString & -1 \\ +PyObject\_DelItem & -1 \\ +PyObject\_GetAttrString & NULL \\ +\end{tabular} +\end{center} +\label{returnpoints} +\caption{A partial list of symbolic return locations in the Python interpreter} +\end{table} + +To work around this problem, WAD implements a feature +known as argument stealing. When examining the call-stack, the signal +handler has full access to all function arguments and local variables. +Therefore, if the handler knows that an error was generated while +calling a wrapper function (as determined by looking at the symbol names), +it can grab the interpreter object from the stack frame of the wrapper and +use it to set an appropriate error code before returning to the interpreter. +Currently, this is managed by allowing the signal handler to steal +arguments from the caller using positional information. +For example, to grab the {\tt Tcl\_Interp *} object from a Tcl wrapper function, +code similar to the following is written: + +\begin{verbatim} +Tcl_Interp *interp; +int err; + +interp = (Tcl_Interp *) wad_steal_outarg( + stack, + "TclExecuteByteCode", + 1, + &err); +if (!err) { + Tcl_SetResult(interp,errtype,TCL_STATIC); + Tcl_AddErrorInfo(interp,errdetails); +} +\end{verbatim} + +In this case, the 2nd argument passed to a wrapper function +is stolen and used to generate an error. Also, the name {\tt TclExecuteByteCode} +refers to the calling function, not the wrapper function itself. +At this time, argument stealing is only applicable to simple types +such as integers and pointers. However, this is adequate for generating +scripting language errors. + +\section{Register Management} + +A final issue concerning the return mechanism has to do with the +precise behavior of the non-local return to the interpreter. Roughly +speaking, this emulates the behavior of the C {\tt longjmp} +library call. However, this is done without the use of a matching +{\tt setjmp} in the interpreter. + +The primary problem with aborting execution and returning to the +interpreter in this manner is that most compilers use a register management technique +known as callee-save \cite{prag}. In this case, it is the responsibility of +the called function to save the state of the registers and to restore +them before returning to the caller. By making a non-local jump, +registers may be left in an inconsistent state due to the fact that +they are not restored to their original values. The {\tt longjmp} function +in the C library avoids this problem by relying upon {\tt setjmp} to save +the registers. Unfortunately, WAD does not have this +luxury. As a result, a return from the signal handler may produce a +corrupted set of registers at the point of return in the interpreter. + +The severity of this problem depends greatly on the architecture and +compiler. For example, on the SPARC, register windows effectively +solve the callee-save problem \cite{sparc}. In this case, each stack frame has its own +register window and the windows are flushed to the stack whenever a +signal occurs. Therefore, the recovery mechanism can examine the stack and +arrange to restore the registers to their proper values when control +is returned. Furthermore, certain conventions of the SPARC ABI resolve several related +issues. For example, floating point registers are caller-saved +and the contents of the SPARC global registers are not guaranteed to be preserved +across procedure calls (in fact, they are not even saved by {\tt setjmp}). + +On other platforms, the problem of register management becomes much +more interesting. One approach is to simply ignore the problem +altogether and return to the interpreter with the registers in an +essentially random state. Surprisingly, this approach actually seems to work (although a considerable degree of +caution might be in order). +This is because the return of an error code tends to trigger +a cascade of procedure returns within the implementation of the interpreter. +As a result, the values of the registers are simply discarded and +overwritten with restored values as the interpreter unwinds itself and prepares to handle an +exception. A better solution to this problem is to modify the recovery mechanism to discover and +restore saved registers from the stack. Unfortunately, there is +no standardized way to know exactly where the registers might have been saved. +Therefore, a heuristic scheme that examines the machine code for each procedure would +have to be used to try and identify stack locations. This approach is used by gdb +and other debuggers when they allow users to inspect register values +within arbitrary stack frames \cite{gdb}. However, this technique has +not yet been implemented in WAD due to its obvious implementation difficulty and the +fact that the WAD prototype has primarily been developed for the SPARC. + +As a fall-back, WAD can be configured to return control to a location +previously specified with {\tt setjmp}. Unfortunately, this either +requires modifications to the interpreter or its extension modules. +Although this kind of instrumentation can be facilitated by automatic +wrapper code generators, it is not a preferred solution and is +not discussed further. + +\section{Implementation Details} + +Currently, WAD is implemented in ANSI C and small amount of assembly +code to assist in the return to the interpreter. The current +implementation supports Python, Tcl, and Perl extensions on SPARC Solaris. An +i386-Linux port has also been developed. The entire implementation contains +approximately 1500 semicolons and most of this code is related to the gathering of debugging +information. Furthermore, due to the hostile environment in which the +recovery process must run, the implementation takes great care not to utilize the +process heap. This allows the signal handler to collect information in situations +where the heap allocator has been corrupted or destroyed in some manner. + +Although there are libraries such as the GNU Binary File Descriptor +(BFD) library that can assist with the manipulation of object files +these are not used in the implementation \cite{bfd}. First, these +libraries tend to be quite large and are oriented more towards +stand-alone tools such as debuggers, linkers, and loaders. Second, +the behavior of these libraries with respect to memory management +would need to be carefully studied before they could be safely used in +an embedded environment. Finally, given the small size of the +implementation, it didn't seem necessary to rely upon such a +heavyweight solution. + +\section{Discussion} + +The primary goal of embedded error recovery is to provide an +alternative approach for debugging scripting language extensions. +Although this approach has many benefits, there are a number +drawbacks and issues that must be discussed. + +First, like the C {\tt longjmp} function, the error recovery mechanism +does not cleanly unwind the call stack. For C++, this means that +objects allocated on stack will not be finalized (destructors will not +be invoked) and that memory allocated on the heap may be +leaked. Similarly, this could result in open files, sockets, and other +system resources. Furthermore, in a multi-threaded environment, +deadlock may occur if a procedure holds a lock when an error occurs. + +Second, the use of signals may interact adversely with both scripting +language signal handling and signal handling in thread libraries. +Since scripting languages ordinarily do not catch signals such as +SIGSEGV, SIGBUS, and SIGABRT, the use of WAD is unlikely to conflict +with any existing signal handling. However, this does not prevent a +module from overriding the error recovery mechanism with its own +signal handler. Threads present a different sort of signal handling problem +due to the fact that thread libraries tend to override default signal handling \cite{thread}. +In this case, the thread library directs fatal signals to the thread in which the problem occurred. +However, first-hand experience has shown that certain implementations +of user threads do not reliably pass signal context information nor do +they universally support advanced signal operations such as {\tt +sigaltstack}. Because of this, the WAD recovery mechanism may not be +compatible with a crippled implementation of user threads on certain +platforms. To further complicate matters, the recovery process itself is +not thread-safe (i.e., it is not possible to concurrently handle fatal errors +occurring different threads). + +Third, certain types of errors may result in an unrecoverable crash. +For example, if an application overwrites the heap, it may destroy +critical data structures within the interpreter. +Similarly, +destruction of the call stack (via buffer overflow) makes it +impossible for the recovery mechanism to create a stack-trace and +return to the interpreter. Although it might be possible to add a heuristic scheme for +recovering a partial stack trace such as backward stack tracing, no such feature has been implemented +\cite{debug}. Finally, memory management problems such as +double-freeing of heap allocated memory can cause a system to fail in +a way that bears little resemblance to the actual source of the +problem. + +Finally, there are a number of issues that pertain +to the interaction of the recovery mechanism with the interpreter. +First, the recovery scheme is unable to return to procedures +that might invoke wrapper functions with conflicting return codes. +This problem manifests itself when the interpreter's virtual +machine is built around a large {\tt switch} statement from which different +types of wrapper functions are called. For example, in Python, certain +internal procedures call a mix of functions where both NULL and -1 are +returned to indicate errors (depending on the function). In this case, there +is no way for WAD to easily determine which return value to use. Second, +the recovery process is extremely inefficient. This is because the +data collection process relies heavily upon {\tt mmap}, file I/O, and linear search +algorithms for finding symbols and debugging information. Therefore, it would +probably not be suitable as a general purpose exception handling mechanism. +Finally, even when an error is successfully returned to the interpreter +and presented to the user, it may not be possible to resume execution of +the application (e.g., even though the interpreter is operational, the extension +module may be corrupted in some manner). + +Despite these limitations, embedded error recovery is applicable to a +wide range of extension-related errors. This is because errors such as +failed assertions, bus errors, and floating point exceptions rarely +result in a situation where the recovery process would be unable to run or the +interpreter would crash. Furthermore, more serious errors such as segmentation faults are more +likely to caused by an uninitialized pointer than a blatant +destruction of the heap or stack. + +\section{Related Work} + +A huge body of literature is devoted to the topic of exception +handling in various languages and systems. Furthermore, the topic +remains one of active interest in the software community. For +instance, IEEE Transactions on Software Engineering recently devoted +two entire issues to current trends in exception handling +\cite{except1,except2}. Unfortunately, very little of this work seems +to be directly related to mixed compiled-interpreted exception +handling, recovery from fatal signals, and problems pertaining to +mixed-language debugging. + +Perhaps the most directly relevant work is that of advanced programming +environments for Common Lisp \cite{lisp}. Not only does CL have a foreign function interface, +debuggers such as gdb have previously been modified to walk the Lisp stack +\cite{ffi,wcl}. Furthermore, certain Lisp development environments have +provided a high degree of integration between compiled code and +the Lisp interpreter\footnote{Note to program committee: I +have been unable to find a suitable reference describing this capability. However, +discussions with Richard Gabriel and other people in the Lisp community seem to indicate that +such work has been done. Please advise.} + +In certain cases, a scripting language module has been used to provide +partial information for fatal signals. For example, the Perl {\tt +sigtrap} module can be used to produce a Perl stack trace when a +problem occurs \cite{perl}. Unfortunately, this module does not +provide any information from the C stack. Similarly, advanced software development +environments such as Microsoft's Visual Studio can automatically launch a C/C++ +debugger when an error occurs. Unfortunately, this doesn't provide any information +about the script that was running. + +In the area of programming languages, a number of efforts have been made to +map signals to exceptions in the form of asynchronous exception handling +\cite{buhr,ml,haskell}. Unfortunately, this work tends to +concentrate on the problem of handling asynchronous signals related to I/O as opposed +to synchronously generated signals caused by software faults. + +With respect to debugging, little work appears to have been done in the area of +mixed compiled-interpreted debugging. Although modern debuggers +certainly try to provide advanced capabilities for debugging within a +single language, they tend to ignore the boundary between languages. +As previously mentioned, debuggers have occasionally been modified to +support other languages such as Common Lisp \cite{wcl}. However, no such work appears +to have been done in the context of modern scripting languages. One system of possible interest +in the context of mixed compiled-interpreted debugging is the R$^{n}$ +system developed at Rice University in the mid-1980's \cite{carle}. This +system, primarily developed for scientific computing, allowed control +to transparently pass between compiled code and an interpreter. +Furthermore, the system allowed dynamic patching of an executable in +which compiled procedures could be replaced by an interpreted +replacement. Although this system does not directly pertain to the problem of +debugging of scripting language extensions, it is one of the few +examples of a system in which compiled and interpreted code have been +tightly integrated within a debugger. + +\section{Future Directions} + +As of this writing, WAD is only an experimental prototype. Because of +this, there are certainly a wide variety of incremental improvements +that could be made to support additional platforms and scripting +languages. In addition, there are a variety of improvements that could be made +to provide better integration with threads and C++. + +A more interesting extension of this work would be to expose a broader +range of debugging capabilities to the scripting interpreter. For example, +rather than simply raising an exception with limited diagnostic +information, the recovery mechanism might be able to provide the +interpreter with a detailed snapshot of the entire call stack +including symbolic debugging information. Using this information, it +might be possible to implement an interactive post-mortem debugger +that allows a programmer to inspect the values of local +variables and other aspects of the application without leaving the +interpreter. Alternatively, it may be possible to integrate this information +into an existing script-level debugger. + +\section{Conclusions and Availability} + +This paper has presented a mechanism by which fatal errors such as +segmentation faults and failed assertions can be handled as scripting +language exceptions. This approach, which relies upon advanced +features of Unix signal handling, allows fatal signals to be caught +and transformed into errors from which interpreters can produce an +informative cross-language stack trace. In doing so, it provides more +seamless integration between scripting languages and compiled +extensions. Furthermore, this has the potential to greatly simplify the +frustrating task of debugging complicated mixed scripted-compiled +software. + +The prototype implementation of this system is available at : + +\begin{center} +{\tt http://systems.cs.uchicago.edu/wad}. +\end{center} + +\noindent +Currently, WAD supports Python, +Tcl, and Perl on SPARC Solaris and i386-Linux systems. Work to +support additional scripting languages and platforms is ongoing. + +\section{Acknowledgments} + +Richard Gabriel and Harlan Sexton provided interesting insights concerning similar capabilities +in Common Lisp. + +\begin{thebibliography}{99} + + +\bibitem{ousterhout} J. K. Ousterhout, {\em Tcl: An Embedable Command Language}, +Proceedings of the USENIX Association Winter Conference, 1990. + +\bibitem{ouster1} J. K. Ousterhout, {\em Scripting: Higher-Level Programming for the 21st Century}, +IEEE Computer, Vol 31, No. 3, p. 23-30, 1998. + +\bibitem{perl} L. Wall, T. Christiansen, and R. Schwartz, {\em Programming Perl}, 2nd. Ed. +O'Reilly \& Associates, 1996. + +\bibitem{python} M. Lutz, {\em Programming Python}, O'Reilly \& Associates, 1996. + +\bibitem{guile} Thomas Lord, {\em An Anatomy of Guile, The Interface to +Tcl/Tk}, USENIX 3rd Annual Tcl/Tk Workshop 1995. + +\bibitem{php} T. Ratschiller and T. Gerken, {\em Web Application Development with PHP 4.0}, +New Riders, 2000. + +\bibitem{ruby} D. Thomas, A. Hunt, {\em Programming Ruby}, Addison-Wesley, 2001. + +\bibitem{swig} D.M. Beazley, {\em SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++}, Proceedings of the 4th USENIX Tcl/Tk Workshop, p. 129-139, July 1996. + +\bibitem{sip} P. Thompson, {\em SIP},\\ +{\tt http://www.thekompany.com/projects/pykde}. + +\bibitem{pyfort} P.~F.~Dubois, {\em Climate Data Analysis Software}, 8th International Python Conference, +Arlington, VA., 2000. + +\bibitem{f2py} P. Peterson, J. Martins, and J. Alonso, +{\em Fortran to Python Interface Generator with an application to Aerospace +Engineering}, 9th International Python Conference, submitted, 2000. + +\bibitem{advperl} S. Srinivasan, {\em Advanced Perl Programming}, O'Reilly \& Associates, 1997. + +\bibitem{heidrich} Wolfgang Heidrich and Philipp Slusallek, {\em Automatic Generation of Tcl Bindings for C and C++ Libraries.}, +USENIX 3rd Tcl/Tk Workshop, 1995. + +\bibitem{vtk} K. Martin, {\em Automated Wrapping of a C++ Class Library into Tcl}, +USENIX 4th Tcl/Tk Workshop, p. 141-148, 1996. + +\bibitem{gwrap} C. Lee, {\em G-Wrap: A tool for exporting C libraries into Scheme Interpreters},\\ +{\tt http://www.cs.cmu.edu/\~{ }chrislee/ +Software/g-wrap}. + +\bibitem{wrappy} G. Couch, C. Huang, and T. Ferrin, {\em Wrappy :A Python Wrapper +Generator for C++ Classes}, O'Reilly Open Source Software Convention, 1999. + +\bibitem{gdb} R. Stallman and R. Pesch, {\em Using GDB: A Guide to the GNU Source-Level Debugger}. +Free Software Foundation and Cygnus Support, Cambridge, MA, 1991. + +\bibitem{swigexcept} D.M. Beazley and P.S. Lomdahl, {\em Feeding a +Large-scale Physics Application to Python}, 6th International Python +Conference, co-sponsored by USENIX, p. 21-28, 1997. + +\bibitem{stevens} W. Richard Stevens, {\em UNIX Network Programming: Interprocess Communication, Volume 2}. PTR +Prentice-Hall, 1998. + +\bibitem{proc} R. Faulkner and R. Gomes, {\em The Process File System and Process Model in UNIX System V}, USENIX Conference Proceedings, +January 1991. + +\bibitem{elf} J.~R.~Levine, {\em Linkers \& Loaders.} Morgan Kaufmann Publishers, 2000. + +\bibitem{stabs} Free Software Foundation, {\em The "stabs" debugging format}. GNU info document. + +\bibitem{prag} M.L. Scott. {\em Programming Language Pragmatics}, Morgan Kaufmann Publishers, 2000. + +\bibitem{sparc} D. Weaver and T. Germond, {\em SPARC Architecture Manual Version 9}, +Prentice-Hall, 1993. + +\bibitem{bfd} S. Chamberlain. {\em libbfd: The Binary File Descriptor Library}. Cygnus Support, bfd version 3.0 edition, April 1991. + +\bibitem{thread} F. Mueller, {\em A Library Implementation of POSIX Threads Under Unix}, +USENIX Winter Technical Conference, San Diego, CA., p. 29-42, 1993. + +\bibitem{debug} J. B. Rosenberg, {\em How Debuggers Work: Algorithms, Data Structures, and +Architecture}, John Wiley \& Sons, 1996. + +\bibitem{except1} D.E. Perry, A. Romanovsky, and A. Tripathi, {\em +Current Trends in Exception Handling-Part I}, +IEEE Transactions on Software Engineering, Vol 26, No. 9, p. 817-819, 2000. + +\bibitem{except2} D.E. Perry, A. Romanovsky, and A. Tripathi, {\em +Current Trends in Exception Handling-Part II}, +IEEE Transactions on Software Engineering, Vol 26, No. 10, p. 921-922, 2000. + + +\bibitem{lisp} G.L. Steele Jr., {\em Common Lisp: The Language, Second Edition}, Digital Press, +Bedford, MA. 1990. + +\bibitem{ffi} H. Sexton, {\em Foreign Functions and Common Lisp}, in Lisp Pointers, Vol 1, No. 5, 1988. + +\bibitem{wcl} W. Henessey, {\em WCL: Delivering Efficient Common Lisp Applications Under Unix}, +ACM Conference on Lisp and Functional Languages, p. 260-269, 1992. + +\bibitem{buhr} P.A. Buhr and W.Y.R. Mok, {\em Advanced Exception Handling Mechanisms}, IEEE Transactions on Software Engineering, +Vol. 26, No. 9, p. 820-836, 2000. + +\bibitem{haskell} S. Marlow, S. P. Jones, and A. Moran. {\em +Asynchronous Exceptions in Haskell.} In 4th International Workshop on +High-Level Concurrent Languages, September 2000. + +\bibitem{ml} J. H. Reppy, {\em Asynchronous Signals in Standard ML}. Technical Report TR90-1144, +Cornell University, Computer Science Department, 1990. + +\bibitem{carle} A. Carle, D. Cooper, R. Hood, K. Kennedy, L. Torczon, S. Warren, +{\em A Practical Environment for Scientific Programming.} +IEEE Computer, Vol 20, No. 11, p. 75-89, 1987. + + + + + + +\end{thebibliography} + +\end{document} + + + + + + + +