swig/Doc/engineering.html

<html>
<head>
<title>SWIG Engineering Manual</title>
</head>
<body bgcolor="#ffffff">
<center>
<h1>SWIG Engineering Manual</h1>

<b>David Beazley <br>
Department of Computer Science <br>
University of Chicago <br>
Chicago, IL  60637 <br>
beazley@cs.uchicago.edu <br>
</b>
</center>

<p>
<b>$Header$</b>

<p>
(Note : This is a work in progress.)

<h2>1. Introduction</h2>

The purpose of this document is to describe various coding conventions
and organizational aspects for SWIG developers. The idea for this
document is largely borrowed from John Ousterhout's Tcl/Tk Engineering
Manual.  It is not my intent to overly managerial about matters--rather I'm
hoping to make life a little less chaotic for everyone.

<p>
First a little background: SWIG was started in 1995 as a one-person
project and continued in this mode of operation until about 1998.
Most of this development was driven by ideas submitted by early SWIG
users as opposed to being motivated by a grand design.  As a result,
the code ended up being a pretty horrible C++ coding disaster.  A
mostly working disaster perhaps, but a disaster nonetheless.

<p>
With that said, the primary goal of future SWIG development is to
reengineer the original system, fix most of its inherent design flaws,
and to produce what I hope will become a highly extensible and modular
interface compiler framework.  To this do this, there are a few
critical areas of work.  First, I want to restructure SWIG as a
collection of loosely coupled modules written in either ANSI C or an
scripting language.  Second, I want the system to be minimalistic in
its use of data structures and interconnections.  The primary reason
for this is that the fewer data structures there are, the less users
will have to remember.  This will also make the system more accessible
to non-experts.  Finally, I want to reevaluate the whole idea of a
SWIG module is and expand the definition to include just about
anything from parsers, preprocessors, optimizers, interface editors,
and code generators.

<p>
The rest of this document outlines a few general rules of how code
should be developed within the SWIG project.  These rules are
primarily drawn from my own experience developing software and
observing the practices of other successful projects.

<h2>2. Programming Languages and Libraries </h2>

All SWIG modules must be written in either ANSI C or one of the
scripting languages for which SWIG can generate an interface (e.g.,
Perl, Python, or Tcl).  <B>C++ is NOT an acceptable alternative and
will not be utilized for any future development due to the fact that
it is too complicated, too dogmatic, too problematic, and that Dave
would rather take a bullet to the head than write one more line of
code in this most decidedly unpleasant language. </B> Rare exceptions
to this rule may be made if there is a justifiable need to interface
an existing piece of software written in C++ into the SWIG module
system.  Anyone who finds this rule to be unreasonable is more than
welcome to go write their own wrapper generator--so there.

<p>
Module writers should make every attempt to use only those functions
described in the POSIX.1 standard.  This includes most of the
functions contained the Kernighan and Ritchie C programming book.  Use
of operating system dependent functionality such as socket libraries
should always be included inside a conditional compilation block so
that it can be omitted on problematic platforms.  If you are unsure
about a library call, check the man page or contact Dave.

<h2>3. The Source Directory and Module Names</h2>

All SWIG modules are contained within the "Source" directory.  Within
this directory, each module is placed into its own subdirectory.  The
name of this subdirectory should exactly match the name of the module.
For example, if you are creating a module called "Tcl", all of your
files should be placed in a directory "Tcl".

<p>
When choosing a module name, please pick a name that is not
currently in use.  As a general convention, the first letter of a
module name is capitalized such as "Perl".  Alternatives such as
"perl" or "PERL" should be avoided.  In certain instances, the first
two letters may be capitalized as in "CParse."  The exact usage of
this is somewhat inconsistent and isn't terribly important--just make
sure the first letter is capitalized.  Also, module names should not
start with numbers, include underscores or any other special
non-alphanumeric characters.

<h2>4. Include files </h2>

All modules should include a header file that defines the public interface.
The name of this header file should be of the form "swigmodule.h" where
"module" is the name of your module.  For example, if you created a
module "Perl", the header file should be named "swigperl.h".   This scheme
should prevent header-file naming conflicts both within SWIG and when linking
parts of SWIG to the outside world.

<p>
All header files should include a short description, author information, copyright message,
CVS version, include guards, and be C++ aware. For example:

<blockquote>
<pre>
/* -------------------------------------------------------------------------
 * swigperl.h
 *
 *     All of the externally visible functions in the Perl module.
 *
 * Author(s) : David Beazley (beazley@cs.uchicago.edu)
 *
 * Copyright (C) 1999-2000, The University of Chicago.
 * See the file LICENSE for information on usage and redistribution.
 *
 * $Header$
 * ------------------------------------------------------------------------- */

#ifndef _SWIGPERL_H
#define _SWIGPERL_H   1

#ifdef __cplusplus
extern "C" {
#endif

/* Your declarations here */
...

#ifdef __cplusplus
}
#endif

#endif  /* _SWIGPERL_H */
</pre>
</blockquote>


<p>
To minimize compilation time, please include as few other header files as possible.

<h2>5. File Structure </h2>

Each file in a module should be given a filename that is all lowercase letters
such as "parser.c", not "Parser.c" or "PARSER.c".   Please note that filenames
are case-insensitive on Windows so this convention will prevent you from inadvertantly
creating two files that differ in case-only.

<p>
Each file should include a short abstract, author information, copyright information, and
a CVS revision tag like this:

<blockquote>
<pre>
/* -----------------------------------------------------------------------------
 * include.c
 *
 *     This file implements the functions used to locate and include files in
 *     the SWIG library.  Functions for maintaining the library search path are
 *     also located here.
 *
 * Author(s) : David Beazley (beazley@cs.uchicago.edu)
 *
 * Copyright (C) 1999-2000, The University of Chicago.
 * See the file LICENSE for information on usage and redistribution.
 * ----------------------------------------------------------------------------- */

static char cvsroot[] = "$Header$";

#include "swig.h"

/* Declarations */
typedef struct {
   int x, y;
} Foo;

...

/* Private Declarations (used only in this file) */
static int  avariable;

...

/* Functions */
...

</pre>
</blockquote>

The CVS revision tag should be placed into a static string as shown
above.  This adds the revision information to the SWIG executable and
makes it possible to extract version information from a raw binary
(sometimes useful in debugging).

<p>
As a general rule, files start to get unmanagable once they exceed
about 2000 lines.  Files larger than this should be broken up into
multiple files.  Similarly, you should avoid the temptation to create
many small files as this increases compilation time and makes the
directory structure too complicated.

<h2>6. Bottom-Up Design </h2>

Within each source file, the preferred organization is to use what is
known as "bottom-up" design.  Under this scheme, lower-level functions
appear first and the highest level function appears last.  The easy
way to remember is that the "main" function of your module should
always appear last in the source file.  For example:

<blockquote>
<pre>
/* Simple bottom-up program */
#include &lt;stdio.h&gt;

int foo(int x, int y) {
    /* Implement foo */
    ...
}

int bar() {
    ...
    foo(i,j);
    ...
}

...
int main(int argc, char **argv) {
    ...
    bar();
    ...
}
</pre>
</blockquote>

This choice of design is somewhat arbitrary however it has a number of
benefits particular to C. In particular, a bottom-up design generally
eliminates the need to include forward references--resulting in
cleaner code and fewer compilation errors.

<h2>7. Functions</h2>

All functions should have a function header that gives the function name
and a short description like this:

<blockquote>
<pre>
/* -------------------------------------------------------------------------
 * Swig_add_directory()
 *
 * Adds a directory to the SWIG search path.
 * ------------------------------------------------------------------------- */

void
Swig_add_directory(DOH *dirname) {
...

}
</pre>
</blockquote>

In the function declaration, the return type and any specifiers
(extern or static) should appear on a separate line followed by the
function name and arguments as shown above.  The left curly brace
should appear on the same line as the function name.

<p>
Function declarations should <b>NOT</b> use the pre-ANSI function
declaration syntax.   The ANSI standard has been around long enough for
this to be a non-issue.

<h2>8. Naming Conventions</h2>

The following conventions are used to name various objects throughout SWIG.

<h4>Functions</h4>

Functions should consist of the module name and the function name separated by an underscore like this:

<blockquote>
<pre>
Preprocessor_define()
Swig_add_directory()
</pre>
</blockquote>

In general, the module name should match the name of the module
subdirectory and the function name should be in all lowercase with
words separated by underscores.

<h4>Structures and Types</h4>

If your module defines new structures, the structure name should include the name of the
module and the name of the structure appended together like this:

<blockquote>
<pre>
typedef struct SwigScanner {
   ...
} SwigScanner;

typedef struct LParseType {
   ...
} LParseType;
</pre>
</blockquote>

In this case, both the name of the module and the type should be capitalized.  Also, whenever
possible, you should use the "typedef struct Name { ... } Name" form when defining new
data structures.

<h4>Global Variables</h4>

Global variables should be avoided if at all possible.  However, if you must use a global
variable, please prepend the module name and use the same naming scheme as for functions.

<h4>Constants</h4>

Constants should be created using #define and should be in all caps like this:

<blockquote>
<pre>
#define   SWIG_TOKEN_LPAREN  1
</pre>
</blockquote>

Separate words in a constant should be separated by underscores as with functions.

<h4>Structure members</h4>

Structure members should be in all lower-case and follow the same word-separation convention
as for function names.  However, the module name does not have to be included.
For example:

<blockquote>
<pre>
typedef struct SwigScanner {
  DOH           *text;           /* Current token value */
  DOH           *scanobjs;       /* Objects being scanned */
  DOH           *str;            /* Current object being scanned */
  char          *idstart;        /* Optional identifier start characters */
  int            next_token;     /* Next token to be returned */
  int            start_line;     /* Starting line of certain declarations */
  int            yylen;          /* Length of text pushed into text */
  DOH           *file;           /* Current file name */
} SwigScanner;
</pre>
</blockquote>

<h4>Static Functions and Variables </h4>

Static declarations are free to use any naming convention that is appropriate. However, most
existing parts of SWIG use lower-case names and follow the same convention as described for functions.

<h2>9. Visibility</h2>

Modules should keep the following rules in mind when exposing their internals:

<ul>
<li>Only publicly accessible functions should be included in the module header file.
<li>All non-static declarations must be prepended with some form of the module name
to avoid potential linker namespace conflicts with other modules.
<li>Modules should not expose global variables or use global variables in their
public interface.
<li>Similarly, modules should discourage the direct manipulation of data contained
within data structures in favor of using function calls instead.  For example,
instead of providing a user with a structure like this:

<blockquote>
<pre>
typedef struct Foo {
   int line;
} Foo;
</pre>
</blockquote>

It is better to hide the implementation of Foo and provide an
function-call interface like this:

<blockquote>
<pre>
typedef struct Foo Foo;
extern int  Foo_getline(Foo *f);
extern void Foo_setline(Foo *f, int line);
</pre>
</blockquote>

Although this results in worse performance, there are many practical
reasons for doing this.  The most important reason is that it allows
you to change the internal representation of Foo without breaking all
of the other modules or having to recompile the entire universe after
making your changes.

</ul>

<h2>10. Guile Support Internals</h2>

Please direct questions about this section to
<a href="mailto:ttn@glug.org">ttn@glug.org</a>.
Last update: 2000-04-03 05:27:34-0700.

<h3>Meaning of "Module"</h3>

<p>
There are three different concepts of "module" involved, defined separately
for SWIG, Guile, and Libtool.  To avoid horrible confusion, we explicitly
prefix the context, e.g., "guile-module".

<h3>Linkage</h3>

<p>
Guile support is complicated by a lack of user community cohesiveness, which
manifests in multiple shared-library usage conventions.  A set of policies
implementing a usage convention is called a <b>linkage</b>.  The default
linkage is the simplest; nothing special is done.  In this case
<code>SWIG_init()</code> is provided and users must do something like this:

<blockquote>
<pre>
(define my-so (dynamic-link "./example.so"))
(dynamic-call "SWIG_init" my-so)
</pre>
</blockquote>

At this time, the name <code>SWIG_init</code> is hardcoded; this approach does
not work with multiple swig-modules.  NOTE: The "simple" and "matrix" examples
under Examples/guile include guilemain.i; the resulting standalone interpreter
does not require calls to <code>dynamic-link</code> and
<code>dynamic-call</code>, as shown here.

<p>
A second linkage creates "libtool dl module" wrappers, and currently is
broken.  Whoever fixes this needs to track Guile's libtool dl module
convention, since that is not finalized.

<p>
The only other linkage supported at this time creates shared object libraries
suitable for use by hobbit's <code>(hobbit4d link)</code> guile module.  This
is called the "hobbit" linkage, and requires also using the "-package" command
line option to set the part of the module name before the last symbol.  For
example, both command lines: [checkme:ttn]

<blockquote>
<pre>
swig -guile -package my/lib foo.i
swig -guile -package my/lib -module foo foo.i
</pre>
</blockquote>

would create module <code>(my lib foo)</code> (assuming in the first case
foo.i declares the module to be "foo").  The installed files are
my/lib/libfoo.so.X.Y.Z and friends.  This scheme is still very experimental;
the (hobbit4d link) conventions are not well understood.

<p>
There are no other linkage types planned, but that could change...  To add a
new type, add the name to the enum in guile.h and add the case to
<code>GUILE::emit_linkage()</code>.

<h3>Underscore Folding</h3>

<p>
Underscores are converted to dashes in identifiers.  Guile support may grow an
option to inhibit this folding in the future, but no one has complained so
far.

<h3>Typemaps</h3>

<p>
It used to be that the mappings for "native" types were included in
guile.cxx.  This information is now in Lib/guile/typemaps.i, which presents a
new challenge: how to have SWIG include typemaps.i before processing the
user's foo.i.  At this time, we must say:

<blockquote>
<pre>
%include guile/typemaps.i
</pre>
</blockquote>

in foo.i.  This may change in the future.

<h3>Smobs</h3>

<p>
For pointer types, SWIG can use Guile smobs if given the command-line
option "-with-smobs".  Ultimately this will be the default (and only)
behavior and the command-line option will no longer be supported.
Ideally, "-with-smobs" will not even make it to beta.

<p>
Currently, one wrapper module must be generated without
<code>-c</code> and compiled with <code>-DSWIG_GLOBAL</code>, all the
other wrapper modules must be generated with <code>-c</code>.  Maybe
one should move all the global helper functions that come from
<code>guile.swg</code> into a library, which is built by <code>make
runtime</code>.

<p>
In earlier versions of SWIG, C pointers were represented as Scheme
strings containing a hexadecimal rendering of the pointer value and a
mangled type name.  As Guile allows registering user types, so-called
"smobs" (small objects), a much cleaner representation has been
implemented now.  The details will be discussed in the following.

<p>
A smob is a cons cell where the lower half of the CAR contains the
smob type tag, while the upper half of the CAR and the whole CDR are
available.  <code>SWIG_Guile_Init()</code> registers a smob type named
"swig" with Guile; its type tag is stored in the variable
<code>swig_tag</code>.  The upper half of the CAR store an index into
a table of all C pointer types seen so far, to which new types seen
are appended.  The CDR stores the pointer value.  SWIG smobs print
like this: <code>#&lt;swig struct xyzzy * 0x1234affe&gt;</code>  Two of
them are <code>equal?</code> if and only if they have the same type
and value.

<p>
To construct a Scheme object from a C pointer, the wrapper code calls
the function <code>SWIG_Guile_MakePtr_Str()</code>, passing both a
mangled type string and a pretty type string.  The former is looked up
in the type table to get the type index to store in the upper half of
the CAR.  If the type is new, it is appended to type table.

<p>
To get the pointer represented by a smob, the wrapper code calls the
function <code>SWIG_Guile_GetPtr_Str</code>, passing the mangled name
of the expected pointer type, which is used for looking up the type in
the type table and accessing the list of compatible types.  If the
Scheme object passed was not a SWIG smob representing a compatible
pointer, a <code>wrong-type-arg</code> exception is raised.

<h3>Exception Handling</h3>

<p>
SWIG code calls <code>scm_error</code> on exception, using the following
mapping:

<pre>
      MAP(SWIG_MemoryError,	"swig-memory-error");
      MAP(SWIG_IOError,		"swig-io-error");
      MAP(SWIG_RuntimeError,	"swig-runtime-error");
      MAP(SWIG_IndexError,	"swig-index-error");
      MAP(SWIG_TypeError,	"swig-type-error");
      MAP(SWIG_DivisionByZero,	"swig-division-by-zero");
      MAP(SWIG_OverflowError,	"swig-overflow-error");
      MAP(SWIG_SyntaxError,	"swig-syntax-error");
      MAP(SWIG_ValueError,	"swig-value-error");
      MAP(SWIG_SystemError,	"swig-system-error");
</pre>

<p>
The default when not specified here is to use "swig-error".
See Lib/exception.i for details.


<h2>11. Miscellaneous </h2>

<ul>
<li> Do not use the ternary ?: operator.  It is unnecessarily error prone,
hard for people to read, and hard to maintain code that uses it.
</ul>

</body>
</html>