Table of Contents
Compiler-Base
libraryDylan-Dump ModuleCompile-Time-Values ModuleSource ModuleTokens ModuleHeader ModulePlatform ModuleErrors ModuleSignature-Interface and
Signature Modules
Names ModuleDefinitions ModuleVariables ModulePolicy ModuleCType ModuleTransformers ModuleRepresentation ModuleClasses ModuleType-Dump ModuleC-Representation ModuleCompile-Time-Functions ModuleFlow ModuleCompiler-Parser
libraryCompiler-Front
libraryCompiler-Convert
libraryLexEnv ModuleCompile-Time-Eval ModuleExpanders ModuleFER-Convert ModuleDefine-Macros ModuleDefine-Libraries-and-Modules Module
Define-Functions ModuleDefine-Constants-and-Variables Module
Define-Classes ModuleTop-Level-Expressions ModuleCompiler-Optimize LibraryCompiler-CBack LibraryCompiler-Main LibraryDylan Runtime LibraryThe d2c compiler was developed by the Gwydion Project at CMU. This chapter explains the structure of d2c and gives a brief overview of how the various components interact.
d2c compiles one library at a time. Each library contains one or more modules. A module may be implemented by one or more files. Each file contains one or more source records, the logical unit of Dylan ™ compilation.
By compiling many source records at once, d2c produces more efficient code at the cost of using more memory. In addition to processing an entire library's worth of source records, d2c also performs interlibrary optimizations by looking at certain information from previously compiled libraries.
In its current configuration, d2c is not well-suited to developing new Dylan ™ code. It is, however, well-suited to building production versions of libraries. Obvious future projects include reducing the granularity of compilation and implementing a selectively-incremental compiler that can modify the object code of a running program. The basic architecture was designed with these projects in mind.
The actual compilation process procedes roughly as follows:
Load any libraries used by the current library.
The compiler begins by reading in the
*.lib.du file for the
Dylan library. As other libraries are
used, it loads the appropriate
*.lib.du files. These contain most of the
compiler data structures from the original versions of those
libraries.
Parse and macro-expand each of the input files.
The parser is implemented by the Compiler-Parser
library. As each
top-level form gets parsed, it is passed to
process-top-level-form, which is implemented by
the Compiler-Convert
library. At the end of this stage, the input files have
been completely parsed and macro-expanded.
Finalize the top-level forms.
Top-level forms may contain forward dependencies within a
module. In most cases, these dependencies aren't necessitated by
the design of the
Dylan
™
language, but they are needed for for
efficient compilation. We need to study these dependencies if we
want to make incremental compilation work. The implementation of
finalize-top-level-form lives in
Compiler-Convert
library.
Process the class hierarchy.
Optimizing
Dylan
™
code requires careful attention to the
object model. Unfortunately, d2c uses algorithms which require
the entire class hiearchy to be computed before link time. (The
biggest problem here appear to be assigning unique IDs to
individual classes.) The code to analyze the class hierachy lives
in the
Classes module of the Compiler-Base
library.
Begin processing top-level definitions individually.
Up until this point, d2c performed each step on the
entire library before continuing. The next several steps,
however, are completed for each top-level form before moving on
to the next top-level form. This process is controlled by
compile-1-tlf in the Compiler-Main
library
Convert the parse tree into the front-end representation
.
For various unfortunate reasons, d2c uses the term
front-end representation
(FER) for what other compilers call the
intermediate representation. The
conversion process is controlled by methods on
convert-top-level-form provided by the
Compiler-Convert
library. These take the parsed representation of a
top-level form and convert it to the FER
using the the
Builder-Interface module.
Optimize the FER of each top-level form.
The compiler performs two kinds of optimizations:
simplications and required
optimizations. The simplications should produce output
equivalent to their input. The required optimizations,
however, clean up certain artifacts in the code and insert
type checks. It is an error to pass a top-level form to the
back end before performing the required optimizations. See
the section called “The Compiler-Optimize Library” for more details.
Emit C code for the FER of each top-level form. The Compiler-CBack
library emits the actual C code required to
implement each top-level form.
Dump the local and global heaps.
Each library has a local heap which gets linked into the
library itself. This contains as much static data as
possible. However, not all data can safely live here; some must
live in the executable application. The per-library heap is
called the local heap and the
per-application heap is called the global
heap. The latter is only dumped if the current
library will become an executable application. Both dumpers are
implemented by the
Heap module.
Save a *.lib.du file for the current
library if necessary
.
If d2c is not compiling an executable application, it
dumps the data structures used by this library into a
*.lib.du file. If another
Dylan
™
library
includes this library, the dumped data structures will be used to
perform inter-library optimizations. The dumped data is stored as
persistent objects in Object Description
Format (ODF). The dumping process
is controlled by the Compiler-Main
library. ODF is
implemented by the OD-Format
module.
Generate a Makefile and run the C compiler.
The compilation driver in the Compiler-Main
library creates a Makefile
and C source files as needed during the compilation process. Once
all the necessary code has been generated and the heaps are
built, d2c invokes GNU make on the Makefile.
The Compiler-Base
library exports the basic data structures used by the compiler.
It also provides a number of miscellaneous utility modules. Every
other library relies on Compiler-Base
library.
Several modules in Compiler-Base
library appear to contain assorted utility
routines used elsewhere in the compiler.
Common
This module imports a number of standard modules and
re-exports some or all of their constants. Essentially, this
module is used to specify a local
Dylan
™
dialect for use by
the rest of the compiler. It includes
Dylan, some of
Extensions, a few names from
Table-Extensions and several of the
I/O-related modules.
UtilsThis appears to be the “grab bag” module. It holds a mix of completely unrelated minor classes and functions.
The OD-Format module was designed by
RobMacLachlan and implements the Object
Description Format, a binary format for
storing persistent objects. This is used by the
*.lib.du files emitted by d2c,
though it might be potentially useful elsewhere.
The file
gd/src/d2c/compiler/base/od-format.dylan
contains an extensive discussion the module's goals and
data format.
The basic requirement of the "Object Description Format" (ODF) is to allow an open-ended variety of objects to be dumped into a file, where:
instances may refer to each other in an arbitrary graph structure, and
some of the references may be to objects that are defined in distinct object description units (files, etc.)
In order to make the design interesting, the ODF attempts to satisfy various incompatible requirements:
To support efficient parsing and unparsing on a variety of architectures, but also to potentially serve as a cross-platform interchange format.
To work well on sequential media (like a socket), but also to potentially support random-access demand-loading of objects when possible (e.g. on files.)
We should consider making this into a standalone library at some point in the future.
<data-unit-name> | [Constant] |
Things that can be the name of a data unit.Value
<symbol>Description
<data-unit-name>is a<symbol>because is must be comparible with\==.
$library-summary-unit-type | [Constant] |
A constant for
begin-dumpingValue0Description
This is currently the only type of OD format allowed. To build another format (e.g. DOOD), a new exported constant must be added to
OD-Formatand the file extension must be added to$unit-type-strings(an internal constant toOD-Format).
begin-dumping | [Function] |
Returns a dump state used by other OD functions.Parameters
Return Values
name An instance of <data-unit-name>. The thing to dump.
type An instance of <integer>. The "unit type code"
where:An instance of false-or(<string>). Dump file name.
Description
dump-state An instance of <dump-state>.
Creates a state (which acts very much like a buffer) for other dumping functions to use to build the dump file in memory. The
end-dumpingfunction actually writes the built information to file.
end-dumping | [Function] |
Flushes the dump state to file.Parameters
Return Values
state An instance of <dump-state>.
NoneDescription
Actually output the file.
dump-od | [Open Generic] |
Write an object to the dump buffer.Parameters
Return Values
object An instance of <object>. The object to dump.
buffer An instance of <dump-buffer>. The in-memory buffer into which the object should be dumped.
NoneDescription
This routine is overloaded to dump individual types objects. Methods for primitive Dylan ™ types can be found in the section called “The
Dylan-DumpModule”.
<dump-state> | [Class] |
Holder of the in-progress object dump.Superclasses
<dump-buffer>Initialization KeywordsNoneDescription
Contains all information about the current object-dump in progress. It is illegal to use a dump-state after it has been passed to
end-dumping. All instances of<dump-state>are created by calls tobegin-dumping.
<dispatcher> | [Class] |
Binary to in-memory instance transformer.Superclasses
<object>Initialization KeywordsDescription
initialize-from:An instance of <object>. Unless#f, initialize instance's dispatch information from this dispatcher.
A dispatcher is an object that tells the loader how to convert binary data into Dylan ™ objects.
*default-dispatcher* | [Variable] |
Default dispatcher used for initialization.Type
<dispatcher>Description
Dylan-dumploads this variable with dispatching information for several commonly-used Dylan ™ types (see the section called “TheDylan-DumpModule”).
*data-unit-search-path* | [Variable] |
Paths to du files.Type
<sequence>Value
#[]Description
This is used by
find-data-unitto locate the du file to load.
<location-hint> | [Constant] |
A hint of where to find a data unit.Value
<byte-string>Description
This is currently an absolute(?) pathname.
<word> | [Constant] |
A portable, full-sized <integer>.
Value<integer>Description
It seems mindy has a limited size for
<integer>, so, for mindy,<word>is a<general-integer>to allow for proper bit manipulations that do not clobber the value via overflow.
find-data-unit | [Method] |
Locates then loads a du file.Parameters
Return Values
name An instance of <data-unit-name>.
type An instance of <integer>. (Currently only$library-summary-unit-typeallowed).
location-hint:An instance of false-or(<location-hint>). File name for the dump file; used if the data unit cannot be found in the current*data-unit-search-path*.
check-hash:An instance of false-or(<word>). Can be used to detect version mismatches between data units (used internally asexpected-hash)
dispatcher:An instance of <dispatcher>. Provides the dispatcher to be used for the loading process.
Description
res An instance of <data-unit>. Contains the resulting transformed objects from the du file.
Find-data-unit, when given a data unit name and type, returns the corresponding data unit. This is the main entry for causing loading to happen. Also used recursively.
To write a dumper for a class, one needs to think of a
name for the class (usually just take the <>'s off the
class name), register the name with
register-object-id, and define a dumper. If
your data structure contains any object sharing or circular
usages, also see
the section called “Circular Usage” and
the section called “Object Sharing”. For
automatic generation of object dumpers and loaders, see
the section called “Autodump”.
register-object-id | [Method] |
Associates a number for a type for the OD dumping functions.Parameters
Return Values
name An instance of <symbol>. The name of the type.
id An instance of <integer>. The associated integral value.
NoneDescription
Equate the
namewith theid. This function is not exported; rather, allregister-object-idcalls must occur in theOD-Formatmodule.Of course, this requirement makes it really tough for extending
OD-Formatfor user defined types ...
add-od-loader | [Function] |
Register a method for reading a dumped object back into memory.Parameters
Return Values
dispatcher An instance of <dispatcher>. When loading .lib.du files, this is the*default-dispatcher*
name An instance of <symbol>. The method will be activated when the loader encounters the type registered withregister-object-idin the du file.
func An instance of <function>. The loader method.
NoneDescription
This function is the “other half” of
dump-od. It is called to register a function for reloading dumped objects, which has the signature:method(state :: <load-state>) => res :: <object-type-read-in>If you use
add-od-loader, you will need to define a corresponding method ondump-od(see the section called “Dumping Objects”).
add-make-dumper | [Method] |
Convenience method to add both a loader and dumper.Parameters
Return Values
name An instance of <symbol>. Type name.
dispatcher An instance of <dispatcher>. Where the loader and dumper are being added.
obj-class An instance of <class>. Must be the direct class of the instances we are to dump (and of course, must also be instantiable via make.)
slots An instance of <list>. The slots' descriptions of the class.
load-external:An instance of <boolean>.
dumper-only:An instance of <boolean>. If#t, do not create the loader.
load-side-effect:An instance of false-or(<function>).
dump-side-effect:An instance of false-or(<function>).
NoneDescription
A convenient way to define both a loader and a dumper for objects of class
obj-class.Slotsis a list which is in this format:#(slot-accessor-function-1, init-keyword-1, setter-function-1, slot-accessor-function-2, init-keyword-2, setter-function-2, ...)Note that the stuff is implicitly grouped in clumps of 3. Either of the init-keyword or the setter may be
#f, but not both. The "make dumper" requests backpatching when it encounters an unresolved forward reference (due to circularities). Any slot which might have a forward reference must have a setter function (it can also have an init-keyword, which will be used if possible).A "make dumper" may be used when an object can be reconstructed from some accessor values by reinstatiating the object with
makeand backpatching some slots. This is pretty general, since the accessor and setter function are not required to be slot accessors and setters.
In the dump files, objects contain the following
parts: a header, raw data, sub-objects, and an
end-of-object marker. The header is a single machine word
which contains the class of the object. (Actually, it
contains the object-id, which is found by registry look-up.)
The header also indicates if there is any raw data and
sub-objects. The raw data section is zero or more machine
words which only have meaning to the object's loader. If
raw data is present, it is preceded by a byte-count (which
is a single machine word). If there are sub-objects, they
come next. The last object in a sub-object list is a
magical $end-object, which denotes both the
end of the sub-object list and the last word of the current
object.
For more detail on the binary representation and on issues such as endian-ness and machine word size, see the section called “Binary Representation”.
When an object is loaded, the loading code reads in the header. The object class id is extracted from the header. The dispatcher then provides a load function based on the class id, which reads the remainder of the data associated with the object and returns the reconstructed object.
When adding a method to dump-od use the
methods described here to push out the objects correctly.
dump-definition-header | [Method] |
Dump an object definition header.Parameters
Return Values
name An instance of <symbol>.
buf An instance of <dump-buffer>. Probably best to use the<dump-state>instance passed intodump-od
subobjects:An instance of <object>.#tmeans this object has subobjects
raw-data:An instance of <object>. The Raw-Format field in an object-definition header encodes information needed to byte/word format translation, and in the degenerate case, indicates there is no raw data at all. Allowable format codes are$odf-no-raw-data-format,$odf-byte-raw-data-format,$odf-16bit-raw-data-format,$odf-32bit-raw-data-format,$odf-64bit-raw-data-format,$odf-untranslatable-raw-data-format, and$odf-word-raw-data-format(which is 32 or 64 as appropriate, used for<word>in format pseudo-objects).
NoneDescription
The first dump call in
dump-opfor simple objects, this function dumps the object's header to the buffer. If one is dumping compound objects, first save the position of the buf before calling this method with some code like this:define method dump-od(obj :: <ratio>, buf :: <dump-state>) => (); let start-pos = buf.current-pos; dump-definition-header(#"ratio", buf, subobjects: #t); dump-od(obj.numerator, buf); dump-od(obj.denominator, buf); dump-end-entry(start-pos, buf); end method dump-od;This is necessary so that
dump-end-entry(see the next documented method) writes out the terminal correctly.
dump-end-entry | [Method] |
Dump an "end of object" marker. This is only necessary if there are subobjects.Parameters
Return Values
start-posn An instance of <integer>. Where the object began.
buf An instance of <dump-buffer>. Usually the<dump-state>instance passed intodump-od.
NoneDescription
Before dumping the compound object, be sure to save the buffer's current position (as in the sample code for
dump-definition-header) because this method needs that information.
dump-simple-object | [Method] |
Dump an object that has only a constant number of subobjects (no raw data).Parameters
Return Values
name An instance of <symbol>. The instance's class name.
buf An instance of <dump-buffer>. Usually the<dump-state>instance passed intodump-od.
slots An instance of <object>. The values of the slots of this object.
NoneDescription
This method provides a simple way to dump objects of known size, so using this method, writing
dump-odfor<ratio>would be:define method dump-od(obj :: <ratio>, buf :: <dump-state>) => (); dump-simple-object(#"ratio", buf, obj.numerator, obj.denominator); end method dump-od;
dump-word | [Method] |
Dump an integer into the buffer as one word.Parameters
Return Values
obj An instance of <word>. The integer to dump.
buf An instance of <dump-buffer>.
NoneDescription
This dumps an
<integer>as part of of an object. If you want to dump an<integer>as an<integer>, use thedump-odon<integer>in the section called “TheDylan-DumpModule”.
dump-raw-data | [Method] |
Dumps a collection into the buffer.Parameters
Return Values
obj An instance of <collection>. The collection to dump.
bsize An instance of <integer>. The number of elements in the collection.
buf An instance of <dump-buffer>.
NoneDescription
This take a collection, obj, and dumps its contents to buf. This method is usually called in connection with dumping a complex object, for example,
dump-odfor<byte-string>in the section called “TheDylan-DumpModule” dumps strings thusly:define method dump-od(obj :: <byte-string>, buf :: <dump-buffer>) => (); dump-definition-header(#"byte-string", buf, raw-data: $odf-byte-raw-data-format); dump-raw-data(obj, obj.size, buf); end method;
The basic concept of writing a loader is to read in the
information you wrote out with the method on
dump-od. This process (writing loaders) is a
bit different than writing dumpers, though. Here, you provide
a method to add-od-loader (described in
the section called “Defining Simple Dumpers and Loaders”)
That is activated by a <dispatcher> instance
when it encounters the object in the dump file to load.
A loader on <byte-string>
(the corresponding dump-od method is found in
the section called “Functions for Writing Dumpers”) would
look like this:
add-od-loader(*default-dispatcher*, #"byte-string",
method (state :: <load-state>)
=> res :: <byte-string>;
let next = state.od-next;
let bsize = buffer-word(state.od-buffer, next);
state.od-next := next + $word-bytes;
let res = make(<byte-string>, size: bsize);
load-raw-data(res, bsize, state);
res;
end method
);
(In this case, the call to add-od-loader
is made at the top-level in dylan-dump.dylan.)
The functions used to build the dispatch method are described below.
<load-state> | [Class] |
A load-state is an object that describes the current state of the loading process. Among other things, it contains an input <stream> and the buffer to the stream.Superclasses
<object>
Initialization KeywordsDescription
stream:An instance of <stream>. The stream we're reading from. Only used for refilling the buffer. This slot is accessed via theod-streammethod.
buffer:An instance of <buffer>. State of the stream buffer, which we hold for the duration of the load. It is accessed via theod-buffermethod.
next:An instance of <buffer-index>. The index of the state's stream; it is accessed via theod-nextmethod and can be set (changed) using theod-next-settermethod.
end:An instance of <buffer-index>. The end-index for the buffer and stream. This slot is accessed via theod-endmethod.
position-offset:An instance of <integer>. When added to the od-next pointer, this offset yields the byte offset of our current position from the data-unit start. This must be updated whenever we refill the buffer.
dispatcher:An instance of <dispatcher>.
A
<load-state>object is passed into the dispatching method to reconstruct a dumped object (so, you do not need tomakea<load-state>instance). Use that object in conjuction with the functions described below.
$end-object | [Constant] |
An end-of-subobjects marker.Type
<end-object>Description
A magical object which is always the last sub-object in an object.
load-object-dispatch | [Method] |
Read an object header and invoke the appropriate loader. Return the object that the loader returns.Parameters
Return Values
state An instance of <load-state>.
Description
res An instance of <object>. The reconstructed object.
Start loading some objects from a load-state. Dispatches to an appropriate loader method depending on the dispatcher and the entry type. Returns the object.
For example, to load a
<ratio>instance, build a dispatching method like this:add-od-loader(*default-dispatcher*, #"ratio", method (state :: <load-state>) => res :: <ratio>; let npart = load-object-dispatch(state); let dpart = load-object-dispatch(state); assert-end-object(state); ratio(npart, dpart); end method );
load-raw-data | [Method] |
Loads data into a parameter.Parameters
Return Values
res An instance of <byte-string>. A pre-allocated buffer where the raw data gets loaded.
elsize An instance of <integer>. Size of the buffer (or the maximum bytes to load).
state An instance of <load-state>.
Description
nnext An instance of <buffer-index>. The position in the<load-state>'s buffer after loading the raw data.
Loads raw data into res. It is up to the dispatching method to interpret that loaded raw data in order to reconstruct the object's state.
Usually a call to
load-raw-datais to load a string, so reconstruction is easy (see the example at the top of this section). One thing of note is that this method is not functional:resis not returned, but is a destructively modified parameter.
load-subobjects-vector | [Method] |
Returns a vector of an object's subobjects.Parameters
Return Values
state An instance of <load-state>.
size-hint An instance of false-or(<integer>). If a number, then the number of subobjects MUST match.
Description
res An instance of <simple-object-vector>. The subobjects of the object.
Loads the dumped subobjects into a vector. Note that the subobjects may be unresolved when loaded with this function ... it is the responsibility of the user of this function to resolve those objects (see the section called “Circular Usage” for more information on functions for object resolution such as
obj-resolved?,actual-obj, andrequest-backpatch).So, getting a
<vector>from this function still requires programming effort, e.g.:add-od-loader(*default-dispatcher*, #"simple-object-vector", method (state :: <load-state>) => res :: <simple-object-vector>; let contents = load-subobjects-vector(state, #f); let rsize = contents.size; let res = make(<simple-object-vector>, size: rsize); for (i from 0 below rsize) let el = contents[i]; if (el.obj-resolved?) res[i] := el.actual-obj; else res[i] := el; request-backpatch(el, method (actual) res[i] := actual end); end; end; res; end method );Fortunately, the
Dylan-dumpmodule provides this dispatching method. See the section called “TheDylan-DumpModule”.
load-sole-subobject | [Method] |
Loads the only subobject of this object.Parameters
Return Values
state An instance of <load-state>.
Description
res An instance of <object>. The resolved subobject.
Load a one-slot object. The value is returned, and we check that there was in fact only one subobject.
assert-end-object | [Method] |
Reads from the buffer, and signals an error if the next thing read isn't an end-of-object marker.Parameters
Return Values
state An instance of <load-state>.
NoneDescription
Ensure the subobject list ends with the end-of-object token.
To deal with circular object usage graphs, several mechanisms are employed. Objects can be dumped as forward references and defined later. The loader for the object containing the forward reference has the option of delaying object creation until the forward referenced object is loaded, or requesting that the object be backpatched when the forward referenced object is finally loaded.
The following descriptions TBD:
new-local-id
label-next-object
dump-local-reference
<forward-ref>
obj-resolved?
actual-obj
request-backpatch
resolve-forward-ref
By default, it is assumed that the object being dumped has no object sharing, i.e., for all sub-objects the ref-count is one. For any object that has a ref-count greater than one, that object must be a sub-class of <identity-preserving-mixin>. (If it is ok to dump the object multiple times and load it back as separate objects, then this superclass may be omitted.)
The following descriptions TBD:
<identity-preserving-mixin>
maybe-dump-reference
To allow efficiency as well as potential interchange,
objects are described by a word sequence which vaguely
resembles the in-core representation. The word size is
implementation defined, and dumper/loader methods dump/load
data as words having the native byte ordering. The format is
defined so that any necessary byte-swapping and word-size
adjustment can be done as a pre-pass filter to the actual
loader. The format is described in detail at the top of
od-format.dylan.
The implication of the above is that data-unit files
produced from OD-Format are not portable.
These examples show the bits for some literals, assuming with 32bit word size and big-endian order (which, by the way i386 is not big-endian), and assigning convenient values for the object class-IDs. See the actual code for the real definitions and class IDs.
For example, we could describe a byte-string like this:
0: <Header? = #b1, Etype = #b00, Subobjects? = #b0, Raw-Format = #x1,
Class-ID = byte-string-odf-id>
1: string byte-count
2[ceiling(byte-count, word-bytes)]:
<string-chars: *>
Assuming byte-string-$odf-id is
#x666, the actual bits for "Foo"
would be:
#x81000666
#x00000003
#x466F6F00
A list has subobjects, but no raw data, so it would look like:
0: <Header? = #b1, Etype = #b00, Subobjects? = #b1, Raw-Format = #x0,
Raw-Size = #x00, Class-ID = list-odf-id>
1[N]: <header of subobject 0, subobject data>
...
M: <Header? = #b1, Etype = #b01, Start-Offset: * = M>
If list-odf-id were #x667,
then #("Foo", "Bar") would be:
0: #x90000667
1: #x81000666
2: #x00000003
4: #x466F6F00
5: #x81000666
6: #x00000003
9: #x42617200
7: #xA0000007
This demonstrates the nesting of objects, and shows how the end is marked. Note that:
"Foo" is represented by exactly
the same bits here (it is position-independent), and
that
The entire sequence is position-independent, since the end-entry has a relative offset, and that
The basic subobject protocol doesn't say in advance how many subobjects there's going to be; you have to wait for the end header. This could be avoided for e.g. vectors by considering subobject 0 to be the length.
This module provides ODF dumping and loading routines for all of the primitive types.
This module exports <ct-value>, which
represents a value known at compile time. It also supplies
<literal> and a number of subclasses to represent
literal strings, integers, lists and other primitive types.
Other compile time values can be found elsewhere. Literal
<type> objects are handled by classes in the
CType
module. Literal <class> objects are
handled by classes in the Classes
module. Literal <function> objects are
handled by classes in the Compile
Time Functions module.
d2c was designed to fetch source records from several sources, including flat text files and eventaully code databases. To permit this flexibility, d2c uses a very abstract interface for representing locations in source code.
The <source-location> class is an open,
abstract class with no slots. All instances must provide a method
for describe-source-location.
The source-location generic takes an arbitrary
object as an argument and returns a source location. The mix-in
class <source-location-mixin> provides the
init-keyword source-location: and a method on
source-location which returns the value of the keyword
or <unknown-source-location>.
The <source-file> class reads an entire
input file into memory and provides access to the contents. It's
tightly-coupled to the class
<file-source-location>, which represents a
location within a source file. Separating these two classes is not
feasible. Among other things, this code generates the attractive,
multiline displays of exactly where an error occured.
This module provides classes and constants for representing tokens and syntax tables.
Dylan ™ source files have RFC 822 headers, similar to those used by Internet e-mail and many other protocols. This module provides support for parsing headers at the top of a file and extracting the values of keywords. White space at the begining and ends of lines is removed automatically.
This module contains code to parse platforms.descr
, which contains descriptions of the various host
platforms supported by d2c. A description of each of the
parameters appears in platforms.descr itself.
This module exports a number of classes and functions related to compiler errors and warnings.
The Signature-Interface module creates a
group of names which are actually implemented in the
Signature module. These represent the formal
parameter and result lists of a
Dylan
™
function.
When reporting errors, d2c often needs to provide names for various functions, anonymous local methods and other things found in Dylan ™ programs. This module knows about several different kinds of names and how they should be displayed to the user.
It does not appear that these classes are used in as part of the compiler's own computations. A quick look suggests that this is merely interface and debugging code.
I'm not quite sure what this module does, so take this description with a grain of salt. Better yet, figure out what this module does and fix this documentation.
The top-of-file comment for this module reads:
Abstract class that all definitions inherit from. A definition is the compilers handle on some run-time value. In addition to the obvious things, definitions exist for things like the type of a variable (when it isn't a compile-time constant).
It appears that named, top-level forms each get a definition. From the comment, it sounds as if the following code produces two definitions:
define variable foo :: run-time-expression() = #f;
One definition refers to the variable itself, and the other refers to the type value computed at initialization time. Is this a correct assumption?
This module is in change of libraries, modules and
namespaces. It maintains the global table of libraries, processes
the various clauses in define library and define
module, and keeps track of where variable definitions
live. The magic Dylan-User modules get set up
here as well.
Each <variable> object also provides slots
for the associated definition, transformers and constant
evaluator. It would appear that this module defines some of the
more important data structures in the compiler.
The Policy represents an
optimization policy: a set of parameters
used for making tradeoffs between speed, safety and code size. A
quick inspection of the code shows that a
<policy> object will not survive the dumping and
loading processes intact. Do these actually get used?
Does ctype stand for “constant
type”, “compile-time type”, or something else
entirely?
This module defines the compiler's type model. This is used
by the Flow
module to represent the types of expressions, and by the
Compiler-Optimize
library during several phases of optimization.
Most of these types can be represented as
Dylan
™
objects.
Such types also inherit from <ct-value>, so they
can be represented as compile-time values (see
the section called “The Compile-Time-Values Module”).
All ctypes inherit from <values-ctype>.
Single-value types (e.g., the types of variables) are represented
by subclasses of <ctype>. Multiple-value types
(e.g., the return values of functions) are represented as instances
of <multi-value-ctype>.
Various subclasses of <ctype>
represent class types, union types and all the standard (and
non-standard) limited types supported by d2c. The class
<unknown-ctype> represents a type we won't know
until runtime. The function empty-type returns a
type union with no members. No object can ever be an instance of
this type. (Empty types are used extensively when computing the
intersections, unions and differences of types.)
Class types are defined in the Classes
module.
<multi-value-ctype> is a bit tricky. It
represents zero or more required values of known types, zero or
more optional values of known types, and an optional
#rest list—which may also be typed. (See the
source for more information.) The function
wild-ctype returns a multi-value type representing
zero or more values of any type.
This module supports type intersections, type unions and type differences, primarily for use by the optimizer. All of the operations may be imprecise; if so, the second return value will typically be false.
The intersection of two types is described as:
Return as restrictive a type as we can discover that is no more restrictive than the intersection of Type1 and Type2. The second value is true if the result is exact. At worst, we arbitrarily return one of the arguments as the first value (trying not to return an unknown type).
The union of two types is described as:
Find a type which includes both types. The result is an unknown type if either of the arguments are unknown; otherwise the result is precise.
The difference of two types is described as:
Return our best guess at the type that describes all objects that are in Type1 but not in Type2. If we can't precisely determine this type, then return something more inclusive than it, but never more inclusive than Type1.
Other things in this module include subtype tests, type
extent computation and type specifiers. The various
“-dispatch” functions have nothing to do
with the compiler's support for generic functions—they're
just hooks for extending various generic functions defined in this
module.
Transformers are functions that are attached to function
definitions. They serve as a sort of procedural macro in the
optimizer. For instance, the do() function is transformed into
an ordinary loop construct so the optimizer can bang on it
some more. See the cheese module for usage
of this.
This module provides a very abstract interface for choosing
data representations in generated code. It currently appears to be
implemented by the
C-Representation module.
This module contains the guts of the class system, which is
implemented with classes as a subtype of
<ctype>. This module also computes class
precedence lists, unique class IDs and slot layouts.
This is an implementation module which contains ODF support
for various subclasses of <ctype>.
This modules knows about the different C data types. It also knows about heap representations, direct representations and general representations.
From the top-of-file comment:
A ct-function is a kind of compile-time value used to represent a function. ct-functions contain various linkage-related information needed to call the function, but don't reference the FER for the function (e.g. the <function-literal>.) This information is used both by the backend and by the heap builder.
This is also where Peter Housel added the support for callback functions. This module includes support for general entries and generic entries, which are presumably the type-checked and dispatched entry points for functions, respectively.
These classes define the flow model
used by the compiler. Together with the classes defined in the
Front
module, they make up the intermediate
representation used to communicate between the
Compiler-Convert
library, the Compiler-Front
library, the Compiler-Optimize
library and the
Compiler-CBack
library.
Unfortunately, the front-end and the back-end of the compiler
use a slightly different version of the intermediate
representation. To confuse matters further, both versions are
called the “front-end representation” (see
the section called “The Builder-Interface Module” for more
details).
The actual flow model is fairly straightforward. Data-flow
classes are defined in data-flow.dylan, and
control-flow classes are defined in
control-flow.dylan.
The data-flow classes represent variables, constants, operations and assignments. They use a generalization of the traditional three-address form. Expressions are broken down into a series of step-by-step assignments to temporary variables. The right-hand side of an assignment may contain a constant, a variable or a single operation (nesting is not allowed). The operation may take any number of arguments, each of which must be another variable or a constant.
When the flow classes are first created, a particular variable may be assigned to more than once. The optimizer, however, converts all variables to single static assignment (SSA) form. In SSA form, each variable is given a value at creation time and never modified. This restriction simplifies many optimization algorithms considerably.
Prior to SSA conversion, variables are represented by the
class <intial-variable>. Each individual
assignment to a variable is represented as an
<initial-definition>. After SSA conversion,
variables are represented by the class
<ssa-variable>.
The control-flow model represents groups of assignments as
regions. The simplest kind of region is
<simple-region>, which represents a linear series
of assignments. A <compound-region> holds several
regions which are executed in sequence. The various subclasses of
<join-region> represent control constructs with
non-linear flow patterns. An <exit-region>
transfers control to an enclosing block; various subclasses are
used to return from functions, exit loops and perform other kinds
of funky control flow.
This part of the compiler is well-commented, so you may want
to read the source for further details. You'll also want to look at
the classes defined in Front
module, most of which extend the control- and data-flow
model in various useful ways.
The classes <join-operation>,
<join-assignment> and
<join-region> are currently ignored by the rest
of the compiler. In each case the word “join” refers
to the Phi function used in traditional SSA form. These classes
will be used if anybody ever fixes the optimizer's SSA
conversion.