Global Abstraction­Safe Marshalling with Hash Types
James J. Leifer + Gilles Peskine + Peter Sewell # Keith Wansbrough #
+ INRIA Rocquencourt # University of Cambridge
{First.Last}@inria.fr {First.Last}@cl.cam.ac.uk
Abstract
Type abstraction is a key feature of ML­like languages for writ­
ing large programs. Marshalling is necessary for writing dis­
tributed programs, exchanging values via network byte­streams or
persistent stores. In this paper we combine the two, developing
compile­time and run­time semantics for marshalling, that guaran­
tee abstraction­safety between separately­built programs.
We obtain a namespace for abstract types that is global, i.e.
meaningful between programs, by hashing module declarations.
We examine the scenarios in which values of abstract types are
communicated from one program to another, and ensure, by con­
structing hashes appropriately, that the dynamic and static notions
of type equality mirror each other. We use singleton kinds to ex­
press abstraction in the static semantics; abstraction is tracked in
the dynamic semantics by coloured brackets. These allow us to
prove preservation, erasure, and coincidence results. We argue that
our proposal is a good basis for extensions to existing ML­like lan­
guages, pragmatically straightforward for language users and for
implementors.
Categories and Subject Descriptors
D.3.3 [Programming Languages]: Language Constructs and
Features
General Terms Languages, Theory, Verification
Keywords programming languages, ML, type theory, abstract
types, marshalling, serialisation, modules, singleton kinds, hash­
ing, distributed programming, lambda calculus
1. Introduction
Problem Type abstraction is a basic tool for modular program­
ming, allowing the programmer to separate the interface and the
implementation of an abstract data type, and to limit the scope in
which the implementation details are visible. Work on ML­style
module systems, including [19, 21, 11, 17], has led to expres­
sive language constructs for controlling abstraction, with modules
(structures) that can export abstract types, and also parameterised
modules (functors); they have rich notions of type equality to deal
with generativity and sharing. This work has largely been in the
non­distributed context, concerned only with isolated executions of
single programs. There, build­time type checking suffices to guar­
antee both type­safety and abstraction­safety --- the property that
values of an abstract type can only be inspected or constructed by
the code of its definition, and hence that any invariants of this code
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ICFP'03, August 25--29, 2003, Uppsala, Sweden.
Copyright 2003 ACM 1­58113­756­7/03/0008 ...$5.00.
hold of all values. At run­time, type information can be erased.
In the distributed setting, abstraction­safety is more subtle. One
may need to exchange values between multiple executions of the
same build, between executions of different builds of the same
sources, and between executions of builds of different sources
(sharing some modules, perhaps, but not all). This interaction
might be by network communication or via a persistent store; in
either case, some run­time check is clearly needed to guarantee
safety. For abstraction­safety, it does not suffice to check only the
underlying representation type; intuitively, we need also that the
sender and receiver have compatible invariants. This can be en­
forced by requiring that they have the same code, but in general,
where there is only partial sharing, we shall see that the design of
an appropriate check is delicate.
We focus in particular on language support for marshalling a
value to a byte string and unmarshalling such strings back to val­
ues. With these, one can implement a variety of useful mechanisms
above the standard (byte­string) primitives for network communi­
cation and persistence. For example: (1) In the existing distributed
languages JoCaml [14] and Nomadic Pict [27] a single program
can dynamically distribute computations, which can then interact
via typed channels, but unsafe ``name servers'' are required to boot­
strap connections between programs. Type­ and abstraction­safe
marshalling would enable such name servers to be expressed in a
safe way. (2) More generally, safe marshalling would enable one
to code up a variety of communication abstractions, such as typed
channels with differing behaviour (asynchronous, unicast, multi­
cast,. . . ), within a high­level language; they would then be auto­
matically guaranteed to be safe.
Contribution We present a type system and semantics (both
compile­time and run­time) for marshalling and unmarshalling val­
ues between separate programs. Our solution:
. covers modules that declare abstract types, and ancillary type­
sharing constraints;
. involves a dynamic type­check at unmarshal time that guaran­
tees both type­safety and abstraction­safety;
. ensures the resulting dynamic notion of type equality coincides
with the usual static notion, so that distributed programming is a
smooth extension of local programming;
. ``just works'' in standard cases for interaction between pro­
grams that share some modules, without requiring any shared data
beyond the source code for these modules;
. supports controlled abstraction­breaking, where required; and
. is efficiently implementable.
It is therefore a good basis for extensions to existing ML­like lan­
guages, pragmatically straightforward for language users and for
implementors.
Approach The basic idea of our solution is to construct a
global namespace for abstract types, meaningful across all pro­
grams, by hashing module declarations. Hash types do not

appear in source programs, but are constructed at compile­
time. For example, consider a module called N with the body
struct type t=Trep let x=... end and the published inter­
face sig type t val x:... end. The hash h =
hash(module N=struct type t=Trep let x= ... end
: sig type t val x: ... end, t)
would be constructed to give a run­time analogue of the compile­
time abstract type name N.t. By constructing hashes carefully,
we ensure that a simple run­time syntactic type equality check,
at unmarshal­time, corresponds to the compile­time notion of type
equivalence used in type­checking.
The standard operational semantics for existentials forgets ab­
straction. In contrast, we give a run­time semantics that records
which subterms can see through which abstractions using coloured
brackets, adapting a device of [10]. For example, within the code of
N.x, even after it has been substituted into its usage sites, the type
equality h ==Trep can be used. This enables us to prove type­ and
abstraction­preservation, progress, and the coincidence result men­
tioned above. We prove also that an implementation may safely
erase all coloured brackets outside hashes at run­time.
Non­goals Our focus in this paper is on what mechanisms are
required to guarantee abstraction­safety. We do not address the
full ML language; instead, we focus on a core language based on
simply­typed #­calculus with abstract and manifest modules, al­
though we argue that our formal system may be cleanly extended.
Dynamic rebinding of identifiers within marshalled values is con­
sidered in [4]. Moreover, we are not here concerned with low­
level representations of marshalled values; we assume some fixed
scheme for marshalling simply­typed values. Finally, we protect
against confusion, not malice.
Outline We begin in Sec. 2 by examining scenarios in which val­
ues of abstract types are communicated between programs, identi­
fying the desired constructs and behaviour from the programmer's
point of view. Sec. 3 outlines our solution informally, shows why
it provides the desired behaviour, and shows it can be implemented
efficiently. In Sec. 4 we present a formal calculus, # hash , that covers
the novel aspects of our solution. It describes networks of interact­
ing separately­built modular programs. In Sec. 5 and Sec. 6 we
discuss related and future work and conclude.
2. Abstraction and interaction: the desired behaviour
In this section we discuss the desired behaviour of marshalling in
a distributed setting, with a series of informal examples in an ML­
like language. Our solution, in the following section, shows how
this ideal can be achieved.
We consider an ML­like language in which a program consists
of two parts: first a sequence of module declarations, each of which
can introduce abstract types; then an expression (the main body of
the program). We are concerned with interaction between whole
programs, usually built separately. This interaction is via network
communication, though it could equally be via a persistent store;
in either case, the underlying mechanism simply transmits byte
strings. For concreteness, most of our examples involve networks
consisting of two machines, pauillac and glia, running pro­
grams, say Pa and Pb . These network configurations are written
pauillac[Pa ] | glia[Pb ]. It then suffices to consider a single
communication channel (such as a TCP connection between fixed
ports); the language has communication primitives
send : string­>unit receive : unit­>string
We can now explore the desired behaviour of marshal(e :T ) and
unmarshal(e :T ), which marshal to and from string.
2.1 Communication
The simplest example is that of sending a value of a non­abstract
type between separately­built programs. Consider the two pro­
grams
P1a = send (marshal (5 : int))
P1b = print_int (unmarshal (receive ():int))
If these are built and then executed on the two machines the com­
munication and unmarshal should succeed:
pauillac[P1a ] | glia[P1b ] #
2.2 Respecting types
On the other hand, if one machine sends a string that the other
attempts to unmarshal as an int there should obviously be a run­
time failure.
P2a = send (marshal ("five":string))
P2b = print_int (unmarshal (receive ():int))
pauillac[P2a ] | glia[P2b ] ×
To ease debugging, it is desirable for that failure to occur as early
as possible (at unmarshal­time rather than when the string is used
later) and to be trapped cleanly, raising an exception rather than giv­
ing unpredictable behaviour. The implementation must therefore
send some form of type representation. The following examples
explore the constraints on what this must be.
2.3 Respecting abstractions
Now consider an example with an abstract type. Here the
EvenCounter module declares a type EvenCounter.t which has a
representation type of int but externally is abstract, as declared in
its signature. The operations of EvenCounter enforce the invari­
ant that values of EvenCounter.t are always represented by even
integers. If we allowed an arbitrary integer to be unmarshalled as
an EvenCounter.t then the abstraction, and this invariant, would
be broken; the unmarshal should therefore fail.
P3a = send (marshal(5:int))
P3b = module EvenCounter =
struct sig
type t=int type t
let start=0 : val start:t
let get x = x val get:t­>int
let up x = x+2 val up:t­>t
end end
print_int (EvenCounter.get
(unmarshal (receive ()):EvenCounter.t))
pauillac[P3a ] | glia[P3b ] ×
Marshalling from a different abstract type --- say a
TripleCounter.t --- to EvenCounter.t should fail similarly.
2.4 Communication between completely­shared sources
For communication between two instances of the same build, which
therefore have identical source code, the problem is relatively sim­
ple. Below, P4 declares an abstract type IntSet.t of sets of inte­
gers, representing them as binary search trees. It makes a run­time
determination of which machine it is on and then sends or receives
an IntSet.t; the unmarshal should succeed. We will develop this
example later --- suppose this first implementation orders subtrees

by <, and has a union operation that does not remove duplicate
entries.
P4 =
module IntSet =
struct
type t = int tree
let singleton = singleton­code
9 > > > > = > > > > ;
IntSetStruct
let mem = mem­code
...
end : sig
type t
val singleton : int ­> t
val mem : int ­> t ­> bool
9 > > > > > > > > = > > > > > > > > ;
IntSetSig
val empty : t
val add : int ­> t ­> t
val union : t ­> t ­> t
end
if ...on­machine­pauillac... then
send (marshal (IntSet.singleton 17 : IntSet.t))
else
if IntSet.mem 17 (unmarshal(receive():IntSet.t))
then print "y" else print "n"
pauillac[P4 ] | glia[P4 ] #
By default this should still succeed even if the two machines exe­
cute different builds of the same source.
2.5 Communication between partially­shared sources
More generally, one may need communication between programs
which share only some modules (perhaps ubiquitous standard li­
braries, or application­specific libraries). Here P5a and P5b share
the IntSet module from before, but otherwise have different mod­
ule declarations and main body expressions; their communication
of an IntSet.t should succeed.
P5a =
module IntSet = IntSetStruct :IntSetSig
send (marshal (IntSet.singleton 17 : IntSet.t))
P5b =
module IntSet = IntSetStruct :IntSetSig
module M =
struct let haszero x = IntSet.mem 0 x end
: sig val haszero : IntSet.t ­> bool end
if M.haszero (unmarshal (receive () : IntSet.t))
then print "y" else print "n"
pauillac[P5a ] | glia[P5b ] #
2.6 Guaranteeing compatible invariants
In the previous example the two programs had syntactically iden­
tical IntSet implementations. Since IntSet does not depend
on any other modules, this is a sufficient condition to guarantee
that the two abstract types have compatible invariants, i.e. that any
value of either will be correctly acted upon by the operations of
the other. Moreover, it can be automatically checked, whereas
compatibility of invariants cannot even be stated without specifing
the behaviour of the two modules, and would then require general
theorem­proving to verify. Note that it would not be sufficient to
require that the two implementations use the same representation
type, or even to require that the implementations are (in the absence
of marshalling) observationally equivalent.
For example, suppose that IntSetStructGt is similar to
IntSetStruct but orders subtrees with > rather than <.
P6a =
module IntSet = IntSetStructGt :IntSetSig
send (marshal (IntSet.add 0 (IntSet.add 1
(IntSet.add 2 IntSet.empty)) : IntSet.t))
When communicating with P5b , which contains the original
IntSetStruct , the unmarshal should fail, as otherwise an er­
roneous result could be produced.
pauillac[P6a ] | glia[P5b ] ×
Later we shall see that a mechanism for intentionally circumventing
this restriction, in a controlled way, is sometimes desirable.
2.7 Respecting names (when necessary)
In some cases one has modules with identical implementations
that nonetheless provide conceptually different abstract types, for
example in the Euro and Pound modules below. Unmarshalling
should respect this difference, so the example should fail (just as,
within a single ML program, types Euro.t and Pound.t would be
incompatible).
P7a =
module Euro =
struct type t=int let of_int x = x ... end
: sig type t val of_int : int ­> t ... end
send (marshal (Euro.of_int 17 : Euro.t))
P7b =
module Pound =
struct type t=int let of_int x = x ... end
: sig type t val of_int : int ­> t ... end
unmarshal (receive (): Pound.t)
pauillac[P7a ] | glia[P7b ] ×
This restriction is not always useful (e.g. whether an integer set
module is called IntSet or Set Int is likely irrelevant), so the
language should support some syntactic way of indicating whether
a module name is significant or not.
2.8 Module dependencies
Consider now modules that depend on abstract types declared by
other modules. In P8a below there is a module IntSet, providing
an abstract type IntSet.t, followed by a module SummedIntSet,
providing an abstract type of sets of integers augmented with a run­
ning sum. The expression part constructs, marshals and sends a
value of the SummedIntSet.t abstract type. This SummedIntSet
depends on IntSet in three ways: (1) IntSet.t occurs in its rep­
resentation type IntSet.t # int; (2) IntSet.t occurs in the type
of an operation in its signature; and (3) operations from IntSet oc­
cur in the definitions of its operations. Any such dependency means
that substantive changes to the definition of IntSet must propagate
through to give distinct SummedIntSet.t types. On the other hand,
any module declarations that are not (transitively) depended upon
should have no effect on SummedIntSet.t.
For example, consider also P8b below. It has exactly the same
text as SummedIntSet but a different implementation of IntSet
--- suppose IntSetStruct' has a different representation type
from IntSetStruct , or the same representation but incompati­
ble invariants, or different externally­observable behaviour. The
P8a and P8b SummedIntSet.t types should be incompatible, so
the unmarshal should fail.

P8a =
module IntSet = IntSetStruct :IntSetSig
module SummedIntSet =
struct
type t = IntSet.t * int
let empty = (IntSet.empty,0)
let sum (x,y) = y
...
end : sig
type t
val empty : t
val singleton : int ­> t
val sum : t ­> int
val to_intset : t ­> IntSet.t
...
end
send(marshal((SummedIntSet.singleton 2)
: SummedIntSet.t ))
P8b =
module IntSet = IntSetStruct ':IntSetSig
module SummedIntSet = ...same text as above...
SummedIntSet.sum
(unmarshal (receive () : SummedIntSet.t))
pauillac[P8a ] | glia[P8b ] ×
2.9 Mirroring local type sharing: manifest types, functors
The examples in this and subsequent subsections are not covered
by the formal calculus of Sec. 4. Nonetheless we argue informally
in Sec. 6.2 how they can be treated by straightforward extensions
of our main techniques and earlier work.
ML module systems include parametric modules, known as func­
tors, for large­scale software structuring and code reuse. In the
single­program world there are a number of subtle type­equality
issues, related to how generative functors are, and how one can ex­
press type sharing constraints [21, 17, 11, 28]. Our marshalling
primitives should correctly reflect these subtleties in inter­program
communication.
For example, the module SummedIntSet above, which explic­
itly references IntSet, might be re­expressed in terms of a functor
F which takes any argument structure U with interface IntSetSig
and builds a SummedIntSet:
module IntSet = IntSetStruct :IntSetSig
module F = functor (U:IntSetSig ) ­>
struct type t=U.t*int ... end
: sig type t ... end
module SummedIntSet = F(IntSet)
The functor F generates an abstract type, so we must consider when
that type should be compatible with others. If two separate pro­
grams contain this preamble, they should be able to exchange val­
ues of their respective SummedIntSet.t types. This mirrors the
behaviour of OCaml's applicative functors [18], in which another
instance of the application F(IntSet) within the same program
would have a type compatible with SummedIntSet.t.
Should the functorised and non­functorised (P8a )
SummedIntSet.t be compatible? Again following existing
module systems, we should make them incompatible, as otherwise
static type equality would depend on module substitution.
Type sharing allows functors to express type equalities between
their argument and result; unmarshalling should respect these static
type equalities. The example F' below constructs a type t but, in
contrast to F, does not make that type abstract; instead it makes it
manifestly equal to the product of its argument type U.t and int.
module F' = functor (U:IntSetSig ) ­>
struct type t=U.t*int ... end
: sig type t=U.t*int ... end
The application of F' to a module IntSet creates a static type
equality F'(IntSet).t==IntSet.t*int, which should also be
admitted at run­time.
2.10 Breaking abstractions (simple bidirectional case)
In ongoing software evolution, implementations of an abstract type
may need to be changed, to fix bugs or add functionality, while
values of that type exist on other machines or in a persistent store.
It is often impractical to simultaneously upgrade all machines to a
new implementation version.
A simple case is that in which the representation of the abstract
type is unchanged and where the programmer asserts that the two
versions have compatible invariants, so it is legitimate to exchange
values in both directions. This may be the case even if the two are
not identical, e.g. for an efficiency improvement or bug fix. Here
there should be some mechanism for forcing the old and new types
to be identical, breaking the Sec. 2.6 restriction.
For example, consider the improved IntSetStructDeDup
implementation below, in which the operations are similar to
IntSetStruct , the only difference being that union removes du­
plicates. The compiler cannot verify that IntSetStructDeDup
has all the semantic properties that the programmer requires of
IntSetStruct . Hence we provide a way of explicitly declar­
ing that these modules provide compatible types. In P10a be­
low, IntSet'.t is made equal to IntSet.t by the strong coercion
...with t =! IntSet.t. The compiler checks only that the old
and new types have compatible representations (here int tree),
but should respect further abstractions within those representation
types. This is based on our earlier work of [26].
P10a =
module IntSet = IntSetStruct :IntSetSig
module IntSet' =
struct
type t = int tree
9 > > = > > ;
IntSetStructDeDup
...improved operations...
end
: IntSetSig with t =! IntSet.t
send (marshal (IntSet'.singleton 17 : IntSet'.t))
P10b =
module IntSet = IntSetStruct :IntSetSig
if IntSet.mem 0 (unmarshal(receive():IntSet.t))
then print "y" else print "n"
pauillac[P10a ] | glia[P10b ] #
2.11 Breaking abstractions (directed case)
In the more complex case where the old and new invariants are not
compatible, or where the two representation types differ, the pro­
grammer will have to write an upgrade function. The same strong
coercion can be used to make this possible.
For example, suppose we have a program that uses stored values
of IntSetStruct and we wish to upgrade both the implementa­
tion and the stored values, changing the representation type from
binary search trees to red­black trees. The new implementation
would have a module declaration:

module IntSet2 =
struct
type t = int rbtree
9 > > = > > ;
IntSetStructRBT
...
end
: IntSetSig
A program to upgrade the stored values can be expressed as below,
with an Upgrade module that has both types, coerced respectively
to be equal to the old and new abstract types. (We are not proposing
machinery to automatically apply the upgrade function.)
module IntSet = IntSetStruct :IntSetSig
module IntSet2 = IntSetStructRBT : IntSetSig
module Upgrade =
struct
type t1 = int tree
type t2 = int rbtree
let upgrade = ...
end : sig
type t1
type t2
val upgrade : t1 ­> t2
end
with t1 =! Intset.t and t2 =! Intset2.t
...map Upgrade.upgrade over the stored values...
Note that the coercion does not require the signature of Upgrade
to coincide with those of IntSet and IntSet2. The compiler only
checks that IntSet.t is represented by int tree and IntSet2.t
by int rbtree.
2.12 Forcing generativity
Dually, sometimes it is desirable to force a type change between
builds even when the code remains identical, to prevent confusion
between old and new communicated values. For example, one may
have several distributed deployments of the same application which
should be kept logically isolated.
2.13 E#ectful module initialisation
In our previous examples the components of modules are all values.
Generalising this to arbitrary expressions (as ML does), an abstract
type definition can be dependent on some computation with side
effects.
For example, consider an NCounter module that reads its step
value from standard input when initialised; the invariant of any in­
stance is then that any value of its NCounter.t is a multiple of
this step. Two instances of the module can obviously have differ­
ent invariants, and so marshalling from one to another should fail.
Thus each run of a program containing NCounter should have an
incompatible type NCounter.t.
2.14 Marshalling functions and rebinding
In this paper we deal only with marshalling of closed values; the
semantics ensures that all module and expression declarations are
substituted in before a marshal operation takes place. Marshalling
of functions is therefore semantically straightforward.
A full language should, however, provide some form of dy­
namic rebinding of identifiers when they are unmarshalled, both
to achieve the desired semantics where local resources have dif­
ferent behaviour in different contexts, and for performance reasons
where much code is shared (and so should not be communicated).
The paper [4] addresses dynamic rebinding, in the absence of type
abstraction.
3. Solution: hash types as global names
This section introduces our solution informally, from both imple­
mentation and semantic viewpoints.
As we have seen, type­safe and abstraction­safe unmarshalling
requires some run­time type representation in marshalled values, to
permit a dynamic type comparison.
Our solution is based on the observation that hashing module
definitions provides a global namespace for abstract types: if an
identical module is hashed during builds of two different programs
at different sites, the same hash will be obtained. Thus the pro­
grams share names for any abstract type provided they share the
source code of the module that declares the type (and of its de­
pendencies); no communication (e.g. of GUIDs) is needed at build
time.
We regard hashes literally as types --- hashes appear as a clause
in the type grammar. They do not appear in source programs, but
are inserted during compilation; as we shall see in more detail, the
compiler replaces occurrences of an abstract type such as IntSet.t
by the hash of the definition of IntSet that is in scope. Semanti­
cally, we work with ideal hashing, with a formal syntactic con­
struction hash(...). Implementations would realise this with an
actual hash function; we discuss the low­level properties of hashes
in Sec. 3.5.
At run­time, after this compile­time type substitution, the types
in marshal(e :T ) and unmarshal(e :T' ) are closed, without
free module identifiers or type variables. They can therefore be eas­
ily represented as byte strings, communicated across the network or
stored in a persistent store, and can be compared with simple string
equality.
We ensure that this dynamic equality precisely mirrors the static
notion of provable type equality by carefully tuning the way in
which hashes are generated and used; we show below that our sys­
tem achieves this. Unmarshalling is therefore not only type­safe,
but also abstraction­safe.
The standard operational semantics for abstract types forgets
about abstraction as computation proceeds, substituting in repre­
sentation types and operations. Here, in contrast, we need a run­
time semantics that maintains abstraction throughout, both (1) so
that our type preservation theorems tell us that abstractions are not
broken; and (2) to support the proof that static and dynamic type
equality coincide. After a module is reduced away, module code
(which may see through the abstract type of that module) is in­
termixed with body code (which must treat the type as abstract).
We therefore use a syntactic construct, coloured brackets, adapted
from the work of [10], to delimit the regions in which different type
equivalences hold. This is not purely a proof technique, however:
in some subtle cases the coloured brackets within hashes are needed
in compile­time hash generation to correctly distinguish abstract
types that would otherwise be aliased. We show that implementa­
tions can erase coloured brackets outside hashes after compilation.
3.1 Simple examples
We illustrate the use of hashes in a simple case by referring back to
the example of Sec. 2.4, in which a single program, P4 , was built
and run on the two machines. The build process is modelled in
our semantics by type­checking, as usual, followed by reductions
that substitute out module definitions, inserting hashes as required.
Hash generation is deterministic, and hence the result of building
P4 on the two machines is identical. The program has a single
module definition. It has a compile­time reduction as below, to
an `executable' P4' . (Note that for clarity of exposition, we omit
coloured brackets from all reductions until Sec. 3.3; the example
reductions as stated are not all type­preserving without them.)

P4 =
module IntSet = IntSetStruct :IntSetSig
if ...on­machine­pauillac... then
send (marshal (IntSet.singleton 17 : IntSet.t))
else
if IntSet.mem 17 (unmarshal(receive():IntSet.t))
then print "y" else print "n"
-# c (compilation)
if ...on­machine­pauillac... then
send (marshal (singleton­code 17 : h ))
else
if mem­code 17 (unmarshal (receive () : h ))
then print "y" else print "n"
= P4'
where
h = hash(module IntSet=IntSetStruct :IntSetSig,t).
Here the definitions of IntSet.singleton and IntSet.mem
have been substituted for their occurrences, and the global name
h has been substituted for type IntSet.t. Notice that h is con­
structed from the entire definition of IntSet, including the textual
name IntSet, the implementation structure IntSetStruct , the
interface IntSetSig , and the type field name t. In this simple
example IntSet has no dependencies, so one can think of hashing
its source text; we will discuss later the more interesting case of
modules with dependencies, and also the question of exactly what
form the hash function takes. Our liberal use of substitution is,
of course, a semantic device --- in practice compilation would use
other representations.
At run­time, the two machines pauillac and glia execute their
independently­compiled copies of P4' . Their shared knowledge of
the hash h acts as a certificate that they may safely share values of
their respective abstract types IntSet.t and IntSet.t.
pauillac[P4' ] | glia[P4' ]
-# # (local computation on pauillac and glia)
pauillac[send(marshal(singleton­code 17:h ))]
| glia[if mem­code 17 (unmarshal(receive():h ))
then print "y" else print "n"]
-# # (local computation on pauillac, to get v )
pauillac[send(marshalled( v :h ))]
| glia[if mem­code 17 (unmarshal(receive():h ))
then print "y" else print "n"]
-# (communication)
pauillac[ () ]
| glia[if mem­code 17 (unmarshal(marshalled(v :h ):h ))
then print "y" else print "n"]
-# (on glia: dynamic type check h =h , succeeds)
pauillac[ () ]
| glia[if mem­code 17 v
then print "y" else print "n"]
-# # (on glia: computation, prints "y")
pauillac[ () ] | glia[ () ]
Ultimately, only strings may be communicated across a network.
The notation marshalled (v :T ) denotes a string literal contain­
ing representations of value v and its type T . This is only mean­
ingful, and only used, where v and T are both closed.
Notice that the dynamic check is simple: just that the type h
sent from pauillac is identical to the type h written into the
unmarshal on glia at compile time. Yet, by virtue of the con­
struction of these hashes, this is sufficient to guarantee both type­
safety and abstraction­safety.
Consider now the programs of Sec. 2.1--2.7. How do hashes
of modules provide the desired behaviour? In the case of con­
crete types, the comparison is obvious. For P1 , int=int; for P2 ,
string #=int; for P3 , for no hash h do we have int=h . As we
have already seen, in the P4 case the two programs share an iden­
tical hash h . For P5 , in P5a and P5b the computed hash h for
IntSet.t is identical (in fact the same h as above). Thus the h
substituted for IntSet.t in P5a 's call to marshal will be identi­
cal to that in P5b 's call to unmarshal, and the communication will
again succeed, exactly as we desire.
Although P6a (Sec. 2.6) contains a type IntSet.t, it is clear
that the hash h' of the modified module IntSet differs from the
hash h of the original module, correctly reflecting the difference
in the modules' behaviour. The two programs will, correctly, be
unable to communicate; an exception will be raised at the point of
the unmarshal.
The example of Sec. 2.7 shows why one might wish the textual
name (in general, the path) of a module to be included in its hash,
along with the module body. The two modules Euro and Pound are
identical in all but name, and so a hash that did not include the name
would treat them as interchangeable, clearly leading to dangerous
economic confusion, and furthermore differing from the usual se­
mantics of ML­like languages. On the other hand, the programmer
should also be able to specify that a name is not to be considered
part of the module's identity. This can be done simply by having an
additional form of module declaration, module* N = ... , for
which hashing uses a canonical name *, not admissible in source
programs, instead of the actual name N. In this simple scheme both
sender and receiver must use the *'d form, of course.
3.2 Module dependencies
The example of Sec. 2.8 shows that the same module text defines
a different abstract type if its dependencies change, which means
that the hash of a module must depend on the hashes of its depen­
dencies. In our substitutive reduction semantics, type dependen­
cies are handled automatically: we have substituted hashes for any
types of earlier modules before constructing the hash of a module
that depends on them. We shall see how term dependencies are
also automatically taken into account. Consider the following (a
simplification of P8a ):
module A=struct type t=bool let x=true end
: sig type t val x:t end
module B=struct type t=A.t*int let x=(A.x,3) end
: sig type t val x:t end
send (marshal (B.x : B.t))
-# c (compilation)
module B=struct type t=h *int let x=(true,3) end
: sig type t val x:t end
send (marshal (B.x : B.t))
-# c (compilation)
send (marshal ((true,3) : h' ))
where
h = hash (
module A=struct type t=bool let x=true end
: sig type t val x:t end,t)
h' = hash (
module B=struct type t=h *int let x=(true,3) end
: sig type t val x:t end,t)
Here the hash h' for B is constructed after the hash h for A has
been substituted for A.t, and after the term part true has been sub­

stituted for A.x. It is clear that if A changed, h would change, and
so h' would change. This would still be true in the (unlikely) case
that B mentions A.t but not A.x.
We must also ensure h' depends on h in the (common) case
that B mentioned A.x but not A.t, i.e. where A is used in B only to
implement an internal computation. The coloured brackets of the
following section will conveniently suffice for this.
3.3 Abstraction­preserving reduction
Some reductions in Sec. 3.1, 3.2 require non­standard type equali­
ties to make them type­preserving. For example, to type the inter­
mediate state in Sec. 3.2 we must have (true,3) of type h *int,
hence we need a type equality identifying h with its representation
type bool.
We could allow this type equality to be used anywhere, but
instead prefer to delimit more precisely which subterms can see
through any particular abstraction. We introduce coloured brack­
ets, adapted from the work of [10], during module reduction. In the
previous example, the first reduction will actually replace (A.x, 3)
by ([true] h
h ,3) instead of just (true,3). The brackets serve two
purposes. First, the lower annotation (the colour) is a hash h , indi­
cating that the additional type equivalence h == bool is available
when typing the inside of the bracketed expression. This equiva­
lence is drawn from the structure of h , viz. h =hash(module A=
struct type t=bool...end:...,t). Thus, inside the brack­
ets we have true:h . Second, the upper h annotation is the type of
the bracketed expression as seen from the outside, thus reduction
is type preserving. (One would often have a more complex type in
the upper annotation, not just a hash, e.g. [(true,3)] h *int
h .)
The reduction semantics of our formal system moves brackets
around as required to ensure that abstraction is preserved through­
out reduction, and so our type preservation result (Thm. 4.1) covers
abstraction. If we did not use brackets but allowed hash type equal­
ities to be used freely, abstraction would become invisible after re­
duction. The use of brackets also simplifies the statement of our
result relating static and dynamic type equality (Thm. 4.7). More­
over, when compiling a module that refers to a term field of a previ­
ous one, the presence of brackets ensures that the hash of the later
module does indeed depend on the hash of the earlier module.
3.4 Modest implementation demands
Few changes are required in an ML­like language to support the
strategy outlined above.
Thm. 4.5 shows that type checking is decidable and that hashes
play no role in compile­time type­checking of source code. In par­
ticular, we can use traditional type checking and inference algo­
rithms essentially unchanged. Compile­time reduction only builds
hashes, without ever looking inside one. Run­time reduction only
ever compares hashes by string equality.
Thm. 4.6 shows that almost all coloured brackets and type infor­
mation can be erased before run­time, with the exception of course
of marshal and unmarshal type annotations, and brackets within
hashes.
ML­like languages usually support separate compilation of mod­
ules. Typically, a compilation phase takes a module and the signa­
tures of the modules it imports and generates code parameterised
by these dependencies. For # hash , the compilation phase would
also generate a hash parameterised by the hashes of the imported
modules, in other words a hash­to­hash function. An appropriate
compositional implementation of hashing must be used to make
these efficiently representable. Typically, linking instantiates the
parameterised code with jumps to the code of previous modules.
For # hash , the linking phase would do two further things. First, it
would patch the type annotations for marshal and unmarshal in
the code by replacing references to module types by their hashes.
Second, it would calculate the hash of the module by applying the
hash­to­hash function (generated by compilation) to the hashes of
previous modules.
3.5 Low­level details of hashes
In our semantics, we work with ideal hashing, taking a free con­
structor hash(...) which can be applied to elements of the ab­
stract syntax. We can think of hash as a function whose injectivity
guarantees abstraction­safety. To avoid communicating large quan­
tities of source code, an implementation would reify hash with a
fixed­length hash function, giving a safety guarantee that is only as
strong as the probability of the absense of collisions.
This must be chosen so that (1) collisions are rare, and (2) hashes
are not too costly to compute.
Both MD5 (RFC1321, 128­bit) and SHA­1 (RFC3174, 160­bit)
are sufficiently cheap, and may be considered random functions
for this application [24]. Let us consider the likelihood of colli­
sions. For n abstract types and N possible hash values, the prob­
ability of a collision is approximately n 2 /2N . Pessimistically as­
suming 10 10 programmers in the world, writing 300 lines of code
per day with one abstract type per 100 loc, the probability of a col­
lision in a century of abstract types (using MD5) would then be
(10 15 ) 2 /2 129
# 10 -9 . This is much less than the probability of a
cosmic­ray­induced processor error in this period.
It may be desirable to have an absolute guarantee of type­safety,
while accepting probabilistic abstraction­safety. To achieve this,
one could pair hashes with the corresponding underlying represen­
tation types. At the other extreme, one could accept a probabilistic
guarantee even at simple types, by sending only a hash of the mar­
shalled type. These choices must depend on a risk assessment.
Note that our proposal is aiming to protect only against acci­
dental errors during programming and software deployment, not
against malicious attack, and so we are not concerned with de­
liberate searches for collisions. Protecting against spoofed mes­
sages requires largely orthogonal techniques, e.g. message signa­
tures and/or encryption, that are not in the scope of this paper.
Moreover, we do not address the problem of communication be­
tween untrusting peers, where one must check not just that the type
advertised by the peer is compatible with the local type, but also
the validity of the byte string's claim to represent a value of the
advertised type (see, e.g., [23]).
We hash elements of the abstract syntax, not concrete syntax, for
two reasons. Firstly, it ensures hashes are not dependent on, e.g.,
the choice of newline or newline/CR, or on comments. Secondly, it
fits well with the rest of the semantics --- recall we must calculate
hashes of modules that are the results of module substitutions. In
practice optimised calculations would be possible, without requir­
ing the explicit construction of canonical representatives of abstract
syntax elements.
Hashing abstract syntax, which we take up to alpha­equivalence,
has the (benign) consequence that abstract type equality is not de­
pendent on the names of function parameters. We have both inter­
nal (alpha­convertible) and external module names in the seman­
tics; external names must be meaningful between programs.
4. Formal system
Our calculus describes networks of machines. Each machine exe­
cutes a program; a program consists of a sequence of module dec­
larations followed by an expression. The expression language con­
sists of a simply­typed call­by­value #­calculus with module field
references, marshalling, and communication of strings.

Consider a program containing a module declaring an abstract
type. There is an abstraction boundary between the module's body
and the rest of the program. Inside the boundary, the type's repre­
sentation is visible; thus the type is said to be transparent. Outside,
the type's representation is not visible, thus the type is opaque.
Our calculus tracks this abstraction boundary as reduction pro­
ceeds. Compilation replaces the abstract type by a hash h and
wraps the code e that comes from inside the module definition with
coloured brackets decorated by h , as in [e] T
h . The distinction be­
tween opaque and transparent views is therefore witnessed by the
brackets: inside the brackets, we view h as transparent; when out­
side h is opaque.
In order to express this distinction in our inference rules, we dec­
orate each judgement with a colour hm , as in E #hm e:T . The
colour has one of two forms: it can be a hash h , in which case h is
transparent and all other hashes are opaque; or it can be the empty
colour ., in which case all hashes are opaque.
The reader is referred to the companion technical report [16] for
the full semantics of the calculus and proofs of results.
4.1 Relation to the informal discussion
For brevity, we take a module language in which structures are
type/term pairs, rather than general dependent records from the
earlier informal development. The following table summarises the
correspondence between the informal and formal module syntax.
struct type t = T 0 let y = v . end # [T 0 , v . ]
sig type t val y : T end # [X :Type, T ]
sig type t = T 1 val y : T end # [X :Eq(T 1 ), T ]
We split module names into two parts, an external name N and
an alpha­convertible name U . We write module declarations as
moduleNU = M :S in m , where U binds in m and N nei­
ther binds nor is subject to binding. The user would write only one
identifier, which would be used for both. External names play no
role in the static type system; they are used in hash construction
and hence in dynamic type checks.
The formal system omits =! coercions and run­time generativ­
ity, which should be straightforward extensions. Functors are also
omitted, though we include most of the technical machinery they
require, expressing abstract and manifest types in signatures us­
ing singleton kinds. In Sec. 6.2 we propose extensions for treating
these omissions.
4.2 Syntax
We let x , X and U range over expression, type and module vari­
ables.
Networks:
n ::= 0 | m | n|n
Machines (whole programs):
m ::= e | moduleNU = M :S in m (U binds in m)
Modules:
M ::= [T , v . ] structure (v . is a value)
S ::= [X :K , T ] signature (X binds in T )
Types:
T ::= UNIT | INT | STRING base types
| X | T#T | T # ... # T variable, function, product
| U .TYPE type part of a module
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
| h hash
Hashes:
h ::= hash(N , M :[X :Type, T ]) hash
hm ::= h | . colour (``hash maybe'')
Kinds:
K ::= Type kind of all types
| Eq(T ) kind of types statically equal to T
Expressions:
e ::= () | n unit, integers
| (e, ..., e) | proj i e tuple, projection
| x | #x :T .e | e e lambda calculus (x binds in e)
| U .term value part of a module
| mar (e:T ) marshalling primitive
| unmar e:T unmarshalling primitive
| ! e | ? send and receive
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
| marshalled (e:T ) result of marshalling
| UnmarFailure exception caused by unmar
| [e] T
hm coloured bracket
User source programs are closed terms of the m grammar which do
not contain any of the constructs below the dotted lines.
Values v hm are indexed by a colour. They are defined formally
below; they include usual #­calculus values, marshalled (v . :T )
and ``necessary'' brackets around values. For closed v . , the value
marshalled (v . :T ) is a string, the sequence of bits that repre­
sents the value v . and the type T .
We work up to alpha­conversion. We write substitutions as fol­
lows: {x#e}A replaces x by e in A; we also define substitutions
on module components, as in {U .TYPE#T , U .term#e}A.
4.3 Static and dynamic semantics
The static type system for programs has judgements for subkind­
ing, type equality, and subsignaturing relations. Module structures
M and names U have signatures S , expressions and machines have
types T , and types have kinds K . The system also defines correct­
ness of colours hm , environments E , kinds K , and signatures S .
E #hm K <: K # E #hm T == T # E #hm S <: S #
E #hm M :S E #hm U :S E #hm e:T
E #. m:T E #hm T :K # hm ok
E #hm ok E #hm K ok E #hm S ok
The typing rules are largely standard; the novel rules will be ex­
plained below. Recall that judgements are annotated by a colour
hm , i.e. an optional hash --- the idea being that derivations of judge­
ments annotated by a hash h can make use of the equality between
the abstract type h and its implementation.
Type environments may contain bindings for module, type and
expression variables. Earlier variables bind in later types, kinds and
signatures.
E ::= nil | E , x :T | E , X :K | E , U :S
Static typing of networks, # n ok, simply means that all machines
are well­formed.
We define compile­time reductions m -# c m # of machines
(performed after type checking), and run­time reductions e -#hm
e # and n -# n # for expressions and networks.
4.3.1 Singleton kinds
Following [17, 11, 28], we use singleton kinds to handle abstract
and concrete signatures in a uniform way. We have two families of
kinds: Type is the kind of all types; and, for any type T , Eq(T )
is the singleton kind of all types that are provably equal to T .
A module consists of a structure [T0 , v . ] and a signature
[X :K , T ]. The structure has a representation type T0 and a value
v . --- think of a tuple of operations. This v . must have the type
{X T0}T , and the implementation type T0 must have the kind
K . This is made precise by the following rule:

E #hm T0 :K E , X :K #hm T :Type
E #hm v . :T # E , X :Eq(T0 ) #hm T # == T
E #hm [T0 , v . ]:[X :K , T ]
(MS.struct)
For an abstract module we have K = Type, revealing no in­
formation about the representation type, whereas for a concrete
module, commonly K = Eq(T0 ), revealing it. This is captured
with the type equality relation: in the context of a module dec­
laration moduleNU = M :[X :K , T ] in , one can use the path
U .TYPE to refer to the type part of the module. If it is con­
crete, with K = Eq(T0 ), one can further use the type equality
U .TYPE == T0 , whereas if it is abstract U .TYPE is typically not
equal to any other type.
The subkinding relation K <: K # places Type above all sin­
gleton kinds. This is used to define subsignaturing and hence, using
subsumption, allows a concrete module can be used as if it had an
abstract signature.
4.3.2 Hash formation, type equality of hashes
At run­time, we need globally meaningful type names for abstract
types, corresponding to the U .TYPE paths used in compile­time
type checking. We construct these global names by hashing (well­
typed) closed abstract modules, together with the associated exter­
nal name.
#. [T0 , v . ]:[X :Type, T ]
# hash(N , [T0 , v . ]:[X :Type, T ]) ok
(hmok.hash)
As explained informally earlier, judgements annotated by a hash
permit an additional type equality: under the colour h =
hash(N , [T0 , v . ]:[X :Type, T ]), h is equal to its implementa­
tion T0 :
E #h ok
E #h h == T0
(Teq.hash)
These two rules examine the internal structure of hashes, which
might be thought to be computationally problematic. However,
while they are semantically necessary, they play no role in user
program type­checking (Thm. 4.5) or in execution.
4.3.3 Compile­time reduction and coloured brackets
Module reduction constructs the type representations that will be
used at run­time in marshalling and unmarshalling. Reducing a
concrete module is simple: we replace references to its type com­
ponent by its manifest type, and references to its term component
by the value inside the module.
moduleNU = [T0 , v . ]:[X :Eq(T1 ), T ] in m
-# c {U .TYPE T1 , U .term v .
}m
When it comes to abstract types, things are more interest­
ing. Given an abstract module declaration moduleNU =
[T0 , v . ]:[X :Type, T ], we normally have no way of referring to
its type other than by name, i.e. U .TYPE. However U is not mean­
ingful on other machines, which motivates the introduction of the
hash of the module, i.e. h = hash(N , [T0 , v . ]:[X :Type, T ]).
Then module reduction replaces references to the type component
by h . References to the term component are replaced by the value
suitably protected by h­coloured brackets, which embody the ab­
straction boundary around the module's body as discussed above.
moduleNU = [T0 , v . ]:[X :Type, T ] in m
-# c {U .TYPE h, U .term [v . ] {X h}T
h }m
In general, in a bracket expression [e] T
hm , the lower annota­
tion hm is a colour that indicates what type equalities may be
used to type e . If hm = hash(N , [T0 , v . ]:[X :Type, T ]), then
the equality hm == T0 is available when typing e, through
(Teq.hash) (Sec. 4.3.2). If hm = ., e is typable without any extra
equalities. The upper annotation T is the externally visible type of
e. The following rule (the only typing rule that mentions brackets)
shows this colour change formally.
E # hm # T :Type E #hm e:T
E #hm # [e] T
hm :T
(eT.col)
4.3.4 Expression reduction
Expression reduction is based on a standard call­by­value #­
calculus semantics. In this subsection, we give the function ap­
plication rule and bracket­pushing rules. In later subsections, we
show the rules for marshalling and communication.
As we show in Thm. 4.6, brackets can be erased before run­
time reduction. However, the brackets' presence is necessary for
type preservation (Thm. 4.1). Given their presence, we need reduc­
tion rules to ``push'' them inwards so that the brackets do not in­
terfere with computationally significant reductions (Thm. 4.3). To
describe the bracket pushing rules, and to achieve type preserva­
tion, it is necessary to index the reduction relation, class of values,
and reduction contexts by colours.
We write v hm for a value of colour hm . Brackets may appear
in a value when used to build a value of an abstract type out of a
value of the corresponding implementation type, for example [3] h
h ,
where the implementation type of h is INT.
v hm ::= n | () | (v hm , ..., v hm ) | #x :T .e
| marshalled (v . :T ) | [v h 1 ] h 1
h 1
where h1 #= hm
The following bookkeeping rules push brackets with manifestly
decomposable types inside expressions, and remove them where
not necessary.
[n] INT
hm # -#hm n
[()] UNIT
hm # -#hm ()
[(v hm
1 , ..., v hm
j )] T 1 #...#T j
hm # -#hm ([v hm
1 ] T 1
hm # , ..., [v hm
j ] T j
hm # )
[#x :T .e] T # #T ##
hm # -#hm #x :T # .[{x #[x ] T #
hm }e] T ##
hm #
[marshalled (v . :T )] STRING
hm # -#hm marshalled (v . :T )
[[v h 1 ] h 1
h 1 ] h 1
h 2 -#hm [v h 1 ] h 1
h 1
if h1 #= h2 # h2 #= hm
[v hm 1 ] h 2
hm 1 -#hm v hm 1 if hm 1 = hm # hm 1 = .
Function application introduces brackets to protect the argument,
since the formal parameter may itself be used under a bracket in
the body of the function. This is a variant of [10], where the formal
parameter has to be used at the colour of the function itself.
(#x :T .e) v hm
-#hm {x#[v hm ] T
hm }e
4.3.5 Marshalling
As in [1], mar (e:T ) ``tags'' the value of e with a type annota­
tion T , producing a result of type STRING. The dual construct
unmar e:T produces a value of type T , which the type tag in e
must (dynamically) match.
E #hm e:T
E #hm mar (e:T ):STRING
(eT.mar)
E #hm T :Type E #hm e:STRING
E #hm (unmar e:T ):T
(eT.unmar)
There is a subtlety here: in the conclusion of (eT.mar), the
fact that e has the type T may require the extra type equal­
ity provided by hm . Hence we introduce marshalled (e # :T ),

which requires the argument to be not only closed but typable
in ., i.e. everywhere. Reduction transforms mar (v hm :T ) into
marshalled ([v hm ] T
hm :T ), where the brackets serve to ensure
that any type equality provided by hm is always available to type
v hm (even after sending the marshalled value to another machine).
Note that before reducing mar (v hm :T ), both v hm and T will
have been closed by substitution.
mar (v hm :T ) -#hm marshalled ([v hm ] T
hm :T )
E #hm ok #. e:T
E #hm marshalled (e:T ):STRING
(eT.marshalled)
The unmarshalling of a string first extracts the type tag T from
the string and compares it with the tag for the expected type T # .
Since T is a valid type for v . in ., it is also one in hm . The type
tags T and T # are compared by syntactic equality: if the types
match, the original value is extracted from the string; otherwise
an exception is raised. This dynamic type equivalence is closely
related to static equivalence (Thm. 4.7).
unmar (marshalled (v . :T ):T # )
-#hm v . if T = T #
-#hm UnmarFailure otherwise
4.3.6 Programs and networks
A machine consists of a series of module declarations followed by
an expression. Each module declaration may refer to the previous
ones.
E #. T :Type E #. M :S E , U :S #. m:T
E #. (moduleNU = M :S in m):T
(mT.let)
A network is a parallel juxtaposition of machines. Note that each
machine has its own environment: there is no explicit scope that
encompasses more than one machine.
# n1 ok # n2 ok
# n1 | n2 ok
(nok.par) #. m:UNIT
# m ok (nok.mach)
We assume that there is a single channel, which carries values
of type STRING. The expression ! e sends the value of e over that
channel, and ? reads a value from that channel. Communication is
straightforward as all the work required to make values and types
intercomprehensible is done by the marshalling apparatus; for suit­
able evaluation contexts CC . hm 1
and CC . hm 2
we have just the rule
below, writing context application with a dot.
CC . hm 1 .! v hm 1
| CC . hm 2 .? -# CC . hm 1 .() | CC . hm 2 .v hm 1
4.4 Results
First, our calculus enjoys type preservation and progress properties.
Theorem 4.1 (type preservation for compile­time, expression,
and network reduction)
. if m -# c m # and #. m:T then #. m # :T ;
. if e -#hm e # and #hm e:T then #hm e # :T ; and
. if n -# n # and # n ok then # n # ok.
Theorem 4.2 (progress for compile­time reduction) If #.
m:UNIT then either
. m is an expression; or
. m reduces, i.e. there exists m # such that m -# c m # .
Moreover, compile­time reduction is terminating.
Theorem 4.3 (progress for expressions) If #hm e:T then one of
the following holds:
. e is a value, i.e. there exists v hm such that e = v hm ;
. e reduces, i.e. there exists e # such that e -#hm e # ;
. e is blocked waiting for I/O, i.e. there exists CC hm
hm 2 and e #
such that e = CC hm
hm 2 .! e # or e = CC hm
hm 2 .?; or
. e has thrown an exception, i.e. there exists CC hm
hm 2 such that
e = CC hm
hm 2 .UnmarFailure.
In addition, we have proved a normalisation result for expres­
sions, showing that the rules for coloured brackets do not introduce
any divergencies.
Both compile­time machine reduction and run­time expression
reduction are deterministic (network reduction is not, of course):
Theorem 4.4 (determinacy for compile­time and expression re­
duction)
. If m -# c m # and m -# c m ## then m # = m ## ; and
. if e -#hm e # and e -#hm e ## then e # = e ## .
For static type checking:
Theorem 4.5 (decidability of type checking) Type checking
is decidable. Furthermore, user source programs can be typed by
derivations involving no hashes or coloured brackets.
At run­time, all type annotations except those on mar ,
marshalled , and unmar can be erased. Moreover, all coloured
brackets can be erased except for those that occur within a hash
within one of those remaining annotations. More precisely, we
define erase(e) to be e with all type annotations and brackets
erased except that the type annotations on mar , marshalled ,
and unmar are left unchanged. We define ---# erase
to be like
-#hm by taking the erase­image of the left­ and right­hand sides
of each rule (and removing rules that would become e ---# erase
e).
Theorem 4.6 (erasure preserves reduction outcomes)
Assume #. e:T . We have that e -#. e # implies
erase(e) ---# erase
61 erase(e # ). Conversely, erase(e) ---# erase
e0 im­
plies that there exists e # such that erase(e # ) = e0 and e -# >1
. e # .
Note that brackets are needed in module reduction, to keep track of
a module's ancestors as we build its hash.
Finally we show that, under reasonable conditions, static and dy­
namic type equality coincide. Let D be a module declaration con­
text:
moduleN0U 0 = M0 :S0 in ...module N j U j
= M j :S j in
in the user source language (with no brackets or hashes). Consider
a machine D .C .e for some expression context C and an expression
e = (unmar (mar (e0 :T0 ):T1 )). One would like this dynamic
type check to succeed if and only if T0 and T1 are statically prov­
ably equal, i.e. iff U0 :S0 , ..., U j :S j #. T0 == T1 .
Write #D for the accumulated substitution defined by the module
reduction rules for D (we omit an explicit definition for lack of
space). The dynamic check is then #DT0 = #DT1 . We have:
Theorem 4.7 (coincidence between dynamic and static type
checking) Suppose that D .C .e is well formed (i.e. #.
D .C .e:UNIT), that it contains no hashes, and that its external
names N0 , ..., N j are distinct. Let E = U0 :S0 , ..., U j :S j be the
associated environment. Assume that T0 and T1 contain no hashes
and E #. T i :Type for i = 0, 1. Then E #. T0 == T1 iff
#D T0 = #D T1 .
The requirement that the external names N0 , ..., N j be distinct
rules out the rather pathological programs in which there are two
module definitions with the same name, one shadowing the other,
which have identical structures, signatures, and dependencies. The
exclusion of hashes is automatic for user source programs.

One can imagine stronger theorems, relating type equality be­
tween two programs that share a common (DAG­)prefix of module
definitions, but their statements become rather elaborate.
5. Related work
Modules and generativity There is an extensive literature on
ML­style modules, including [19, 21, 11, 17, 28, 7], much of it
discussing subtle questions of generativity versus applicativity. To
our knowledge, however, none deals with the inter­program case.
In [26], fresh type names are generated during call­by­value mod­
ule reduction, with #­binders that can extrude across distributed
scope. This allows inter­program sharing, and also a with! coer­
cion, but at the pragmatically­awkward cost of requiring particular
object files to be shared.
Type dynamic Our marshal and unmarshal operations are
essentially constructors and destructors for values of dynamic
type; mar is just dynamic, and unmar is a restricted form of
typecase. Our dynamic values have type STRING, emphasising
that they may be communicated readily. Type Dynamic was first
formalised by Abadi et al. [1, 2], who also gives a historical survey.
Intensional polymorphism [12, 31] permits run­time type analysis
of all values.
Marshalling abstract types The problem of marshalling val­
ues of abstract (existential) type has not been satisfactorily ad­
dressed theoretically before. In several systems, abstract types are
run­ or build­time generative, so that two executions or builds of the
same source will yield distinct types. While communication within
such a program can be abstraction­safe, successful communication
between builds can only be at the representation type, and hence
abstraction­unsafe. This is true, for instance, of [2], TMAL [8],
Modula­3 [6, 5], Alice [3], and the typed­channel languages listed
below.
Weirich [32] exposes an existential's representation type to type
analysis, permitting a type­safe polytypic marshalling function to
be written. As future work we hope to expose our global type
names at term level (cf. [13]), permitting an abstraction­safe poly­
typic marshalling function to be written. Furuse and Weis [9] argue
for ignoring abstraction, checking representation types only.
A number of programming languages feature some form of built­
in marshalling (pickling, serialisation, etc.): for example Modula­3,
Alice, Java, .NET, and OCaml. Most of these languages serialise
the type along with the value in order to permit a check at unmar­
shal time, and represent the type by a hash. Languages differ, how­
ever, in exactly what is hashed --- i.e., in what is considered when
deciding type equality.
In Modula­3, abstract types are made opaque by branding, which
may be either by a literal string (analogous to an external name) or a
compiler­generated unique identifier. The latter are unique within a
program but not necessarily related between programs, so explicit
brands must be used for inter­program communication; however,
they do not guarantee abstraction­safety for that case. Revelation
can be used to make an abstraction transparent.
In Alice, abstract type creation is run­time generative, meaning
that abstract types from different executions are always distinct.
This vacuous abstraction­safety forces the use of representation
types for pickling between different programs.
In Java serialisation [29], class equivalence is on fully­qualified
class name, the representation type of all fields, and the types of
all non­private methods; the implementation is not considered in
type equality. A strong coercion (Sec. 2.10) is provided (although
compatibility of representation types is not checked until unmar­
shal time).
In .NET serialisation [20], class equivalence is on the textual
name along with the implementation of the entire assembly in
which it is defined (a single DLL or EXE, which may comprise
many source files). This guarantees data structure invariants are
maintained, as in our approach; however, we work on the much
finer scale of individual modules, and furthermore we require only
source code to be shared, not object files.
OCaml [22] does no typechecking for marshalling at all, and
hence is not even type­safe. When unmarshalling a function, it
verifies (by a hash) that the communicating builds are identical,
thus allowing the code pointers of all closures to be communicated
literally.
Coloured brackets Coloured brackets were introduced in [33,
10]; we differ in that we permit a variable to occur in a colour
other than the one where it is defined. Our proofs are harder, our
#­rule has to introduce extra brackets, but our brackets carry only
a single optional hash, rather than a list of hashes. Rossberg [25],
like us, is concerned to preserve the opacity of abstract types under
reduction due to the presence of typecase. His coercions serve the
same purpose as our brackets, but his use of the closed­scope open
construct instead of dot notation prevents any possibility of sharing
values of abstract type between instances.
Typed channels Several languages, e.g. JoCaml [14], Nomadic
Pict [27], Facile [30, 15], implement typed channels. These per­
mit type­ and abstraction­safe communication once the channel is
established. Establishing a channel at an abstract type, however, re­
quires the endpoints somehow to share the type already; in the case
that the endpoints reside in different programs or instances, this re­
quires an unsafe cast, usually performed (outside the language) by
a name server.
6. Conclusions and future work
6.1 Summary
We have proposed a novel and expressive design for guarantee­
ing type­ and abstraction­safe marshalling of data sent between
distributed ML programs, that can uniformly treat manifest, ab­
stract, and generative types. The key technical idea is to use hashes
of module declarations as globally­meaningful type names, which
are inserted at compile­time and then compared dynamically when
unmarshalling. We add coloured brackets to delimit the ``abstrac­
tion boundary'' within which hashes are transparent, tracking these
brackets through the reductions so as to achieve type and abstrac­
tion preservation. Our proposal is a smooth extension of existing
ML­like languages: type checking is unchanged, most type infor­
mation can be erased before run­time, and the dynamic type check
closely mirrors static ML type equivalence.
6.2 Future work
In the future, we aim to broaden our solution to be applicable to
full­scale languages.
The following extensions will be required to cope with the ex­
amples in Sec. 2.9--2.13. The strong coercion (Sec. 2.10 and 2.11)
used for forcing an abstract type to have the same hash as an earlier
module, has a simple compile­time implementation: check the rep­
resentation types of the two are provably equal, then simply reuse
the hash of the earlier module as the type name for the new. This
requires the compiler to keep a mapping from hashes to representa­
tion types, which is straightforward. Programmer­requested gener­
ativity (Sec. 2.12) can be dealt with in an implementation by gen­
erating a fresh global name (say a random bit string of the same
length as hashes) at compile time; its semantics can be modelled

by #­binding. Both this and the strong coercion are very similar to
the constructs in our earlier work [26]. Side­effect­induced gener­
ativity (Sec. 2.13) requires a way to identify simple pure compu­
tations in structure bodies that the programmer can easily under­
stand; abstract types of structures with pure computations should
be hashes, whereas those of structures with effectful computations
should have freshly­generated names. Functors (Sec. 2.9) are a
more substantial extension, but, at least for a restricted but useful
class, should be straightforward. Consider first­order applicative
functors [18] and module expressions that are either (i) an explicit
structure, possibly multiply­abstracted, or (ii) pure, i.e. constructed
from module identifiers, abstraction and application. These give
rise to functions from hashes to hashes; applying these functions
gives run­time representations of the compile­time path­based type
names.
Other substantial extensions also need to be considered. Depen­
dent record structures, i.e. module structures with multiple fields
also appear in this paper's informal examples; they should be con­
ceptually straightforward. Parametric and substructuring polymor­
phism within the dynamic check would allow receivers to accept
a more general type than that offered by the sender. This is a
more substantial extension; it will be a challenge to minimise the
transmitted type information required for these dynamic ``subtype''
checks. One may want to rebind (Sec. 2.14) identifiers within a
transmitted value to avoid the overhead of sending code already
available at the other end, or to obtain location­specific behaviour;
here we aim to integrate hash types with [4]. Marshalling reference
cells exhibits related problems: should the reference be rebound,
made remote, or duplicated? More generally, one must consider
values mentioning other machine resources: screens, files. . .
We wish to integrate our work with existing systems for dis­
tributed programming which have statically typed channels for nor­
mal operation but no safe way of initiating communication, such as
JoCaml [14] and Nomadic Pict [27]. We also wish to test the ex­
pressiveness of our marshalling primitives by using them to write
libraries for safe distributed communication and persistence.
Acknowledgments We acknowledge support from a Royal
Society University Research Fellowship (Sewell), EPSRC grant
GRN24872 (Wansbrough), EC FET­GC project IST­2001­33234
PEPITO, and APPSEM 2. The authors thank Georges Gonthier,
Jean­Jacques L’ evy, Luc Maranget, and the anonymous referees for
their suggestions.
7. REFERENCES
[1] M. Abadi, L. Cardelli, B. Pierce, and G. Plotkin. Dynamic typing in a
statically typed language. ACM TOPLAS, 13(2):237--268, 1991.
[2] M. Abadi, L. Cardelli, B. Pierce, and D. R’ emy. Dynamic typing in
polymorphic languages. J. Functional Programming, 5(1):111--130,
1995.
[3] T. Alice Project. Alice manual: Pickling.
http://www.ps.uni­sb.de/alice/manual/pickling.html,
2003.
[4] G. Bierman, M. Hicks, P. Sewell, G. Stoyle, and K. Wansbrough.
Dynamic rebinding for marshalling and update, with destruct­time #.
In Proc. ICFP 2003, 2003. Full version available as
UCAM­CL­TR­568. http://www.cl.cam.ac.uk/~pes20/.
[5] A. Birrell, G. Nelson, S. Owicki, and E. Wobber. Network objects.
Software -- Practice & Experience, 25(S4):87--130, 1995. Available
in slightly different form as SRC­115 revised.
[6] A. Z. Broder et al. Fingerprint.i3.
http://research.compaq.com/SRC/m3sources/html/
fingerprint/src/Fingerprint.i3.html, 1994.
[7] D. Dreyer, K. Crary, and R. Harper. A type system for higher­order
modules. In Proc. 30th POPL, New Orleans, pages 236--249, 2003.
[8] D. Duggan. Type­safe linking with recursive DLLs and shared
libraries. ACM TOPLAS, 24(6):711--804, 2002.
[9] J. Furuse and P. Weis. Entr’ ees/sorties de valeurs en Caml. In J.
Francophones des Langages Applicatifs, 2000.
[10] D. Grossman, G. Morrisett, and S. Zdancewic. Syntactic type
abstraction. ACM TOPLAS, 22(6):1037--1080, 2000.
[11] R. Harper and M. Lillibridge. A type­theoretic approach to
higher­order modules with sharing. In Proc. 21st POPL, 1994.
[12] R. Harper and G. Morrisett. Compiling polymorphism using
intensional type analysis. In Proc. 22nd POPL, pages 130--141, 1995.
[13] M. Hicks, S. Weirich, and K. Crary. Safe and flexible dynamic
linking of native code. In Proc. 3rd Workshop on Types in
Compilation, pages 147--176, 2000.
[14] JoCaml. http://pauillac.inria.fr/jocaml/.
[15] F. Knabe. Language Support for Mobile Agents. PhD thesis, Carnegie
Mellon University, Dec. 1995.
[16] J. J. Leifer, G. Peskine, P. Sewell, and K. Wansbrough. Global
abstraction­safe marshalling with hash types. Technical Report
RR­4851, INRIA Rocquencourt, 2003. Available from
http://pauillac.inria.fr/~leifer/research.html. Also
published as UCAM­CL­TR­569.
[17] X. Leroy. Manifest types, modules, and separate compilation. In
Proc. 21st POPL, pages 109--122, 1994.
[18] X. Leroy. Applicative functors and fully transparent higher­order
modules. In Proc. 22nd POPL, pages 142--153, 1995.
[19] D. MacQueen. Modules for Standard ML. In Proc. 1984 ACM Symp.
LISP and Func. Prog., pages 198--207, 1984.
[20] Microsoft Corporation. .NET Framework developer's guide:
Serializing objects. http://msdn.microsoft.com/library/
en­us/cpguide/html/cpovrserializingobjects.asp, 2001.
[21] R. Milner, M. Tofte, and R. Harper. The Definition of Standard ML.
MIT Press, 1990.
[22] Objective Caml. http://caml.inria.fr.
[23] B. Pierce and E. Sumii. Relating cryptography and polymorphism.
http://web.yl.is.s.u­tokyo.ac.jp/~sumii/pub/, July 16
2000. Substantially revised version to appear in J. Comp. Security.
[24] M. J. B. Robshaw. On recent results for MD2, MD4 and MD5. RSA
Laboratories' Bulletin, (4), Nov. 12 1996.
[25] A. Rossberg. Dynamic opacity for abstract types. Technical report,
Programming Systems Lab, Universit˜ at des Saarlandes, 2002. http:
//www.ps.uni­sb.de/Papers/abstracts/opaque.html.
[26] P. Sewell. Modules, abstract types, and distributed versioning. In
Proc. 28th POPL, pages 236--247, 2001.
[27] P. Sewell, P. T. Wojciechowski, and B. C. Pierce.
Location­independent communication for mobile agents: a two­level
architecture. In Internet Programming Languages, LNCS 1686, pages
1--31, 1999.
[28] C. A. Stone and R. Harper. Deciding type equivalence in a language
with singleton kinds. In Proc. 27th POPL, pages 214--227, 2000.
[29] Sun Microsystems. Java object serialization specification 1.4.4.
http:
//java.sun.com/j2se/1.4.1/docs/guide/serialization/,
2002.
[30] B. Thomsen, L. Leth, and T.­M. Kuo. A Facile tutorial. In
CONCUR'96, LNCS 1119, pages 278--298, 1996.
[31] S. Weirich. Type­safe cast: Functional pearl. In Proc. ICFP,
Montreal, pages 58--67, 2000.
[32] S. Weirich. Higher­order intensional type analysis. In Proc. 11th
ESOP, LNCS 2305, Grenoble, France, 2002.
[33] S. Zdancewic, D. Grossman, and G. Morrisett. Principals in
programming languages: A syntactic proof technique. In Proc. ICFP,
Paris, pages 197--207, Sep 1999.
$Revision: 1.212 $ $Date: 2003/06/25 14:38:47 $