The JoCaml language
beta release
Documentation and user's manual
Cédric Fournet, Fabrice Le Fessant, Luc Maranget and Alan Schmitt
July 25, 2002
Copyright © 2000 Institut National de
Recherche en Informatique et Automatique
This manual will also be available in (out of date information)
Postscript,
DVI,
and as a
bundle
of HTML files.
Table of Contents
Foreword
The JoCaml system is an attempt to provide all join-calculus
constructs for concurrency, communication, synchronization and process
mobility directly as a syntactical extension of the Objective Caml
language (version 1.07). Consequently, this manual does not describe
the Objective CAML part of the system, which is available in a
separate manual, but only how to use the JoCaml system with some
knowledge of the Objective CAML system.
Basically, the Join-Calculus is a special library with which
Objective-Caml programs must be linked to use Join constructs. Even if
the native code compiler is incorporated in the distribution, the
``join'' library is only available in bytecode. However, the
JoCaml system enables programs to run both native code and
bytecode in a same runtime, with some limitations.
Finally, any question or bug report for any part of the distribution
should be sent to jocaml-dev@inria.fr, and not to any
Objective-Caml mailing list. For general discussion on JoCaml, you can
subscribe to the JoCaml mailing list: send a message with
``subscribe'' in the body to jocaml-request@inria.fr. The mailing list address is jocaml@inria.fr and is archived on the website, at http://pauillac.inria.fr/jocaml/.
Part I
An introduction to JoCaml
Chapter 1 Concurrent programming
This part of the manual is a tutorial introduction to JoCaml. This
chapter presents small, local examples.
Chapter 2 deals with the distributed features. It
is assumed that the reader has some previous knowledge of Ocaml.
1.1 Conventions
Examples are given as JoCaml source, followed by the output of the
top-level (or of the compiler when prompted to print types). The
JoCaml top-level provides an interactive environment, much as the
Ocaml top-level.
In order to try the examples, you can either type them in a top-level,
launched by the command joctop, or concatenate the sources chunks in
some file a.ml then compile a.ml by the command joc -i a.ml, and
finally run the produced code by the command ./a.out. (Option -i
enables the output of inferred types).
1.2 Basics
Jocaml programs are made of processes and expressions. Roughly, processes are executed asynchronously and
produce no result, whereas expressions are evaluated synchronously and
their evaluation produces values. For instance, OCaml expressions are
JoCaml expressions. Processes communicate by sending
messages on channels (a.k.a. port names). Messages carried by channels are
made of zero or more values, and channels are values themselves. In
contrast with other process calculi (such as the pi-calculus and its
derived programming language Pict), channels and the processes that
listen on them are defined in a single language construct. This
allows considering (and implementing) channels as functions when they
have the same usage.
Jocaml programs are first organized as a list of top-level
statements. A Top-level statement is a declaration (such as an OCaml
binding let x = 1 or a channel binding) or an expression.
Top-level statements are terminated by an optional ;; that triggers
evaluation in interactive mode.
1.2.1 Simple channel declarations
Channels, or port names, are the main new primitive values of JoCaml.
There are two important categories of port names: asynchronous and
synchronous port names. Synchronous names return values, whereas
asynchronous channels do not.
Users can create new channels with a new kind of let def binding,
which should not be confused with the ordinary value let binding.
The right hand-side of the definition of a channel a is a
process that will be spawned whenever some message is sent on a.
Additionally, the contents of messages received on a are bound
to formal parameters.
For instance, here is the definition of an asynchronous echo channel:
# let def echo! x = print_int x;
# ;;
val echo : <<int>>
The new channel echo has type <<int>>, which is the type of
asynchronous channels carrying values of type int. The presence of
! in the definition of the channel name indicates that this channel
is asynchronous. This indication is present only in the channel
definition, not when the channel is used. Sending an integer i on
echo fires an instance of the garded process print_int i; which
prints the integer on the console. Since the Ocaml expression
print_int i returns the value (), it is necessary to append a ;
that discards this value. echo is asynchronous and it is not possible to know
when the actual printing takes place.
The definition of a synchronous print is as follows:
# let def print x = print_int x; reply
# ;;
val print : int -> unit
The type of print is the functional type int -> unit that takes one
integer as argument and returns an empty result. However, print is
not a function because it is introduced by a let def binding. Since
there is no ! at the end of the defined name, print is
synchronous, thus it must return a value. The mechanism to return
values for synchronous channels is different from the one for
functions: it uses a reply construct whose semantics is to
send back some (here zero) values as result. This is the first
difference with plain OCaml functions, which implicitly return the
value of the guarded expression, instead of using the explicit
reply. Message sending on print is synchronous, in the
sense that one knows that console output has occurred when
print returns the answer ().
As we just saw, synchronous names return values, whereas asynchronous
channels do not. Therefore, message sending on synchronous channels
occurs inside expressions, as if they were functions, whereas
message sending on asynchronous channels occurs inside processes.
Processes are the new core syntactic class of JoCaml. The most basic
process sends a message on an asynchronous channel, such as the
channel echo just introduced. Since only declarations and
expressions are allowed at top-level, processes are turned into
expressions by ``spawning'' them : they are introduced by the keyword
``spawn'' followed by a process in curly brackets ``{'' ``}''.
# spawn {echo 1}
# ;;
#
# spawn {echo 2}
# ;;
-> 12
Processes introduced by ``spawn'' are executed concurrently.
The program above may either echo 1 then 2, or echo 2 then 1.
Thus, the output above may be 12 or 21, depending on the implementation.
Concurrent execution also occurs inside processes, using
the parallel composition operator ``|''.
This provides a more concise, semantically equivalent, alternative to
the previous example:
# spawn {echo 1 | echo 2}
# ;;
-> 21
Composite processes also include conditionals (if's), matching
(match's) and local binding (let...in's and let def...in's). Process grouping is done by using curly braces ``{'' and
``}''.
# spawn {
# let x = 1 in
# {let y = x+1 in echo y | echo (y+1)} | echo x }
# ;;
-> 132
Once again, the program above may echo the integers 1, 2 and 3
in any order. Grouping is necessary around the process let y = ...in to restrict the scope of y such that its evaluation
occurs independently of the process echo x.
1.2.3 Expressions
The other important syntactic class of JoCaml is the class
of expressions. By contrast with processes,
expressions evaluate to some results.
Expressions can occur at top-level. Expressions also occur in the
right-hand side of value bindings or as arguments to message sending.
Apart from OCaml expressions, the most basic expression sends some
values on a synchronous channel, which behave like OCaml
function. Synchronous channels return an answer, which is made of zero
or more results, such as the channel print introduced above.
# let x = 1
# ;;
# print x
# ;;
# print (x+1)
# ;;
val x : int
-> 12
In the program above 1, x and x+1 are expressions whose
evaluation returns a single value (here an integer). The expressions
print x and print (x+1) return empty results, the value ().
This makes sense with respect to synchronization: as top-level phrases
are evaluated by following textual ordering, the program above first
outputs 1 and then outputs 2.
Synchronous channels can be considered as functions, and used as such,
for instance in a sequence:
# let x = 1 in
# print x; print (x+1)
# ;;
-> 12
Sequences may also occur inside processes. The general form of a
sequence inside a process is expression ; process , where
the result of expression will be discarded. As expression can itself
be a sequence, thus one may write:
# spawn
# { print 1 ; print 2 ; echo 3 }
# ;;
-> 123
A sequence may be terminated by an empty process that does nothing and
is denoted by ``'', the empty sequence of characters.
Thus, an alternative to the previous example is as follows:
# spawn
# { print 1 ; print 2 ; print 3 ; }
# ;;
-> 123
This is why print_int x; in the definition of the echo channel is
considered as a process.
Concrete syntax for processes and expressions are purposely similar,
when not the same. A noticeable exception is grouping, which is
expressed by curly braces ``{'' and ``}'' in the case of processes
and by ordinary parentheses ``('' and ``)'' in the case of
expressions. Since grouping is necessary when there is a sequence or
a parallel composition in a branch of an if instruction, the
grouping is either if expression then { process 1 } else
{ process 2 } or if expression then ( expression 1 )
else ( expression 2 ), depending on whether the whole if is a
process or an expression. The same rule is applied when considering
matching.
1.2.4 More on channels
The guarded process in a channel definition can spawn several
messages, as in a stuttering echo channel:
# let def echo_twice! x = echo x | echo x
# ;;
val echo_twice : <<int>>
It is also possible to define directly such a channel, without
referring to the channel echo, but by using the OCaml function
print_int. In this case, it is necessary to enclose each use of
print_int in ``{'' and ``}'', as in this new
definition of echo_twice:
# let def echo_twice! x = {print_int x;} | {print_int x;}
# ;;
val echo_twice : <<int>>
This ``grouping'' is necessary because | binds more tightly than
;, as in:
# let def echo3! x = print_int x; echo x | echo x
# ;;
val echo3 : <<int>>
Now mixing synchronous and asynchronous calls:
# spawn {echo3 1}
# ;;
#
# print 2; print 2
# ;;
#
# spawn {echo_twice 3}
# ;;
-> 2213113
Observe that, since processes execute concurrently, 1221331 would
also be a perfectly valid output. Printing both 2's before any 3
is the only constraint here, by the synchronous character of print.
Since synchronous and asynchronous channels have different types, the
type-checker flags an error whenever a channel is used in the wrong
context.
# spawn {print 1}
# ;;
File "ex14.ml", line 9, characters 7-14:
Expecting an asynchronous channel, but receive int -> unit
Channels are polyadic with respect to both arguments and results (when
they exist).
For instance, print accepts one argument and returns an empty result.
The following channel f has arity two, both for argument and
result, as shown by its type.
# let def f (x,y) = reply x+y,y-x
# ;;
val f : int * int -> int * int
As in OCaml, polyadic results are exploited by using polyadic value bindings.
For instance the following program should print 3 on the console:
# let x,y = f (1,2)
# ;;
#
# print_int x
# ;;
val x : int
val y : int
-> 3
Since they have the same type and behave like functions, synchronous
names can be used to support a functional programming style. A
traditional example is the Fibonacci function.
# let def fib n =
# if n <= 1 then reply 1
# else reply fib (n-1) + fib (n-2)
# ;;
#
# print_int (fib 10)
# ;;
val fib : int -> int
-> 89
In contrast with value bindings, channel definitions are
always potentially recursive.
Port names are first-class values in JoCaml. They can be sent as
messages on other port names and returned as results. As a result,
higher order ``ports'' can be written, such as
# let def twice f =
# let def r x = reply f (f x) in
# reply r
# ;;
val twice : ('a -> 'a) -> 'a -> 'a
The type for twice is polymorphic: it includes a type variable 'a
that can be replaced by any type. Thus twice is a synchronous
channel that takes a synchronous channel (or a function) of type <'a> -> <'a> as
argument and returns one result of the same type.
For instance, 'a can be the type of integers or the type of strings
(^ is OCaml string concatenation):
# let def succ x = reply x+1
# ;;
#
# let def double s = reply s^s
# ;;
#
# let f = twice succ in
# let g = twice double in
# print_int (f 0) ; print_string (g "X")
# ;;
val succ : int -> int
val double : string -> string
-> 2XXXX
Threads are not part of the JoCaml language, but they are part
of the implementation. Threads are some kind of execution units at
the implementation level. Threads are created, may suspend then
resume, and finally die. Threads match the intuition of a ``direct''
causality. For instance, all the printing actions in a sequence are to be
executed one after another and clearly belong to an unique thread.
By contrast, the parallel composition operator ``|'' separates two
threads that execute independently.
More precisely, the following program creates three threads:
# spawn {
# {print_int 1 ; print_int 2 ; print_int 3 ;} |
# { {print_int 4;}
# | {print_int 5 ; print_int 6 ;}} }
# ;;
-> 123456
The output of the program is a mix of the output of its three
threads: 123, 4 and 56.
An output such as 12, then 4, then 356 would reveal that the thread
printing 1, 2 and 3 has been preempted before printing 3.
Sending messages on channels ultimately fires a new process and
should thus create a new thread. However, a new thread is not always
created.
For instance,
compare the following definitions that create infinitely many threads
and one, never dying, thread:
# let def rabbit! () = print_string "+" ; rabbit () | rabbit ()
# ;;
#
# let def forever () = print_string "-" ; forever() ; forever() ; reply
# ;;
val rabbit : <<unit>>
val forever : unit -> 'a
Sending a message on asynchronous rabbit, fires a small
thread that terminates spawning two new instances of rabbit, thus
creating two other thread.
By contrast, sending a message on synchronous forever blocks the sender
thread until the thread fired by the message completes.
The implementation takes
advantage of this: it does not fire a new thread and
executes the process guarded by forever () on the sender thread.
Here, message sending results in an ordinary function call.
This implementation behavior is exposed by the following program:
# spawn {
# rabbit () |
# {print_newline () ; exit 0;} |
# {forever () ;} }
# ;;
->
-> +----------------------------
The program concurrently starts three threads: first, a
exit 0 thread that flushes pending output and kills the
system; then, two crazy threads, rabbit () and forever ().
Considering processes only, the program may terminate after an
unspecified number of -'s and +'s has been printed.
Considering threads, more -'s than +'s should appear on the
console, since all -'s are printed by the same thread, whereas +'s
are printed by different threads. All these printing threads compete
for scheduling with the fatal thread and the forever() thread has to
be preempted to stop outputting.
As a conclusion, informal reasoning about threads helps predicting
program output, taking the implementation into account. What is
predicted is likely output and not correct output.
1.3 Modules
The current implementation of JoCaml relies on the same module
system as OCaml.
Users can create their own modules, compile them
separately and link them together into an executable program. For
instance, users may write a module stutter, that exports two
channels echo and print. Ideally, users first specify the names
exported by module stutter and their types, by writing an interface
file stutter.mli.
# val echo : <<int>>
# val print : int -> unit
The interface stutter.mli
is compiled by issuing the command joc stutter.mli. This produces an
object interface file stutter.cmi.
Then, the implementation file stutter.ml contains the actual
definitions for Stutter.echo and Stutter.print.
# let def echo! x = {print_int x ;} | {print_int x ;}
#
# let def print x = print_int x ; print_int x ; reply
The implementation file stutter.ml is compiled by issuing the
command joc -c stutter.ml (-c is the compile-only, do-not-link
option). This produces an object implementation file stutter.cmo.
Now that the module stutter is properly compiled, some other
implementation file user.ml can use it.
# Stutter.print 1
# ;;
#
# spawn {Stutter.echo 2 | Stutter.echo 3}
# ;;
The implementation file user.ml can be compiled into user.cmo by
issuing the command joc -c user.ml. This compilation uses the
compiled interface stutter.cmi. An executable a.out is produced by
the command joc stutter.cmo user.cmo that links the modules
stutter and user together. Alternatively, a.out can be produced
in one step by the command joc stutter.cmo user.ml.
Running a.out may produce the following output:
-> 113232
1.4 Join-patterns
Join patterns significantly extend port name definitions.
A join-pattern defines several ports simultaneously
and specifies a synchronization pattern between these co-defined
ports. For instance, the following source fragment defines two
synchronizing port names fruit and cake:
# let def fruit! f | cake! c =
# print_string (f^" "^c) ; print_newline () ;
# ;;
val cake : <<string>>
val fruit : <<string>>
To trigger the guarded process
print_string (f^" "^c) ; print_newline () ;,
messages must be sent on both fruit and cake.
# spawn {fruit "apple" | cake "pie"}
# ;;
-> apple pie
The parallel composition operator ``|'' appears both in
join-patterns and in processes. This
highlights the kind of synchronization that the pattern matches.
Join-definitions such as the one for fruit and cake provide a
simple mean to express non-determinism.
# spawn {fruit "apple" | fruit "raspberry" | cake "pie" | cake "crumble"}
# ;;
-> raspberry pie
-> apple crumble
Two cake names must appear on the console, but both combinations of
fruits and cakes are correct.
Composite join-definitions can specify several synchronization
patterns.
# let def apple! () | pie! () = print_string "apple pie" ;
# or raspberry! () | pie! () = print_string "raspberry pie" ;
val pie : <<unit>>
val apple : <<unit>>
val raspberry : <<unit>>
Observe that the name pie is defined only once. Thus, pie
potentially takes part in two synchronizations. This co-definition is
expressed by the keyword or.
Again, internal choice is performed when only one invocation
of pie is present:
# spawn {apple () | raspberry () | pie ()}
# ;;
-> raspberry pie
Join-patterns are the programming paradigm for concurrency in JoCaml.
They allow the encoding of many concurrent data structures. For instance, the following
code defines a counter:
# let def count! n | inc () = count (n+1) | reply to inc
# or count! n | get () = count n | reply n to get
# ;;
#
# spawn {count 0}
# ;;
val inc : unit -> unit
val count : <<int>>
val get : unit -> int
This definition calls for two remarks. First, join-pattern may mix
synchronous and asynchronous message, but when there are several
synchronous message, each reply construct must specify the name to
which it replies, using the new reply ... to name construct.
In the case where there is a single synchronous name in the pattern,
the to construct is optional. For instance, it was not necessary in the
previous example.
Second, the usage of the name count above is a typical way of ensuring
mutual exclusion. For the moment, assume that there is at most one
active invocation on count. When one invocation is active,
count holds the counter value as a message and the counter is ready to
be incremented or examined. Otherwise, some operation is being
performed on the counter and pending operations are postponed until
the operation being performed has left the counter in a consistent
state. As a consequence, the counter may be used consistently by
several threads.
# spawn {{inc () ; inc () ;} | {inc() ;}}
# ;;
#
# let def wait! () =
# let x = get () in
# if x < 3 then wait () else {
# print_string "three is enough !!!" ; print_newline () ;
# }
# ;;
#
# spawn {wait ()}
# ;;
val wait : <<unit>>
-> three is enough !!!
Ensuring the correct counter behavior in the example above requires some
programming discipline: only one initial invocation on count has to
be made.
If there are more than one simultaneous invocations on count, then mutual
exclusion is lost. If there is no initial invocation on count, then
the counter will not work at all.
This can be avoided by making the count, inc and get names local
to a create_counter definition and then by exporting inc and get
while hiding count, taking advantage of lexical
scoping rules.
# let def create_counter () =
# let def count! n | inc0 () = count (n+1) | reply
# or count! n | get0 () = count n | reply n in
# count 0 | reply inc0, get0
# ;;
#
# let inc,get = create_counter ()
# ;;
val create_counter : unit -> (unit -> unit) * (unit -> int)
val inc : unit -> unit
val get : unit -> int
This programming style is reminiscent of ``object-oriented'' programming:
a counter is a thing called an object, it has some internal state
(count and its argument), and it exports some methods to the
external world (here, inc and get). The constructor
create_counter creates a new object, initializes its internal state,
and returns the exported methods.
As a consequence, several counters may be allocated and used independently.
1.5 Control structures
Join-pattern synchronization can express many common programming
paradigms, either concurrent or sequential.
1.5.1 Control structures for concurrency
Locks
Join-pattern synchronization can be used to emulate simple locks:
# let def new_lock () =
# let def free! () | lock () = reply
# and unlock () = free () | reply in
# free () | reply lock, unlock
# ;;
val new_lock : unit -> (unit -> unit) * (unit -> unit)
Threads try to acquire the lock by performing a synchronous call on
channel lock. Due to the definition of lock(), this consumes the
name free and only one thread can get a response at a time. Another
thread that attempts to acquire the lock is blocked until the thread
that has the lock releases it by the synchronous call unlock that
fires another invocation of free.
As in OCaml, it is possible to introduce several bindings with the
and keyword. These bindings are recursive.
To give an example of lock usage, we introduce channels that
output their string arguments several times:
#
# let def double p =
# let def r s = p s ; p s ; reply in
# reply r
# ;;
#
# let def print_port s = print_string s ; Thread.delay 0.001; reply
# ;;
#
# let print16 = double(double(double(double(print_port))))
# ;;
val double : ('a -> 'b) -> 'a -> 'b
val print_port : string -> unit
val print16 : string -> unit
The Thread.delay calls prevents the same thread from running long
enough to print all its strings.
Now consider two threads, one printing -'s, the other printing
+'s.
# spawn {{print16 "-" ;} | {print16 "+" ;}}
# ;;
-> -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
As threads execute concurrently, their outputs may mix, depending upon
scheduling.
However, one can use a lock to delimit a critical section and prevent
the interleaving of -'s and +'s.
# let lock, unlock = new_lock ()
# ;;
#
# spawn {
# {lock () ; print16 "-" ; unlock () ;} |
# {lock () ; print16 "+" ; unlock () ;} }
# ;;
#
val lock : unit -> unit
val unlock : unit -> unit
-> ----------------++++++++++++++++
Barriers
A barrier is another common synchronization mechanism. Basically, barriers
define synchronization points in the execution of parallel tasks.
Here is a simple barrier that synchronizes two threads:
# let def join1 () | join2 () = reply to join1 | reply to join2
# ;;
val join2 : unit -> unit
val join1 : unit -> unit
The definition above includes two reply constructs, which makes the
mention of a port mandatory.
The following two threads print ba or ab between matching parenthesis:
# spawn {
# {print_string "(" ; join1 () ; print_string "a" ; join1() ;
# print_string ")" ;}
# |
# {join2 () ; print_string "b" ; join2 () ;} }
# ;;
-> (ab)
Bi-directional channels
Bi-directional channels appear in most process calculi. In the
asynchronous pi-calculus, for instance, and for a given channel
c, a value v can be sent asynchronously on c
(written c![v]) or received from c and bound to
some variable x in some guarded process P (written
c?x.P). Any process can send and receive on the
channels they know. In contrast, A JoCaml process that knows a channel
can only send messages on it, whereas a unique channel definition
receives all messages. Finally, the scope of a pi-calculus channel
name c is defined by the new c in P operator.
Such an operator does not exist in JoCaml, since join-definitions are
binding constructs.
Nonetheless, bi-directional channels can be defined in JoCaml as
follows:
# let def new_pi_channel () =
# let def send! x | receive () = reply x in
# reply send, receive
# ;;
val new_pi_channel : unit -> <<'a>> * (unit -> 'a)
A pi-calculus channel is implemented by a join definition with two
port names. The port name send is asynchronous and is used to send
messages on the channel. Such messages can be received by making a
synchronous call to the other port name receive. Let us now
``translate'' the pi-calculus process
new c,d in c![1] | c![2] | c?x.d![x+x] | d?y. print(y))
We get:
# spawn {
# let sc, rc = new_pi_channel ()
# and sd, rd = new_pi_channel () in
# sc 1 | sc 2 | {let x = rc () in sd (x+x)} |
# {let y = rd () in print_int y ;} }
# ;;
-> 2
Synchronous pi-calculus channels are encoded just as easily as
asynchronous ones: it suffices to make send synchronous:
# let def new_pi_sync_channel () =
# let def send x | receive () = reply x to receive | reply to send in
# reply send, receive
# ;;
val new_pi_sync_channel : unit -> ('a -> unit) * (unit -> 'a)
Join-patterns are also useful for expressing various programming
control structures. Our first examples deal with iterations on an
integer interval.
Simple loops
Asynchronous loops can be used when the execution order for the
iterated actions is irrelevant, e.g., when these actions are
asynchronous.
# let def loop! (a,x) = if x > 0 then {a () | loop (a,x-1)}
# ;;
#
# let def echo_star! () = print_string "*" ;
# ;;
#
# spawn {loop (echo_star,5)}
# ;;
val loop : <<(<<unit>> * int)>>
val echo_star : <<unit>>
-> *****
When execution order matters, a sequential loop is preferable:
# let def loop (a,x) = if x > 0 then {a x ; loop (a,x-1) ; reply} else reply
# ;;
#
# let def print x = print_int(x) ; print_string " " ; reply
# ;;
#
# loop (print,5)
# ;;
val loop : (int -> 'a) * int -> unit
val print : int -> unit
-> 5 4 3 2 1
When the loop produces a result, this result can be
computed inside reply constructs and accumulated. The following
example computes the sum of the squares of the integer between 1
and 32:
# let def sum (i0,f) =
# let def iter i =
# if i > 0 then reply (f i) + iter (i-1) else reply 0 in
# reply iter i0
# ;;
#
# let def square x = reply x*x
# ;;
val sum : int * (int -> int) -> int
val square : int -> int
# print_int (sum (32,square))
# ;;
-> 11440
Port name definitions such as the one for iter above belong to a
functional programming style. In particular, the various iterations
of the loop body (computing f i here) never execute concurrently,
since iterations are performed one after the other.
However, since integer addition is associative and commutative, the
summing order of the various f i does not matters. Thus,
asynchronous iteration can be used here, leading to a program with
much more opportunities for concurrent execution:
# let def sum (i0,f) =
#
# let def add! dr | total (r,i) =
# let r' = r+dr in
# if i > 1 then reply total (r',i-1)
# else reply r' in
#
# let def loop! i = if i > 0 then {add (f i) | loop (i-1)} in
# loop i0 | reply total (0,i0)
# ;;
#
# print_int( sum (32,square))
# ;;
val sum : int * (int -> int) -> int
-> 11440
Observe how the loop result is accumulated using the synchronous name
total.
The argument in the reply to sum consists in one call to total.
This trick enables the synchronous sum to return its result when
the asynchronous loop is over. In fact, the current JoCaml language places
strong restrictions on the positioning of reply constructs in
definitions, and a direct reply to sum
from within the sub-definition for add! dr | total (r,i) is rejected
by the compiler:
# let def sum (i0,f) =
#
# let def add! dr | total! (r,i) =
# let r' = r+dr in
# if i > 1 then total (r',i-1)
# else reply r' to sum in
#
# let def loop! i = if i > 0 then {add(f(i)) | loop(i-1)} in
# loop(i0)
# ;;
File "ex45.ml", line 6, characters 10-25:
Reply to channel external from def sum
Distributed loops
Sharing a loop between several ``agents'' requires more work. Let us
informally define an agent as some computing unit. In this section, an
agent is represented by a synchronous channel. In a more realistic
setting, different agents would reside on different computers. The
agent paradigm then serves to allocate computing resources (see
section 2.3.3 in the next chapter).
For instance, here are two agents square1 and square2.
The agent square1 modelizes a fast machine, whereas square2
modelizes a slow machine by computing squares in a uneficient way.
Additionally, square1 and square2 differ marginally by their
console output: square1 outputs a +, while
square2 outputs a * when it starts and a - just before answering.
# let def square1 i = print_string "+" ; reply i*i
# ;;
#
# let def square2 i =
# print_string "*" ;
# let def total! r | wait () = reply r in
# let def mult! (r,j) =
# if j <= 0 then total r else mult (r+i,j-1) in
# mult (0,i) |
# let r = wait () in
# print_string "-" ; reply r
# ;;
val square1 : int -> int
val square2 : int -> int
Sharing a loop
between several agents is allocating the iterations to be performed to
them. The following channel make_sum, returns a register
channel and a wait channel. An agent registers by sending its
computing channel on register. The final loop result is returned on wait.
# let def make_sum i0 =
#
# let def add! dr | total i =
# if i > 1 then reply dr+total(i-1)
# else reply dr
# and wait () = reply total i0 in
#
# let def loop! i | register! f =
# if i > 0 then {add (f i) | register f | loop (i-1) } in
#
# loop i0 | reply register, wait
# ;;
val make_sum : int -> <<(int -> int)>> * (unit -> int)
The only difference with the asynchronous sum loop from the previous section
resides in the replacement of the definition let def loop! i = ... by
the join-pattern definition let def loop! i | register f = .... As a
consequence, the agents square1 and square2 may now
compete for loop iterations, provided two invocations
register square1 and register square2 are active.
# let register, wait = make_sum 32
# ;;
#
# spawn {register square1 | register square2}
# ;;
#
# print_int (wait ())
# ;;
val register : <<(int -> int)>>
val wait : unit -> int
-> *+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+----------------11440
The distributed loop above is not satisfactory, since it does not take
the relative computing speed of square1 and square2 into account
while allocating iterations. The add(f i) jobs are spawned
asynchronously, so that the different iterations performed by a given
agent are executed concurrently. As a result, the iteration space is
partitioned evenly between square1 and square2, as illustrated
by the output above. This leads to a poor load balance, since the fast
square1 stands idle at the end of the loop, while the slow square2
is overwhelmed.
A better solution is for an an agent to execute its share of work
in sequence rather than
concurrently. This is achieved by the slightly modified definition
for loop i | register f below:
# let def make_sum i0 =
#
# let def add! dr | total i =
# if i > 1 then reply dr+total(i-1)
# else reply dr in
#
# let def wait () = reply total i0 in
#
# let def loop! i | register! f =
# if i > 0 then
# {loop(i-1) |
# let r = f i in
# add r | register f} in
#
# loop i0 | reply register, wait
# ;;
val make_sum : int -> <<(int -> int)>> * (unit -> int)
In the new definitions, register f is launched again only once f i is
computed. By contrast, loop (i-1) is launched immediately, for another
agent to grab the next iteration as soon as possible.
# let register,wait = make_sum 32
# ;;
#
# spawn {register square1 | register square2}
# ;;
#
# print_int (wait ())
# ;;
val register : <<(int -> int)>>
val wait : unit -> int
-> *+++++++++++++++++++++++++++++++-11440
1.6 Data structures
To explore the expressive power of message-passing in JoCaml, we now
consider the encoding of some data structures. In practice however,
one would use the state-of-the-art built-in data structures inherited
from OCaml, rather than their Jocaml internal encodings.
Polymorphic pairs can be encoded quite easily in JoCaml by taking
advantage of port name arity.
# let def create (x,y) =
# let def get () = reply x,y in
# reply get
# ;;
val create : 'a * 'b -> unit -> 'a * 'b
As exposed by the type above, a pair is a synchronous port
name that takes zero arguments and returns two values.
The synchronous name create returns such a name when given the
values to store in the pair.
The content of a pair can be retrieved by sending a message of arity
zero on it:
# let def fst p = let x,y = p () in reply x
# and snd p = let x,y = p () in reply y
# ;;
val fst : (unit -> 'a * 'b) -> 'a
val snd : (unit -> 'c * 'd) -> 'd
Pairs can now be used in an ``abstract'' fashion, by considering only
the constructor create and the two destructors, fst and snd:
# let p = create (1,"four")
# ;;
# print_int (fst p + String.length (snd p))
# ;;
val p : unit -> int * string
-> 5
1.6.2 Encoding data structures in object-oriented style
A convenient programming style for expressing encodings of data
structures in JoCaml is the so-called object-oriented style. An object
has an internal, hidden, state and exports some methods, (here some
synchronous names) that operate on its internal state. Consider for
instance the following encoding for lists:
# let def cons (h,t) =
# let def head () = reply h
# and tail () = reply t
# and self () = reply false, head, tail in
# reply self
# and nil () =
# let def head () = reply failwith "head of nil"
# and tail () = reply failwith "tail of nil"
# and self () = reply true, head, tail in
# reply self
# ;;
val cons : 'a * 'b -> unit -> bool * (unit -> 'a) * (unit -> 'b)
val nil : unit -> unit -> bool * (unit -> 'c) * (unit -> 'd)
The internal state of a list cell is an emptiness status (true or
false) plus, when appropriate, two subcomponents (h and t).
The emptiness status is exposed directly to the external world. The two
methods head and tail give access to the subcomponents of a
non-empty list cell or fail. Observe
how the name self is introduced to pack methods together
in the reply to cons and nil. Finally, the types for the
results of nil and cons are the same. Hence, both kinds of list
cells are values of the same type.
Lists can now be used by directly retrieving methods:
# let def list_concat l =
# let n,h,t = l () in
# if n then reply ""
# else reply h()^list_concat(t ())
# ;;
val list_concat :
(unit -> bool * (unit -> string) * (unit -> 'a) as 'a) -> string
The above type is recursive. This reflects the fact that lists are
recursive data structures. More precisely, the type of the cons-cell
creator cons was not recursive. However, the recursive type for
lists of strings appears naturally when writing a function that
traverses lists.
Finally, the following source fragment shows how to create and use
string lists:
# let def replicate (elem,n) =
# if n <= 0 then reply nil ()
# else reply cons (elem, replicate (elem,n-1))
# ;;
#
# print_string (list_concat (replicate ("X",16)))
# ;;
val replicate :
'a * int -> (unit -> bool * (unit -> 'a) * (unit -> 'b) as 'b)
-> XXXXXXXXXXXXXXXX
1.6.3 Mutable data structures
Object states, represented as join-patterns,
can be altered by invoking the appropriate methods.
Here is a definition for a reference cell. One method (get) examines
the content of the cell, while another (put) alters it.
# let def create_ref y0 =
# let def state! y | get () = state y | reply y
# or state! y | put new_y = state new_y | reply in
# state y0 | reply get, put
# ;;
val create_ref : 'a -> (unit -> 'a) * ('a -> unit)
Here, the internal state of a cell is its content, its is stored
as a message y on the channel state. Lexical scoping is used to keep the
state internal to a given cell.
# let gi, pi = create_ref 0
# and gs, ps = create_ref ""
# ;;
val gi : unit -> int
val pi : int -> unit
val gs : unit -> string
val ps : string -> unit
1.6.4 A concurrent FIFO
We are now ready for a more sophisticated example of data structure
encoding in JoCaml. First we define a new kind of list
cells. Such a cell always holds an element (x below). When created,
it stands in the first position of a list, and a name (see first
below) reflecting that status is activated. Then, when another
element is cons-ed is front of the list, a cell additionally holds a
pointer to this previous element (see the pattern first! () | set_prev prev below). In the end, a cell can be destroyed (see
kill below), it then returns both its content, a pointer to the
previous cell (when it exists), and a boolean, which is set to true,
when the destroyed cell is the only one in the list.
# let def new_cell x =
# let def first! () | set_prev prev = inside prev | reply
# or inside! prev | kill () = reply x, prev, false
# or first! () | kill () = reply x, self, true
# or self () = reply set_prev, kill in
# first () | reply self
# ;;
val new_cell : 'a -> (unit -> ('b -> unit) * (unit -> 'a * 'b * bool) as 'b)
A fifo is a data structure that provides two methods put and get.
The method put stores its argument in the fifo, while the method get
retrieves one element from the fifo. The internal state of the fifo
is either empty (and then the name empty is activated), or it
contains some elements (internally, non-empty fifos have the name state
activated). The name state holds two arguments: fst is a pointer to
the first cell of the element list, while lst is a pointer to the
last cell of the element list. Thus, elements stored in the fifo are
cons-ed in front of fst (see put below), while retrieved elements
are taken from the end of the element list (see get below).
There is no empty! () | get () pattern below. As a consequence, an
attempt to retrieve an element from an empty fifo is not an error:
answering to get is simply postponed until the fifo fills in.
# let def fifo () =
# let def empty! () | put x =
# let fst = new_cell x in
# state (fst,fst) | reply
# or state! (fst,lst) | put x =
# let new_fst = new_cell x in
# let set, rem = fst () in
# set new_fst ;
# state (new_fst,lst) | reply
# or state! (fst,lst) | get () =
# let set, rem = lst () in
# let x, prev, last_cell = rem () in
# if last_cell then empty () else state (fst,prev) |
# reply x in
# empty () | reply put, get
# ;;
val fifo : unit -> ('a -> unit) * (unit -> 'a)
From the fifo point of view, elements are retrieved in the order they
are stored. In a concurrent setting this means that when a thread
performs several get in a row, the retrieved elements come in an
order that is compatible with the order in which another thread feeds
the fifo.
# spawn {
# let put, get = fifo () in
# {print_int (get ()) ; print_int (get ()) ; print_newline () ;} |
# {let x = get () and y = get () in 0;} |
# {put(1) ; put(2) ; put(3) ; put(4);} }
# ;;
-> 24
Therefore, the program above prints two integers from the set {1, 2, 3, 4} in increasing order.
1.7 A word on typing
The JoCaml type system is derived from the ML type system and
it should be no surprise to ML programmers. The key point in typing
à la ML is parametric polymorphism. For instance, here is a
polymorphic identity function:
# let def id x = reply x
# ;;
val id : 'a -> 'a
The type for id contains a type variable ``'a'' that can be
instantiated to any type each time id is actually used.
Such a type variable is a generalized type variable.
For instance, in the following program, variable ``'a'' is
instantiated successively to int and string:
# let i = id 1 and s = id "coucou"
# ;;
#
# print_int i ; print_string s
# ;;
val i : int
val s : string
-> 1coucou
In other words, the first occurrence of id above has type int -> int, while the second has type string -> string.
Experienced ML programmers may wonder how JoCaml
type system achieves mixing parametric polymorphism and mutable data
structures. There is no miracle here. Consider, again, the
JoCaml encoding of a reference cell:
# let def state! x | get () = state x | reply x
# or state! x | set new_x = state new_x | reply
# ;;
val get : unit -> '_a
val state : <<'_a>>
val set : '_a -> unit
The type variable ``'_a'' that appears inside the types for state,
get and set is prefixed by an underscore ``_''. Such type
variables are non-generalized type variables that are instantiated only once.
That is, all the occurrences of state must have the same type. Moreover,
once ``'_a'' is instantiated with some type, this type replaces ``'_a'' in
all the types where ``'_a'' appears (here, the types for get and set).
This wide-scope
instantiation guarantees that the various port names whose type
contains ``'_a'' (state, get and
set here) are used consistently.
More specifically, if ``'_a'' is instantiated to some type int,
by sending the message 0 on state. Then, the type for
get is unit -> int in the rest of the program, as shown by the
type for x below. As a consequence, the following program does not
type-check and a runtime type-error (printing an integer, while
believing it is a string) is avoided:
# let def state! x | get () = state x | reply x
# or state! x | set new_x = state new_x | reply
# ;;
#
# spawn {state 0}
# ;;
#
# let x = get ()
# ;;
#
# print_string x
# ;;
File "ex65.ml", line 13, characters 13-14:
This expression has type int but is here used with type string
Non generalized type variables appear when
the type of several co-defined port names share a type variable.
Such a type variable is not generalized.
# let def port! p | arg! x = p x
# ;;
val arg : <<'_a>>
val port : <<<<'_a>>>>
A workaround is to encapsulate the faulty names into another port name
definition that defines only one name. This restores polymorphism.
# let def make_it () =
# let def port! p | arg! x = p x in reply port,arg
# ;;
val make_it : unit -> <<<<'a>>>> * <<'a>>
Non-generalized type variables also appear in the types of the
identifiers defined by a value binding.
# let p1, a1 = make_it ()
# ;;
#
# let p2, a2 = make_it ()
# ;;
#
# let def echo! x = print_int x;
# and echo_string! x = print_string x;
# ;;
#
# spawn {p1 echo | p2 echo_string | a1 1 | a2 "coucou"}
# ;;
val p1 : <<<<int>>>>
val a1 : <<int>>
val p2 : <<<<string>>>>
val a2 : <<string>>
val echo : <<int>>
val echo_string : <<string>>
-> coucou1
It is interesting to notice that invoking make_it () twice produces
two different sets of port and arg port names, whose types contain
different type variables. Thereby, programmers make explicit the
different type instantiations that are performed silently by the
compiler in the case of generalized type variables.
1.8 Exceptions
Since processes are mapped to several threads at run-time, it is
important to specify their behaviours in the presence of exceptions.
Exceptions behave as in OCaml for OCaml expressions. If the exception
is not catched in the expression, the behaviour will depend on the
synchrony of the process.
If the process is asynchronous, the exception is printed on the
standard output and the asynchronous process terminates. No other
process is affected.
# spawn {
# {failwith "Bye bye";}
# | {for i = 1 to 10 do print_string "-" done;}
# }
# ;;
-> Uncaught exception: Failure("Bye bye")
-> ----------
However, if the process was synchronous, the process waiting for
the result of this process will receive the exception, which will be
propagated as in an OCaml function.
# let def die () = failwith "die"; reply
# ;;
#
# try
# die ()
# with _ -> print_string "dead\n"
# ;;
val die : unit -> 'a
-> dead
Several processes may be waiting for a result as an exception is
raised---this is the case for instance when their reply constructs are
syntactically guarded by a shared expression that raises the
exception. In such cases, the exception is duplicated and thrown at
all threads, reversing joins into forks.
# let def a () | b () = failwith "die"; reply to a | reply to b
# ;;
#
# spawn {
# { (try a () with _ -> print_string "hello a\n"); }
# | { (try b () with _ -> print_string "hello b\n"); }
# } ;;
val b : unit -> unit
val a : unit -> unit
-> hello a
-> hello b
Chapter 2 Distributed programming
This chapter presents the distributed and mobile features of
JoCaml.
JoCaml is specifically designed to provide a simple and well-defined
model of distributed programming. Since the language entirely relies
on asynchronous message-passing, programs can either be used on a
single machine (as described in the previous chapter), or they can be
executed in a distributed manner on several machines.
In this chapter, we describe support for execution on several machines
and new primitives that control locality, migration, and failure. To
this end, we interleave a description of the model with a series of
examples that illustrate the use of these primitives.
2.1 The Distributed Model
The execution of JoCaml programs can be distributed among numerous
machines, possibly running different systems; new machines may join or
quit the computation.
At any time, every process or expression is running on a given
machine. However, they may migrate from one machine to another, under
the control of the language.
In this implementation, the runtime support consists of several system-level
processes that communicate using TCP/IP over the network.
In JoCaml, the execution of a process (or an expression)
does not usually depend on its localization. Indeed, it is equivalent
to run processes P and Q on two different machines, or to run
the compound process { P | Q } on a single machine.
In particular, the scope for defined names and values does not depend
on their localization: whenever a port name appears in a process, it
can be used to form messages (using the name as the address, or as the
message contents) without knowing whether this port name is locally-
or remotely-defined. So far, locality is transparent, and programs can
be written independently of their run-time distribution.
Of course, locality matters in some circumstances: side-effects such
as printing values on the local terminal depend on the current
machine; besides, efficiency can be affected because message-sending
over the network takes much longer than local calls; finally, the
termination of some underlying runtime will affect all its local
processes.
For all these reasons, locality is explicitly controlled within
JoCaml; it can be adjusted using migration. In contrast,
resources such as definitions and processes are never silently
relocated by the system.
An important issue when passing messages in a distributed system is
whether the message contents is replicated or passed by reference.
This is the essential difference between functions and synchronous
channels.
When a function is sent to a remote machine, its code and the values
for its local variables are also sent there, and any invocation will
be executed locally on the remote machine.
When a synchronous port name is sent to a remote machine, only the
name is sent and invocations on this name will forward the invocation
to the machine were the name is defined, much as in a remote procedure
call.
The name-server
Since JoCaml has lexical scoping, programs being executed on different
machines do not initially share any port name; therefore, they would
normally not be able to interact with one another. To bootstrap a
distributed computation, it is necessary to exchange a few names, and
this is achieved using a built-in library called the name server. Once
this is done, these first names can be used to communicate some more
names and to build more complex communication patterns.
The interface of the name server mostly consists of two functions to
register and look up arbitrary values in a ``global table'' indexed by
plain strings (see 3.2.3 for reference). To
use the name server, do not forget to first launch it, using the
command jocns in a shell.
For instance, the following program contains two processes running in
parallel. One of them locally defines some resource (a function f
that squares integers) and registers it under the string ``square''.
The other process is not within the scope of f; it looks up for the
value registered under the same string, locally binds it to sqr,
then uses it to print something.
# spawn{ let def f x = reply x*x in Ns.register "square" f vartype; }
# ;;
# spawn{ let sqr = Ns.lookup "square" vartype in print_int (sqr 2); exit
# 0;}
# ;;
File "ex1.ml", line 3, characters 36-43:
Warning: VARTYPE replaced by type
( int -> int) metatype
File "ex1.ml", line 1, characters 57-64:
Warning: VARTYPE replaced by type
( int -> int) metatype
-> 4
The vartype keyword stands for the type of the value that is being
registered or looked up, and it is supplied by the compiler. When a
value is registered, its type is explicitely stored with it. When a
value is looked up, the stored type is compared with the inferred type
in the receiving context; if these types do not match, an exception
TypeMismatch is raised. This limited form of dynamic typing is
necessary to ensure type safety; to prevent run-time TypeMismatch
exceptions, the compiler also provides the inferred vartype at both
ends of the name server, here int -> int.
Of course, using the name server makes sense only when the two
processes are running as part of stand-alone programs on different
machines, and when these processes use the same conventional strings
to access the name server. To avoid name clashes when using the same
name server for unrelated computations, a local identifier Ns.user
is appended to the index string; by default, Ns.user contains the
local user name.
Running several programs in concert
The runtimes that participate to a distributed computation are
launched as independent executables, e.g. bytecode executables
generated by the compiler and linked to the distributed runtime. Each
runtime is given its own JoCaml program.
When lookup or register is first used, the runtime attempts to
communicate with its name server, using by default the IP address and
port number provided in two environment variables JNSNAME and
JNSPORT. If no name server is running, the lookup or register
call fails. The IP address for the server (or its port number) can
also be supplied as command-line parameters when the runtime is
started.
The following example illustrates this with two machines. Let us
assume that we have a single machine ``here.inria.fr'' that is
particularly good at computing squares of integers; on this machine,
we define a square function that also prints something when it is
called (so that we can keep track of what is happening), and we
register this function with key ``square'':
# let def f x =
# print_string ("["^string_of_int(x)^"] "); flush stdout;
# reply x*x in
# Ns.register "square" f vartype
# ;;
#
# Join.server ()
The ``Join.server'' primitive tells the program to keep running
after the completion of all local statements, so that it can serve
remote calls.
On machine here.inria.fr, we compile the previous program (p.ml)
and we execute it:
here> joc p.ml -o p.out
here> ./p.out
We also write a program that relies on the previous machine to compute
squares; this program first looks up for the name registered by
here.inria.fr, then performs some computations and reports their
results.
# let sqr = Ns.lookup "square" vartype
# ;;
#
# let def log (s,x) =
# print_string ("q: "^s^"= "^string_of_int(x)^"\n"); flush stdout; reply
# ;;
#
# let def sum (s,n) = reply (if n = 0 then s else sum (s+sqr(n),n-1))
# ;;
#
# log ("sqr 3",sqr 3);
# log ("sum 5",sum (0,5));
# exit 0
On another machine there.inria,fr, we compile and run our second
program (q.ml), after setting the address of our name server (we
here use bash syntax to set the environment variable):
there> joc q.ml -o q.out
there> export JNSNAME=here.inria.fr
there> ./q.out
What is the outcome of this computation? Whenever a process defines new
port names, this is done locally, that is, their guarded processes
will be executed at the same place as the defining process. Here,
every call to square in sqr 3 and within sum 5 will be evaluated
as a remote function call to here.inria.fr. The actual localization
of processes is revealed by the print_int statements: f (aliased to
sqr on there.inria.fr) always prints on machine here, and log
always prints on machine there, no matter where the messages are
posted.
The result on machine here.inria.fr is:
while the result on machine there.inria.fr is:
Polymorphism and the name server
Typing the previous programs in the toplevel joctop gives an
interesting error. Indeed, everything goes well until the following
line:
# let sqr = Ns.lookup "square" vartype
# ;;
which results in the error:
-> Failure: Failure("vartype: no polymorphism is allowed")
The current implementation of the name server only deals with
monomorphic types. Since there is no indication on the type of sqr
when it is looked up, the lookup fails. A solution would be to give
the type of sqr, as in:
# let sqr : int -> int = Ns.lookup "square" vartype
# ;;
This error does not occur when compiling the file q.ml because later
use of sqr make its type precise enough.
2.2 Locations and Mobility
So far, the localization of processes and expressions is entirely
static. In some cases, however, a finer control is called for. To
compute the sum of squares in the previous example, each call to sqr
within the loop resulted in two messages on the network (one for the
request, and another one for the answer). It would be better to run
the whole loop on the machine that actually computes squares. Yet, we
would prefer not to modify the program running on the server every
time we need to run a different kind of loop that involves numerous
squares.
To this end, we introduce a unit of locality called ``location''.
A location contains a bunch of definitions and running processes ``at
the same place''. Every location is given a name, and these location
names are first-class values. They can be communicated as contents of
messages, registered to the name server, ...much as port names.
These location names can also be used as arguments to primitives that
dynamically control the relations between locations.
2.2.1 Basic examples
Locations can be declared either locally or as a top-level statement.
For instance, we create a new location named this_location:
# let loc this_location
# def square x = reply x*x
# and cubic x = reply (square x)*x
# do {
# print_int (square 2);
# }
# ;;
# print_int (cubic 2)
# ;;
val this_location : Join.location
val cubic : int -> int
val square : int -> int
-> 48
The let loc declaration binds a location name this_location, and
two port names square and cubic whose scope extends to the
location and to the following statements. Here, the declaration also
has an init part that starts a process {print_int (square 2);}.
This process runs within the location, in parallel with the remaining part
of the program. As a result, we can obtain either 84 or 48.
Distributed computations are organized as trees of nested locations;
every definition and every process is permanently attached to the
location where it appears in the source program; a process can create
new sublocations with an initial content (bindings and processes) and
a fresh location name. Once created, there is no way to place new
bindings and processes in the location from outside the location.
For instance, the following program defines three locations such that
the locations named kitchen and living_room are sublocations of
house. As regards the scopes of names, the locations kitchen,
living_room and the ports cook, switch, on, off all have the
same scope, which extends to the whole house location (betweeen the
first do { and the last }); the location name house has a larger
scope that would include whatever follows in the source file.
# let loc house
# do {
# let loc kitchen
# def cook () = print_string "cooking...\n"; reply
# do {}
# and living_room
# def switch () | off! () = print_string "music on\n"; on () | reply
# or switch () | on! () = print_string "music off\n"; off() | reply
# do { off () }
# in
# switch (); cook (); switch ();
# }
# ;;
val house : Join.location
-> music on
-> cooking...
-> music off
2.2.2 Mobile Agents
While processes and definitions are statically attached to their
location, locations can move from one enclosing location to another.
Such migrations are triggered by a process inside of the moving
location. As a result of the migration, the moving location becomes a
sublocation of its destination location.
Notice that locations can be used for several purposes: as a
destination addresses, as mobile agents, or as a combination of the
two.
Our next example is an agent-based variant of the square example
above. On the squaring side, we create a new empty location
``here'', and we register it on the name-server; its name will be
used as the destination address for our mobile agent.
# let loc here do {}
# ;;
#
# Ns.register "here" here vartype;
# Join.server ()
On the client side, we create another location ``mobile'' that wraps
the loop computation that should be executed on the square side; the
process within mobile first gets the name here, then migrates its
location inside of ``here''. Once this is done, it performs the
actual computation.
# let loc mobile
# do {
# let here = Ns.lookup "here" vartype in
# go here;
# let sqr = Ns.lookup "square" vartype in
# let def sum (s,n) =
# reply (if n = 0 then s else sum (s+sqr n, n-1)) in
# let result = sum (0,5) in
# print_string ("q: sum 5= "^string_of_int result^"\n"); flush stdout;
# }
The go here primitive migrates the mobile location with its
current contents to the machine here.inria.fr, as a sub-location of
location here, then it returns. Afterwards, the whole computation
(calls to the name server, to sqr and to sum) is local to
here.inria.fr. There are only three messages exchanged between the
machines: one for the lookup here request, one for the answer,
and one for the migration.
Let us consider a variant of mobile that combines migration and
remote communication:
# let sqr = Ns.lookup "square" vartype
# let here = Ns.lookup "here" vartype
#
# let def done1! () | done2! () = exit 0;
# ;;
#
# let def log (s,x) =
# let r = string_of_int x in
# print_string ("agent: "^s^" is "^r^"\n"); flush stdout;
# reply
# ;;
# let loc mobile
# def quadric x = reply sqr(sqr x)
# and sum (s,n,f) = reply (if n = 0 then s else sum(s+f n, n-1, f))
# do {
# go here;
# log ("sum ( i^2 , i= 1..10 )",sum (0,10,sqr));
# done1 () }
# ;;
# spawn {log ("sum ( i^4 , i= 1..10 )", sum (0,10,quadric)); done2 ()}
As before, the mobile agent contains a process that first controls the
migration, then performs some computation. Here, the location mobile
is also used as a container for the definitions of quadric and
sum. The scoping rules are never affected by migrations, so
quadric and sum can be used in the following expressions that
remain on machine there.
Once the agent arrives on the square machine here, the calls to
sum and quadric become remote calls, as if both functions had been
defined and registered on machine here. Conversely, messages sent to
log from the body of the mobile agent arrive on machine
there where the result is printed. As we run this program on machine
there, we thus obtain the local output:
As regards locality, every repeated use of sqr is now performed on
machine here. In the example, the computation of the two sums is
entirely local once the agent arrives on machine here (one network
datagram), which is much more efficient than the equivalent RPC-based
program, which would send over sixty network datagrams. (The done1
and done2 messages are signals used to ensure the termination of the
program in the script that generates the tutorial.)
Remember that localization and scopes are independent in
JoCaml: an agent can perform exactly the same actions no matter
where it is actually positioned in the location tree. If we forget
about the difference between what is local and what is remote, our
program produces the same result as a plain program where the
locations boundaries and the migration have been removed:
# let sqr = Ns.lookup "square" vartype
#
# let def done1! () | done2! () = exit 0;
# and log (s,x) =
# let r = string_of_int x in
# print_string ("agent: "^s^" is "^r^"\n"); flush stdout;
# reply
# and quadric x = reply sqr(sqr x)
# and sum (s,n,f) = reply (if n = 0 then s else sum(s+f n, n-1, f)) ;;
#
# spawn { log ("sum ( i^2 , i= 1..10 )", sum (0,10,sqr)); done1 () } ;;
# spawn { log ("sum ( i^4 , i= 1..10 )", sum (0,10,quadric)); done2 () } ;;
Apart from the performances, both styles are equivalent. In
particular, we can first write and test programs, then refine them to
get a better tuning of locality.
Applets
The next example shows how to define ``applets''. An applet is a
program that is downloaded from a remote server, then used locally. As
compared to the previous examples, this migration operates the other
way round. Here, the applet defines a reference cell with destructive
reading:
# let def cell there =
# let def log s = print_string ("cell "^s^"\n"); flush stdout; reply in
#
# let loc applet
# def get () | some! x = log ("is empty"); none () | reply x
# and put x | none! () = log ("contains "^x); some x | reply
# do { go there; none () } in
#
# reply get, put
# ;;
#
# Ns.register "cell" cell vartype;
# Join.server ()
Our applet has two states: either none () or some s where s is a
string, and two methods get and put. Each time cell is called,
it creates a new applet in its own location. Thus, numerous
independent cells can be created and shipped to callers.
The name cell takes as argument the location (there) where the new
cell should reside. The relocation is controlled by the process
go there; none () that first performs the migration, then sends an
internal message to activate the cell. Besides, cell defines a log
function outside of the applet. The latter therefore remains on the
server and, when called from within the applet on the client machine,
keeps track of the usage of its cell. This is in contrast with
applets à la Java: the location migrates with its code, but also with
its communication capabilities unaffected.
We complement our example with a simplistic user that allocates and
uses a local cell:
# let cell = Ns.lookup "cell" vartype
#
# let loc user
# do {
# let get, (put : string -> unit) = cell user in
# put "world";
# put ("hello, "^get ());
# print_string (get ());
# exit 0;
# }
-> hello, world
On the server side, we get the trace:
-> cell contains world
-> cell is empty
-> cell contains hello, world
-> cell is empty
On the client side, there are no more go primitives in the applet
after its arrival, and this instance of the location name applet
does not appear anywhere. As a result, the contents of the applet can
be considered part of the host location, as if this contents had been
defined locally in the beginning. (Some other host location may still
move, but then it would carry the cell applet as a sublocation.)
Data-driven Migration
In the following examples, we consider ``large'' data structures
distributed between several machines.
We are interested in defining a general iterator that takes a
distributed data structure and applies a function to each of its basic
component.
Because of their relative sizes, it is better to have agents move from
site to site as they use the data, rather than move the data or, even
worse, access the data one piece at a time.
In practice, we use arrays as the building blocs of our data
structure; the basic functions to allocate arrays and fill them
(make), and to create a general function that applies a function
over every value in the array (iter) could be defined by the
following module table.ml:
# let def make (n,f) =
# let a = Array.create n 0 in
# for m = 0 to n - 1 do Array.set a m (f m) done;
# reply a
# ;;
# let def iter a =
# let def i f = Array.iter f a; reply in
# reply i
# ;;
We now need to ``glue'' together several arrays. More precisely, we
define an iterator that is consistent with locality: for each array,
we move an agent that contains the function to apply inside of the
array location, then we apply it, then we move to the next
array, ...
Now, each part of the data structure is a pair (host location,
iterator), and the mobility protocol consists in (1) migrate the
function to apply inside of the host location, (2) call the iterator
with this function as argument.
For instance, the module statistics.ml collects data using this
protocol and keeps its partial results as an internal message:
# let def collect! (loc_data, iter_data) =
#
# let loc agent
# def state! (n,s,s2) | f x = state (n+1,s+x,s2+x*x) | reply
# or state! (n,s,s2) | finished! () | result () = reply n,s,s2
# do {
# go loc_data;
# { state (0,0,0) | iter_data f; finished () }
# } in
#
# let n,s,s2 = result () in
# print_string "the size is "; print_int n;
# print_string ", the average is "; print_int (s/n);
# print_string ", the variance is "; print_int ((n*s2-s*s)/(n*n));
# print_newline ();
# exit 0;
Here is the definition of a basic data structure that consists of one array:
# let loc here ;;
#
# let iter =
# let def f x = reply 2*x+5 in
# Table.iter (Table.make (100,f)) ;;
#
# Ns.register "loc_a" here vartype;
# Ns.register "iter_a" (iter : (int -> unit) -> unit) vartype;
# Join.server ()
In order to build our structure in a compositional way, we use a
merge function. This function takes two (location, iterator) pairs
and returns a new such pair standing for the compound data structure
(module merge.ml):
# let def merge (loc1,iter1,loc2,iter2) =
# let loc mobile
# def iter f = go loc1; iter1 f; go loc2 ; iter2 f; reply in
# reply mobile,iter
Thereby, we can assemble data by repeatedly calling the merge
functions on miscellaneous chunks; this defines a structural tree
whose leaves are basic arrays. At the same time, merge sets up the
locations that are needed to traverse it locally.
In our example, the locations of arrays are stationary, but this is
not the case of the locations of compound structures: these locations
move to each of the subcomponents in turn before applying their
iterator to them.
For instance, if we consider the data structure built from three
arrays in the following program,
# let (itr_a : (int -> unit) -> unit) = Ns.lookup "iter_a" vartype
# let loc_a = Ns.lookup "loc_a" vartype
# let (itr_b : (int -> unit) -> unit) = Ns.lookup "iter_b" vartype
# let loc_b = Ns.lookup "loc_b" vartype
# let (itr_c : (int -> unit) -> unit) = Ns.lookup "iter_c" vartype
# let loc_c = Ns.lookup "loc_c" vartype
#
# open Merge
# let loc_ab, itr_ab = merge (loc_a ,itr_a ,loc_b, itr_b)
# let loc_ab_c,itr_ab_c = merge (loc_ab,itr_ab,loc_c, itr_c)
# ;;
#
# spawn {Statistics.collect (loc_ab_c, itr_ab_c)}
We obtain the results on the machine that runs the program:
-> the size is 230, the average is 77, the variance is 3109
The successive nestings of locations during the computation are:
iter on a ----> iter on b ----> iter on c
-----------------------------------------------------
a b c | a b c | a b c
ab --> | ab | ab
ab_c | ab_c --> | ab_c
agent | agent | agent
Notice that migrations are delegated at each level of nesting: as we
apply a function f, we put it in its own location (agent)
and migrate it inside the data structure location (ab_c), which is a
leaf of the location tree. Then, we repeatedly apply the function and
migrate some location whose branch contains the location of f. At
any stage, the branch of the location tree that contains the leaf
agent is the inverse of a branch in the structural tree, and the
subpart locations are the successive superlocations of the compound
location.
2.2.3 Mobile objects
In JoCaml, all objects are also locations. Thus they can migrate to a
location or be the target of a migration. Object methods are also
channels, thus a method call will be run on the machine where the
object is, not necessarily the same machine as where the call is made.
An example of a mobile object
For instance we can create an object that has methods to make it
migrate, and print some information to see where the object is:
# let home = Join.here
# ;;
#
# class migrant () =
# method hello () =
# print_string "hello\n"
# method go_home () =
# Join.go home;
# print_string "I'm home\n"
# method go_there l =
# Join.go l;
# print_string "I'm not home\n"
# end
# ;;
#
# let obj = new migrant () in
# let def finished! () = { exit 0; } in
# Ns.user := "mobile_object";
# Ns.register "migrant" obj vartype;
# Ns.register "finished" finished vartype;
# Join.server ()
To use this object, we get it from the name server (it could also be
sent in a message). In both cases, we only get a reference to the
object, not the object itself.
# let here = Join.here
# ;;
#
# Ns.user := "mobile_object"
# ;;
#
# let (obj : < hello : unit -> unit;
# go_home : unit -> unit;
# go_there : Join.location -> unit > ) =
# Ns.lookup "migrant" vartype
# ;;
#
# let (finished : << unit >> ) = Ns.lookup "finished" vartype
# ;;
#
# obj#hello ();
# obj#go_there here;
# obj#go_home ();
# spawn{finished ()};
# exit 0
The output on the first machine is:
-> hello
-> I'm home
Even though the ``hello'' call was made from the second machine, the
message is printed on the machine where the object is.
The output on the second machine is:
-> I'm not home
It is clear here that the object has migrated to the second machine,
before printing it is there.
Objects as migration targets
Since objects can be considered as locations, it is legitimate to have
an objct as a migration target. However, the ``Join.go'' primitive
expects a location:
# let f a = Join.go a
# ;;
val f : Join.location -> unit
This is why there is another primitive to migrate to an object, called
``Join.goo'':
# let g a = Join.goo a
# ;;
val g : < > -> unit
There is however a small issue because of explicit subtyping and
objects. Consider for instance the following object:
# class foo () =
# method bar () = print_string "hello\n"
# end
# ;;
#
# let obj = new foo()
# ;;
class foo (unit) = method bar : unit -> unit end
val obj : foo
Let us now create a location that migrates to this object:
# let loc test do {
# Join.goo obj;
# }
# ;;
File "ex7.ml", line 4, characters 11-14:
This expression has type Ex6.foo = < bar : unit -> unit >
but is here used with type < >
There is a typing problem, because subtyping between objects is
explicit in Objective Caml. Thus we need to constraint the type of the
object the location is migrating to:
# let loc test do {
# Join.goo (obj :> < >);
# }
# ;;
val test : Join.location
Mobility and object creation
For the time being, a missing feature in JoCaml is the migration of
the capability to create objects. For instance, consider the following
code:
# let home = Join.here
# ;;
#
# class migrant () =
# method hello () =
# print_string "hello\n"
# method go_home () =
# Join.go home;
# print_string "I'm home\n"
# method go_there l =
# Join.go l;
# print_string "I'm not home\n"
# end
# ;;
#
# let f () = new migrant ()
# ;;
#
# Ns.register "f" f vartype
# ;;
val home : Join.location
class migrant (unit) =
method go_home : unit -> unit
method go_there : Join.location -> unit
method hello : unit -> unit
end
val f : unit -> migrant
File "ex9.ml", line 19, characters 18-25:
Warning: VARTYPE replaced by type
( unit -> <go_home:( unit -> unit) ; go_there:( Join.location -> unit) ; hello:( unit -> unit) ; >) metatype
If another runtime looks the function ``f'' up and uses it (or gets
it by other means, as in a message), the object returned may not
work. Similarly, if a location or an object containing a function that
creates some object migrates to another runtime, the object returned
by the call to such a function may not be correct. This is a
limitation of the current implementation.
2.3 Termination, Failures and Failure Recovery
As a matter of fact, some parts of a distributed computation may fail
(e.g., because a machine is abruptly switched off). The simplest
solution would be to abort the whole computation whenever this is
detected, but this is not realistic in case numerous machines are
involved. Rather, we would like our programs to detect such failures
and take adequate measures, such as cleanly report the problem, abort
related parts of the computation, or make another attempt on a
different machine. To this end, JoCaml provides an abstract model of
failure and failure detection expressed in terms of locations:
-
a location can run a primitive process halt () that, when
executed, will atomically halts every process inside of this
location (and recursively every sublocation);
- a location can detect that another location with name there
has halted, using the primitive call fail there; P. When the
process P is triggered, it is guaranteed that location there is
halted.
In the current implementation, halting is detected only when the halt () primitive is issued in the same runtime, or when the runtime
containing the location actually stops. Thus, simply issuing a halt () will not trigger a matching fail in another runtime. This fail
will be triggered when the runtime hosting the watched location
terminates or becomes unreachable.
The semantics of JoCaml guarantees that this is the only reason why
parts of the computation may stop. Now, an executable running the
program P on a fallible machine can be thought as a system-defined
location let loc machine do { P | {crash (); exit 0;} } where
crash may return at any time, with exit 0 terminating the runtime.
In the model, a machine can in particular remotely detect that another
machine has stopped, once it knows the name of a location there. In
practice, it is difficult to provide reliable failure detection, as
this requires further assumptions on the network.
In the prototype implementation, error detection is only partially
implemented, hence there no guarantee that when a runtime terminates
abnormally, the failure of its locations is detected. (Still, detected
failures provide the expected negative guarantees: the failed location
is not visible anymore to any part of the computation.)
Since locations fail only as a whole, the programmer can define
suitable units of failure, and even use the halt/fail primitives
to control the computation. In the current implementation, the control
is not this fine-grained when spanning over several runtimes, and
require to use exit 0/fail primitives. Notice that no silent
recovery mechanism is provided; the programmer must figure out what to
do in case of problems.
2.3.1 Basic examples
To begin with, we use simple examples that attempt to use a port name
say inside of a fallible location to get messages printed. Because
these calls may never return in case the locations stopped, we spawn
them instead of waiting for their completion.
In this first example, location agent can stop at any time. After
the failure occurred, we print some report message. We know for sure
that the first say can only print something before the failure
report, and that the second say cannot print anything.
# let loc agent
# def say s = print_string s; reply
# do { halt (); } ;;
#
# spawn { say "it may work before.\n"; }
# ;;
# spawn { fail agent;
# print_string "the location stopped\n";
# say "it never works after\n"; }
# ;;
val agent : Join.location
val say : string -> unit
-> the location stopped
The following example is more tricky. First, the agent does not halt
itself; however, it migrates within a location that stops and this is
a deadly move. Second, the halt () process can be triggered only from
the outside by a normal message kill (). Thus we know that the first
say always prints its message. Finally, as there is no halt in
location agent, it can only stop because location fallible halted,
so that fail agent; implies that fallible also stopped.
# let loc fallible
# def kill! () = halt ();
# ;;
#
# let loc agent
# def say s = print_string s; reply
# do { go fallible; }
# ;;
#
# spawn { say "it always works.\n"; kill () }
# ;;
# spawn { say "it may work before.\n"; }
# ;;
# spawn { fail agent;
# print_string "both locations stopped.\n";
# say "it never works after.\n"; }
# ;;
val fallible : Join.location
val kill : <<unit>>
val agent : Join.location
val say : string -> unit
-> it always works.
-> it may work before.
-> both locations stopped.
2.3.2 Watching for Failures
We now move on to some more realistic use of failure-detection; we
first consider a function that encapsulates a session with mobility
and potential failures.
There is usually no need to halt locations that completed their task
explicitly (the garbage-collector should take care of them). However,
in some case we would like to be sure that no immigrant location is
still running locally.
Let us assume that job is a remote function within location there
that may create mobile sublocations and migrate them to the caller's
site. To this end, the caller should supply a host location, as in
the previous examples. How can we make sure that job is not using
this location to run other agents after the call completes ? This is
handled using a new temporary location box for each call, and
halting it once the function call has completed.
# let def safe! (job,arg,success,failure) =
#
# let loc box
# def kill! () = halt();
# and start () = reply job (box,arg) in
#
# let def finished! x | running! () = finished x | kill()
# or finished! x | failed! () = success x
# or running! () | failed! () = failure () in
#
# finished (start ()) | running () | fail box; failed ()
# ;;
val safe : <<((Join.location * 'a -> 'b) * 'a * <<'b>> * <<unit>>)>>
Our supervising protocol either returns a result on success, or a
signal on failure. In either case, the message guarantees that no
alien computation may take place afterward.
Initially, there is a message running (), and the control definition
waits for either some result on finished x or some failure detection on
failed (). Whatever its definition is, the job process can create
and move locations inside of the box, and eventually return some value
to the start process within the box. Once this occurs, done
forwards the reply to the control process, and the first join-pattern
is triggered. In this case, the running () message is consumed and
eventually replaced by a failed () message (once the kill () message
is handled, the box gets closed, and the fail guard in the control
process is triggered, releasing a message on failed).
At this stage, we know for sure that no binding or computation
introduced by job remains on the caller's machine, and we can return
the value as if a plain RPC had occurred.
This ``wrapper'' is quite general. Once a location-passing convention
is chosen, the safe function does not depend on the actual
computation performed by job (its arguments, its results, and even
the way it uses locations are parametric here).
We could refine this example further to transform unduly long calls to
job into failure (by sending a failed () message after an external
timeout), to give some more control to the caller (adding an abort
message),...
2.3.3 Recovering from partial failures
We finally come back to distributed loops. We give an example of a
program that uses the CPU of whatever machine is available to compute
the sum 1 + 2 + ... + 999. Basically, we only assume that the
iteration could be computed in any order, we cut the loop in small
chunks, and we distribute a chunk to every available machine. The
program takes care of potential failure. When a machine failes, any
chunk that was distributed to that the machine is taken back and given
to another machine.
# let size = 1000
# let chunk = 200
#
# let def join! (name,there) =
# let loc mobile
# do {
# let def start! (i,finished) =
# let def loop! (u,s) = if u<(i+1)*chunk then loop (u+1,s+u) else finished s in
# loop (i*chunk,0) in
# go there;
# worker (name,mobile,start) } in
# print_string (name^" joins the party\n"); flush stdout;
#
# and job! i | worker! (name,there,start) =
# print_string (name^","^string_of_int(i*chunk)^"\n"); flush stdout;
# let def once! () | finished! s = add s | worker (name,there,start)
# or once! () | failed! () = print_string (name^" went down\n"); job i in
# once () | start (i,finished) | fail there;failed()
#
# and result! (n,s) | add! ds =
# let s' = s + ds in
# if n > 0 then result (n-1,s')
# else {print_string("The sum is "^string_of_int s'^"\n"); exit 0;} ;;
#
# spawn { result (size/chunk-1,0)
# | let def jobs! n = job n | if n>0 then jobs (n-1) in jobs (size/chunk-1) };;
#
# Ns.register "join" join vartype ; Join.server ()
The actual work is performed in the mobile locations, once they
reach the locations there provided by joining machines. Messages
job i partition the work to be done. Each remote computation
concludes either with a finised s or failed () message; in the latter
case, the aborted job is re-issued. The resulting sum is accumulated
as a message on result.
The client is not specific to our computation at all; indeed, its only
contribution is a location where others may place their sublocations.
# let loc worker do {
# let join = Ns.lookup "join" vartype in
# join ("reliable",worker) }
#
In the following, we explicitly model an unreliable task force, as a
variant of the previous program that also has a ``time bomb'' which
eventually stops the joining location:
# let delay = int_of_string(Sys.getenv "DELAY")
# let name = Sys.getenv "NAME"
#
# let loc unreliable_worker do {
# let join = Ns.lookup "join" vartype in
# let def tictac! n = if n = 0 then {exit 0;} else tictac (n-1) in
# join (name,unreliable_worker) | tictac delay }
We start three of them with various parameters, for instance with the
command lines:
DELAY=1000 ; NAME=fallible; ./work.out &
DELAY=2000 ; NAME=dubious ; ./work.out &
DELAY=30000 ; NAME=reliable; ./work.out &
./distributed_loop.out
and we observe the output of the last command (the main loop program):
Chapter 3 The JoCaml language and system
3.1 Jocaml Tools
Most jocaml tools are Objective-Caml tools with an initial
``j''. For example, ocamlc is called jocamlc, ocaml
is called jocaml, ocamlrun is called jocamlrun, and
so on. There is no compatibility between Objective-Caml compiled files
and Jocaml compiled files, even with a same source file.
In this section, we first describe new options for old Objective-Caml
tools, and then new tools designed for Join programmers.
New options:
-
-join
Link the executable with a ``thread-safe'' version of the standard
library, needed for join programs.
- -nodyncheck
Modules containing location definitions are dynamically linked
during execution. Modules and primitives dependence are resolved
during the first execution of each dependencies. However, by default,
these dependencies are also checked by the compiler before linking.
This option prevent the compiler from checking these dependencies.
This is useful if you create a location with a dependence to a
module or primitive which is not immediately linked, but which will
be available where the location will migrate.
- -make_vm
This option enables you to create new runtime (as jocamlrun)
to execute bytecode programs using non standard C and bytecode
libraries, without using the -custom option. Executables for these
new runtimes will only contain their own bytecode, while finding
libraries bytecode and external primitives in the new runtime. The
name of the runtime is specified with the -o runtime
option. The compiler creates two files, runtime and runtime.cmc. runtime is the runtime executable and
runtime.cmc is used by the compiler to create programs
executing on runtime.
- -use_vm absolute-path/runtime
This option is used to create programs executing on runtime.
The associated file runtime.cmc must be available in the
compiler search path (the standard directory, the current one and
those specified with the -I option). The created program will
always search its runtime at the absolute position fixed by the path
specified during the compilation. However, the runtime may not be at
this position during the compilation.
- -cca cclib-option
Creates a bytecode library (same as the -a option), and
specifies an argument which will be passed to the C-linker when
linking a program with this library. For example, the graphics
library is created with:
jocamlc -cca -lX11 -lgraphics -o graphics.cma
graphics.cmo
- -noautolink
This option prevents the compiler from using arguments given with
the -cca option. It is most useful when, for some reasons,
these arguments must be temporary overridden.
- -l library
Equivalent to ``library.cma''
Mixing native code and bytecode
As an experimental feature, the Jocaml system enables you to mix
native code and bytecode in a same executable. This feature is useful
since join constructs are only available using bytecode, whereas good
performances are achieved using native code (As an example, see
mandel.opt in the distribution).
In a mixed runtime, native code program is started first. At the end of the
native code, the bytecode program is executed. The bytecode may call
native functions through dedicated ``call-back'' modules. These modules
are Objective-Caml source files with a ``.mlx'' suffix, which, when
compiled with jocamlopt, generate two object files, one in
native code, with real functions, and one in bytecode, which functions
are only wrappers to the native code functions.
New options:
-
-join
Same as for jocamlc.
- -make_vm
Same as for jocamlc. However, the runtime may
also contain native code.
- -byte source
ml Compile file source.ml in bytecode instead of native code.
- -cca cclib-option
Same as for jocamlc.
- -l library
Equivalent to ``library.cmxa''
- -lbyte library
Equivalent to ``library.cma''
3.1.3 jocrun, joc, jocl and joctop
The jocaml system includes three runtimes: jocamlrun, jocrun and jogrun. jocrun is created using the -make_vm option, and contains the thread-safe standard library,
the Unix library, the bytecode thread library and the join
library. Thus, it can be used to execute most standard Join
programs. To compile programs using this runtime, you can use the
command jocamlc -use_vm jocaml-bin-directory/jocrun other-args, or use the dedicated program joc: joc other-args.
If you need to link your program with other libraries which are not
contained in the jocrun runtime, you can use the command jocamlc -join -l unix -l join other-args, or the dedicated
program jocl: jocl other-args.
Both joc and jocl programs add a -linkall option to
include all modules in the program created. This is safer, since join
programs may receive locations from other programs which have more
dependencies than the original program.
The jocaml system also includes two toplevels: jocaml and joctop. jocaml is the Objective-Caml toplevel, without join
constructs, whereas joctop is the Objective-Caml toplevel
compiled on the jocrun runtime, and thus extended with join
constructs (join definitions and locations).
3.1.4 jocns, jocclient
jocns is a name server used by the Ns module. It can be
started directly. It is not started automatically by a program calling
Ns.register or Ns.lookup. The default port is 20001, but can be
modified with the JNSPORT environment variable. The JNSNAME
environment variable can be used to specify the name server machine
if it is not the localhost. The user name is appended to the resource
name that is registered or looked up. If no user name is specified
(for instance in the environment variable USER), the default
user name pub is used. To change the user name in a jocaml
program, simply set the Ns.user string reference:
Ns.user := "toto"
jocclient is a generic client for JoCaml applications: It
queries the default name server (jocns) for a generic channel
name ``newclient'' of type Join.location * string * string list
-> unit, and applies this channel to its location, the user login
name, and its list of arguments. The Genserver module enables
programmers to easily create servers for generic clients. The name
server host and port can be set either by environment variables
(JNSNAME and JNSPORT) or by options (-ns_host and -ns_port). The queried name can be modified by the -ns_name
option and the user name used by -ns_user.
3.1.5 jogrun, jogc and jogclient
jogrun is a specialized runtime including the Join library and
the Graphics library. jogc is the equivalent of joc for
building programs using this runtime, and jogclient is a generic
client built on this runtime (thus, this client must be used to
receive locations needing the Graphics library). It can be used for most
examples of the distribution (Tron, Bomberman, Bataille, Pong, Mandel,...).
3.2 Syntax and Examples
Expressions |
declaration |
::= |
Ocaml-declaration |
|
|
let def automata-definition |
|
|
let loc join-locations |
automata-definition |
::= |
automaton [and automata-definition ] |
automaton |
::= |
join-pattern = process [or automaton ] |
join-pattern |
::= |
channel-decl [ | join-pattern ] |
channel-decl |
::= |
synchronous-name arguments |
|
|
asynchronous-name arguments |
synchronous-name |
::= |
Ocaml-lower-ident |
asynchronous-name |
::= |
Ocaml-lower-lident ! |
arguments |
::= |
Ocaml-pattern |
process |
::= |
|
|
|
final-process |
|
|
final-process | process |
|
|
declaration in process |
|
|
if expression then final-process |
|
|
expression ; process |
final-process |
::= |
reply [expression] [to Ocaml-lower-lident] |
|
|
asynchronous-send |
|
|
if expression then final-process else final-process |
|
|
match expression with |
|
|
Ocaml-pattern -> final-process |
|
|
[ | Ocaml-pattern -> final-process
[...]] |
|
|
loc join-locations |
|
|
{ process } |
expression |
::= |
Ocaml-expression |
|
|
spawn { process } |
asynchronous-send |
::= |
expression expression |
join-locations |
::= |
location-definition [and join-locations] |
location-definition |
::= |
location-name[def automata-definition] |
|
|
do final-process |
Types: |
channel-type |
::= |
function-type |
|
|
asynchronous-channel |
asynchronous-channel |
::= |
<< core-type >> |
3.2.1 Join definitions
Join channels are defined with the let def construct.
For example, the definition of a memory cell flows (this lines have
been written in the joctop program):
# let def new_cell s =
let def get () | state! s = reply s | state s
or set s | state! _ = state s | reply ()
in
state s | reply (get,set)
;;
val new_cell : 'a -> (unit -> 'a) * ('a -> unit) = chan
In this definition, we create a synchronous name new_cell,
with one parameter. When this name is called, it creates three names,
two synchronous (get and set) and one asynchronous (note
the ! notation) (state). Notice that all names are lowercase standard Objective-Caml
identifiers. Multiple join patterns in the same definition are
introduced with the or keyword. Continuations for reply
constructs are not specified, since there is only one synchronous name
per join pattern in this example. With join patterns containing
several synchronous names, continuations should have been specified
with the reply value to name construct.
The type of new_cell is a polymorphic synchronous channel,
returning a tuple of two synchronous channels. Asynchronous calls use
the functional application construct (state s), but only inside
a process construct. Indeed, state s; would raise a type error,
since state is an asynchronous channel, used in a functional
application inside an expression (the context before a semi colon (;)
is always an expression, and not a process). In the same way, reply constructs and parallel compositions (|) can only appear
in processes. However, the spawn { process }
construct can be used to fork a process from within an expression.
# let (get_a,set_a) = new_cell "hello";;
val get_a : unit -> string = <fun>
val set_a : string -> unit = <fun>
Finally:
# let _ =
print_string (get_a());
print_char ' ';
set_a "world";
print_string (get_a());
print_newline ()
;;
hello world
- : unit = ()
Other more complicated examples can be found in the examples/
directory of the distribution.
3.2.2 Location definitions and migration
Here is the definition of a location mobile, with two included
join definitions, one with one synchronous name go, and another
close to the previous memory cell:
# let loc mobile def
goto location = go location; set_position location; reply
and set_position location | position! _ = position location | reply
or get_position () | position! location = position location | reply location
do { position here }
;;
val mobile : Join.location = <abstr>
val goto : Join.location -> unit = chan
val position : <<Join.location>> = chan
val set_position : Join.location -> unit = chan
val get_position : unit -> Join.location = chan
First, we can notice that the type of locations is Join.location, ie location from the Join module. The
Join module packages basic primitives for the join-calculus
language. Here, we also use the Join.go function, which is used
for location migration, and Join.here, which is the predefined
toplevel location of the program.
Second, the join-calculus init process end construct
is replaced by the struct location-items end
construct. This construct enables you to define local values inside
the location, which are not available from outside the location.
During location migration, join automata (definitions) and
Objective-Caml values behave differently. Join automata are always
unique. Consequently, when they migrate with their location to another
runtime, local pointers to these automata from the old runtime are
replaced by remote pointers to the new runtime. On the contrary,
Objective-Caml values are always copied. For Objective-caml primitives
and functions, this means that next call to one of them will yield to
an attempt to find the function code or the primitive C code locally
in the new runtime. This will success if the module where the function
was defined is contained in the new runtime, or if the module code has
been copied during the migration (indeed, the smallest unit of
bytecode is the module, thus, all the module code migrate when the
location code defined in this module migrate). This will abort and an
Reloc.ModuleNotAvailable exception will be raised if the
function code is not present. However, since the default runtime
contains the standard library, the Unix module, the Thread
module and the Join module, most needed functions are always
available, and other locations dependencies should be defined in the
same module as the locations.
3.2.3 Name server
The join library contains a Ns module used to make requests to
the jocns name server. This module contains two functions:
val lookup : string -> 'a metatype -> 'a
val register : string -> 'b -> 'b metatype -> unit
val user : string ref
Ns.lookup is used to lookup the value associated with a
particular name in the name server, whereas Ns.register is used
to associate a value to a name in the name server. Ns.user is
appended to every name for a lookup or a register, and is equal by
default either to the environment variable USER, or to pub
if this variable is not set. Since it is a reference, its value can be
modified to access a particular resource registered by another user.
If no name server is running during these requests, Ns.lookup
and Ns.register raise an exception. The name server address is
hostname:20001 by default, and can be overridden with JNSNAME and JNSPORT environment variables.
Ns.lookup and Ns.register require an extra argument of
type 'a metatype which is used to check the types of the requested
value with the real type of the value in the name server. This extra
argument is set with the vartype keyword, and a type constraint
on the 'a metatype parameter type.
For example,
# Ns.register "mobile" mobile (vartype:Join.location metatype);;
will register the previously defined location mobile with the
name mobile and the type Join.location.
3.2.4 Distributed Garbage collection
A distributed garbage collector is implemented to collect unreachable
objects. It is based on the SSPC garbage collector. This garbage
collector can only collect acyclic distributed garbage. Distributed
garbage collection is triggered by major garbage collections.
To suppress distributed garbage collection, the environment variable
JNODGC can be set before starting the runtime. To avoid
distributed garbage collections only during critical periods, use the
two functions Join.dgc_stop and Join.dgc_restart ().
The Join module contains two functions to handle failures:
val halt : unit -> 'a
val fail : location -> unit
Join.halt is used to halt a location. Currently, it does not
work for the toplevel location. Join.fail is used to detect
location failures. Indeed, a call to Join.fail only returns when
the location in argument has failed. Currently, only locations halted
with Join.halt are detected. Moreover, this detection is
immediate for locations in the same runtime, and lazily propagated by
the distributed garbage collector for different runtimes. As a
consequence, only active runtimes may detect location failures.
3.2.6 Distributed objects
The Objective-Caml objects system has been extended with distributed
objects: Objective-Caml objects are also locations. Thus, they can
migrate using the Join.go function with all automata, locations
and objects they contain. More precisely, any object or location that
migrated to this object will be carried along, as well as any channel
definition, location, or object created inside the migrating object.
The function Join.goo can be used to migrate a location into an
object, or an object into another object.
Moreover, all methods calls are implemented as a new process started
in the object (as location). Thus, running method threads migrate with
the object.
3.2.7 Exceptions
Exceptions raised in automata processes are propagated back to all
synchronous names which will not receive a reply owing to the raised
exception:
#let def a() | b() | c() = reply to a |
raise Not_found; { reply to b | reply to c}
;;
val c : unit -> unit = chan
val b : unit -> unit = chan
val a : unit -> unit = chan
In this example, the call to a will receive a reply, whereas the
calls to b and c will raise the Not_found exception.
3.2.8 Join module
Most join-calculus primitives are available from the Join
module:
type location
type space_id
val new_port : int -> unit
val server : unit -> unit
val here : location
val go : location -> unit
val goo : < > -> unit
val halt : unit -> 't
val fail : location -> unit
val getRemoteService : space_id -> string -> 'n metatype -> 'n
val getLocalService : string -> 'o metatype -> 'o
val setLocalService : string -> 'p -> 'p metatype -> unit
val getSpaceService : Unix.inet_addr -> int -> string -> 'q metatype -> 'q
val dgc_stop : unit -> unit
val dgc_restart : unit -> unit
This document was translated from LATEX by
HEVEA.