Contents:
% % THE \simulan LOGO IS DEFINED HERE. % \def\simulan{{\rm s\kern-.06em\raise-.5ex\hbox{i}\kern-.1em\raise-.1ex \hbox{m}\raise-.3ex\hbox{u}\kern-.10emL\kern-.1667em\lower-.6ex \hbox{a}\kern-.10emn}}
%% the \PBeam Logo is defined here \def\PBeam{{\sc\kern.15emP\kern-.9em\raise.125ex\hbox{$\leftarrow$}\sc\kern-.25emB\sc\kern-.1eme\kern-.1ema\kern-.1emm}}
You might also want look at the companion thesis about modelling
with simuLan:
[Schmida91]
Ralf Schmidt-Dannert:
\simulan: Modellierung und Simulation lokaler Netzwerke.
Diplomarbeit, TU Braunschweig, 1991 (in german).
However, they exhibit problems with unbalanced load, conflicts with interactive users and increased failure probability. Process migration and checkpoint/restart are adequate means to solve these problems. Several of such mechanisms are described in the literature, but they all have certain weaknesses and impose restrictions on the applications they can handle.
In this thesis, a new concept for an application transparent migration and checkpointing mechanism is developed, wich overcomes some substantial of those restrictions. It supports migration and fault transparency for parallel and distributed applications, i.e. groups of communicating processes, on clusters of workstations. Neither the system kernel nor the application programs need to be modified, and applications are not required to be written for a specific runtime environment. However, for better performance the applications can be linked with a modified system library. From this concept, the architecture of the example implementation is derived, and first measured results and experiences with the implementation are reported.
A BiBTeX-File, from my Dissertation.
The system uses a global virtual name space to provide migration and rollback transparency in user space for distributed groups of processes on workstations. Applications always use the same virtual names for the operating system objects, independent of their current real location. The system calls are interposed and their parameters translated between the name spaces. Unlike other migration mechanisms, does not require the applications to be written for a specific programming model or communication library.
The first approach to execute applications in the virtual name space was to link the programs with a modified system library. Now, in this paper we describe design and implementation of a separate system call interposition process that accesses the application via the debugging interface. The main advantage of this approach is that it can handle even unmodified (e.g. commercially bought) application programs. We compare measured performance figures with previous similar approaches and the modified system library.
Some amount of data is kept distributed or replicated on some or all nodes of a distributed system. At every moment, each instance that accesses this data must see the same information. Updates must be delivered ordered, reliably, and efficiently.
Our prototype software implements ordered, reliable multicasts on top of the unreliable IP broad- or multicast with three different methods (Master-Slave, Token Exchange on Demand, Totem Single Ring). This paper shows measurement results for the efficiency and scalability of the three methods in different topologies.
The measurements confirm earlier analytical results. Totem behaves well in large networks with many concurrent senders. The overhead of Token on Demand and of the Master-Slave algorithm is almost the same. Also we could not find an indication for the often-read opinion that the Master-Slave approach scales worse because of the central bottleneck.