We have collected/mirrored and put up for anon FTP some GB of tech reports, papers, etc about distributed systems in general, with emphasis and load balancing, scheduling, mapping, checkpointing.
Slightly outdated information about our research in load
balancing and process migration (P-Beam) as well as the list of
published papers can be found in our load
balancing project home page.
From: shap@cobra.cis.upenn.edu (Jonathan Shapiro) Newsgroups: comp.os.research Subject: Re: Checkpointing Date: 16 Dec 1995 20:15:02 GMT Organization: University of Pennsylvania Message-ID: <4av9c6$mi9@darkstar.UCSC.EDU> References: <4anocr$3dl@darkstar.ucsc.edu> <4aq11q$dmo@darkstar.UCSC.EDU> You might also want to check out the KeyKOS home page: http://www.cis.upenn.edu/~KeyKOS They have an extremely light-weight global checkpoint mechanism. A paper describing it and various others on the system can be found from that home page.
From: pstephan+@RUBIX.MC.CS.CMU.EDU (Peter Stephan) Newsgroups: comp.parallel.pvm Subject: ANNOUNCE: Release of Dome (version 1.0) Date: 23 May 1996 17:08:58 GMT Organization: Carnegie Mellon University Message-ID: <4o263a$foj@cantaloupe.srv.cs.cmu.edu> ------------------------------------------------------------------- Announcing the release of Dome version 1.0 (Distributed object migration environment) ------------------------------------------------------------------- Overview -------- Dome, the Distributed object migration environment, provides a C++ library of distributed objects for parallel programming. These objects perform dynamic load balancing and support fault tolerance. Programmers using Dome can, with modest effort, write parallel programs that are automatically distributed over a heterogeneous network, dynamically load balanced as the program runs, and able to survive compute node and network failures. Thus, Dome provides a means for writing simple simple and efficient distributed programs. The focus of the Dome system is to support parallel programming over networks of workstations. Dome's load balancing and fault tolerance play an integral role in producing efficient and survivable parallel programs in such an environment. Dome uses a single program multiple data (SPMD) model to perform the parallelization of programs which use the Dome library, and Dome uses PVM to provide its underlying process control and message passing. The Dome system is available in a package via anonymous ftp. The package includes the Dome source code, makefiles, related build scripts, documentation, and example programs. To obtain the Dome package login via anonymous ftp to ftp.cs.cmu.edu. The directory project/dome will contain the file dome1.0.tar.Z and a README file. The dome1.0.tar.Z file contains the Dome system in compressed, tar format. More information on the Dome project is available at
http://www.cs.cmu.edu/~Dome The authors of Dome can be contacted at dome-help@cs.cmu.edu. ------------------------------------------------------------------- * Dome version 1.0: Distributed object migration environment * * Carnegie Mellon University * * Authors: J. Arabe, A. Beguelin, B. Lowekamp, E. Seligman, * * S. Simon, M. Starkey, P. Stephan, and K. Walker * * (C) 1996 All Rights Reserved * -------------------------------------------------------------------
One of the Papers is:
James S. Plank and Micah Beck and Gerry Kingsley and Kai Li:
Libckpt: Transparent Checkpointing under Unix.
In Usenix Conference Proceedings, New Orleans, January 1995.
plank.html
"Checkpointing is a simple technique for rollback recovery: the state
of an executing program is periodically saved to a disk file from
which it can be recovered after a failure. While recent research has
developed a collection of powerful techniques for minimizing the
overhead of writing checkpoint files, checkpointing remains
unavailable to most application developers. In this paper we describe
libckpt, a portable checkpointing tool for Unix that implements all
applicable performance optimizations which are reported in the
literature. While libckpt can be used in a mode which is almost
totally transparent to the programmer, it also supports the
incorporation of user directives into the creation of checkpoints.
This ``user-directed'' checkpointing is an innovation which is unique
to our work."
Resource Management
Also available as Technical Report TR94-1468, Department of Computer Science, Cornell University, USA, November 1994.
Newsgroups: comp.parallel From: itf@mcs.anl.gov (Ian Foster) Subject: Mirror Sites for DESIGNING & BUILDING PARALLEL PROGRAMS Message-ID: <81687650327264@dalek.mcs.anl.gov> Organization: Math and Computer Science, Argonne National Laboratory Date: Mon, 20 Nov 1995 14:08:23 GMTMany of you have seen the text, "Designing and Building Parallel Programs", available both from Addison-Wesley and (thanks to A-W's enlightened publishing policies) on the Web I'm glad to announce that the online version is now also available at two mirror sites: http://www.cs.rdg.ac.uk/dbpp/ http://www.qpsf.edu.au/mirrors/dbpp/ Thanks a lot to Jonathan Chin and Paul Pritchard for making these available. Additional mirror sites will probably be added in the future; these will be listed at: http://www.mcs.anl.gov/dbpp/mirror_sites.html Happy reading! Ian Foster. Designing and Building Parallel Programs