The goal of this paper is to address the selection of
efficient checkpoint interval which reduces the total overhead
cost due to the checkpointing and restarting of the applications in
a distributed system environment.
Coordinated checkpointing rollback recovery protocol is used for
making the application programs fault tolerant on a stand-alone
system under no load conditions using BLCR and OPEN MPI at
system level.
We have presented an experimental study in which we have
determined the optimum checkpoint interval and we have used it
to compare the performance of coordinated checkpointing
protocol using two types of checkpointing intervals namely fixed
and incremental checkpoint intervals. We measured the
checkpoint cost, rollback cost and total cost of overheads caused
by the above two methods of checkpointing intervals
Failures are simulated using the Poisson distribution with one
failure per hour and the inter arrival time between the failures
follow exponential distribution.
We have observed from the results that, rollback overhead and
total cost of overheads due to checkpointing the application are
very high in incremental checkpoint interval method than in
fixed checkpoint interval method.
Hence, we conclude that fixed checkpointing interval method
is more efficient as it reduces the rollback overhead and also total
cost of overheads considerably.