Abstract:
In this paper we introduce a new algorithm for American Monte Carlo that can be used either for American-style options, callable structured products or for computing counterparty credit risk (e.g. CVA or PFE computation). Leveraging least squares regressions, the main novel feature of our algorithm is that it can be fully parallelized. Moreover, there is no need to store the paths and the payoff computation can be done forwards: this allows to price structured products with complex path and exercise dependencies. The key idea of our algorithm is to split the set of paths in several subsets which are used iteratively. We give the convergence rate of the algorithm. We illustrate our method on an American put option and compare the results with the Longstaff-Schwartz algorithm.

Abstract:
An efficient approach of measuring the absolute free energy in parallel tempering Monte Carlo using the exponential averaging method is discussed and the results are compared with those of population annealing Monte Carlo using the three-dimensional Edwards-Anderson Ising spin glass model as benchmark tests. Numerical results show that parallel tempering, even though uses a much less number of temperatures than population annealing, can nevertheless equally efficiently measure the absolute free energy by simulating each temperature for longer times.

Abstract:
Markov chain Monte Carlo is an inherently serial algorithm. Although likelihood calculations for individual steps can sometimes be parallelized, the serial evolution of the process is widely viewed as incompatible with parallelization, offering no speedup for samplers which require large numbers of iterations to converge to equilibrium. We provide a methodology for parallelizing Markov chain Monte Carlo across large numbers of independent, asynchronous processors. Our approach uses a partitioning and weight estimation scheme to combine independent simulations run on separate processors into rigorous Monte Carlo estimates. The method is originally motivated by sampling multimodal target distributions, where we see an exponential speedup in running time. However we show that the approach is general-purpose and applicable to all Markov chain Monte Carlo simulations, and demonstrate speedups proportional to the number of available processors on slowly mixing chains with unimodal target distributions. The approach is simple and easy to implement, and suggests additional directions for further research.

Abstract:
The standard kinetic Monte Carlo algorithm is an extremely efficient method to carry out serial simulations of dynamical processes such as thin-film growth. However, in some cases it is necessary to study systems over extended time and length scales, and therefore a parallel algorithm is desired. Here we describe an efficient, semi-rigorous synchronous sublattice algorithm for parallel kinetic Monte Carlo simulations. The accuracy and parallel efficiency are studied as a function of diffusion rate, processor size, and number of processors for a variety of simple models of epitaxial growth. The effects of fluctuations on the parallel efficiency are also studied. Since only local communications are required, linear scaling behavior is observed, e.g. the parallel efficiency is independent of the number of processors for fixed processor size.

Abstract:
Sequential Monte Carlo is a family of algorithms for sampling from a sequence of distributions. Some of these algorithms, such as particle filters, are widely used in the physics and signal processing researches. More recent developments have established their application in more general inference problems such as Bayesian modeling. These algorithms have attracted considerable attentions in recent years as they admit natural and scalable parallelizations. However, these algorithms are perceived to be difficult to implement. In addition, parallel programming is often unfamiliar to many researchers though conceptually appealing, especially for sequential Monte Carlo related fields. A C++ template library is presented for the purpose of implementing general sequential Monte Carlo algorithms on parallel hardware. Two examples are presented: a simple particle filter and a classic Bayesian modeling problem.

Abstract:
We develop a parallel rejection algorithm to tackle the problem of low acceptance in Monte Carlo methods, and apply it to the simulation of the hopping conduction in Coulomb glasses using Graphics Processing Units, for which we also parallelize the update of local energies. In two dimensions, our parallel code achieves speedups of up to two orders of magnitude in computing time over an equivalent serial code. We find numerical evidence of a scaling relation for the relaxation of the conductivity at different temperatures.

Abstract:
We investigate the applicability of the synchronous relaxation (SR) algorithm to parallel kinetic Monte Carlo simulations of simple models of thin-film growth. A variety of techniques for optimizing the parallel efficiency are also presented. We find that the parallel efficiency is determined by three main factors $-$ the calculation overhead due to relaxation iterations to correct boundary events in neighboring processors, the (extreme) fluctuations in the number of events per cycle in each processor, and the overhead due to interprocessor communications. Due to the existence of fluctuations and the requirement of global synchronization, the SR algorithm does not scale, i.e. the parallel efficiency decreases logarithmically as the number of processors increases. The dependence of the parallel efficiency on simulation parameters such as the processor size, domain decomposition geometry, and the ratio $D/F$ of the monomer hopping rate $D$ to the deposition rate $F$ is also discussed.

Abstract:
We introduce an algorithm to systematically improve the efficiency of parallel tempering Monte Carlo simulations by optimizing the simulated temperature set. Our approach is closely related to a recently introduced adaptive algorithm that optimizes the simulated statistical ensemble in generalized broad-histogram Monte Carlo simulations. Conventionally, a temperature set is chosen in such a way that the acceptance rates for replica swaps between adjacent temperatures are independent of the temperature and large enough to ensure frequent swaps. In this paper, we show that by choosing the temperatures with a modified version of the optimized ensemble feedback method we can minimize the round-trip times between the lowest and highest temperatures which effectively increases the efficiency of the parallel tempering algorithm. In particular, the density of temperatures in the optimized temperature set increases at the "bottlenecks'' of the simulation, such as phase transitions. In turn, the acceptance rates are now temperature dependent in the optimized temperature ensemble. We illustrate the feedback-optimized parallel tempering algorithm by studying the two-dimensional Ising ferromagnet and the two-dimensional fully-frustrated Ising model, and briefly discuss possible feedback schemes for systems that require configurational averages, such as spin glasses.

Abstract:
In many situations it is important to be able to propose $N$ independent realizations of a given distribution law. We propose a strategy for making $N$ parallel Monte Carlo Markov Chains (MCMC) interact in order to get an approximation of an independent $N$-sample of a given target law. In this method each individual chain proposes candidates for all other chains. We prove that the set of interacting chains is itself a MCMC method for the product of $N$ target measures. Compared to independent parallel chains this method is more time consuming, but we show through concrete examples that it possesses many advantages: it can speed up convergence toward the target law as well as handle the multi-modal case.