In this chapter we reviewed direct, linear and iterative linear inverse methods in geophysics. Most direct methods are not robust for incomplete data in the presence of noise and are consequently not very useful. Therefore model based inversion methods have gained much popularity. Based on the nature of the forward problem, we followed the commonly accepted definition [e.g., Tarantola, 1987] to classify inverse problems into linear, weakly nonlinear, quasi-linear and highly nonlinear problems. These definitions may at times be somewhat misleading in that it is very difficult to say under what condition a problem can be classified as a weakly nonlinear or quasi-linear problem. The facts are straightforward in that we know that most forward modeling operators are nonlinear and the success of an inverse method is usually judged by how well we explain the observed data given their uncertainty. Thus a probabilistic theory as developed by Tarantola or based on Bayes' rule, seems to be adequate to describe the inverse problem. Their solution, however, is the most difficult aspect. If a problem can be linearized, i.e., if under some restrictive set of conditions a forward modeling operator can be reduced to a linear operator, the solution becomes fairly straightforward. Otherwise, one needs to at least try to use gradient based methods in solving the inverse problem. The gradient based methods suffer from the following limitations: • Gradient based methods will often find a minimum in the close neighborhood of the starting solution. In situations where the error function may have several secondary minima, one needs to either start very close to the global minimum or try several starting points hoping that one of them will lead to the best solution. • Most of these methods require that a gradient and sometimes a curvature (second derivative) matrix be computed at each point during the iteration process. Although for some geophysical applications analytical expressions for the gradient and curvature can be derived, it is not possible in general. For example, for wave propagation problems in laterally varying media, the forward calculation is done most accurately with methods such as finite differences. In such a case, the gradient can only be computed numerically—a computationally formidable task. • Almost all of these methods require computing the inverse of a large matrix. Fortunately, in many situations several known methods such as Cholesky decomposition can be used for this purpose. Inverting a large matrix even with well known techniques is not necessarily a very easy computational task. Using a statistical framework, the solution of an inverse problem is described by the PPD. If the forward modeling is nonlinear, clearly the PPD is not Gaussian and the analysis of the solution is not straightforward. Even in the case of a highly nonlinear problem, if one can start at a point very close to the peak of the PPD, the MAP point can be reached by gradient methods. But, using this approach one would still only approximate the PPD by a Gaussian around the MAP point, even though the true PPD may be highly complicated in shape. The only way to truly estimate the PPD is to evaluate the error function at each point in model space which is in general an impossible task. Therefore, practical ways need to be sought that can sample several points near the peak of the PPD such that the most significant part of the PPD can be approximately constructed by direct sampling rather than fitting a presumed shape around the MAP point.

Since the work of Kirkpatrick et al. [1983], the optimization method now known as SA has been applied to several problems in science and engineering. Following the work of Rothman [1985, Rothman 1986], the method has gained popularity in the geophysical community and has been applied to a wide variety of problems in geophysics. Many geophysical inverse problems are nonlinear and due to the availability of high speed computers, SA is now being applied to these problems. In this chapter, we reviewed starting with classical SA, several variants of SA that are aimed at making the algorithm computationally efficient. Most of these algorithms are statistically guaranteed to attain equilibrium distribution and possibly reach the global minimum of an error or energy function and are thus suitable in MAP estimation. We also note that they can be used as importance sampling algorithms to estimate the uncertainties and the marginal posterior probability density function for each parameter in model space. In practice, however, these algorithms cannot be expected to always perform flawlessly since care needs to be taken in parameter selection which is problem dependent. For example, it is important to choose the starting temperature and cooling schedule properly. One obvious omission in this chapter is a discussion of cooling schedules. There has been some theoretical work on choosing a cooling schedule. However, the problem with theoretical cooling schedules is that your computer budget will quickly be exceeded. For this reason, we found VFSA very useful in many applications. The SA's that use fast cooling schedules are often called simulated quenching (SQ) [Ingber, 1993]. One variant of SQ is to find the critical temperature and either run the algorithm at that constant temperature [Basu and Frazer, 1990] or anneal slowly by starting slightly above the critical temperature [Sen and Stoffa, 1991]. However, a substantial amount of computer time can be spent finding the critical temperature. The MFA algorithm appears promising in that an approximate value of the critical temperature can be found analytically. SA at a constant temperature of one avoids the issues of selecting a suitable starting temperature and a cooling schedule and enables us to use SA as an importance sampling technique in the efficient evaluation of multidimensional integrals. These concepts will be illustrated in Chapter 7. In MAP estimations, however, choice of the proper cooling schedule is critical to the design of an efficient algorithm. In practice, an SQ method that uses a cooling schedule which is faster than a theoretical cooling schedule can be determined by trial and error.

GA is a novel and interesting approach to optimization problems. Unlike SA, the commonly used GA still lacks a rigorous mathematical basis. Its innovative appeal lies in its analog with natural biological systems and the fact that even a simple GA can be applied to find many good solutions quickly. We have shown how some aspects of GA, e.g., stretching the objective function, can be replaced by incorporating the concept of temperature to the probabilities for the selection process. Also, the genetic concept of heredity can be replaced with the Metropolis criterion to decide if each new model should be replaced by its predecessor. The modification to a classical GA allows the algorithm to be viewed as either an analog to biological reproduction or as an approximation to parallel Metropolis SA. In either case, the principal distinction is the introduction of the crossover operator which allows the models in the distribution to share information. This is an advantage, since the information common to good models propagates to the next generation more efficiently, and a disadvantage since premature convergence can occur. Each model of a given generation can be evaluated independently and each run can be made independently. This makes GA's ideally suited for implementation in parallel computer architectures.

It is now well recognized that the results of geophysical inversion may have uncertainties for several reasons. Bayes' rule offers an elegant description of the solution for the geophysical inverse problem in terms of the posterior probability density function in the model space. Because the model space is highly multidimensional, a complete display of the PPD is not possible and therefore, we seek quantities such as the posterior marginal density functions, mean, covariance, etc. to describe the solution of our inverse problems. Derivation of these quantities is not a trivial task because they involve numerical evaluation of integrals with a large number of variables. One additional difficulty is that for most geophysical inverse problems the data have a nonlinear relationship with the model parameters resulting in error functions which may have multiple extrema. The PPD may therefore be highly complicated, and methods that require prior assumptions about the shape of the PPD may not be adequate. These methods attempt to locate the maximum of the PPD and the Hessian computed at the MAP point is used to estimate the covariance; thus a Gaussian is completely described. We described several computationally intensive sampling based approaches to derive quantities such as the posterior marginal PPD, mean and covariance. Six different methods have been tested and compared by application to synthetic resistivity sounding data where the number of model parameters were only five, such that the forward modeling was computationally tractable. Methods such as GS and PGS agree well with grid search integration. The results from multiple MAP algorithms are quite encouraging because even though the estimated variances were lower than those estimated by GS, PGS, and grid search, the correlations were in very good agreement. Of course for very complicated PPD's, the multiple MAP algorithms may not perform equally well, but for many applications, they will help in getting approximate estimates of these quantities quite rapidly. The SA method is guaranteed to asymptotically converge to a Gibbs' distribution. We showed numerically that a steady state distribution can be achieved in a finite number of iterations. Thus a GS requires much less computation time than a complete grid search in the evaluation of multidimensional integrals. Running SA at a constant temperature of 1.0 appears to be feasible for many practical problems. Although our PGS is not based on a mathematical model as rigorous as SA, we showed numerically that the results from GS and PGS are nearly identical. The PGS, however, offers the advantage of running the algorithm on parallel computer architectures. The use of stochastic inversion methods based on statistical models, has been criticized primarily because of the requirement that the prior distribution of noise in the data and theory error be known. Often the prior noise pdf is assumed Gaussian. In practice, it is never possible to estimate the true noise distribution with a small and finite number of repeated measurements, since the classical definition of probability requires that an experiment be repeated an infinite number of times under identical conditions. The use and specification of the prior pdf has long been a subject of dispute and the use of a statistical approach has often been criticized by classical inverse theorists. There exist several definitions of probability. Besides the classical definition which is based on relative frequency of occurrence, the Bayesian interpretation defines probability as a degree of belief [Bayes, 1763]. The only requirement is that the axioms of probability theory are not violated. Again, in the subjective Bayesian interpretation, the degree of belief is a personal degree of belief. Thus a priori distribution may simply reflect an expert's opinion. We therefore recommend that a simple and practical approach be taken to characterize the prior distribution of noise in the data. For any practical problem, we need to carefully analyze the data and explore the sources of error in the data in order to arrive at some way of characterizing the uncertainty.