You are on page 1of 59



By Amit Kumar August 2002

I certify that I have read this report and that in my opinion it is fully adequate, in scope and in quality, as partial fulfillment of the degree of Master of Science in Petroleum Engineering.

_____________________________ Dr. Khalid Aziz (Principal Advisor)


I wish to express my sincere thanks to Dr. Khalid Aziz for his guidance, patience, and encouragement. His insights were invaluable during the research. I would like to express my gratitude to SUPRI-B for funding this research. I also wish to thank BHP Petroleum for providing the field data for the Skua field. I am grateful to Peter Behrenbruch for his encouragement. I also received some useful pointers and suggestions from Dr. Christopher D. White, LSU and Dr. Adwaita Chawathe, ChevronTexaco for which I am thankful. I must mention the help I received from my friends and colleagues at Stanford. I would especially like to thank Rakesh Kumar, Ashish Dabral, and Sunderrajan Krishnan.


Table of Contents

iii v vi vii 1 1 3 4 4 4 6 6
9 11


13 Error! Bookmark not defined. 14 15 16 16 17 18 19 21 21 22 24 25 26 31 35 37 40 42 45 47



List of Figures
Figure 1. Pareto Chart of the Standardized Effects ..................................................................................... 24 Figure 2. Main Effects for the Full Factorial Case ...................................................................................... 25 Figure 3. Main Effects for Box-Behnken Design .......................................................................................... 26 Figure 4. Main Effects for D-optimal Design with 49 Runs ......................................................................... 27 Figure 5. Main Effects for D-optimal Design with 100 Runs ....................................................................... 27 Figure 6. Interaction Effects for Full Factorial Design ............................................................................... 28 Figure 7. Contour Plot of Response Surface for Horizontal Permeability and Relative Gas Cap Size........ 30 Figure 8. Contour Plot of Response Surface for Horizontal Permeability and Gas Rate ............................ 30 Figure 9. Contour Plot of Response Surface for horizontal permeability and Relative Water Zone Size .... 31 Figure 10. Contour Plot of Response Surface for Horizontal Permeability and Horizontal Location of Well ...................................................................................................................................................................... 32 Figure 11. Contour Plot of Response Surface for Gas Rate and Relative Water Zone Size ......................... 32 Figure 12. Contour Plot of Response Surface for Gas Rate and Horizontal Location of Well .................... 33 Figure 13. Contour Plot of Response Surface for Relative Gas Cap Size and Horizontal Location of Well 33 Figure 14. Normal Probability Plot of the Residuals (Full Factorial Design)............................................. 34 Figure 15. Standardized Residuals versus the Fitted Values (Full Factorial Design) ................................. 34 Figure 16. Standardized Residuals versus Observation Order (Full Factorial Design) .............................. 35 Figure 17. Analysis of a Half-Fraction of the full 24 design; a 25-1 Fractional Factorial Design ................ 43

List of Tables

Table 1 Reservoir Properties, Dimensions and Initial Conditions ............................................................ 19 Table 2 Parameters Varied in Simulation Study....................................................................................... 20 Table 3 Factors and Their Levels ............................................................................................................. 21 Table 4 Comparison of Regressions ......................................................................................................... 23 Table 5 -- Coefficients and P-Values for Regression Terms......................................................................... 29


Numerical Reservoir Simulation is often too expensive to allow exhaustive investigation of sensitivities to multiple parameters. Experimental Design and the associated technique of response surface analysis make available the tools to reduce the number of simulations required. Moreover, the methodology provides a statistical framework to estimate the impact of various parameters and their interactions. It accords not only accurate estimate of response, but also quantification of errors. In the present work, second order polynomial response surface models were used to approximate the relationship between recovery and six reservoir and production parameters in case of Gas Cap Blowdown with the objective to identify the key parameters that govern the recovery from Gas Cap Blowdown and to quantify their impact. High gas rate in presence of high horizontal permeability, a small gas cap, large water support, and central location of the well were found to be the conditions for high recovery. The effects of many factors were found to depend on other factors; the model also captured these interaction effects. Different experimental designs were employed and compared with the exhaustive case. It was found that D-optimal design with the same number of runs as Box-Behnken design provided a better fit to the observed reservoir behavior. There was only a slight increase in the goodness of fit using a D-optimal design with double the number of runs. Comparisons with the exhaustive case confirmed that all the designs used in this study give consistent results about statistically significant effects. Uncertainty analysis on the reservoir performance can be made using the regressions developed. Also, optimization of response can be performed using the regression model as an inexpensive proxy for the flow simulator.


Chapter 1

Experimental Design and the associated technique of Response Surface Modeling can be a powerful tool to make effective use of reservoir simulation for sensitivity and optimization studies. In order to optimize data acquisition and field development, reservoir modeling teams require knowledge of sensitivities to key parameters and uncertainty in the outcome. This is generally addressed by running a large number of simulations spanning the range of uncertainty in geologic, engineering, and even financial parameters. Even though the values of model parameters remain uncertain, for a given set of parameters, the outcome of reservoir simulation is deterministic and reasonably accurate. The complexity in reservoir simulation software and the high running time of complex simulation models pose a limit on the number of simulations actually performed. Experimental Design is used to select a moderate number of simulation runs and analyze them to estimate the sensitivity of reservoir behavior to various factors. Moreover, accurate polynomial models called Response Surfaces can be fitted to the simulator response and can be used as a computationally efficient proxy for the reservoir simulator in performing uncertainty analysis, parameter estimation, and optimization. 1.1 Historical Development of Experimental Design Experimental Design has been widely used in agriculture, process control, quality control, and optimization; primarily, to select a set of experiments and analyze them to ascertain the effect of variables, interaction of variables, and estimation of errors. Unfortunately, this technique has not been as widely used in the petroleum industry as it might have been. A few applications in the petroleum area are cited: Chu(1990), to optimize the choice of the completion interval in steamflooding; Damlseth et al. (1991), to a North Sea field development study; Narayanan et al. (1999), to generate psuedofunctions;

Dejean and Blanc (1999), to build a simplified model for a process and estimate the uncertainties on the response predictions in combinations with experimental design and Monte Carlo simulations; White et al. (2000), for parameter estimation and uncertainty analysis; Wang (2001), for simulation of turbidite reservoir using outcrop data; Friedmann et al. (2001), to assess uncertainty in recovery predictions in channelized reservoirs for both primary and waterflood processes; and Manceau et al. (2001), to quantify the impact of important reservoir uncertainties on cumulative oil production and optimize future field development.

Chu (1990) used two-level factorial design (see section 2.2.1) to predict steamflood performance. He studied the effect of various rock and fluid properties as well as design and operating variables. However, Chu did not include uncertainty assessment of the response. Damlseth et al. (1991) performed a simulation study based on models constructed using an optimal design. The resulting response surface was used to identify and assess important factors affecting the recovery as well as the interaction among factors of interest. Uncertainty assessment was done using Monte Carlo simulations on the response surface obtained. Narayanan et al. (1999) applied the response surface approach to waterflooding studies for models based on an outcrop. Response surface methodology was used to generate pseudofunctions for different reservoir descriptions without expensive fine grid simulations. As the response surface technique was found to be simpler and cheaper than fine grid simulations, it allowed easy generation of pseudofunctions for different scenarios. The study also quantified the effects of different parameters of the reservoir model on the recovery responses and upscaled properties. Dejean and Blanc (1999) used the same set of tools for a synthetic field but they studied a large number of factors using a cheaper two-level design and retain only those parameters for further analysis using three-level designs that were seen to have the most significant effect on the response. In a similar work, Wang (2001) used the statistical tool of principal component analysis to reduce a large number of geological parameters to a more manageable set and performed designed simulation on turbidite reservoirs using outcrop data. White et al. (2000) employed the same techniques for estimating parameters and associated uncertainty assessment. They also examined the effects of varying geostatistical parameters and compared geostatistical and quasi-deterministic models of geologic variability. They used the response surface models to test differences between scenarios, assess sensitivities to factors, and estimate the effects of measurements on response uncertainty. They also used Bayesian statistics to calculate maximum-likelihood estimates of factors conditional to sets of responses. Friedmann et al. (2001) performed response surface analysis on simulation runs generated using experimental design to obtain a simplified (polynomial) analog to the simulation model. Moreover, they used neural networks trained on the simulation models 2

generated using experimental design to construct recovery type curves. The limitation of the type curve approach is that they are suitable only in immature projects where sufficient data is not available to build reliable simulation models. Moreover, since the type curves are based on synthetic geologic models, their use is limited to cases where the reservoir is similar to the synthetic model. Manceau et al. (2001) combined experimental design methodology with gradual deformation methods and optimization techniques into a Joint Modeling Method. In this method, the production response is modeled with a mean model and a variance model. The mean model, after experimental design methodology, describes the production response as a function of the deterministic parameters like petrophysical properties and production parameters. The variance model describes the dispersion of the production response due to the non-continuous stochastic parameters like geostatistical realizations. Together these models provided the framework for quantifying the risk associated with both deterministic and stochastic uncertainties. A persistent problem with reservoir simulation studies is the large number of simulations that must be performed in order to estimate the impact of various parameters on fluid flow. Experimental design make it possible to select with statistical means only a small subset of the total number of runs otherwise required and response surface technique facilitates the analysis of the effect of the parameters along with their interacts with other parameters. The general strategy employed in this work is applicable to all types of simulation studies that involve estimation of impact of a large number of parameters.

1.2 Organization of Report The present chapter introduces the topic and outlines previous applications of Experimental Design in petroleum engineering. Chapter 2 explains the methodology of experimental design and response surface analysis and Chapter 3 explains the problem of gas cap blowdown to which these techniques are applied. Chapter 4 details the procedure adopted in this application and presents the results obtained. Chapter 4 also contains the conclusions that are drawn from the present study and suggests some of the future work that might be done in this area. Appendix A illustrates some of the basic concepts behind the generation of economical designs through the example of half-fraction designs. Appendix B outlines the use of algorithms to generate optimal designs. Appendix C contains an eclipse data file with all factors at their base values. Terms used in equations are defined just after they are used, but for reference, they are also provided in the Nomenclature Section.

Chapter 2

Experimental Design and Response Surfaces

Myers and Montgomery (1995) describe Design of Experiments (DOE) as a method to select experiments to maximize the information gained from each experiment and to evaluate statistically the significance of different factors. An experimental design study aiming to generate response surfaces requires identification of the various factors that cause changes in the response and predicting the various factors these variations in a mathematical form. The choice of the mathematical form (the model) and selection of the set of experimental conditions (the design) both influence each other. For example, a quadratic model requires a design with at least three levels. One particular design can be used to generate different models. At the same time, different models may be used to estimate the parameters of a particular model. The design provides a set of input values for the factors that are used to run the experiments or simulations (numerical experiments). The results obtained from the experiments are fitted to the model, using standard regression techniques. Statistical tools allow the effects of the factors to be ascertained as well as the significance of these effects. The model fitted to the data can also be used as a cheap proxy for the actual process being studied by the experiments in order to perform optimization, risk analysis, etc. 2.1 Terms and Definition It is convenient to use the standard terminology of Experimental Design in order to follow the later discussion. Hence important terms are defined below based on Box et al. (1978). 2.1.1 Definitions Designs are lists of different experimental conditions (or combination of factors) at which experiments (or simulations) are performed. With two factors, a complete 3-level design requires 32 runs. In matrix notation, each row of the design matrix indicates a run, whereas each column contains the settings of each factor. e.g. a design with two factors A and B and 2 runs may be given as: Run Factor A Factor B +1 -1 1 0 +1 2 Conventionally, in matrix notation, the same is expressed as follows: 4

+1 0

-1 +1

Factors are the input variables. They could be the control variables in an experiment or stochastic. They are systematically varied in the simulation study or an experiment to assess their effects. Levels. The values that a factor takes constitute its levels. A two-level factor, for example, will take a high value and a low value. Sometimes the number of levels each factor takes identifies a design. e.g. in a two level design, all factors will have a high and a low value. The levels are customarily denoted by + or - referring to the higher and the lower values respectively. In a 3-level design the levels are denoted by +1, 0, -1, in reference to the high, the base, and the low setting of the values of the factors respectively. Responses are the system outputs. We might consider oil in place, recovery efficiency, breakthrough time, initial rate, etc. as responses. Main Effect. The main effect of a factor can be given as the difference between the average response at higher settings of the factor and the average response at lower settings of the factor. Interaction Effect. If the effect of a factor depends on the effect of another factor, the two factors are said to interact. For example, the performance of dual multilateral wells in a reservoir with anisotropy in horizontal permeability is affected by well orientation much more severely compared to a multilateral of quad configuration (Rivera et al., 2002). Hence, orientation and configuration of multilateral wells interact in their effect on reservoir performance. Confounding. When the factors in a design are so arranged that some effects are indistinguishable from other effects, these effects are said to be confounded. Only certain confounding patterns are considered to generate designs in order that the effects of interest do not confound with each other. The terms to be confounded with the mean term in the model determine the entire confounding pattern by implication. (Appendix A presents this concept with an example.) Resolution. A design of resolution R is one where no l-way interactions are confounded with any other interaction of order less than R-l. (By convention, resolution is expressed in Roman.) A design of resolution R=III(3) does not confound main effects with one another but does confound main effect with two-factor interactions; whereas, A design of resolution R=IV(4) does not confound main effects with two-factor interactions but does confound two-factor interactions with other two-factor interactions. In the same vein, the higher resolution R=V (5), does not confound main effects and two-factor interactions with each other, but does confound two-factor interactions with three-factor interactions.

2.2 Types of Design Broadly, there are two kinds of designs in use, namely: 1. Classical Experimental Designs 2. Optimal Experimental Designs Response surfaces are the widely used tools to analyze the results of the experiments (or response) that may have been performed using an experimental design. 2.2.1 Classical Experimental Designs Factorial designs are the simplest, requiring Lk experiments, where L is the number of levels and k is the number of factors. Since the factors are set at either their maximum or minimum value in an two-level factorial experiment, these designs cannot go beyond estimating first-order effects and interactions. Hence, they are valid only over a limited range of values, or have low accuracy. Scaling the factors and response can help alleviate both limitations in some cases. Quadratic effects can be estimated only with a three-level or higher design. A three-level factorial, requiring 3k experiments, may be employed, where k is the number of factors. In this case, a factor will be set to its maximum, center, or minimum value. As the number of factors grow, a full factorial design raises the required number of experiments to unacceptably high values. To reduce the high cost of these designs for large number of factors, partial factorial designs are formulated that select only a subset of the full-factorial design. The accuracy of higher-order interactions is traded for lower number of experiments. Higher-order interactions that are considered to be insignificant are mixed together by a process known as confounding (i.e. confusing) so that these effects can be no longer separated. A notation, such as 2k-pR can describe fractional factorials, where k is the number of factors, p is the fraction of the factorial, and R is the characteristic resolution of the design. Let us take an example of 2(11-7) design of resolution III (three). This means that the 2-level design accommodates overall k = 11 factors (the first number in parentheses); however, p = 7 of those factors (the second number in parentheses) were generated from the interactions of a 2(11-7), or, full 24 factorial design. As a result, the design does not give full resolution; that is, there are certain interaction effects that are confounded with (identical to) other effects. Box-Behnken designs (Section 2.3.2) are a popular example of partial three-level designs. Generation of a half-fraction design is discussed in Appendix A. Central composite designs are another family of classical experimental designs. The design is computed by adding axial points to a smaller inner design that may be factorial

or fractional factorial. Clearly, this approach combines the advantages of a two-level design with those of a three-level design. However, these designs also suffer from some shortcomings. Reusing points is difficult if additional points are added to a central composite design. Hence, if some experiments are performed, the results can not be easily added to the existing design. Also, with large number of factors, large factor ranges are required in order to maintain desirable numerical properties.

2.2.2 Optimal Experimental Designs Experimental designs may be defined with respect to an optimality condition. These optimal designs are quite flexible and also allow incorporating new data easily. For example, with a D-optimal design it is easy to add a few more results to an already existing data set (augmentation). However, there are some limitations of optimal designs as well. The criterion for optimality of design may not be clearly related to the phenomenon being modeled. Also, there are different types of optimality conditions and it may not be entirely clear which of these abstract conditions are more suitable to the process being modeled. Moreover, computations of the optimality depend upon the response model selected. So a design that is optimal for a quadratic model need not be optimal for linear model.

2.2.3 Response Surface Models Response surface models are functions that are empirically fit to observed data from results of experiments or simulations. Usually, the data being fit were obtained at factor values specified by an experimental design and the model being fit is a polynomial in those factors. If a response y in the process (or system) being modeled depends on input variables 1 , 2 ,..., k the model that describes the process may be written as (Myers and Montgomery, 1995): y = f ( 1 , 2 ,..., k ) + (2.1)

Where the true response function f is unknown and can be very complicated in real applications. is the error term that represents sources of variability not accounted for in f including both pure error and error due to lack of fit. may include measurement error on the response, other sources of variability inherent in the process, the effect of variables not included in the model, etc. (Wang, 2001). The expected value of y, denoted by is given as (Myers and Montgomery, 1995):

E ( y ) = = E[ f ( 1 , 2 ,..., k )] + E ( )


Further, assuming to have a normal distribution with a zero mean (and variance 2 ):

= f ( 1 , 2 ,..., k )


The variables 1 , 2 ,..., k are the natural variables that are expressed in their natural units of measurement. They are, however, converted (coded) to [-1,1] with the mean value at zero, before performing regression. In terms of the coded variables, the response function is written as (Myers and Montgomery, 1995):

= f ( x1 , x 2 ,..., x k )


A very simple first-order model, main effects model, in two variables x1 and x2 is:

= 0 + 1 x1 + 2 x 2


The constant 0 is an estimate of the mean of y over the experimental domain. It corresponds to the value of y when all coded variables are at 0. The coefficient 1 is an estimate of the gradient of y with respect to x1. Similarly, the coefficient 2 is an estimate of the gradient of y with respect to x2. A more complicated first-order model, shown below, includes the two-factor interaction term such that:

= 0 + 1 x1 + 2 x 2 + 12 x1 x 2


The interaction term accounts for the variation in y due to x1 depending on the value of x2, or the variation in y due to x2 depending on the value of x1. However, the first-order models are likely to be useful only in a relatively small region of the independent variable space. Also, they can not account for curvature in f. Hence, for most modeling studies, second-order models are employed. The second-order model in general form is given as (Myers and Montgomery, 1995):

= o + j x j + jj x 2 j + ij x i x j
j =1 j =2 i< j


It includes a constant, linear effects, single-term quadratic effects, and two-term interactions (from left to right).

Selection of the model is a subjective decision. It should be representative of the underlying process. For example, a very complicated process is unlikely to be approximated by the main-effects model but the general second-order model might be appropriate. Once the process is approximated by a simple analytical method such as above that fits well the true response surface, the identification of parameters that are actually influential on the response as well as their possible interactions can also be obtained by a process called screening. Plackett-Burman designs (Section 2.3.1) are a good example of a screening design. Fitting A Model

In a general matrix notation, the model may be expressed as (Dejean and Blanc, 1999):
y = X + e



X = (n*p) matrix called model matrix, or regression matrix. It depends both on the regression model and on the design of the experiments y = (n*1) vector of observations of the response = (p*1) vector of the coefficients (or parameters) of the model. n = the number of experiments, and p = the number of terms in the model (including the constant). The true coefficients remain unknown and have to be estimated. The estimate of a response at all observation points may be given as:
 y = X


 y = n-vector of estimated responses, based on observations y

X = the regression matrix,

= the vector of model coefficients obtained from a least-square fit to the observations.
Under the assumption that the error term is normally distributed with a zero mean and variance of 2 , only the mean of the response has to be modeled.
E ( y ) = = E ( X ) = Xb


Where, 9

b= an unbiased estimate of .
Further, if we assume a function L, representing the loss resulting from an incorrect estimate of response, the loss function may be written as (Dejean and Blanc, 1999):
L = ei2 = e'.e = ( y X )'.( y X )
i =1 n


The estimate of such that L in the above form is minimized is known as its leastsquare estimate. The least-square method can be used to fit a polynomial to observed data (Montgomery and Peck, 1982). The name is appropriate for this particular form of the loss function that penalizes overestimation as severely as an underestimation. This is the most commonly used estimate, but if another function models the loss incurred from an error in estimate better than the square form for a particular problem, it can be used as the loss function. b is given as (Dejean and Blanc, 1999):

b = ( X ' X ) 1 X ' y


Prime () indicates transpose of a matrix unless noted otherwise. b is a random variable with the following properties:
E (b) =

(2.13) (2.14)

Cov(b) = 2 ( X ' X )

The covariance of b is directly related to the quality of the fit of the model and depends on the regression matrix X and the error variance. An estimate of error variance is given by (Dejean and Blanc, 1999):

i =1

2 i

n p


The summation term in the above equation, which is the sum of squares of error terms, is sometimes denoted as SSE. All other things being equal, a model with minimum error variance is desirable since it explains better the variability of the response. Details of procedures of calculating b are available in numerous texts (including Montgomery and Peck, 1982).

10 Significance Testing for the Fitted Model and Individual Coefficients
The process of statistical testing involves advancing a hypothesis (called a null hypothesis) and an alternate hypothesis that is expected to be true if the null hypothesis is false, given a dataset. The alternate hypothesis may not be proved to be true but it may be possible to quantify its plausibility by means of a probability (P-value). The null hypothesis is tested against the alternate hypothesis by means of an appropriate statistic (called a test). The distribution of the test statistic, which is a real-valued function of the data, in the event of the null hypothesis being true is known as null distribution. Tables for several of null distributions (t-distribution, F-distribution, chi-square distribution, etc.) are widely available and can be readily computed. The value of the test statistic from the data is compared with the null distribution in order to quantify the plausibility of the either hypothesis. For the purpose of ascertaining the predictive power of the model in the design range of factors, significance testing is necessary. The test for significance of regression is a test to determine if a linear relationship exists between the response variable and a subset of the regressor variables (or, factors). The appropriate hypotheses are (Myers and Montgomery, 1995):
H 0 : 1 = 2 = ... = p 1 = 0 H 1 : j 0, for at least one j

(2.16) (2.17)

The test statistic for Ho is an F-test such that:

Fo =

MS R SS R ( p 1) = MS E SS E ( n p )


Where, SSR = the sum of squares due to the model, SSE = the sum of squares due to the error, MSR = the mean square of the regressors, and MSE = the mean square error. The total sum of squares, SST = SSR + SSE. If Fo > F ( p 1, n p ) , hypothesis H0 is rejected. This implies that at least one of the variables contributes significantly to the model.


The coefficient of multiple determination, R2, is defined as (Myers and Montgomery, 1995) R2 = SS R SS =1 E SS T SS T (2.19)

R2 is a measure of the amount of reduction in the variability of y obtained by using the regressor variables in the model. However, a large value of R2 does not necessarily imply that the regression model is good (Myers and Montgomery, 1995). Since R2 always increases as terms are added to the model, it is sometimes preferable to use an R2 statistic adjusted for number of terms, R2adj , defined as (Myers and Montgomery, 1995)
2 Radj =1

SS E ( n p ) n 1 =1 (1 R 2 ) SS T ( n 1) n p


Adjusted R2 statistic does not always increase as variables are added to the model. If the adjusted R2 statistic differs significantly from the (original) R2 statistic, it may be a strong indication that unnecessary terms were added. Both R2 and adjusted R2 estimate the quality of fit, but a good fit does not guarantee good predictive value of the fitted model (Dejean and Blanc, 1999). Sometimes it is necessary to test the significance of individual regression coefficient. When a variable is added to the regression model, the sum of squares for the regression increases while the error sum of squares decreases. Significance testing for individual coefficient helps in making the decision whether the increase in regression sum of squares is sufficient to justify the use of an additional variable. Also, adding unimportant variable may increase the mean square error thus making the model less useful. The appropriate hypotheses for any coefficient j are (Myers and Montgomery, 1995):
Ho : j = 0 H1 : j 0

(2.21) (2.22)

The test statistic for this hypothesis is (Myers and Montgomery, 1995):
t0 = bj MS E .C jj


Where, b = the least square estimate of , MSE = the mean square error, and 12

Cjj = the diagonal element of the covariance matrix, given by (XX)-1. The hypothesis H0 is rejected if | t 0 | > t / 2,n k 1 . If the hypothesis Ho is not rejected, then it is indicated that the term related to j can be deleted from the model. The concept of P-value is also used to determine the statistical significance of the effects in the model. The P-value quantifies the strength of evidence against the null hypothesis and in favor of the alternative. P-value may be defined as the probability that a variate would assume a value greater than or equal to the observed value strictly by chance. value is defined as the number 0 1, such that P(z zobserved) is considered significant, where P denotes P-value. A commonly used value is 0.05. Given the degrees of freedom in the regression and the value of a test statistic, like F-test or t-test, P-value can be calculated or looked up from standard tables. If the P-value of an effect is less than value, hypothesis H0 is rejected, thus implying that at least one of the variables is statistically significant. It may be noted however, that a high P-value does not necessarily support the null hypothesis.

2.3 Some Popular Designs

Some designs are applied frequently because they offer certain advantages over the less popular ones. Three of them are discussed in the following account. 2.3.1 Plackett-Burman Design In order to screen a large number of factors to identify those that may be important a desirable design would be one that allows one to test the largest number of factor main effects with the least number of observations. One way to design such experiments is to confound all interactions with "new" main effects. Such designs are also sometimes called saturated designs, because all information in those designs is used to estimate the parameters, leaving no degrees of freedom to estimate the error. Since the added factors are created by equating (see Appendix A), the "new" factors with the interactions of a full factorial design, these designs always will have 2k runs (e.g., 4, 8, 16, 32, and so on). Plackett and Burman (1946) showed how full factorial design can be fractionalized in a different manner, to yield saturated designs where the number of runs is a multiple of 4, rather than a power of 2. These two-level fractional designs can be used for k=N-1 variables in N runs, where N is a multiple of 4. These highly fractionalized designs to screen the maximum number of (main) effects in the least number of experimental runs are known as Plackett-Burman designs and are widely used for screening a large number of possible factors and retain only those that have higher main effect (Freidmann et al., 2001).


2.3.2 D-optimal Design The D-optimal design procedures provide various options to select from a list of valid (candidate) points (i.e., combinations of factor settings) those points that will extract the maximum amount of information from the experimental region, given the respective model that is expected to be fitted to the data. Details of the algorithms for generating the optimal designs are provided in Appendix B. When the factor level settings for two factors in an experiment are uncorrelated, that is, when they are varied independently of each other, then they are said to be orthogonal to each other. Two column vectors X1 and X2 in the design matrix are orthogonal if X1'*X2= 0. The more redundant the vectors (columns) of the design matrix, the closer to zero is the determinant of the correlation matrix for those vectors; the more independent the columns, the larger is the determinant of that matrix. Thus, finding a design matrix that maximizes the determinant D of this matrix means finding a design where the factor effects are maximally independent of each other. This criterion for selecting a design is called the D-optimality criterion (Poland et al., 2001). A D-optimal design seeks to minimize the average size of the variance matrix of parameter estimates by minimizing its average eigenvalue. In a least squares analysis, the variance matrix of the vector , given a design matrix X, is proportional to (X'X)-1. The determinant of (X'X)-1 equals the product of its eigenvalues. Thus, the search for Doptimal designs aims to minimize |(X'X)-1|, where the vertical lines (||) indicate the determinant. As a practice, however, maximizing |(X'X)| is preferred to minimizing |(X'X)-1| since it avoids computation of the inverse of a maxtrix. (See Kuhfeld, 1997.) A number of standard measures have been proposed to summarize the efficiency of a design. D-efficiency is such a measure related to the D-optimality criterion. It is defined as follows (Narayanan, 1998): D-efficiency = 100 * (|X'X|1/p/n) Where, p = the number of factor effects in the design (columns in X), n = the number of requested runs, or, the number of points in the optimal design. It may be noted that since the p-th root of the determinant is the geometric mean of the product of its eigenvalues, it is a measure of goodness of design. This measure can be interpreted as the relative number of runs (in percent) that would be required by an orthogonal design to achieve the same value of the determinant |X'X|. However, an orthogonal design may not be possible in many cases, that is, it is only a theoretical "yard-stick." Therefore, this measure can be used as a relative indicator of efficiency, to compare other designs of the same size, and constructed from the same set of candidate points (Kuhfeld, 1997).


2.3.3 Box-Behnken Design The equivalents of Plackett-Burman designs in the case of 3k-p designs are the so-called Box-Behnken designs (Box and Behnken, 1960; also Box and Draper, 1987). These designs do not have simple design generators and have complex confounding of interaction. However, the designs are economical and therefore particularly useful when it is expensive to perform the necessary experimental runs. Box-Behnken designs were used in this study. This design has several advantages relative to the alternatives. The Box-Behnken design reduces the number of required experiments by confounding higher-order interactions. This reduction becomes more significant as the factors increases in number. Box-Behnken designs do not require many more experiments than two-level. Also, unlike D-optimal designs, Box-Behnken designs do not require or depend on prior specification of the model. Further, unlike first-order (e.g., two-level factorial) designs, Box-Behnken designs allow estimation of quadratic terms and do not yield a constant sensitivity of the response to the factor. Moreover, including the center point in the design reduces the estimation error for the most likely responses. Two-level designs do not include experiments at the design center-point and thus may be inaccurate in the most likely factor ranges. However, as the Box-Behnken design does not use the extreme values for all the factors simultaneously, the accuracy at the extremes suffers.


Chapter 3

Gas Cap Blowdown

Development of an oil field usually aims to maximize ultimate recovery as well as to minimize capital expenditure and operating expenditure. This may be achieved using a traditional development plan, or if need be, a less conventional approach. It is the general practice to make use of all available natural reservoir drive mechanisms gas in solution, primary gas cap, aquifer and compaction. This allows the intrinsic energy of the reservoir to drain out the maximum oil. However, sometimes one natural drive may be so dominant that the benefits of the others may not be necessary. Specifically, a strong aquifer may obviate the need of a small gas cap for the purpose of conserving reservoir energy and blowing down the primary gas cap may maximize the ultimate recovery from a field.
3.1 Early and Delayed Blowdown

A regulatory requirement often imposed on blowdown is that the primary gas cap may not be produced till the pool is in the final stages of depletion (more than, say, 90% of the ultimate recovery). This constitutes the Delayed Blowdown. If the gas from the gas cap is produced earlier than that stage, it is called Early Blowdown (Kuppe et al., 1998). It was found that both forms of blowdowns may result in a comparable recovery of hydrocarbons (Kuppe et al., 1998), but an early blowdown approach can conserve capital and reduce the operating life of the reservoir by many years. It may be noted that in this particular case, a water fence was created by water injection between the gas cap and the oil leg in order to allow concurrent production of oil and gas still maintaining a separation between the two. The advantages of the early blowdown can be attributed to several factors. As the voidage replacement is maintained, the argument that the gas cap is required as a source of energy no longer holds. A high recovery in the early blowdown is obtained by depleting the gas cap before water influx as the superior gas mobility can outrun the waterfront. The reduction in gas recovery due to liquid invasion is lesser in case of early blowdown because of less available time for the water influx. Hence, higher gas saturations lead to higher gas mobilities and better hydrocarbon recovery. In the present study, only early blowdown is studied since it was found that only few results are available on this form of blowdown even though it has been applied successfully in some cases (Behrenbruch and Mason, 1993). 16

3.2 Important Factors in Gas Cap Blowdown

Two kinds of considerations are important when considering the relative importance of different drive mechanisms Reservoir Energy and Fluid Displacement. To compare the relative benefits of water drive and gas cap drive, we can, therefore, discuss in these terms. The following is based largely on Behrenbruch and Mason (1993): Reservoir Energy. Intrinsic energy of different substances can account for their ability to compensate reservoir voidage. The compression-volume product maintains the reservoir pressure. Considering typical values, gas has much lesser instantaneous expansion capacity than water (Behrenbruch and Mason, 1993). However, the closer proximity of the gas cap to the oil leg compared to the aquifer results in more effectiveness of the gas cap initially. The aquifer due to its larger distance from the oil leg is slow to respond to the pressure depletion in the oil leg. Fluid Displacement. Aquifer strength has to be sufficiently high in terms of both size and connectivity in order to sweep the oil at high pressure, but it is usually unknown in the real cases. Availability and use of make up gas and cost of re-injection also needs to be considered. Barring regulatory requirement of injection of gas, the value of the gas and the cost of injection have to be weighed against the value of additional oil recovery (if any). Well placement is another factor. Due to updip movement of fluid contacts, it may be desirable to place wells near the crest of the reservoir initially. Horizontal wells are known to greatly enhance and accelerate the recovery of attic oil in some cases (Vo et el, 1996). The size of the gas cap is important because of its expansive energy and possibly commercial value. As the aquifer sweeps the oil into the gas cap, pore space in the gas cap zone is saturated with oil, which may be reduced again when water invasion starts. Part of the oil originally in the oil zone may be spread over the initial gas zone and be lost as residual oil saturation after displacement by water. A smaller volume of abandoned oil and a shorter operating life can balance this potential loss. For a strong water drive reservoir, the gas cap blowdown from the crestally located wells causes the aquifer to push up the oil column and re-saturate most of the original gas cap volume. This oil is potentially unrecoverable. The volume of oil loss due to re-saturation of the gas cap can be given in stock tank barrels (stb) as:
V , g S o,g Bo = GB gi S o,g = mNBoi S o, g Bo (1 S wc )

(1 S wc ) Bo



Assuming that an alternative production policy will leave behind an abandonment oil column of pore volume (Vab), the amount of abandoned oil in stb will be:


(1 S wc ) Bo,ab


The abandoned oil volume in case of gas cap blowdown will be, similarly,


S orw Bo


For gas cap blowdown to be favorable, the oil losses in case of gas cap blowdown must be less than those in case of an alternative strategy, i.e.
(1 S wc ) mNBoi S o , g S + V ab orw < Vab Bo (1 S wc ) Bo Bo,ab (3.4)

It may be noted that for the same width of abandonment column of oil, it will tend to be closer to the original gas-oil contact in case of strong water drive and closer to the oilwater contact in case of a weak water drive. Water injection may be needed in order to make this feasible if the aquifer support is not enough.

3.3 Significance of Gas Cap Blowdown

The significance of gas cap blowdown lies in its ability to provide an alternative reservoir management strategy in certain cases. It has been successfully applied in many cases and can be extremely attractive with proper reservoir monitoring and control. Some case studies can be seen in Lee (1993), Behrenbruch and Mason (1993), and Starzer et al. (1995) A pitfall is the uncertainty in the drive mechanisms particularly in the aquifer size and the gas cap size. If the predicted values of the drive mechanism are not accurate, recompletion of wells may have to be resorted to in order to drain from a shrinking oil rim, or, unplanned fluid injection may be another expensive solution. Gas coning and cusping can be severe in these cases due to greater gas mobility. Constrained oil rate and reconsidering completion intervals are ways to check this. However, these are not desirable solutions since lower initial completions will force recompletions later as the fluid contacts move up.


3.4 Simulation Model

The simulation model used in the present study is a regular grid, shoe-box model with constant permeability (though vertical anisotropy is considered), and uniform porosity. Three phase flow is considered along with a Carter-Tracy analytic aquifer support at the bottom. Table 1 shows the important parameters for the base case of the model. The parameters are based on the Skua field in East Timor Sea (Behrenbruch, 2000), but the reservoir model is synthetic. Six factors each with three levels were considered, as shown in Table 2. The factors were assumed to be independent. Using three levels instead of two ensures that curvature in the relation between a factor and the response can be captured. All the cases have a gas cap, an oil zone, and a water zone connected to a Carter-Tracy aquifer. The well is always completed just above the gas-oil contact. As seen in Table 2, each factor is varied over a possible range of values. The details of implementation can be seen in Appendix C containing a sample Eclipse input data file.
Table 1 Reservoir Properties, Dimensions and Initial Conditions Property Value 21 20 50 50 0.5 49 9 27 55.56 97.96 2.8 2286.5 228.6 2258.5 75.6

S wi , % k x , md
k y , md

k z , md Nx

x , m y , m

z , m

Datum depth, m

pi at datum depth, bars Depth to top of reservoir, m Reservoir Thickness, m


The full set of runs in this case requires 36 (=729) simulations. We manage to reduce this requirement drastically with the help of experimental design, as explained in the Chapter 4. Moreover, more information can be extracted from the reduced number of runs, than is usually done by changing one factor at a time from its base value.
Table 2 Parameters Varied in Simulation Study Horizontal Aquifer Gas Oil-Water Gas-Oil Well Location in 5 Permeabilit Influx Rate(10 Contact(m Contact(m) Grid y (md) Coefficie sm3/day) ) nt 5 0.1 3 2297.7 2261.3 (1,1) 50 1 6 2314.5 2278.1 (13,5) 500 10 12 2331.3 2294.9 (25,9)


Chapter 4

Application of Experimental Design

The simulation runs were designed using Box-Behnken and D-optimal designs, and analyzed using response surface methods. The regressions so obtained were analyzed for relative importance of the factors, their main effects and interaction effects. Both designs give comparable results and are discussed in detail in the following sections.
4.1 Setup of the Regression

Six factors were chosen for their expected impact on oil recovery using gas cap blowdown. The factors are denoted by KH (horizontal permeability), AQ (Aquifer strength coefficient for the Carter-Tracy aquifer attached to the bottom blocks), GR (specified Gas Rate for the well), WI (the ratio of volume of the oil zone to that of the water zone), M (the ratio of volume of the free gas cap to that of the oil zone), and LH (Horizontal Location of the well, measured along the diagonal from the top right corner to the center of the model, from 0 to 1). A vertical anisotropy of 0.01 is assumed for the purpose of this study. M and WI are varied by varying the fluid contacts. M is referred to as relative gas cap size, and WI as relative water zone size.It should be noted that in hereafter KH would refer to natural logarithm of horizontal permeability and AQ to that of the aquifer influx coefficient, since their levels vary logarithmically, rather than linearly. The factors have the values at different levels as shown in Table 3. The response variable is the cumulative oil production after 8 years divided by the initial pore volume of oil in 10-3 sm3/rm3. It represents the oil recovery, and is denoted by R.
Table 3 Factors and Their Levels

Level 1 Level 2 Level 3

AQ -2.39 0.00 2.30

GR 3.00 6.00

WI 1.00 1.86

M 0.08 0.54 1.00

LH 0.00 0.50 1.00

1.61 3.91 6.21

12.00 13.00


The coefficients of each term is calculated by first converting them using the following relation:

xi =

i , max + i , min

i , max i , min


x = coded value
= raw uncoded value of the natural variable The subscripts max and min refer to respectively the maximum and the minimum values of the factors. The subscript i refers to the factor number. This coding is done to eliminate any spurious statistical results due to different measurement scales for the factors. For example, the gas rate is to the order of 106 sm3/day whereas the relative gas cap size is of the order 0.1 to 1, and so the regression with uncoded values would give very high coefficients for the gas rate compared to the relative gas cap size. (It is noted that the regressors are all rendered dimensionless by the coding scheme.) Four different cases were considered to compare Box-Behnken design and D-optimal design. For 6 factors, the Box-Behnken design requires 49 runs including one run with all the factors at their base (or center) value. This design is compared to a D-optimal design with the same 49 number of runs. The candidates for the D-optimal design were all the runs possible with 3 levels and 6 factors, i.e. 36 (=729) runs. The design was optimal with respect to the full quadratic response surface model. The same method was employed to generate another D-optimal design with 100 runs. To compare these designs, the full factorial case consisting of all 36 cases was also considered. The response surface model in all the cases was the full quadratic model, which can be constructed from Equation 2.7 with the number of factors, K=6. Hence, the coefficients to be estimated from the regression were 28 in number -- 1 constant, 6 for the linear terms, 6 for the square terms and 15 two-factor interaction terms. Simulations were performed using a commercial flow simulator (ECLIPSE) for these four cases. Values for factors were input to the simulator as the four design cases prescribe and the responses so obtained as output were recorded. For each case, regression in the six factors is performed with respect to the response (R) using a standard statistical software (MINITAB) to obtain the response surface models.
4.2 Comparison of Designs

Regressions were significant for all the cases since the P-values were found to be 0.00. It establishes that the models fitted to the data account for the variability in the process 22

irrespective of the design chosen in this study. Table 4 shows the P-values for different terms (linear, quadratic, and interaction) for the four cases discussed in Section 4.1. The linear terms are found to be significant in all designs. Only the Box-Behnken design fails to find the interaction terms significant for = 0.05. The square terms are significant only according to the full factorial design for the same -value. (The concept of value is explained in Section
Table 4 Comparison of Regressions P-values for Regression Terms Linear Box-Behnken D-optimal(49 runs) D-optimal(100 runs) Full Factorial 0.000 0.000 0.000 0.000 Square 0.109 0.247 0.169 0.000 Interaction 0.075 0.002 0.000 0.000 Goodness of Fit R-Sq 0.884 0.922 0.896 0.868 Adjusted RSq 0.735 0.822 0.856 0.863

For the same number of runs, D-optimal design shows much better R2 value than the Box-Behnken design. Since a D-optimal design is optimal in the sense of decreased variance of the parameter estimate (see Section and Section 2.3.2), it is not surprising that it shows a better fit. It may be recalled that R2 indicates the percentage of variability in the process explained by the fitted model. The other two cases have higher sample size available for regression and thus may be expected to show a better fit. However, they suffer from the presence of outliers (Fig. 8 for the full factorial), which prevents them from achieving a better fit than the cases with lower number of runs. In fact, as Table 4 indicates, the full factorial case has the smallest value of R2. It may be concluded that the goodness of fit is high in all the cases, but more available data does not necessarily lead to better fit. The R2adj values show the goodness of fit for the regressions adjusted for number of terms. It is found to be lowest for the Box-Behnken design. Since the full factorial design is the exhaustive case, it is not surprising that it has the highest value of adjusted R2. The D-optimal design with 49 runs shows a better fit than the Box-Behnken design even though it requires the same number of runs. The D-optimal design with 100 runs shows slight improvement in fit than the D-optimal design with 49 runs. The difference between R2adj and R2 is the highest for the Box-Behnken case. It then decreases with increasing number of runs. This difference indicates presence of unnecessary terms in the model (Myers and Montgomery, 1995). Clearly, the Box23

Behnken case failed to attach significance to some terms that were found to be necessary by the cases with a larger dataset.
4.3 Effects of Factors

A Pareto chart (Friedmann et al., 1999) is a bar chart that graphically ranks the effects of factors on the response so that the most important ones can be identified. It is generally used with screening designs since it can be constructed from 2-level designs and thus requires relatively small number of runs (Friedmann et al., 1999). Since the 3-level full factorial case is a superset of 2-level full factorial case, the Pareto chart could easily be complied from existing results. The Pareto chart, in Fig. 1 is used to compare the relative magnitude and the statistical significance of both main and interaction effects.

Figure 1. Pareto Chart of the Standardized Effects

As seen in Fig. 1, by far the most important factor is the main effect of KH. It is followed by the interaction effect of KH and M, which has only a slightly higher significance than the main effect of M. The interaction of KH and GR, the main effect of WI, the interaction of KH and WI, and the main effect of GR are in order of decreasing significance, but are almost equal. The main effect of LH is slightly more important than the interaction effect of KH and LH, but less than the main effect of GR. The effects that are below the dotted vertical line ( = 0.05) in Fig. 1 are insignificant. (The concept of -value is explained in Section Even though the Pareto chart is informative in ranking the impact of effects, as mentioned earlier, it is based only on 2-level information. It also ranks some three-factor interactions as significant, but the quadratic model can not capture these effects. Hence even in the full factorial case, the model 24

imposes limitations on the accuracy of the regression relative to the true response function.

4.3.1 Main Effects

Figs. 2, 3, 4, and 5 show the main effects of the factors for the full factorial design, the Box-Behnken design, the D-optimal design with 49 runs, and the D-optimal design with 100 runs respectively. The main effects plots are used to compare the relative strength of the effects across factors. In these figures, the means at each level of a factor are plotted and connected with a line. Center points and factorial points are represented by different symbols. A reference line at the grand mean of the response data is also shown. All designs predict similar behavior for KH, though they differ in their estimation of the main effects of other factors. It is instructive to look at the P-values of individual factors from Table 5. The factors whose estimates by different designs differ also have high Pvalues, or in other words, low significance. For example, AQ has very high P-values and its estimation differs in case of every design. On the other hand, KH or GR have low Pvalues in all designs and their estimates are very similar. It can be thus concluded that the important main effects are captured well by all designs, even though the full factorial case is more accurate. The significance of the effects is also consistent with the information in the Pareto chart of Fig. 1.

Figure 2. Main Effects for the Full Factorial Case

As KH increases, so does recovery. This is to be expected since a high horizontal permeability causes low near-wellbore pressure drop (drawdown), thus reducing gas coning and aiding recovery. AQ is not very significant since the aquifer is very strong even at the lowest value of aquifer influx coefficient. From Fig. 2, it is observed that 25

increased specified gas rate increases oil recovery R. thus producing the gas at higher rate leads to faster blowdown of gas, thus more oil can be recovered. All designs show that smaller M leads to higher R.

Figure 3. Main Effects for Box-Behnken Design

4.3.2 Interaction Effects

Fig. 6 shows the plots of two-factor interaction as obtained from the full factorial case. An interactions plot is a plot of means for each level of a factor with the level of a second factor held constant. These plots are used to compare the relative strength of the effects across factors. However, the interpretation, as also for the main effects, is meaningful only if the interaction effects are statistically significant.


Figure 4. Main Effects for D-optimal Design with 49 Runs

Figure 5. Main Effects for D-optimal Design with 100 Runs

Most significant interaction effects are those of KH with M, GR, WI, and LH as seen from Table 5. The contours of the response surfaces for these pairs of factors are shown in Figs. 7, 8, 9, and 10 respectively. The other factors are held constant at their base values in these figures. Higher values of R occur at high permeability (KH) and low M. Since the gas cap is small it does not contribute significantly to the recovery compared to the aquifer, and at higher permeability producing the gas helps to recover more oil. 27

Figure 6. Interaction Effects for Full Factorial Design


Table 5 -- Coefficients and P-Values for Regression Terms

D-optimal(49 D-optimal(100 Full Factorial Runs) Runs) Term Coefficie P- Coefficien P-value Coefficient P-value Coefficie P-value nt value t nt Consta 34.84 0.548 13.84 0.723 47.23 0.102 52.84 0 nt KH 60.08 0 39.68 0 43.34 0 47.65 0 AQ 0 1 -2.79 0.427 3.49 0.182 0.58 0.559 GR 12.25 0.082 6.86 0.065 8.56 0.002 10.35 0 WI -7.61 0.181 -8.56 0.02 -9.21 0.001 -9.09 0 M -15.01 0.048 -22.63 0 -25.36 0 -20.31 0 LH 4.53 0.512 1.34 0.704 5.95 0.024 5.11 0 KH*KH 16.05 0.169 23.05 0.026 20.42 0.006 20.31 0 AQ*AQ 4.14 0.716 -0.9 0.929 -0.19 0.978 0.16 0.921 GR*GR -6.36 0.624 -6.42 0.568 -1.5 0.85 -7.69 0 WI*WI 16.92 0.715 13.3 0.747 -8.07 0.779 -5.46 0.427 M*M -8.22 0.472 12.11 0.229 -6.6 0.345 -6.96 0 LH*LH -9.02 0.431 -11.02 0.273 -5.11 0.47 -8.76 0 KH*AQ 0 1 -1.97 0.6 3.93 0.157 0.58 0.615 KH*GR 32.37 0.001 7.02 0.071 9.69 0.001 14.4 0 KH*WI -5.06 0.394 -8.34 0.032 -10.43 0 -10.27 0 KH*M -29.94 0.004 -25.12 0 -29.34 0 -29.21 0 KH*LH 5.95 0.521 1.28 0.733 5.44 0.052 5.68 0 AQ*GR 0 1 1.04 0.778 2.25 0.415 0.56 0.619 AQ*WI 0 1 -2.53 0.482 1.88 0.477 0.59 0.571 AQ*M 0 1 -1.94 0.609 -1.88 0.496 -0.53 0.648 AQ*LH 0 1 7.2 0.064 2.89 0.298 0.53 0.648 GR*WI -4.9 0.509 -3.05 0.393 -0.64 0.808 -2.86 0.005 GR*M 5.12 0.568 0.06 0.988 -1.55 0.575 0.97 0.394 GR*LH 3.86 0.549 0.5 0.894 5.38 0.053 2.73 0.016 WI*M -10.42 0.178 2.1 0.56 -0.93 0.724 -0.61 0.558 WI*LH -0.29 0.969 -0.99 0.782 2.7 0.312 -0.12 0.908 M*LH 0.56 0.951 -4.39 0.245 -4.82 0.085 -2.4 0.038


Figure 7. Contour Plot of Response Surface for Horizontal Permeability and Relative Gas Cap Size

Figure 8. Contour Plot of Response Surface for Horizontal Permeability and Gas Rate

It can be seen from Figs. 7-10 that higher R-values occur at higher horizontal permeability consistent with the main effect of horizontal permeability. Similarly, the tendency of the other factors is to affect the recovery as manifest in their main effects. 30

For example, Fig. 8 shows that higher GR and higher horizontal permeability lead to higher recovery, as is to be expected from their main effects (discussed in Section 4.3.1). Lower WI (Fig. 12) indicates bigger water zone size compared to the oil leg. As more water is available to provide pressure support especially in the face of depleting gas zone, more oil recovery is obtained. A centrally located well provides more recovery at higher horizontal permeabilities as seen in Fig. 10. Since this is a homogeneous reservoir without dip, the result is reasonable. Interaction effects of GR with WI and LH are also significant. It is apparent that the effect of specified gas rate on recovery is highly dependent on the location of the well (Fig. 11), as well as the size of water zone relative to that of the oil zone (Fig. 12). High gas rate is more beneficial in case of large water zone, since a larger water zone is able to provide more pressure support to aid recovery. Also, producing the well at higher gas rate yields more recovery in case of a centrally located well. From Fig. 13 it is indicated that with a smaller gas cap size, a centrally located well can accord better recovery.

Figure 9. Contour Plot of Response Surface for horizontal permeability and Relative Water Zone Size

4.3.3 Nature of Residuals

Residuals are assumed to be normally distributed and independent. We need to verify that this indeed is the case for the regression to be considered useful. In Figs. 14-16, the residuals in the plots are standardized by dividing the residual by its variance. Standardization eliminates the effect of location of data points in the predictor space (Myers and Montgomery, 1995). 31

In Fig. 14, the normal probability plot of residuals for the full factorial case is almost straight showing that it is close to a normal distribution in the central portion of the data. Hence, the transformation of variables as a remedial measure is not necessary (Myers and Montgomery, 1995).

Figure 10. Contour Plot of Response Surface for Horizontal Permeability and Horizontal Location of Well

Figure 11. Contour Plot of Response Surface for Gas Rate and Relative Water Zone Size


Figure 12. Contour Plot of Response Surface for Gas Rate and Horizontal Location of Well

Figure 13. Contour Plot of Response Surface for Relative Gas Cap Size and Horizontal Location of Well


Figure 14. Normal Probability Plot of the Residuals (Full Factorial Design)

Figure 15. Standardized Residuals versus the Fitted Values (Full Factorial Design)

Fig. 15 contains the plot of residuals versus the fitted value for the full factorial case. There is a clear indication of a systematic error. Since the residuals are not scattered randomly, the variance of the original observation is not constant for all values of the response (Myers and Montgomery, 1995). Presence of outliers can also be seen which indicates that the fitted model is not a very good approximation to the true response 34

surface in some regions of regressor space. This indicates the need to employ higher order polynomials or explore alternative regression tools like neural networks. The training data for the neural network may come from runs designed using experimental design. Inferring the cause behind the artifact errors observed could be part of future work. Fig. 16 shows the residuals versus the observation order for the full factorial case. The observations were not arranged at random the pattern of arrangement of values of the factors is reflected in the plot. It can be concluded that some factors have strong influence on the response. In fact, a closer inspection of the data indicated that large excursions of residuals from zero occur when the horizontal permeability is at the high level. Given a high horizontal permeability, the excursions are positive when the specified gas rate for the well is at the high level and are negative when it is at the low level. It may be suggested that factors with insignificant main effect be removed and the factors with large influence on the response possibly transformed.

Figure 16. Standardized Residuals versus Observation Order (Full Factorial Design) 4.4 Conclusions and Future Work

It was found that experimental designs are very useful, not only to reduce the number of runs required in a simulation study, but also to maximize information gained from the study. The important factors can be ascertained and ranked. Their main effects and interactions can be estimated accurately. The designs compared provided consistent results and comparison with the exhaustive case confirmed their usefulness. In the specific case of gas cap blowdown, the process was well captured by the designs employed. High gas rate in presence of high horizontal permeability, a small gas cap, 35

large water support, and central location of the well were found to be the conditions for high recovery. The regression obtained is strictly valid for this case in the ranges of values of the factors chosen, but its application may be tested in more general cases. The relative ranking of the factors and their interactions is intuitively satisfying and lends weight to future applications of this tool for gas cap blowdown studies. In this study only one response was considered, but the same methodology can be easily extended to include more responses, thus providing an even more useful approximation to the flow simulator. Choosing an optimum time for gas cap blowdown is a challenging task. A future study can address this issue. During gas cap blowdown, the oil rises into the original gas cap zone as gas is produced from the reservoir and hence the hysteresis of relative permeability curve becomes important. The impact of this hysteresis is an important aspect of this problem. An immediate application could be choosing appropriate probability distributions for the factors used in this study and use the regressions obtained to perform uncertainty analysis on the response, using Monte Carlo Simulation (Wang, 2001). The important aspect of this approach is choice of physically meaningful distributions for the factors in order to achieve practical applicability. The regression obtained through different designs in this study can be used for an optimization study. Validity of the regression should also be checked on a real field case.


AQ = regression factor corresponding to aquifer activity coefficient

b = unbiased estimate of
B = volumetric factor, rb/stb C = covariance matrix E = expected value f = true response function F0 = F-test statistic for the null hypothesis F = F-test statistic for a given -value G = gas in place (standard conditions), scf GR = regression factor corresponding to gas rate specified for the well H0 = null hypothesis H1 = alternative to the null hypothesis KH = regression factor corresponding to natural logarithm of horizontal permeability LH = regression factor corresponding to horizontal location of the well m = initial relative reservoir gas cap volume (compared to reservoir oil volume), ratio M = regression factor corresponding to m MS = mean square value n = number of simulation runs ( or experiments) N = oil in place (standard conditions), stb p = number of (independent) parameters or factors


P = P-value R = ratio of cumulative oil production in 8 years to initial oil pore-volume, 10-4 sm3/rm3 R2 = coefficient of multiple determination R2adj = R2 adjusted for number of terms in the model
S = reservoir saturation by fluid (oil , gas or water), fraction S = average saturation, fraction

SS = Sum of squares t0 = t-test statistic for the null hypothesis V = volume W = water (or aquifer) volume WI = regression factor corresponding to oil volume compared to water volume, ratio X = regression matrix, or, model matrix y = response (of the process)

= cut-off value for significance testing
= vector of size equal to the number of runs, containing the coefficients of the model

= error
= natural variables in natural units of measurement = expected value of response 2 = variance of error
= estimate of variance of error

= porosity

ab = pore space of abandonment oil rim

E = Due to error (SS or MS) i = index
g = gas gi = gas initial , scf

j = index k = index
o = oil oi = oil, initial, stb or = oil, residual, fraction org = oil residual after displacement by gas, fraction orw = oil , residual after displacement by water, fraction o,g = oil in gas cap o,ab = oil at abandonment w = water max = maximum value min = minimum value T = Total

-1 = inverse of a matrix ` = transpose of a matrix 39

Azancot, M., Ward, R., Gibson, S., The Role of Subsea Systems as an Integral Part of Occidental's North Sea Development Strategy, paper 17622, SPE International Meeting on Petroleum Engineering, Tianjin, China, November 1-4, 1988. Behrenbruch, P. and Mason, L.T., paper SPE 25353, Optimal Oilfield Development of Fields With a Small Gas Cap and Strong Aquifer, SPE Asia-Pacific Oil and Gas Conference & Exhibition, Singapore, February 1993 Behrenbruch, P., BHP Petroleum, 2000, Personal Communication Box, G.E.P. and Behnken, D.W.,: Some New Three Level Designs for the Study of Quantitative Variables, Technometrics, November 1960. Box, G.E.P., Hunter, W.G., and Hunter, J.S., Statistics for Experimenters, John Wiley & Sons, 1978. Box, G.E.P. and Draper, N.R., Empirical Model-Building and Response Surface, John Wiley & Sons, 1987. Chu, C., Optimal Choice of Completion Intervals for Injectors and Producers in Steamfloods, paper SPE 25787, International Thermal Operations Symposium, Bakersfield, California, February, 1993. Cook, R.D. and Nachtsheim, C.J., A Comparison of Algorithms for Constructing Exact D-Optimal Designs, Technometrics, Vol. 22, No. 3, August 1980. Damlseth, E., Hage, A., and Volden, R., Maximum Information at Minimum Cost: A North Sea Development Study With Experimental Design, JPT, December 1992. Dejean, J.P. and Blanc, G., Managing Uncertainties on Production Predictions Using Integrated Statistical Methods, paper SPE 56696, SPE Annual Technology Conference and Exhibition, Houston, Texas, October 1999. Friedmann, F., Chawathe, A., and Larue, D., Assessing Uncertainty in Channelized Reservoirs Using Experimental Designs, paper SPE 71622, SPE Annual Technical Conference and Exhibition, Louisiana, New Orleans, September October 2001. Galil, Z., and Kiefer, J., Time- And Space-Saving Computer Methods, Related to Mitchell's DETMAX, for Finding D-Optimum Designs, Technometrics, Vol. 22, No. 3, August 1980. Kuppe, F.C., Chugh, S., and Kyles, J.D., Gas Cap Blowdown of the Virginia Hills Belloy Reservoir, presented at 49th Annual Technical Meeting of the Petroleum Society, Clagary, Alberta, Canada, June 8-10, 1996. 40

Lee, B., Samarang K5/7 Reservoir Simulation Study, paper SPE 25351, SPE Asia Pacific Oil and Gas Conference and Exhibition, Singapore, 8-10 February 1993. Myers, R.H. and Montgomery, D.C., Response Surface Methodology, John Wiley & Sons, 1995. Manceau, E., Mezghani, M., Zabalza-Mezghani, I., and Roggero, F., Combination of Experimental Design and Joint Modeling Methods for Quantifying the Risk Associated With Deterministic and Stochastic Uncertainties An Integrated Test Study, SPE 71620, SPE Annual Technical Conference and Exhibition, New Orleans, Louisiana, September-October 2001. Montgomery, D. C., and Peck, E. A., Introduction to Linear Regression Analysis. John Wiley and Sons, 1982. Narayanan, K., White, C.D., Lake, L.W., and Willis, B.J., Response Surface Methods for Upscaling Heterogeneous Geologic Models, paper SPE 51923, Reservoir Simulation Symposium, Houston, Texas, February 1999. Narayanan, K., Applications for Response Surfaces in Reservoir Engineering, MS Thesis, University of Texas at Austin, 1999. Plackett, R.L. and Burman, J.P., The Design of Optimum Multifactorial Experiments, Biometrika, Vol. 33, Issue 4, June 1946. Poland, J., Mitterer, A., Knodler, K., and Zell, A., Genetic Algorithms Can Improve the Construction of D-Optimal Experimental Designs, N. Mastorakis (Ed.), Advances In Fuzzy Systems and Evolutionary Computation (Proceedings of WSES EC 2001), pp. 227-231. Rivera, N., Kumar, A., Kumar, A., and Jalali, Y., Application of Multilateral Wells in Solution Gas-Drive Reservoirs, paper SPE 74377, SPE International Petroleum Conference and Exhibition, Villahermosa, February 2002. Starzer, M.R., Tenzer, J.R., Larson, J.W., Bunch, B.C., Boehm, M.C., Blowdown Optimization for the East Coalinga Extension Field, Coalinga Nose Unit, Fresno County, California, paper 29667, Western Regional Meeting, Bakersfield, California, March 1995. Vo, D.T., Marsh, E.L., Sienkiewicz, L.J., and Mueller, M.D., Gulf of Mexico Horizontal Well Improves Attic Oil Recovery in Active Water Drive Reservoir, paper SPE 35437, SPE/DOE Tenth Symposium on Improved Oil Recovery, Tulsa, Oklahoma, April 1996. Wang, F., Designed Simulation for Turbidite Reservoirs Using The Bell Canyon 3D Data Set, MS Thesis, University of Texas at Austin, 2001. White, C.D., Willis, B.J., Narayanan, K., and Dutton, S.P., Identifying Controls on Reservoir Behavior Using Designed Simulations, paper SPE 62971, SPE Annual Technical Conference and Exhibition, Dallas, Texas, October 2000.


Appendix A

Generating a Half-Fraction Design

Generation of a half-fraction design is considered in order to get a feel for how fractional designs are generated in general, and also to have a better understanding of the reasons that make them economical as well as informative. The description in this section is based on Box et al. (1978). The number of runs required by a full 2k factorial design increases geometrically as k is increased. It is observed that when k is not small, the desired information can often be obtained by performing only a fraction of the full factorial design. In other words, there tends to be a redundancy in the in terms of an excess number of interactions that can be estimated and sometimes in an excess number of variables that can be studied. Fractional factorial designs exploit this redundancy. An example shown below illustrates what effects can be estimated using only a half-fraction of a 25 factorial design, thus reducing the task to performing only 24 experiments. Let there be five variables denoted by 1 through 5, each set at a high and a low value denoted by + and -. The procedure for the half-fraction design is as follows: 1. The full 24 fractional design is written for 1,2,3, and 4. 2. The column of signs for the 1234 interaction is written, and these are used to define the levels of variables 5. This is denoted as 5=1234, and is said to be the generator for the design. The interactions are obtained by multiplying the columns under each variable. Hence 12 interaction is obtained by multiplying the columns under 1 and 2. Similarly, for higher factor interactions. Figure 17 presents the analysis of generation of the half-fraction design, 25-1 from the full 24 design.


1 + + + + + + + +

2 + + + + + + + +

Variables 3 + + + + + + + +

4 + + + + + + + +

5=1234 + + + + + + + +

Figure 17. Analysis of a Half-Fraction of the full 24 design; a 25-1 Fractional Factorial Design

To estimate the effect of two-way interaction 12, the signs of 1 and 2 are multiplied to obtain the signs for 12. When multiplying, + is treated as +1 and - as -1. This column is multiplied to the column containing the measurements for the corresponding runs, and the sum of the values in the resulting column when divided by 8 gives the estimate of the 12 interaction effect. (The divisor is 8 because the 12 interaction effect is the difference between two averages of eight results.) Of course, this estimate is limited by the approximation inherent in this design, namely neglecting three-factor and higher order interactions. With this reduced fraction design, 16 runs can be performed and 16 quantities estimated-the mean, the 5 main effects, and the 10 two-way interactions. But since the full fraction design calls for 32 runs, the remaining 16 effects, constituting of 10 three-way interactions, the 5 four factor interactions, and the 1 five-factor interactions, are not estimated. An attempt to measure the value of 123 interaction by multiplying the signs in columns 1,2, and 3 provides (written row-wise to save space)
123= -++-+--+-++-+--+

This is found to be identical to the 45 interaction. Hence,

45= -++-+--+-++-+--+

Thus, 123=45. This means that the three-way interaction 123 is indistinguishable from the two-way interaction 45 in this design, or in other words, these two effects are confounded. Equivalently, 123 and 45 are said to be aliases. In fact, on inspection it may be found that this design lumps together two-factor interactions with three-factor 43

interactions but does not confound main effect with two-way interactions. The methods to arrive at the confounding pattern in the design other than the tedious method of inspection are described in detail in standard texts (Box, et al., 1978). The half-fraction design is the simplest fractional design but the same basic principles hold for construction of other fractional designs, e.g. 2k-p or 3k-p.


Appendix B

Algorithms for Constructing D-Optimal Designs

Most algorithms for generating D-optimal designs are heuristics that sequentially generate a good set of design points from a set of candidate points, the final set representing the optimal design. The set of candidate points consists of all the factor level combinations that may potentially be included in the design. It is not necessary that the set of candidate points comprise of the full factorial design; for larger problems, a fractional-factorial design may be a good set of candidate points. The algorithm for generating an optimal design searches the candidate points for a set of N design points that is optimal according to a given efficiency criterion. N has to be specified by the user, but it must be greater than the number of parameters (Kuhfeld, 1997). Dykstra (1971) presented a sequential search algorithm that starts with an empty design and adds candidate points so as to maximize the chosen efficiency criterion at each step. Since it starts with an empty set, it always finds the same design for a given problem. Moreover, even though it is fast, it is not very reliable in finding a globally optimal design (Kuhfeld, 1997). Cook and Nachstein (1980) compared the relative performance characteristics of several algorithms. The Wynn-Mitchell algorithm (Mitchell and Miller, 1970 and Wynn, 1972) and the Van Schalkwyn algorithm (Van Schalkwyn, 1971) were found to be the fastest methods to generate designs of acceptable efficiency, whereas the Fedorov algorithm (Federov, 1969, 1972) produced the most efficient designs. Mitchell (1974) generalized the Wynn-Mitchell procedure to the DETMAX algorithm. Cook and Nachstein (1980) also proposed a modified Fedorov algorithm that was found to be almost as efficient as the Fedorov algorithm but nearly twice as fast. If a candidate xi is defined by a point ui in the input space, such that i = 1, 2,,n, the n candidate points are defined by the d-dimensional Euclidean space and a regression model. For a choice of p < n candidates, let the set of candidate points be = ( j1 ,..., j p ) {1...n} p , we write | |= p and define the design matrix X = ( x j1 ... x j p ) ' ,

thus denoting the matrix composed of the chosen candidates, namely j1, j2, ,jp.. The braces define the input space. It may be noted that the notation is after Poland et al. (2001).


The Federov algorithm starts with an N-point nonsingular design, say, (1) . During the ith iteration, a point x(i) is removed from X ( i 1) and a point x is added to it, so that the resulting increase in the determinant of the design matrix X ( i ) is maximal (Cook and Naschtein, 1980). This method requires that all candidate point and design point pairs be checked before making the swap. This makes it slower than DETMAX. The DETMAX algorithm starts with a random design of size p (the actual number of candidates). The value of p changes during the process as additions and deletions to the design are made depending on whether they improve the design (based on the criterion to be maximized). The algorithm stops after there is no further improvement. In a recent development, genetic algorithms were found to improve the construction of Doptimal experimental designs (Poland et al., 2001). They can provide a way to obtain good practical designs that are good with respect to more than one optimality criterion. The commercial statistical softwares usually implement more than one algorithm. The user can specify one algorithm to initialize the design, that can be further improved by applying another method.


Appendix C

Example ECLIPSE Data File

RUNSPEC TITLE skua (regular grids, non-dipping ) DIMENS -- dimensions of the model -- NX NY NZ 49 9 27 / -- specify the phases present OIL GAS WATER DISGAS METRIC -- unit specification

START -- starting date for simulation run 1 'JAN' 2001 / -- some other sizes and dimensions necessary for memory allocation: EQLDIMS -- equilibration table size (not needed yet) 1 100 10 1 20 / TABDIMS -- size of saturation and pvt tables 1 1 40 40 / WELLDIMS -- max numb of WELLS/CONN per WELL/GROUPS/WELLperGROUP 2 5 1 2/ NSTACK -- usually 10 25 / AQUDIMS --mxnaqn mxnaqc niftbl nriftb nanaqu ncamax 0 0 1 36 1 441 / UNIFOUT GRID DXV 49*55.56 / DYV 9*97.96 / 47

DZ 11907*2.8 / PERMX 11907*50 / PERMY 11907*50 / PERMZ 11907*50 / MULTIPLY PERMZ 0.01 / / PORO 11907*0.21 / BOX 1 49 1 9 1 1 TOPS 441*2258.5 / ENDBOX -- request init and grid file, necessary for post processing of the simulation --INIT --GRIDFILE -- 2 / NOGGF


== pvt and relperm tables =============================

--pvt data GRAVITY 42.0 1.03 0.72 / -- SK.PROPS11 data: -- PVT Properties from Sk-3 PVT Report -- From Table 16 Diff Vapn at 205 degF Field Adjusted -- Converted to Metric 48

-- Live Oil Properties (With Dissolved Gas) PVTO 28.60 39.97 1.114 0.607 / 53.01 76.86 1.176 0.500 / 80.40 116.36 1.244 0.421 / 108.40 154.22 1.315 0.370 / 138.93 190.90 1.390 0.332 / 173.48 228.89 1.48 0.300 231.78 1.479 0.301 245.09 1.474 0.303 / / -- Dead Gas Properties (No Vapourized Oil) PVDG 76.86 116.36 154.22 190.90 228.89 245.09 /

0.01513 0.00977 0.00730 0.00590 0.00496 0.00465

0.01420 0.01560 0.01710 0.01865 0.02056 0.02150

-- Water PVT Properties (from HP Fluids PAC) -- Assumptions : 9 scf/bbl gas, salinity (NaCl) 90,000 ppm -- Pressure FVF Compress Viscosity Viscosibility PVTW 227.586 1.04 43.5E-06


0.0 /

-- Rock Properties (from HP Fluids PAC) -- Pressure ROCK 227.586 Compressibility

50.75E-06 /

--satn function tables SWFN ---- Sw krw Pcw 49

-0.15 0.00 0.0 0.80 0.21 0.0 1.0 1.0 0.0 / SOF3 -- So -0.00 0.20 0.85 krow krogcw 0.00 0.00 0.00 0.00 0.85 0.85 /

------------------------------------------SGFN -- Sg -0.0 0.15 0.85 krg Pcg 0.0 0.0 0.0 0.0 0.90 0.0 /

SOLUTION EQUIL -- DATA FOR INITIALISING FLUIDS TO POTENTIAL EQUILIBRIUM -- DATUM DATUM OWC OWC GOC GOC RSVD RVVD SOLN -- DEPTH PRESS DEPTH PCOW DEPTH PCOG TABLE TABLE METH 2286.5 228.6 2314.5 0 2278.1 0 1 0 0 / RSVD -- variation of initial rs with depth -- depth rs 2286.5 178.1 2500 178.1 / --aquifer; carter-tracy non-dip, irregular skua AQUCT --No depth prs perm poro compress. Rin thick angle PVT infTab salt 1 2286.5 1* 600. 0.11 4.E-5 3.E3 340. 1* 1 1 / AQUANCON --aquifer no I1 I2 J1 J2 K1 K2 ,connecting face ,influx coeff, influx coeff multiplier 1 1 49 1 9 27 27 'K+' 1 / 50



--RPTRST --'BASIC=4' --/ WELSPECS == WELL ================================ -- WELL GROUP LOCATION BHP PI -- NAME NAME I J DEPTH DEFN 'PROD1' 'DUMMY' 13 5 1* 'GAS' 2* 'STOP' / /




7 7 'OPEN' 0 1* 0.216 3* / /


-- timesteps can be refined by entering multiple TSTEP keywords TSTEP -- and run it for 8 yrs 8*365. / END END END