Quantitative Genetic Epidemiology

Estimating Equations

Version 1.0

Quantitative Genetic Epidemiology Group
Fred Hutchinson Cancer Research Center

Contents

CURRENT VERSION
  1. INTRODUCTION
  2. OVERVIEW OF EE
  3. BACKGROUND
  4. HOW TO USE EE
    1. Preparing the input data
    2. Mean,Variance, and Correlation link functions
      1. Mean link functions
      2. Variance link functions
      3. Correlation link functions>
    3. Specifying options in control files
      1. Required arguments
      2. Optional arguments
    4. Interpreting the output
  5. EXAMPLES
    1. Binary outcome
      1. A univariate example
      2. A multivariate example
    2. Poisson outcome
    3. Continuous outcome
  6. USING EE METHODS IN SPLUS
    1. Introduction
    2. Installation and use of the EE function in Splus
  7. TECHNICAL INFORMATION
    1. Program details
    2. Methodological development of estimating equation
    3. References for EE
  8. LITERATURE

CURRENT VERSION

The current version of EE is 1.0 dated Oct. 25, 1994. This is the first version of the program.

1. INTRODUCTION

EE, standing for estimating equations, consists of several programs that are useful for regression analysis of univariate or multivariate responses with multiple covariates. Regression analysis is one of the most commonly used statistical techniques because a vast number of different applications can be structured and analyzed as a regression problem, such as simple comparisons of means between two groups by t-test or between multiple groups by analysis of variance, somewhat more complex problems arising from longitudinal studies, and clustered sampling survey studies or family studies. This generality is the essential motivation for developing EE as a general package of regression analysis.

Traditionally, regression techniques have also been categorized by the specific link functions that depict the relationship between outcomes and covariates. For example, linear regression assumes that a linear combination of covariates is linearly associated with or determines the average of the outcomes. Logistic regression assumes that the average of the outcomes is the logit of a linear combination of covariates, and is often used for analyzing binary outcomes.

While many regression techniques, and hence relevant computer packages, have been developed for regression analysis, EE has a number of unique features. First of all, EE is developed based on an estimating equation technique, which differs from the method of maximum likelihood, or quasi-likelihood. The method of estimating equations requires only assumptions on relevant moments rather than knowledge of the entire distribution required for constructing likelihood functions or unneccessary assumptions of moments in constructing quasilikelihood. Hence, the estimates obtained by this technique are valid under much more general condition than those by the other techniques, i.e., they are more robust. Secondly, EE permits simultaneous regression analysis of means, variances, and covariances on covariates, the generality of which sets it apart from two other programs, GEE and GEE2, currently available.

Historically, the idea of robust regression may be traced back to Godambe (1960), Huber (1967) and White (1982). However, it was not until 1986 when Liang and Zeger (1986) and Zeger and Liang (1986) re-invented the estimating equation approach for longitudinal studies that the concept of estimating equations became widely accepted and the technique frequently used with implemented GEE programs. Extending Liang and Zeger’s idea, Prentice (1988) proposed the joint estimation of parameters in both means and covariances. Soon after, Prentice and his colleagues identified a partly exponential family and a quadratic exponential family under which the proposed estimating equations are simply score estimating equations, bridging the gap between the likelihood principle and the method of estimating equations (Zhao and Prentice 1990, Prentice and Zhao 1992, Zhao and Prentice 1992, Zhao, Prentice and Self 1993). While bridging the theoretical gap, Zhao and Prentice have extended the usual regression of the marginal means to the joint regression of means, variances and covariances on covariates. The implementation of EE is largely based on Prentice and Zhao’s generalization
Back to Top.

2. OVERVIEW OF EE

EE consists of several programs, all of which can be used to perform univariate and multivariate analysis. Specifically, when a single response is identified, EE automatically chooses the programs for univariate analysis. In performing the univariate analysis, one has the option of regressing only the mean of the response on covariates or both the mean and variance of the response on covariates, with a choice of a number of link functions, including linear, logistic and exponential. When a vector of multivariate responses is supplied, EE automatically selects programs for multivariate analysis. In the multivariate analysis, one has the option of focusing on the marginal means of the outcomes, or both the marginal means and covariances of the outcomes. In either cases, one has a choice of link functions for the marginal means, variances and covariances, as well as a choice of covariates to be included in their respective mean link functions. The flexible choice of covariates into these first two moments permits us to explore problems ranging from longitudinal studies, studies with repeated measurements, or studies with multiple outcomes.
Back to Top

3. BACKGROUND

In contrast to the method of maximum likelihood, the method of estimating equations requires only assumptions of relevant moments such as means, variances, and correlations. Let y¢ =(y1,...,ym) denote a vector of m outcomes, which may be a single outcome in the univariate analysis, or may be a vector of any length in a multivariate analysis. Also, for i=1,...,m, let xi¢ =(xi1,...,xip), denote a vector of p covariates associated with the expected value of yi, zi¢ =(zi1,...,ziq) a vector of q covariates associated with the variance of yi, and zzk¢ =(zzk1..,zzkr), k=1,...,m(m-1)/2, a vector of r covariates associated with correlations between yi and other outcomes yj. The vectors xi, zi , and zzk may consist of some of the same covariates. The objective of the regression analysis is to study the dependence of the outcomes, y, on these covariates. Most applications focus on the dependence of the averaged outcomes, i.e., their expectation, on the covariates, and this relationship may be quantified via

where a ¢ =(a 1,...,a p) is a vector of p regression coefficients, ¢ is the transpose operator, and g(× ) is a specified link function.

Similarly, the variance and correlation of outcomes can be expressed as functions of covariates as follows:

where f(.) and h(.) are specified link functions for variances and corrlelations and =(b1,...,bq) and =(g1,...,gr) are vectors of q and r regression coefficients, respectively.

Using this general framework, one can model the dependencies of the outcomes on covariates in term of the means, variances and covariances. In statistical modeling, one needs to rationalize link functions on the basis of the subject matter. For example, in choosing a link function for means, a logistic regression model is often considered for binary outcomes or binomial counts. For Poisson counts, an exponential regression model can be considered, while a linear regression model can be considered for continuous outcomes. Selection of covariates follows the usual principle.

For a specified model, the goal is to estimate the regression coefficients. Let yk =(yk,1,...,yk,nk), k=1,...,K, be a sample of K independant outcomes with associated covariates (xki, zki, zzki), i=1,...mk, and let (a , b , g ) denote the estimated regression coefficients. The estimates satisfy the equations

where the summation is over all K independent observations, Wk is a weight matrix, m k=E(yk),

sk=(sk,12, sk,1sk,2 , ... , sk,1sk,nk , sk,22,.., sk,nk2), sk,i=(yki-m ki), s k = E(sk), and Dk is a derivative matrix of (m k,s k) with respect to (a , b , g ). Solving this set of estimating equations, one obtains an estimate of the regression coefficients.

In general, there is no explicit solution. The iterative Newton-Raphson algorithm may be used to solve this set of estimating equations. Starting from an initial value, (a 0, b 0, g 0), a new value, (a 1, b 1, g 1), is obtained via

where u is the function described above, I is its derivative with respect to the regression

coefficients, and u0 and I0 are values of these functions evaluated at (a 0, b 0, g 0). The above algorithm iterates until either |u0| < e or |a1-a0| + |b1-b0| + |g1-g0| < e , where e is some prespecified tolerance.
Back to Top

4. HOW TO USE EE

  • To use the EE program, follow these steps:
    1. Prepare your input data. The EE program can accept ASCII or SAS xport data files in addition to pre-processed system files. ASCII and SAS xport files are copied and converted in a temporary system formatted file internally each time the EE program is run. The user can pre-process ASCII files using the asc2eed program as follows:
           
               asc2eed ASCII-file-name

      This will automatically generate a data file in the system file format , which is required by the ee program, in the current working directory. The data file name will be ASCII-file-name.eed.
    2. Create a control file containing the model specifications and input options you desire.
    3. Run the program with the command

             ee file.eec

      where file.eec is the name of the control file. A SAS macro is also available which facilitates the use of the ee program inside the SAS system. This is described in detail below.
  • Inputs to the program are provided via a control file, which specifies the name of the file containing the data in the system file format. Other input options include specification of the mean, variance and correlation links to be used, tolerance values for convergence criteria for mean, variance and correlation parameters, and initial values for the parameters to be estimated. The output of the program is written to a file.

    Specifications for the control file and input data are detailed in the sections below. See the next chapter for examples of how to perform specific regression analyses.
    Back to Top

    4.1 PREPARING THE INPUT DATA

    The input data suitable for analysis with EE includes a response (univariate or multivariate), a group identification variable if the response is multivariate, and covariates which possibly explain the variability of the response and the working variances. The response can be continuous or discrete. If any transformation of the response or covariates is required, it must be done beforehand and the resulting values included as part of the input data.

    Only one outcome variable can be used per analysis. The input data file can include more variables than will be used in any one model. The same data file can thus be used to explore different models, with appropriate specification of variables to be used in different runs of the program. This program cannot process missing observations. If there are any missing values, they must be removed from the data set before converting it to the system file format.

    The EE program can read data from a data file in the system file format, with the .eed suffix, as well as ASCII or SAS xport files. If desired, the practitioners can use asc2eed program to translate ASCII data to data in the system file format as described in above section. The ASCII data must be represented in the ASCII file with one observation per line and fields separated by spaces. An example of multivariate ASCII data file format is as follows:

    Figure 1.     Sample ASCII data format 
    
      groupid   outcome(response)  var1  var2 ....  varp (covariates)
    
        1          y11             x111  x112 ....  x11p
        1          y12             x121  x122 ....  x12p
        .           .                .           .     .
        .           .                .           .         
        .           .                .           .     .                                                 ......     
        1                        ....  2 
        2          y21             x211  x212  ....  x21p  
        2          y22             x221  x222  ...   x22p
        
        .           .                                     
        .           .                                    
        .           .                               
        i          yij             xij1  xij2  ...   xijp
    

    where i = 1, 2, ..., K and j = 1, 2, ..., mi

    Univariate ASCII data files are similar to multivariate ones, but there is no groupid variable in the file. Nor is there a group concept for response and covariate variables. Each row in file is just considered as an independent observation.
    Back to Top

    4.2 MEAN, VARIANCE AND CORRELATION LINK FUNCTIONS

    The mean, variance and correlation link functions together specify the model to be fit for the data. The mean link function specifies how the mean of the data depends on a linear combination of a set of predictor variables. The variance link function can specify the relationship between the working variances used in the estimating equations and a set of covariates. The correlation link function sets the functional form between the working correlation and a set of covariates.

    To simplify the presentation, mean and variance link functions are described separately for the univariate and multivariate situations.

    4.2.1 Mean link functions

    4.2.1.1 Univariate situation:

    Let m i signify the mean for each observation i=1,...,K, and (xi1,...,xip) be the set of covariates on which ui depends. The functions available for the mean link functions are:

    Linear link function: m i = a 0+a 1xi1 + ··· + a pxip

    Exponential link function: m i= exp(a 0+a 1xi1 + ··· + a pxip)

    Logistic link function: m i = 1/[1 + exp(a 0+a 1xi1 + ··· + a pxip)]

    Note that the intercept term is included in all three link functions. To fit models without an intercept term, there is an option which can be included in the command file which indicates that no intercept should be included in the linear predictor.
    Back to Top

    4.2.1.2 Multivariate situation:

    Let m ij, i=1,...,K and j=1,...,mi, where K is the group number in the data file and mi is the observations in ith group, signify the jth mean element in the ith group, and (xij1, ..., xijp) be the set of covariates on which m ij depends. The functions available for the mean link functions are:

    Linear link function: m ij = a j0+a j1xij1 + ··· + a jpxijp

    Exponential link function: m ij= exp(a j0+a j1xij1 + ··· + a jpxijp)

    Logistic link function: m ij = 1/[1 + exp(a j0+a j1xi1 + ··· + a jpxijp)]

    As in the univariate case, the intercept is included in the linear predictor. The option of fitting a model without an intercept term is available through a command which can be included in the control file. All or some of a jk for j=1,...,mi may be set equal.
    Back to Top

    4.2.2 Variance link functions

    4.2.2.1 Univariate situation:

    For each observation i=1,...,K, a variance s i is calculated and used in the estimation. The variances are calculated according to the specified variance function. Some variance functions allow for dependence on an additional specified set of covariates (zi1, 1/4 , ziq). The functions available for variance functions are:

    Constant variance function: s i = s 0

    Fixed variance function: s i = b 1zi1 + b 2zi2 + ··· + b qziq

    Binomial variance function: s

    where mi is the binomial denominator
    Poisson variance function: s i = m i

    Linear variance function: s i =l m i + b 0 + b 1zi1 + ··· +b qziq

    Exponential variance function: s i = m il exp(b 0 + b 1zi1 + ··· + b qziq)

    Logistic variance functions: s i =

    where
           Linear intra-correlation coefficient is g i = b 0 + b 1zi1 + ··· + b qziq

    Hypergeometric intra-correlation coefficient is

    When you specify fixed variance, you must supply the values to be used as the b coefficients in addition to the z covariates. In this special case, if you wish to include an intercept you must explicitly include an intercept variable of all ones in the list of covariates, along with the desired value, as no intercept will be assumed in the model unless specified. No parameters are estimated in this variance link function. Instead, the specified coefficients (indicated as the initial values in the control file) and covariates are used to calculate the values used as variance in the algorithm.
    Back to Top

    4.2.2.2 Multivariate situation:

    For ith group of observations i=1,...,K, a set of variances, s ij, j=1,...,mi, where mi is the number of observations in ith group, are calculated and used in the estimation. The variances are calculated according to the specified variance function. Some variance functions allow for dependence on an additional specified set of covariates (zij1, 1/4 , zijq). The functions available for variance functions are:

    Constant variance function: s ij = s 0

    Fixed variance function: s ij = b j1zij1 + b j2zij2 + ··· + b jqzijq

    Binomial variance function:sij  

    where mij is the binomial denominator
    Poisson variance function: s ij = m ij

    Linear variance function: s ij = l m ij + b j0 + b j1zij1 + ··· + b jqzijq

    Exponential variance function: s ij = m il exp(b j0 + b j1zij1 + ··· + b jqzijq)

     

    Logistic variance function:sij =

    where:

    Linear intra-correlation coefficient is g ij = b j0 + b j1zij1 + ··· + b jqzijq

    Hypergeometric intra-correlation coefficient is

    Note that like the mean link functions, the variance link functions that involve covariate terms all include an intercept term, except for fixed variance link function, where a column of ones must be included in the data if an intercept is desired.
    Back to Top

    4.2.3 Correlation link functions

    If a correlation link function has been specified, it should be in a multivariate situation. Let i=1,...,K signify the ith group of observations. For each pair of observations in this group, j,k=1,...,mi, j¹ k, a correlation s ijk is calculated and used in the weight matrix of the EE algorithm. The correlation are calculated according to the specified correlation function, which can allow for dependence on additional specified sets of covariates(zzij1,...,zzijr) and (zzik1,...,zzikr). The functions available for the correlation functions are as follows:

    Linear correlation function:

    s ijk = (d 0 + d 1|zzij1-zzik1| + ... + d r|zzijr-zzikr|)

    Exponential correlation function:

    s ijk = exp(d 0 + d 1|zzij1-zzik1| + ... + d r|zzijr-zzikr|)

    Hypergeometric correlation function:

    s ijk =

    Back to Top

    4.3 SPECIFYING OPTIONS IN A CONTROL FILE

    The EE program reads all input options from a control file. Inputs to the program include the name of the data file, variable names of covariates and response, and specifications of mean link function, variance link function, and correlation link function. Specify these and other input options in a control file using the format described below. Some arguments are required to run the program, while others need only be supplied if you wish to specify additional options. Figure 2 shows a sample control file. The control file must be a simple text file. If you use a word processor to create and edit your control file, be sure it is saved in ASCII format and not as a word processed document, as the EE program cannot interpret any additional formatting commands placed in a document by a word processing program.

    Figure 2. Sample control file

    Data filename					:  mvn2.eed		
    Output filename					:
    
    Identification variable of cluster		:  groupid
    Outcome variable				:  y
    Denominator variable				:
    
    Covariates in
       Mean	 					:  x1 x2 x3 x4 x5
       Variance					:  z1 z2 z3
       Correlation					:		
    
    Mean  link					:  LIN
    Variance link					:  EXP
    Correlation link				:  
    
    Initial values of parameters in		
          mean function				:
          variance function				:
          correlation function			:
     
    Fixed parameters about dependence of
          variances on means			:		
     
    Convergence criteria
          mean parameters				:  
          variance parameters			:  
          correlation parameters			:
    	
    Maximum number of iterations			: 
    Joint						:
    
    Intercept	
    Back to Top

    4.3.1 Required arguments

    Data filename

    Outcome variable

  • Name of the variable in the data file to be used as the outcome variable.
  • Identification variable of cluster

    Covariates in Mean

  • Name(s) of the variable(s) in the data file to be used as covariate(s) for estimating the mean. The intercept term is automatically included unless the optional arguement described below is specified.
  • Mean link

  • Specification of the mean link function. Users must specify one of the following abbreviations corresponding to the set of available link functions:

    LIN                   linear link
    EXP                  exponential link
    LOGIT              logistic link

  • Back to Top

    4.3.2 Optional arguments

    Output filename

  • Name of the output data file to be created by the program in the current directory. If not specified, output is written to a file with the same name as the input data file with a .eeo extension.
  • Identification variable of cluster

    Denominator variable

  • Name of the variable in the data file to be used as the denominator variable (cluster size) for logistic links. If not specified, default is one for all observations.
  • Variance covariates

  • Name(s) of the variable(s) in the data file to be used as covariate(s) for estimating the variance, for variance functions that include covariates. These can include variables also used in estimating the mean. The intercept term is fit automatically in applicable variance functions, with the exception of fixed variance. This option can be omitted if no additional covariates are desired in the variance function. Note that this argument is required if fixed variance is specified.
  • Correlation covariates

  • Name(s) of the variable(s) in the data file to be used as covariate(s) for estimating the correlation. These can include variables also used in estimating the mean and/or variance. The intercept term is fit automatically in the correlation functions. This option can be omitted if no additional covariates are desired in the correlation function.
  • Variance function

  • Specification of the working variance link function. Users must specify one of the following abbreviations corresponding to the set of available variance functions:
     
    	CONST		constant variance
    	FIXED		fixed variance
    	BIN		binomial variance
    	POIS		Poisson variance
    	LIN		linear variance function
    	EXP		exponential variance function
    	LOGITLIN	logistic variance function with linear intra-correlation coefficient
    	LOGITHG		logistic variance function with hypergeometric intra-correlation
    

    If variance function is not specified, then the default variance function, constant variance, will be used.

  • Correlation function

  • Specification of the correlation link function. Users may specify one of the following abbreviations corresponding to the set of available correlation functions. If the correlation link is not specified, hypergeometric correlation is used.
               LIN                linear correlation
               EXP               exponential correlation
               HYPER          hypergeometric correlation
  • Initial values

          Starting values for the parameters to be estimated in the mean link, variance link, and correlation link functions. If not specified, the defaults are zeros.

         In the univariate situation, this argument is also used to specify the coefficients for terms in fixed variance. This argument is required if the fixed variance link function is specified.

    Dependence of variances on means

         Value for parameter lambda in the linear and exponential variance functions, if a contribution of mean is desired for the working variances. Default is 0.

    Convergence criteria

         Tolerance value for convergence of the parameters in the mean link, variance link, and correlation link. Default values are 0.0001 for mean parameters and 0.001 for variance and correlation      parameters.

    Maximum number of iterations

  • Maximum number of iterations to be performed in the algorithm for updating the parameter estimates. Default value is 20.
  • Joint

  • Specification the working structure of the working matrix to be used for regression. Users may specify one of the following numbers corresponding to the set of available working matrices.

    0            plain univariate or multivariate working matrix
    1            independence working matrix
    2            Gaussian working matrix
    3            extended Gaussian working matrix

    Default value is 0.

  • Intercept

    Back to Top

    4.4 INTERPRETING THE OUTPUT

    The output of the program is written to a file. The output file includes information about the specification of the model, the parameter estimates with standard errors, and covariates for the estimates. Figure 3 shows a sample output file. In the tables for marginal means, marginal variances, and marginal correlations, both model-based and robust estimates of the standard errors of the parameter estimates are given. The SE is the model-based estimate, which is valid if the mean, variance and correlation are correctly specified. The robust SE is from the "sandwich" estimate for the asymptotic variance of the estimated parameters. The z-values are the ratios of the parameter estimates and their robust standard errors, and the p-values for these z-values are from the normal distribution. Under the null hypothesis, the EE parameter estimates follow an asymptotically normal distribution.

    Figure 3.     Sample output file
    
    Data				:  mvn2.eed
    
    Outcome variable		:  y
    Covariates in Mean		:  intercept	x1	x2	x3	x4	x5
    Covariates in Var		:  z1	z2	z3
    
    Mean link			:  Linear link function
    Variance link			:  Exponential variance function
    Correlation link		:  Hypergeometric correlation function
    
    Dependence of Variances on Means	:  0.000000
    epsilon1				:  0.001000
    epsilon2				:  0.010000
    Number of observations			:  500
    Number of groups			:  81
    Maximum group size			:  10
    Minimum group size			:  2
    Number of Fisher Scoring Iterations	:  6
    
    
    =====================================================================
    Marginal Means
    =====================================================================
    
    		Estimate	SE		robust 		SE		z-value		p-value
    intercept       -2.06193	0.10062		0.06410         -32.16744	0.00000
    x1	        -0.92815	0.07406		0.04827         -19.22953	0.00000
    x2		0.07829		0.07766		0.05674		1.37982		0.16764
    x3		1.00097		0.07247		0.04886		20.48720	0.00000
    x4		1.99449		0.06911		0.04271         46.69948	0.00000
    x5		2.93977		0.07052		0.04744		61.97361	0.00000 
    
    			
    Correlation of Coefficients 
    		
    		intercept	x1		x2		x3		x4
    x1		0.109781	
    x2	       -0.078542       -0.013310
    x3		0.096431	0.063342	0.073256
    x4	       -0.031132       -0.087433       -0.189933	0.031581
    x5		0.064436	0.090774	0.193858	0.032531       -0.241995 
    
    
    Covariance of Coefficients 
    		
    		Intercept	x1		x2		x3		x4		x5
    Intercept	0.004109
    x1		0.000340	0.002330
    x2	       -0.000286       -0.000036	0.003219
    x3		0.000302	0.000149	0.000203	0.002387
    x4	       -0.000085       -0.000180       -0.000460	0.000066	0.001824
    x5		0.000196	0.000208	0.000522	0.000075       -0.000490	0.002250
    
          Marginal Variance
    =====================================================================
    
    		Estimate	SE		robust 		SE		z-value		p-value
    intercept  	1.41014		0.01119		0.15355		9.18380		0.00000
    z1		0.13608		0.00655		0.11333		1.20071		0.22986
    z2		0.63862		0.00628		0.16495		3.87168		0.00011
    z3	        -0.46481	0.00574		0.13552         -3.42981	0.00060
    
    
    
    Correlation of Coefficients 
    		
    	intercept		z1		z2
    z1	-0.098094
    z2	-0.687131		0.267283
    z3	0.175155		0.077508	0.310760
    
    Covariance of Coefficients
    		
    		intercept	z1		z2		z3
    intercept	0.023577
    z1	       -0.001707	0.012844
    z2	       -0.017403	0.004997	0.027208
    z3		0.003645	0.001190	0.006947	0.018366
    
    =====================================================================      
    Marginal Correlation
    ===================================================================== 
    		Estimate	SE		robust  	SE		z-value		p-value
    Intercept	0.38540		0.00993		0.19635		1.96281		0.04967
    
    Back to Top

    5. EXAMPLES

    5.1 Binary outcome

    The most common statistical analysis performed on data with a binary outcome is logistic regression. It is also possible to perform a linear regression with binary data, for a different interpretation of the coefficients in the model. The EE approach using the logistic link function and binomial variance function is equivalent to standard logistic regression, while the linear link function together with the constant variance function gives standard linear regression.

    5.1.1 A univariate example

    First we use univariate estimating equations to illustrate logistic regression with the EE program. The data we used are from Meyer and White (1993) in a study on alcohol and nutrients in relation to colon cancer. The covariates include several dietary variables, and the outcome is a binary variable indicating whether or not a subject has colon cancer. A subset of the complete data is stored in an ASCII file called colon.dat whose five beginning and ending lines look like this:

    status sex alcohol fiber kcal fat

    	status   sex  	alcohol  	fiber  	   kcal          fat
    	1         0    10.93611    	41.38175    3108.84    178.109
    	1         1    29.46          	70.24778    4338.851   218.1827
    	1         0     4.131667    	28.56525    2802.585   126.0111
    	1         0     2.451667    	11.8891     1526.369    78.6589
    	1         0     0.130556    	46.08092    2085.224    94.13641
            .         .       .                  .         .         .
            .         .       .                  .         .         .
            .         .       .                  .         .         .
            0         0      0.333333    	21.88417    843.4617   33.04477
            0         0     25.805        	31.12233   2150.008    86.51898
            0         1     10.60667    	26.56195   2893.791   122.3937
            0         1     35.33167    	37.09152   2789.703    97.97728
            0         1     79.62083    	22.94439   2522.076    67.47237
    

    The variables in this file include a variable for the status of colon cancer coded as 1 or 0, sex coded 1 for male and 0 for female, daily alcohol intake in grams (alcohol), daily total dietary fiber intake in grams (fiber), total energy in kilocalories (kcal), and dietary fat in grams (fat). We use the command

          asc2eed colon.dat

    to create a binary data file named colon.eed containing this data in the EE system file format. To perform a logistic regression on this data set, we create a control file named colon.eec that contains these specifications:

    Data filename: 			colon.eed		
    Outcome variable:		status
    Mean covariates:		intercept sex kcal fat fiber alcohol
    Mean  link:			LOGIT
    Variance link: 			BIN
    

    Then the command

    ee colon.eec

    runs the EE program and produces an output file named colon.eeo shown below. The model-based and robust standard errors for the parameter estimates in the table for Marginal Means are very similar in this case because we use the binomial variance function which is specified by the mean. The model-based SEs are valid if the assumed model is correct, while the robust SEs do not require correct specification of the variance.

    According to this analysis, total dietary fiber intake is the only significant contributor to the status of colon cancer from this set of variables, with a p-value of about .03. The negative

    coefficient for this variable indicates that it has a protective effect against colon cancer

    
    Data			: colon.eed
    
    Outcome variable	: status
    Covariates in Mean	: intercept sex       kcal      fat       fiber     alcohol    
    
    
    Mean link		: Logistic mean link function
    Variance link		: Binomial variance link function
    
    Dependence of Variances on Means		: 0.000000
    epsilon1					: 0.000100
    epsilon2					: 0.001000
    Number of observations				: 838
    Number of Fisher Scoring Iterations		: 4
    
    
    =====================================================================
    Marginal Means
    =====================================================================
    		 Estimate     SE         robust        SE        z-value      p-value
    intercept 	-0.01795     0.20616     0.20459    -0.08773     0.93009  
    sex       	-0.06050     0.14816     0.15131    -0.39984     0.68927  
    kcal       	 0.00035     0.00033     0.00033     1.05891     0.28964  
    fat       	-0.00279     0.00581     0.00582    -0.47924     0.63177  
    fiber     	-0.01805     0.00814     0.00828    -2.17914     0.02932  
    alcohol    	 0.00067     0.00517     0.00546     0.12278     0.90228  
    
    Correlation of Coefficients 
    
    		 intercept      sex       kcal       fat        fiber             
    sex       	-0.186219 
    kcal      	-0.292919   -0.093585 
    fat        	 0.155901    0.027950   -0.929452 
    fiber     	-0.249781    0.080118   -0.464397   0.248175 
    alcohol    	 0.165314   -0.096348   -0.834207   0.754208   0.452118 
    
    Covariance of Coefficients 
    
               	intercept      sex         kcal        fat        fiber       alcohol           
    intercept  	 0.041856 
    sex       	-0.005764     0.022894 
    kcal      	-0.000020    -0.000005    0.000000 
    fat        	 0.000186     0.000025   -0.000002    0.000034 
    fiber     	-0.000423     0.000100   -0.000001    0.000012    0.000069 
    alcohol   	 0.000185    -0.000080   -0.000002    0.000024    0.000020    0.000030 
    Back to Top

    5.1.2 A multivariate example

    In this section, we use multivariate estimating equations to perform linear regression with the EE program. Here the data are from a large case-control study conducted in the multiethnic population of Hawaii to compare the colorectal cancer risk of relatives (parents and siblings) of colorectal cancer cases with that of relatives of controls. The outcome variable is colorectal cancer (y) and the group identification (id). The covariates are cc (case 1, control 0), cjap (Japanese case relative 1, other case relative 0), ccau (Caucasians case relative 1, other case relative 0), rage (relative age = (actual age - 50)/10) and page (proband age = (actual age - 50)/10). The results show that relatives of cases have 1.5=exp(0.386) fold increased risk of colorectal cancer compared to relatives of controls. The control file colorect6.eec created to perform a logistic regression on the data contains following specifications:

    Data filename				: colorect6.eed
    
    Identification variable of cluster	: id
    Outcome variable			: y
    Covariates in Mean			: intercept cc cjap ccau rage page
    
    Mean link				: LOGIT
    Variance link				: BIN
    Correlation link			: LIN
    
    The result output file is:
    
    Data			: colorect6.eed
    
    Outcome variable	: y
    Covariates in Mean	: intercept cc        cjap      ccau      rage      page
    
    Mean link		: Logistic mean  link function
    Variance link		: Binomial variance link function
    Correlation link 	: Linear correlation link function
    
    Dependence of Variances on Means	: 0.000000
    
    epsilon1				: 0.000100
    epsilon2				: 0.001000
    Number of observations			: 15523
    Number of groups			: 2377
    Maximum groups size			: 23
    Minimum groups size			: 1
    Number of Fisher Scoring Iterations	: 5
    
    =====================================================================
    Marginal Means
    =====================================================================
    
    		 Estimate      SE       robust          SE       z-value      p-value
    intercept	-4.49777      0.16362   0.15160      -29.66927	 0.00000
    cc		 0.38599      0.23601   0.23203       1.66354    0.09620
    cjap		 0.58089      0.22885   0.22399       2.59335    0.00950
    ccau		 0.70483      0.26262   0.26924       2.61783    0.00850
    rage		 0.23144      0.03896   0.03047       7.59598    0.00000
    page		-0.21644      0.06847   0.07100      -3.04846    0.00230
    
    Correlation of Coefficients
    
    		 intercept     cc           cjap        ccau        rage
    cc	         -0.376105
    cjap		  0.035243    -0.754507
    ccau	         -0.018145    -0.641089    0.669916
    rage	         -0.173857     0.060252   -0.099542   -0.028132
    page	         -0.492877     0.093073   -0.000321   -0.035539    -0.319607
    
    Covariance of Coefficients
    
    		intercept     cc         cjap        ccau        rage       page
    intercept       0.022982
    cc             -0.013229    0.053837
    cjap            0.001197   -0.039214    0.050172
    ccau	 	0.000741   -0.040050    0.040401    0.072491
    rage           -0.000803    0.000426   -0.000679   -0.000231    0.000928
    page	       -0.005305    0.001533   -0.000005   -0.000679   -0.000691    0.005041
    
    
    =====================================================================
    Marginal Correlation
    =====================================================================
    
    		Estimate     SE        robust  	SE    	    z-value      p-value
    intercept	0.04751      0.21422   0.01701  2.79275     0.00523 
    Back to Top

    5.2 Poisson outcome

    Count data that do not arise from a situation with a maximum count possible typically follow a Poisson distribution. To perform Poisson regression using EE, specify the exponential mean link function with Poisson variance link function. The exponential variance link function allows for a power family, with an additional parameter specifying the power of the contribution of the mean in this variance link function.

    The data in the following example are from Laishes and Rolfe (1980). The number of GT-positive liver colonies detected in the ARL lobe of the livers of rats injected with diseased cells is used as the outcome variable. The covariates are the number of cells injected into the rat liver (cells), the number of the particular experiment (1, 2, or 3) (exper), and the section for each observation (the experiment was replicated within each specimen in two different sections) (section). The results of an analysis using EE for Poisson regression are shown below. The section does not appear to have a significant effect on the outcome, while the number of GT-positive colonies increases with increasing number of injected cells.


    Data:      		liver.eed
    
    Outcome variable:     	ARL
    Covariates in Mean:   	cells     exper     section    
    
    
    Mean link:              Exponential link function
    Variance link:         	Poisson variance function
    Dependence of Variances on Means:      1.000000
    
    epsilon1 =                             0.000100
    epsilon2 =                             0.001000
    Number of observations:                52
    Number of Fisher Scoring Iterations:   4
    
    
    =====================================================================
    Marginal Means
    =====================================================================
    
               	Estimate    SE          robust       SE       z-value  p-value
    cells      	0.25819     0.00587     0.01599   16.14432    0.00000  
    exper      	0.43941     0.03110     0.06970    6.30395    0.00000  
    section    	0.27012     0.07249     0.19752    1.36759    0.17144  
    
    
    Correlation of Coefficients 
    
               	cells         exper             
    exper     	-0.626994 
    section   	-0.132547   -0.492546 
    
    
    Covariance of Coefficients 
    
               	cells         exper      section           
    cells      	 0.000256 
    exper     	-0.000699    0.004859 
    section   	-0.000419   -0.006781    0.039013 
    
    Back to Top
    5.3 Continuous outcome

    Linear regression is the most common model for data with a continuous outcome. The linear link function expresses the conditional expectation of the outcome on the covariates as a linear combination of the covariates. A linear mean link with constant variance corresponds to an ordinary least-squares regression.

     

    The data in the following example are from a recent AIDS clinical trial (Volberding et al, 1990). The CD4 cell count, a measure of the immunological status of subjects, is used as the outcome. The covariates are race (1=black, 0=otherwise), hispanic origin (1=hispanic origin, 0=otherwise), sex (1=male, 0=female), age (which is transformed as (age-30)/10), homosexuality (1=yes, 0=no) and parental use of drug (1=yes, 0=no). The results show that CD4 cell counts are greater among males, non-Hispanics, and IV drug users than among females, Hispanics, and non-IV drug users, respectively.

    ______________________________________________________________________________

    Data			: aids2.eed
    Outcome variable	: cd4
    Covariates in Mean	: intercept race sex  hisp age homosex bisex pdrug
    Covariates in Var	:           race sex  hisp age homosex bisex pdrug
    Covariates in Col	: age
    
    Mean link		: Linear mean link function
    Variance link		: Linear variance link function
    Correlation link	: Linear correlation link function
    Dependence of Variances on Means		: 0.000000
    epsilon1					: 0.000100
    epsilon2					: 0.001000
    Number of observations				: 767
    Number of groups				: 296
    Maximum group size				: 4
    Minimum group size				: 1
    Number of Fisher Scoring Iterations		: 37
    
    =====================================================================
    Marginal Means
    ===================================================================== 
    
    		Estimate    SE          robust      SE         	z-value   p-value
    intercept	5.88564     0.12397     0.13462    43.72112     0.00000
    race		0.17587     0.09054     0.09204     1.91086     0.05602
    sex	       -0.04589     0.14639     0.15351    -0.29893     0.76499
    hisp	       -0.01353     0.10511     0.10480    -0.12915     0.89724
    age	       -0.04280     0.03205     0.02991    -1.43084     0.15248
    homosex         0.00758     0.10245     0.10103     0.07501     0.94021
    bisex	       -0.04302     0.10407     0.09659    -0.44543     0.65601
    pdrug	       -0.08556     0.10587     0.13998    -0.61127     0.54102
    
    Correlation of Coefficients
    
    		intercept     race        sex        hisp        age        homosex      bisex
    race	       	-0.220549
    sex	        -0.455414    0.178294
    hisp	       	-0.509924   -0.126538   -0.272034
    age	        -0.200106    0.181740    0.061813   -0.071307
    homosex         -0.100311    0.079924   -0.650060    0.148885    0.102126
    bisex	        -0.040682    0.069350   -0.506308    0.047481    0.036053    0.708477
    pdrug	        -0.396592   -0.050452    0.412659   -0.111312    0.148884   -0.031537   -0.013277
    
    Covariance of Coefficients
    
    	       intercept    race      sex       hisp        age       homosex    bisex           pdrug
    intercept      0.018122
    race	      -0.002733   0.008471
    sex	      -0.009411   0.002519   0.023566
    hisp	      -0.007194  -0.001220  -0.004376  0.010982
    age	      -0.000806   0.000500   0.000284  -0.000224   0.000895
    homosex       -0.001364   0.000743  -0.010082  0.001576    0.000309   0.010207
    bisex	      -0.000529   0.000616  -0.007507  0.000481    0.000104   0.006913   0.009329
    pdrug	      -0.007473  -0.000650   0.008867  -0.001633   0.000623  -0.000446  -0.000180    0.019593
    
    
    =====================================================================
    Marginal Variance
    =====================================================================    
    
    		 Estimate    SE          robust        SE        z-value      p-value      
    intercept	 0.25322     0.05248     0.10852     2.33343     0.01963
    race		 0.00095     0.03626     0.06447     0.01470     0.98827
    sex		 0.08854     0.05933     0.10059     0.88018     0.37876
    hisp		-0.02680     0.04628     0.09169    -0.29225     0.77010
    age		-0.00194     0.01292     0.02342    -0.08297     0.93388
    homosex		-0.05531     0.03969     0.05493    -1.00700     0.31393
    bisex		-0.12025     0.03897     0.05281    -2.27689     0.02279
    pdrug		 0.01889     0.04730     0.10134     0.18640     0.85213
    
    Correlation of Coefficients
    
    		 intercept      race      sex         hisp         age        homosec     bisex
    race		-0.251102
    sex		-0.509835    0.383899
    hisp		-0.633539   -0.160563   -0.176536
    age		-0.098871    0.339738    0.146322   -0.167215
    homosex		-0.011417   -0.134563   -0.549546    0.108393    -0.111768
    bisex		 0.016829    0.101839   -0.366664   -0.075173    -0.058066    0.572637
    pdrug		-0.372748    0.077465    0.488143   -0.108004     0.168403   -0.059314   -0.017718
    
    Covariance of Coefficients
     
    	        intercept    race         sex        hisp        age       homosex      bisex       pdrug
    intercept      	0.011776
    race           -0.001757    0.004157
    sex	       -0.005565    0.002490    0.010119
    hisp	       -0.006304   -0.000949   -0.001628    0.008407
    age	       -0.000251    0.000513    0.000345   -0.000359    0.000548
    homosex        -0.000068   -0.000477   -0.003036    0.000546   -0.000144    0.003017
    bisex	        0.000096    0.000347   -0.001948   -0.000364   -0.000072    0.001661    0.002789
    pdrug	       -0.004099    0.000506    0.004976   -0.001004    0.000400   -0.000330   -0.000095    0.010271
    
    =====================================================================
    Marginal Correlation
    =====================================================================
     
    		 Estimate    SE         robust 	     SE    	 z-value     p-value
    intercept	 0.54310    0.04745     0.04625     11.74226     0.00000
    age		-1.10456    1.00455     0.09331    -11.83749     0.00000 
    
    Correlation of Coefficients
    
    		intercept
    age		-0.998388
    
    
    Covariance of Coefficients
    
    		 intercept      age
    intercept	 0.002139
    age		-0.004309    0.008707
    
    Back to Top
    

    6. USING EE METHODS IN SPLUS

    6.1 Introduction

    The EE object code can be linked to Splus creating a new executable version capable of ee based statistical modeling. The necessary Splus functions which facilitates using ee under the created Splus executable are available. These functions are modifications of, and are used similarly as, those developed by V. Carey and A. Mcdermott of Chaining Laboratory, Harvard Medical School for their Splus implementation of GEE. The primary function, named ee, is used to pass appropriate parameters and invoke the EE module then creates an ee-object with a class attribute glm based on the parameters passed and the results returned by the estimation procedure. The ee function is defined as follows:

    ee <- function (formula = formula(data), mean.function = NA,  variance.function = NA, corr.function = NA,id = NA,  deno = NA, data = sys.parent(), subset, na.action, mean.coef = NA, variance.coef = NA, corr.coef = NA, lambda= 0, tol = 0.001, maxiter = as.integer(25), mean.link = "LIN", variance.link = "CONST", corr.link = NA,joint = F, silent = T) {...}
    where
    formula is an Splus expression of the form response~predictors. Response is the response variable and predictors include all the other variables to be used in the model.
    mean.function is an Splus expression of the form ~predictors. Predictors are the variables included in the left-hand side of the formula expression above for the mean link function. This parameter can be omitted if the predictors specified in the formula expression will all be used for the mean link function.
    variance.function is an Splus expression of the form ~predictors. Predictors are the variables included in the left-hand side of the formula expression above for the mean link function. This parameter can be omitted if the predictors specified in the formula expression will all be used for the mean link function.
    corr.function is an Splus expression of the form ~predictors. Here predictors include the variables for the correlation link function.
    id specifies the variable to identify clusters.
    deno is the denominator variable.
    data is the source data frame for the formula expression above and for the id and deno variables.
    subset expression specifying a subset of the rows of the data to be used in the fit.
    na.action is a function to filter missing data.
    mean.coeff is a vector of initial values for the mean link coefficients.
    variance.coeff is a vector of initial values for the variance link coefficients.
    corr.coeff is a vector of initial values for the correlation link coefficients.
    tol is the convergence criteria.
    maxiter is the maximum number of iterations allowed for the fitting algorithm.
    mean.link is the mean link function type.
    variance.link is the variance link function type.
    corr.link is the correlation link function type.
    joint is a flag set to True if joint estimation is desired
    silent is a flag set to False to get a display of the estimated coefficients at each iteration.

    The other functions are print.ee and print.summary.ee which are used for displaying respectively all the details or a summary including the model specified, the estimated coefficients and variance/covariance matrix of an ee-object. A help file is also available which can be installed as an on-line help in Splus describing the parameters of the ee function.
    Back to Top

    6.2 Instalation and use of the ee function in Splus

    After downloading all the files, the following commands will install the macro and the associated Splus help file.

             Splus QINSTALL .Data *.q
             Splus HINSTALL .Data/.Help *.d

    If these commands are not issued from the directory containing the downloaded .q and .d files, the path to these files must be specified. Or if these commands are not issued from the directory containing your .Data directory, a path to that directory must be specified. It may be necessary to create the .Help directory within your .Data directory.

    To invoke Splus, issue the command

             Splus EXEC local.Sqpe

    If this command is not issued from the directory containing the downloaded local.Sqpe file the path to this file must be specifed.

    Once the above commands have been executed, the Splus command

             help(ee)

    will supply a help file on the ee macro, with a description of the required and optional arguements for the ee macro.
    Back to Top

    7. TECHNICAL INFORMATION

    This program was written by programmers and researchers in the Quantitative Genetic Epidemiology Group. We appreciate your feedback and will be happy to assist with any questions you may have regarding the program.

    7.1 Program details

    The EE program is designed using object-oriented methodology and is written in the C++ programming language. The EE program is originally designed for solving univariate estimating equation problems, then has been extended for solving both univariate and multivariate estimating equation problems. The basic classes are EEModel, Accumulator, Linkfunction and SysFile, where EEModel dealing with estimating equation computing models; Accumulator controlling regression loops; Linkfunction choosing different mean and variance link functions and SysFile providing system format data file as input data. There are some subclasses such as MultiEEModel, MultiAccumulator and MultiLinkfunction etc., derived from the base classes of EEModel, Accumulator and Linkfunction respectively. The EE program also has some utility components to process data file format translation; regression results output, and user select options. Because of its object-oriented style, the EE program is quite flexible and can be easily extended to include additional link, variance and correlation functions.

    The EE program is developed under AIX (Unix) environment on IBM RS6000 workstation. It can be run under DOS or MS window on PC as well.
    Back to Top

    7.2 Methodological development of estimating equation

    The development of estimating equation can be traced back to the early sixties, and later in the eighties, when Godambe and his coworkers [1960], Huber [1967, 1981] and White [1982] independently proposed the concept of an estimating function (or likelihood estimation of misspecified model) and studied its theoretical properties. These research activities remained largely theoretical, and the relevant literature was not well known to the majority of statisticians until the late eighties. In 1986, Liang and Zeger, without being aware of the earlier works in estimating functions, independently proposed estimating equation for analyzing longitudinal data (especially discrete data) with minimum statistical assumptions [Liang and Zeger, 1986; Zeger and Liang, 1986]. Their work may be thought as a multivariate extension of the generalized linear model. Following this research, other researchers including Prentice, Zhao, and Self [Prentice, 1988; Zhao and Prentice, 1990; Zhao and Prentice, 1991; Zhao, Prentice and Self, 1992; Prentice and Zhao, 1991] studied the properties of estimating equation, established the connection between the estimating equation and maximum likelihood frameworks, and extended the estimating equation framework from marginal means to include higher-order moments. They have used the estimating equation framework to establish a number of methodologies for biomedical problems arising from public health research, especially cancer research.

    Let y denote an outcome, and let x=(x1,...,xp) denote a vector of p covariates. Regression of y on x assesses the dependence of the outcome y on the covariate vector x via

    ,

    where a =(a 1,...,a p) is a vector of p regression coefficients, ¢ is the transpose operator, and g(.) is a specified function. The specification g(a) = a corresponds to linear regression, g(a) = 1/[1+exp(-a)] to logistic regression, g(a) = exp(a) to exponential regression, etc. The main objective of regression analysis is to estimate the regression coefficients a , and to make relevant statistical inference. For example, if a 1=0, it implies that the covariate x1 is not associated with the mean of the outcome y, since any change in x1 does not alter the mean of y.

    In classical statistics, a distribution assumption is often made regarding the random outcome y given the covariate vector x, i.e. f(yi|a , xi). Consider a sample of N random observations, (yi,xi), i=1,...,N. Under the assumption f(yi|a , xi), the likelihood function is obtainable, and is written as the product over all N random observations of the density functions f. Maximizing this likelihood with respect to a results in an estimate of the regression coefficients (if such an estimate exists) which is consistent to the true underlying coefficient and has an asymptotic normal distribution.

    The asymptotic normal distribution can then be used for making statistical inference about the regression coefficient. Also according to the maximum likelihood theory, the estimate is fully efficient under the assumed distribution.

    During the early seventies, Nelder and Wedderburn [Nelder and Wedderburn, 1972; Wedderburn, 1974] recognized that all of the commonly used regression methods rely on distributions that are members of the exponential family, and hence their likelihood functions are proportional to an exponential function

    where the canonical parameter q i is a function of m i and c(yi) is a shape function. Consequently, the score estimating equations, which are the first derivative of the log-likelihood function with respect to the parameters of interest, have a unified form, namely,

    where the summation is over all N independent samples, Di is a derivative matrix of m i with respect to a , and Vi is the variance of yi. The estimate of a is a solution to the equations above. Interestingly, the above estimating equations have the same form regardless of the type of outcome. This uniform expression for the score estimating equations forms the basis of the generalized linear model [McCullagh and Nelder, 1991].

    A usual concern with any maximum likelihood estimation is its validity under a violation of the assumed distribution. A possible consequence of violating the distributional assumption is that the estimates of the parameters or their variances may be biased, or may not have an asymptotic normal distribution at all. The concern with misspecified distribution assumption motivates many statisticians to establish methods that are distribution-free, i.e., robust statistical methods. By careful observation of the score estimating equations, Wedderburn [1974] discovered that the distribution assumption may be relaxed, and may be replaced by a weaker assumption of the first two moments, namely, the marginal mean, m i, and the marginal variance, Vi, since they fully specify the score estimating equations above and their correct specification ensures a valid estimate of the regression coefficients and statistical inference about them. In fact, under the assumption of these two moments the estimate is fully efficient. This discovery leads to the development of the quasi-likelihood function, which is the result of integrating the assumed "score estimating equations" with respect to the regression coefficient. The estimate from quasi-likelihood is no longer a likelihood estimate, since the quasi-likelihood may not be a likelihood function at all. Nevertheless, the quasi-likelihood estimate possesses all of the properties of likelihood estimate.

    In the quasi-likelihood regression method above, both the marginal mean and the marginal variance are assumed, although the marginal variances or their parameters are often not of interest in most regression problems. Unfortunately, any misspecification of the variance function, although it does not bias the estimate of the regression coefficients, may yield an incorrect estimate of the asymptotic variance and hence may invalidate the statistical inference. Through further examination of the score estimating equations above, Liang and Zeger [Liang and Zeger, 1986; Zeger and Liang, 1986], Prentice and Zhao [Prentice, 1988; Zhao and Prentice, 1990; Zhao and Prentice, 1991; Prentice and Zhao, 1991] and others [Lipsitz et al, 1991; Zeger et al, 1988; Zhao et al, 1992; Liang et al, 1992] have found in the context of multivariate problems that the relevant estimate, even with incorrectly specified variances, still has an asymptotic normal distribution with an easily estimable variance. In fact, the "score estimating equations" above, with the variance function replaced by an arbitrary weight, leads to a set of estimating equations. Provided that the marginal mean m i is correctly specified, the estimating equations yield a consistent estimate of the regression coefficients, and the estimate has an asymptotic normal distribution. As the ratios of the weights versus the actual variances approach a constant, the estimate from estimating equations is fully efficient [Zhao et al, 1992]. It should be noted that estimating equations are also referred to as GEE, which is an acronym for generalized estimating equations.
    Back to Top

    7.3 References for EE

    Please use the following reference when reporting results based on EE:

    Q.G.E.: "EE: Estimating Equations", Fred Hutchinson Cancer Research Center,

    Quantitative Genetic Epidemiology, Technical report 126.
    Back to Top

    8. LITERATURE

    Godambe V.P. (1960). An optimum property of regular maximum likelihood estimation. Annals of Mathematical Statistics 31: 1208-1212.

    Huber P.J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. Proceeding of the Fifth Berkeley Symposium in Mathematical Institute and Probability. Berkeley: UC Press.

    Huber P.J. (1981). Robust Statistics. New York: John Wiley & Sons.

    Karim M.R., Zeger S.L. (1988). GEE: A SAS Macro for Longitudinal Analysis. Technical Report #674. Dept. of Biostatistics, John Hopkins University.

    Laishes B.A., Rolfe P.B. (1980). Quantitative Assessment of Liver Colony Formation and Hepatocellular Carcinoma Incidence in Rats Receiving Intravenous Injections of Isogeneic Liver Cells Isolated during Hepatocarcinogenesis. Cancer Research 40: 4133-4143.

    Liang K.Y., Zeger S.L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73: 13-22.

    Liang K.Y., Zeger S.L., Qaqish B. (1992). Multivariate regression analysis for categorical data. Journal of the Royal Statistical Society, Series B 54: 3-40.

    Lipsitz S.R., Laird N.M., Harrington D.P. (1991). Generalized estimating equations for correlated binary data: Using the odds ratio as a measure of association. Biometrika 78: 153-160.

    McCullagh P., Nelder J.A. (1989). Generalized Linear Models, second edition. London: Chapman and Hall.

    Meyer F., White E. (1993). Alcohol and Nutrients in Relation to Colon Cancer in Middle-aged Adults. American Journal of Epidemiology 138: 225-236.

    Prentice R.L. (1988). Correlated binary regression with covariates specific to each binary observation. Biometrics 44: 1033- 1048.

    Prentice R.L., Zhao L.P. (1991). Estimating equations for parameters in means and covariances of multivariate discrete and continuous responses. Biometrics 47: 825-839.

    Wedderburn R.W.M. (1974). Quasilikelihood functions, generalized linear models and the Gauss-Newton method. Biometrika 61:439-447.

    White H. (1982). Maximum likelihood estimation of misspecified models. Econometrica 50: 1-25.

    Zeger S.L., Liang K.Y. (1986). Longitudinal data analysis for discrete and continuous outcomes. Biometrics 42: 121-130.

    Zeger S.L., Liang K.Y., Albert P. (1988). Models for longitudinal data: A generalized estimating equation approach. Biometrics 44: 1049-1060.

    Zhao L.P., Grove J., Quiaoit F. (1992). A method for assessing patterns of familial resemblance in complex human pedigrees with an application to the nevus count data in 28 Utah kindreds. American Journal of Human Genetics 51: 178-190.

    Zhao L.P., Prentice R.L. (1990). Correlated binary regression using a quadratic exponential model. Biometrika 77: 642-648.

    Zhao L.P., Prentice R.L. (1991). Use of a quadratic exponential model to generate estimating equations for means, variances, and covariances. In Estimating Functions, Godambe V.P. (ed). Oxford: Oxford Press.

    Zhao L.P., Prentice R.L., Self S.G. (1992). Multivariate mean parameter estimation using a partly exponential model. Journal of the Royal Statistical Society, Series B 54: 805-811.

    Back to Top