ComBat:
‘Combatting’ Batch Effects When Combining Batches of Gene Expression Microarray Data
 
 
If you are familiar with R and microarray analysis, source the 'ComBat.R' script and use the 'ComBat()' function, and look at the parameter descriptions below or commented in the script.

If you are new to R and/or to microarray analysis, here are some steps to get you through:

Gene expression index file:
Skip this step if you already have a tab-delimited .txt file containing your expression values. I'll help you do this in dChip because that's what I know best! First, read your .CEL files into dChip, normalize, and calculate the model-base expression index (I assume you can to this, otherwise how do you know you have batch effects in your data! But if you are new to dChip check the dChip Manual). Now go to 'Tools/Export Expression Value'. Under 'Gene list file' select 'all genes', then highlight all your arrays in the box below. Make sure the 'Has Presence or SNP call' box has a check in it. Note where you are saving the file and click 'OK'. You now have a tab delimited text file containing your expression indices!

Sample information file:
The sample information should be a tab-delimited .txt file containing information about the arrays to be adjusted. The first column must contain the names of the arrays (column names) as in your expression index file. If your expression index file contains 'presence call' you should not include the 'call' column names in the sample information file. The columns after Array and Sample names include batch and covariate (treatment) information. The batch column is required and must be named 'Batch'. Note that currently ComBat only deals with categorical covariates. Numerical covariates have not been implemented at this time. Here is an example of a sample information file:
Array name    Sample name    Batch    Covariate 1  ...
Array 1            Sample 1            1             Tissue 1         ...
Array 2            Sample 2            2             Tissue 2         ...
...                       ...                        ...            ...                  ...

Running ComBat in R:
The two files above are all that is needed to run ComBat. To run the function, open up R, go to 'File/Change dir' and select the location of 'ComBat.R' and your expression and sample information files. Now and type the following:
> source('ComBat.R')
> ComBat('your expression filename here', 'your sample information filename here')

This is it! ComBat should now write a new file to your directory containing the adjusted data!
If you get an error, check your expression or sample information files or the parameter descriptions below to make sure everything is set-up right.
Parameter descriptions:
expression_xls      Expression index file (e.g. outputted by dChip);
sample_info_file   Tab-delimited text file containing the colums: Array  name, 
                                Sample name, Batch, and any other covariates to be 
                                included in the modeling.

type                        File type for the expression values. Currently supports two     
                                data file types: 'txt' for a tab-delimited text file and 'csv' for 
                                an Excel .csv file.  I don't claim to be an expert at R and I ran 
                                into some problems getting the expression data into R. For 
                                this reason I included the '.csv' option. My suggestion is to             
                                use the default ('txt') and if this doesn't work, open the 
                                expression data in Excel, save it as a .csv file, and then use 
                                the 'csv' option.

write                        If 'T' ComBat writes adjusted data to a file, and if 'F' and 
                                ComBat outputs the adjusted data matrix if 'F' (so assign it 
                                to an object! i.e. NewData <- ComBat('my expression.xls',     
                                'Sample info file.txt', write=F)).

covariates              If 'all' ComBat will use all of the columns in your sample 
                                info file in the modeling (except array/sample name). If you 
                                only want use only some of the columns in your sample 
                                info file, specify these columns here as a vector (you must 
                                include the Batch column in this list but not array/sample 
                                name).
par.prior                If 'T', ComBat will use the parametric adjustments. If 'F', it 
                                uses the nonparametric adjustments. If you are unsure what 
                                to use, try the parametric adjustments (they run faster) and 
                                check the plots to see if these priors are reasonable. If the 
                                red and black lines don't match up well, use the 
                                nonparametric adjustments (par.prior=F).

filter                       If you do not have presence/absence call in your expression 
                                file, set this to 'F'. If you have presence/absence call, the 
                                'filter=value' (where 0 < value < 1) filters the genes with 
                                absent calls in > 1-value of the samples. The defaut here is 
                                 'F' (in dchip it is .8). The EB adjustments work better after 
                                 filtering, so filter if you can. Note: 'filter' must be numeric if 
                                your expression index file contains presence/absence calls 
                                (but you can set it >1 if you don't want to filter any genes) 
                                and must be 'F' if your data doesn't have presence/absence 
                                calls.

skip                       The number of columns that contain probe names and gene 
                               information, so 'skip=5' implies the first expression values 
                               are in column 6.
prior.plots              If true will give prior plots where black is a kernel density 
                                estimate of the batch effects, and red as the parametric 
                                estimate of the batch effects. Quantile-quantile plots are also 
                                included. If the red and black lines don't match up well, use 
                                the nonparametric adjustments.

 Final note: If anything on this page is unclear or incorrect, please let me know by joining the ComBat users forum and posting your comment. You can join at: http://groups.google.com/group/combat-user-forum. Also, if you have any usage questions or problems please post them and I'll answer what I can based on my time restraints and the complexity of the question!http://biosun1.harvard.edu/complab/dchip/manual.htmhttp://groups.google.com/group/combat-user-forumshapeimage_2_link_0shapeimage_2_link_1
Usage