Matlab files and data downloadable from this web site and related to
the paper "Summarizing Probe Intensities of Affymetrix GeneChip 3'
Expression Arrays Taking into Account Day-to-Day Variability"
by Magni et al. published on IEEE/ACM Transactions on Computational
Biology and Bioinformatics (2011) are here briefly described.
In the distribution directory there are the following files:
- README (this file) containing the main instructions for the use of
the Matlab functions and some descriptions of the data *.mat files.
Matlab functions
----------------
- rmasummary.m -> Function of the Bioinformatics toolbox that
implements the standard RMA summary, i.e. it calculates the gene
expression values by using Robust Multiple-array Average procedure.
For details on its use, see Matlab manuals. It is called within the
Step3Summarization script file (see below).
- rmabackadj_Sacchi.m -> Function slightly modified by the authors
to remove some bugs from the rmabackadj function of the
Bioinformatics toolbox. It performs the background adjustment of
Affymetrix microarray probe-level data using the RMA procedure. For
details on its use, see Matlab manuals. It is called within the
Step1backadjustment script file (see below).
- quantilenorm.m -> Function of the Bioinformatics toolbox that
implements the quantile normalization over multiple arrays. For
details on its use, see Matlab manuals. It is called within the
Step2Normalization script file (see below).
- D2DsumII.m -> Function implementing the second step of the D2Dsum
algorithm as described in the Appendix of the paper published on
IEEE/ACM TCBB (doi:10.1109/TCBB.2010.82). It has 2 inputs and until 4 outputs as
better explained in the following few lines.
%function [mua,ba,valgamma,gamma]=D2DsumII(VRMA,day)
% Input:
% * VRMA - G x A matrix whose elements are the gene expression levels computed by D2Dsum (or standard RMA) in its first step.
% * day - A x 1 vector whose elements are d=h(a), i.e. the day d at which the array a was hybridized.
% Output:
% * mua - G x A matrix whose elements are the log2 gene expression levels as computed by D2Dsum.
% * ba - G x A matrix whose elements are the "array effect" as computed by D2Dsum.
% * valgamma - G x number of days matrix whose elements are the "day" effect as computed by D2Dsum.
% * gamma - G x A matrix whose elements are the "day" effect as computed by D2Dsum.
%
% By Paolo Magni - Lab. Informatica Medica - Dip. Informatica e Sistemistica - Univ. Pavia - 2007
% Algorithm presented in Magni et al. IEEE/ACM TCBB 2011
It is called within the Step3Summarization script file (see below).
Matlab scripts
-------------
Three script files, useful to perform the analysis discussed in the
IEEE/ACM TCBB paper, are here reported. The analysis has been
divided into three steps and intermediate results was save in *.mat
files and provided too.
- Step1backadjustment.m -> In the first step the background
adjustment is performed. The original data was preventively loaded
from *.cel files an stored into a Matlab structure called
allpmStructNo24Ord.
Then, first of all data are loaded from allpmStructNo24Ord.mat file
and perfect match (PM) intensities are extracted by this structure
and save into a matrix (PMmatrix) and saved into the PMmatrix.mat
file. Probe indices, containing for each gene the number of each
probe, are putted into a vector and saved in a *.mat file
(ProbeIndex). Finally, the RMA background adjustment is performed
by mean of the corresponding function rmabackadj_Sacchi and results
are saved in the pmMatrix_bg.mat file.
- Step2Normalization.m -> In the second step the quantile
normalization is performed by the quantilenorm function and results
are saved in the pmMatrix_bgnorm.mat file
- Step3Summarization.m -> In the third step both the RMA and the D2Dsum
summarizations are performed. First, RMA summarization is made
through the rmasummary Matlab function. Results are saved in the
PMsum_glob.mat file. Then, the day in which each experiment was
performed is extracted by the name of the chip and the
giorniChipNo24 vector is created and saved in the corresponding
*.mat file. The D2Dsum algorithm is invoked by calling the D2DsumII
function. Finally, microarrays are randomized to avoid possible
polarization effects in the clustering analysis and *.txt files with
results of the summarization process to be used with clustering tool
are generated.
Matlab file with stored data
----------------------------
- allpmStuctNo24Ord.mat Structure containing the data loaded from
the Affymetrix *.cel files.
- giorniChipNo24.mat Vector with the day in which the transcription
and the hybridization was performed.
- nomiChipNo24.mat Name of the chip as reported in the paper.
- PMmatrix.mat Matrix with the original data (PM intensities).
- pmMatrix_bg.mat PM intensities after background correction.
- pmMatrix_bgnorm.mat PM intensities after background correction and
normalization.
- PMsum_glob.mat Gene expression levels as computed by RMA.
- ProbeIndices.mat Vector containing the number of each probe within
each probset.
Text files with gene expression data
------------------------------------
- geniRMA.txt File containing the gene expression levels evaluated
by RMA algorithm.
- geniD2D.txt File containing the gene expression levels evaluated
by D2Dsum algorithm.
- nomiChipNo24.txt File containing the chip names.