Matlab files and data downloadable from this web site and related to the paper "Summarizing Probe Intensities of Affymetrix GeneChip 3' Expression Arrays Taking into Account Day-to-Day Variability" by Magni et al. published on IEEE/ACM Transactions on Computational Biology and Bioinformatics (2011) are here briefly described. In the distribution directory there are the following files: - README (this file) containing the main instructions for the use of the Matlab functions and some descriptions of the data *.mat files. Matlab functions ---------------- - rmasummary.m -> Function of the Bioinformatics toolbox that implements the standard RMA summary, i.e. it calculates the gene expression values by using Robust Multiple-array Average procedure. For details on its use, see Matlab manuals. It is called within the Step3Summarization script file (see below). - rmabackadj_Sacchi.m -> Function slightly modified by the authors to remove some bugs from the rmabackadj function of the Bioinformatics toolbox. It performs the background adjustment of Affymetrix microarray probe-level data using the RMA procedure. For details on its use, see Matlab manuals. It is called within the Step1backadjustment script file (see below). - quantilenorm.m -> Function of the Bioinformatics toolbox that implements the quantile normalization over multiple arrays. For details on its use, see Matlab manuals. It is called within the Step2Normalization script file (see below). - D2DsumII.m -> Function implementing the second step of the D2Dsum algorithm as described in the Appendix of the paper published on IEEE/ACM TCBB (doi:10.1109/TCBB.2010.82). It has 2 inputs and until 4 outputs as better explained in the following few lines. %function [mua,ba,valgamma,gamma]=D2DsumII(VRMA,day) % Input: % * VRMA - G x A matrix whose elements are the gene expression levels computed by D2Dsum (or standard RMA) in its first step. % * day - A x 1 vector whose elements are d=h(a), i.e. the day d at which the array a was hybridized. % Output: % * mua - G x A matrix whose elements are the log2 gene expression levels as computed by D2Dsum. % * ba - G x A matrix whose elements are the "array effect" as computed by D2Dsum. % * valgamma - G x number of days matrix whose elements are the "day" effect as computed by D2Dsum. % * gamma - G x A matrix whose elements are the "day" effect as computed by D2Dsum. % % By Paolo Magni - Lab. Informatica Medica - Dip. Informatica e Sistemistica - Univ. Pavia - 2007 % Algorithm presented in Magni et al. IEEE/ACM TCBB 2011 It is called within the Step3Summarization script file (see below). Matlab scripts ------------- Three script files, useful to perform the analysis discussed in the IEEE/ACM TCBB paper, are here reported. The analysis has been divided into three steps and intermediate results was save in *.mat files and provided too. - Step1backadjustment.m -> In the first step the background adjustment is performed. The original data was preventively loaded from *.cel files an stored into a Matlab structure called allpmStructNo24Ord. Then, first of all data are loaded from allpmStructNo24Ord.mat file and perfect match (PM) intensities are extracted by this structure and save into a matrix (PMmatrix) and saved into the PMmatrix.mat file. Probe indices, containing for each gene the number of each probe, are putted into a vector and saved in a *.mat file (ProbeIndex). Finally, the RMA background adjustment is performed by mean of the corresponding function rmabackadj_Sacchi and results are saved in the pmMatrix_bg.mat file. - Step2Normalization.m -> In the second step the quantile normalization is performed by the quantilenorm function and results are saved in the pmMatrix_bgnorm.mat file - Step3Summarization.m -> In the third step both the RMA and the D2Dsum summarizations are performed. First, RMA summarization is made through the rmasummary Matlab function. Results are saved in the PMsum_glob.mat file. Then, the day in which each experiment was performed is extracted by the name of the chip and the giorniChipNo24 vector is created and saved in the corresponding *.mat file. The D2Dsum algorithm is invoked by calling the D2DsumII function. Finally, microarrays are randomized to avoid possible polarization effects in the clustering analysis and *.txt files with results of the summarization process to be used with clustering tool are generated. Matlab file with stored data ---------------------------- - allpmStuctNo24Ord.mat Structure containing the data loaded from the Affymetrix *.cel files. - giorniChipNo24.mat Vector with the day in which the transcription and the hybridization was performed. - nomiChipNo24.mat Name of the chip as reported in the paper. - PMmatrix.mat Matrix with the original data (PM intensities). - pmMatrix_bg.mat PM intensities after background correction. - pmMatrix_bgnorm.mat PM intensities after background correction and normalization. - PMsum_glob.mat Gene expression levels as computed by RMA. - ProbeIndices.mat Vector containing the number of each probe within each probset. Text files with gene expression data ------------------------------------ - geniRMA.txt File containing the gene expression levels evaluated by RMA algorithm. - geniD2D.txt File containing the gene expression levels evaluated by D2Dsum algorithm. - nomiChipNo24.txt File containing the chip names.