- UID
- 600097
- 在线时间
- 小时
- 注册时间
- 2011-1-24
- 最后登录
- 1970-1-1
- 主题
- 帖子
- 性别
- 保密
|
【越障1-12】Ranking differentially expressed genes from Affymetrix gene expression d
<span style="color:#444444;"><font face="Verdana, Helvetica, Arial, sans-serif"><font size="3"><span style="color:#ff8c00;"><font size="3"><font face="Verdana "><span style="color:indigo;">D: k5 K* Q6 [* m* f S+ F% c<br /></span></font></font></span></font><br /><div style="text-align:center;"><strong><font face="&quot; ">Abstract</font></strong></div><br />* L J( W% W" e6 Q<br /><strong><font face="&quot; ">Background: </font></strong><font face="&quot; ">To identify differentially expressed genes (DEGs) from microarray data, users of the Affymetrix Gene Chip system need to select both a preprocessing algorithm to obtain expression level measurements and a way of ranking genes to obtain the most plausible candidates. We recently recommended suitable combinations of a preprocessing algorithm and gene ranking method that can be used to identify DEGs with a higher level of sensitivity and specificity. However, in addition to these recommendations, researchers also want to know which combinations enhance reproducibility.</font><br /><font face="&quot; "></font>, |% d' C+ ^) V3 B<br /><strong><font face="&quot; ">Results: </font></strong><font face="&quot; ">We compared eight conventional methods for ranking genes: weighted average difference (WAD),average difference (AD), fold change (FC), rank products (RP), moderated <em>t</em>statistic(modT), significance analysis of microarrays (samT), shrinkage <em>t </em>statistic(shrinkT), and intensity based moderated <em>t </em>statistic (ibmT) with six preprocessing algorithms (PLIER, VSN, FARMS, multimgMOS (mmgMOS), MBEI, and GCRMA). A total of 36 real experimental data sets was evaluated on the basis of the area under the receiver operating characteristic curve (AUC) as a measure for both sensitivity and specificity. We found that the RP method performed well for VSN-, FARMS-, MBEI-, and GCRMA-preprocessed data, and the WAD method performed well for mmgMOS preprocessed data. Our analysis of the MicroArray Quality Control (MAQC) project's data sets showed that the FC-based gene ranking methods (WAD, AD, FC, and RP) had a higher level of reproducibility: The percentages of overlapping genes (POGs) across different sites for the FC-based methods were higher overall than those for the <em>t</em>-statistic-based methods (modT,samT, shrinkT, and ibmT). In particular, POG values for WAD were the highest overall among the FC based methods irrespective of the choice of preprocessing algorithm.</font>2 o' v0 ~; `6 v g<br /><font face="&quot; "></font><br /><strong><font face="&quot; ">Conclusion: </font></strong><font face="&quot; ">Our results demonstrate that to increase sensitivity, specificity, and reproducibility in microarray analyses, we need to select suitable combinations of preprocessing algorithms and gene ranking methods. We recommend the use of FC-based methods,in particular RP or WAD.</font>: Y- `6 h$ l. G5 u m, J<br /><font face="&quot; "></font>! m2 U6 X, L K. X" n& b<br /><br /><div style="text-align:center;"><strong><font face="&quot; ">Background</font></strong></div><br />' m4 ^2 R/ e1 V- N9 U% P<br /><font face="&quot; ">Microarray analysis is often used to detect differentially expressed genes (DEGs) under different conditions. As there are considerable differences [1,2] in how well it performs, choosing the best method of ranking these genes is important.Furthermore, Affymetrix GeneChip users need to choose a preprocessing algorithm from a number of competitors in order to obtain expression-level measurements [3].</font><br /><font face="&quot; "></font><br /><font face="&quot; ">We recently reported with another group that there are suitable combinations of preprocessing algorithms and gene ranking methods [1,2]. We evaluated three preprocessing algorithms, MAS [4], RMA [5], and DFW [6], and eight gene ranking methods, WAD [1], AD, FC, RP [7], modT [8], samT [9], shrinkT [10], and ibmT[11], by using a total of 38 data sets (including 36 real experimental datasets)[1]. Meanwhile, Pearson [2] evaluated nine preprocessing algorithms, MAS [4],RMA [5], DFW [6], MBEI [12], CP [13], PLIER[14], GCRMA [15], mmgMOS [16], and FARMS[17], and five gene ranking methods, modT [8], FC, a standard <em>t</em>-test,cyberT [18], and PPLR [19], by using only one artificial 'spike-in' dataset,the Golden Spike dataset [13]. </font>9 ^9 }1 k* w+ ?% A5 D: m' r<br /><font face="&quot; "></font><br /><font face="&quot; ">When were-evaluated the two reports using the common algorithms and methods we found that suitable gene ranking methods for each of the three preprocessing algorithms, i.e., MAS, RMA, and DFW, converge to the same: Combinations of MAS and modT (MAS/modT), RMA/FC, and DFW/FC can thus be recommended. However, the final conclusions for the original reports are understandably different: Our recommendations [1] are MAS/WAD, RMA/FC, and DFW/RP, while Pearson [2] recommends mmgMOS/PPLR, GCRMA/FC, and so on. This difference is mainly because fewer preprocessing algorithms were evaluated in our previous study [1]. </font>6 y d- O5 A& I& D<br /><font face="&quot; "></font>- e& k; V$ M" g" t( \. D( A<br /><font face="&quot; ">We investigated suitable gene ranking methods for each of six preprocessing algorithms: MBEI,VSN [20], PLIER, GCRMA, FARMS, and mmgMOS. We also investigated the best combination of a preprocessing algorithm and gene ranking method using another evaluation metric, i.e., the percentage of overlapping genes (POG), proposed by the MAQC study [21].</font><br /><font face="&quot; "></font><br /><font face="&quot; ">Most authors of methodological papers have made claims that their methods have a greater area under the receiver operating characteristic curve (AUC) values, i.e., both high sensitivity and specificity [1,2]. However, reproducibility is rarely mentioned[21]. A good method should produce high POG values, i.e., those indicating reproducibility as well as high AUC ones, i.e., those for sensitivity and specificity. We will discuss suitable combinations of preprocessing algorithms and gene ranking methods.</font>& y! I: q) A" }<br /><font face="&quot; "></font><br />7 g& P& w% C; _7 h<br /><div style="text-align:center;"><strong><font face="&quot; ">Conclusion</font></strong></div><br /><br /><font face="&quot; ">We evaluated the performance of combinations between six preprocessing algorithms and eight gene ranking methods in terms of the AUC value, i.e., both sensitivity and specificity, and the POG one, i.e., reproducibility. Our comprehensive evaluation confirmed the importance of using suitable combinations of preprocessing algorithms and gene ranking methods.</font>5 r) w3 C( G7 b9 N<br /><font face="&quot; "></font>* i2 ~' L. J. [" }9 A4 p$ l<br /><font face="&quot; ">Overall, two FC-based gene ranking methods (RP and WAD) can be recommended. Our current and previous results indicate that any of the following combinations, RMA/RP,DFW/RP, PLIER/RP, VSN/RP, FARMS/RP, MBEI/ RP, GCRMA/RP, MAS/WAD, and mmgMOS/WAD, enhances both sensitivity and specificity, and also that using the WAD method enhances reproducibility.</font>4 T8 g o4 j2 M; _, u: z! E" S<br /><font face="&quot; "></font><br /> L- d6 P% \, M+ p; [4 \<br /><div style="text-align:center;"><strong><font face="&quot; ">Methods</font></strong></div><br />: P3 P# F, b% \<br /><font face="&quot; ">The raw data(Affymetrix CEL files) for Datasets 3–38 were obtained from the Gene ExpressionOmnibus (GEO) website [32]. All analysis was performed using R (ver. 2.7.2)[33] and Bioconductor [34]. The versions of R libraries used in this study areas follows: <em>plier </em>(ver. 1.10.0), <em>vsn </em>(3.2.1), <em>farms </em>(1.3),<em>puma </em>(1.6.0), <em>affy</em> (1.16.0) [35], <em>gcrma </em>(2.10.0),<em>RankProd</em>(2.12.0) [36], <em>st</em> (1.0.3) [10], <em>limma </em>(2.14.7) [8], <em>ROC </em>(1.14.0).The main functions in the R libraries are as follows: <em>justPlier </em>for PLIER,<em>vsnrma </em>for VSN, <em>q.farms </em>for FARMS, <em>mmgmos</em>for mmgMOS, <em>expresso</em> for MBEI (PM only model), <em>gcrma </em>for GCRMA, <em>mas5 </em>for MAS, <em>rma</em>for RMA, <em>expresso </em>and the R codes available in [37] for DFW, <em>RP </em>forRP, <em>modt.stat </em>for modT,<em>sam.stat </em>for samT, <em>shrinkt.stat </em>forshrinkT, <em>IBMT </em>for ibmT [38], and <em>pumaComb </em>and <em>pumaDE</em>forPPLR [19].</font><br /><font face="&quot; "></font><br /><font face="&quot; ">Since the MBEIand MAS expression measures do not output logged values, signal intensities under 1 in those preprocessed data were set to 1 so that the logarithm of the data could be found. Logged values smaller than 0 in PLIER-, VSN-, FARMS-, mmgMOS-,and GCRMA-preprocessed data were set to 0. For reproducible research, we made the R code for analyzing Dataset 4 (GEO ID: GSM189708–189713) available as the additional file [see Additional file 3]. The R codes for the other datasets are available upon request.</font>9 c0 g0 G) S* j8 s+ V B S4 v<br /><font face="&quot; "></font><br /><font face="&quot; ">The raw data forthe MAQC datasets were obtained from the MAQC website [39]. The evaluationbased on POG was done with 12 datasets produced by the MAQC project [21] inwhich two RNA sample types and two mixtures of the original samples were used:Sample A, a universal human reference RNA; Sample B, a human brain reference RNA;Sample C, which consisted of 75 and 25% of Sample A and B respectively; andSample D, which consisted of 25 and 75% of Sample A and B respectively. Fivereplicate experiments for each of the four sample types at six independent testsites (Sites 1–6) were conducted, and, thus there are 20 files at each site.The data preprocessing was performed at each site. The application of the gene rankingmethods was independently performed for comparisons of "Sample A versusB" and "Sample C versus D".</font></font></span><span style="color:#444444;"><font face="Verdana, Helvetica, Arial, sans-serif"><font face="&quot; "><br /></font></font></span><br /><span style="color:#444444;"><font face="Verdana, Helvetica, Arial, sans-serif"><font face="&quot; ">来源:</font></font></span><a href="http://bbs.gter.net/bbs/viewthread.php?tid=994272&highlight=" target="_blank">http://bbs.gter.net/bbs/viewthread.php?tid=994272&highlight=</a> |
|