Minimum Information about a high-throughput SeQuencing Experiment - MINSEQE (Draft Proposal)
As an output of discussions at the recent FGED-organized UHTS standardization workshop (in Berkeley, March 17-18, 2008), a new Minimum Information checklist is being developed called MINSEQE [PDF - 11KB] [Or, as shown below].
Published E-Letter in Science - FGED urges scientific journals to adopt and support MINSEQE to preserve MIAME-enabled achievements in this new age of advancing UHTS.
We welcome your feedback on this draft proposal so please
Feedback received so far:
Batch effects are a problem in most (all) high-throughput technologies and initial work indicates that they are also present in high-throughout sequencing (see for example http://www.nature.com/nrg/journal/vaop/ncurrent/full/nrg2825.html ). One of the most useful pieces of information for finding a batch effect in public data, is the date of the experiment. There are several examples where the conclusions of published data have been criticized (and been shown to be wrong) based on the presence of batch effects.
Existing "raw" file formats (like FASTQ, SAM etc) for the various sequencing platforms does not encode the date of the experiment, unlike several standard microarray formats (for example the Affymetrix CEL file format). The likely result of this will be the inability of independent researches to properly assess the quality of publicly available data.
In light of this, we find it important to amend the MINSEQ guidelines to include processing date. Of course, processing date can be defined in many ways, such as
- date of the library prep
- date of the sequencing
- date of the computational analysis
We believe the processing date should be one or both of 1) and 2).
Kasper Daniel Hansen, Jeff Leek, and Rafael Irizarry
Department of Biostatsitics
School of Public Health
Johns Hopkins University
- Additional quality measures may be needed. Currently only mention of quality is on sequencing reads. The new working group on UHTS Data Quality will hopefully address this. Stoeckert
- Repository Support for MINSEQE. Parkinson H, Barrett T and Sansone SA
Repository Support for MINSEQE Name Submission Format Download Format Example Comment Instructions for submitters User support Standards supported DataTypes Production Repositories ArrayExpress MAGE-TAB MAGE-TAB E-MTAB-5 AE Submission Info Users supported by team of curators MINSEQE, MIAME Any array technology, UHTS, metabolomic, GEO GEOarchive SOFT, MINiML, native raw files GSE11172
Raw sequence data is shared with NCBI's Short Read Archive sequence database. GEO Sequence Submission Information Users supported by team of curators MIAME, MINSEQE Any array technology, quantitative high throughput sequencing Repositories under developement BioInvestigation Index ISA-TAB ISA-TAB Example Launch planned in Fall 08; download a one page overview [PPT - 4.79MB] MIBBI which include MIAME, MIGS, MIAPE etc and should include MINSEQE too Entry point for multi-assay studies, using omics and conventional technologies: experimental metadata, including the sample/assay/data relationships, is stored in the Index; transcriptomics and proteomics data are dispatched to EBI's ArrayExpress and PRIDE, metabolomics to a file archive.
- Experimental descriptors can be generalized and modularized with MIAME relevant parts, should we do this via the MIBBI project? Sansone SA
- I suggest we explain the 'relation' with the MIGS specifications for the users' benefit. Sansone SA