|
Home : Workgroups : MIAME : MIAME Archive : Mar 2001
Minimum information about a microarray experiment - MIAME
Draft March 21, 2001, based on November 17, 2000 For archive purposes
New draft (with change-tracking) as of 22 Mar 2001
March 2001
New draft (without change-tracking) as of 22 Mar 2001
March 2001
The goal of the MIAME is to specify the minimum information that must be reported about a microarray (or any DNA array) based gene expression monitoring experiment in order to ensure the interpretability, as well as potential verification of the results by third parties. The background aim is to facilitate the establishing public repositories and data exchange format for microarray based gene expression data. The MGED group will be encouraging the scientific journals and funding agencies to adopt policies requiring data submissions to repositories, once MIAME compliant repositories are established.
Introduction:
The definition of the minimum information is aimed at co-operative data providers, and not for closing possible loopholes in not providing the information.
Among the concepts in the definition is a list of 'qualifier, value, source' triplets, where 'source' is either user defined, or a reference to an externally defined ontology or controlled vocabulary, such as the species taxonomy database at NCBI. Where necessary, the authors are encouraged to define their own qualifiers and provide the appropriate values so that the list as the whole gives sufficient information to interpret the particular part of the experiment. The judgement regarding the necessary level of detail is left to the data providers. In the future these 'voluntary' qualifier lists may be gradually substituted by required fields, as the respective ontologies are developed.
Parts of the MIAME can be provided as a reference or link to an externally existing description. For instance, for commercial or other standard arrays all the required information should be normally provided only once by the array provider and referenced by the users. Standard protocols should also normally be provided only once. It is necessary, that either a valid reference or the information itself if provided for every experiment set.
Definition:
The minimum information about a published microarray based gene expression experiment should include the description of
- Experimental design: the set of the hybridisation experiments as a whole
- Array design: each array used and each element (spot) on the array
- Samples: samples used, the extract preparation and labeling
- Hybridisations: procedures and parameters
- Measurements: images, quantitation, specifications
- Controls: types, values, specifications
The following details should be provided for each array, each sample, hybridisation and measurement in the experiment set:
1. Experimental design: the set of the hybridisation experiments as a whole
This section gives information describing the experiment, which may consist of one or more hybridisations, as a whole. Normally 'experiment' should include a set of hybridisations which are inter-related and performed in a limited period of time. For instance, it may be all the hybridisations related to research published in a single paper.
author (submitter), laboratory, contact information, links (URL)
type of the experiment - maximum one line for instance:
- normal vs. diseased comparison
- treated vs. untreated comparison
- time course
- dose response
- effect of gene knock-out
- effect of gene knock-in (transgenics)
- shock
(multiple types possible)
experimental factors, i.e. parameters or conditions tested (e.g., time, dose, genetic variation, response to a treatment or compound)
the list of all platforms used (commercial or in-house made; if commercial, provider)
single or multiple hybridisations
For multiple hybridisations:
- ordered/unordered
- serial (yes/no)
- type (e.g., time course, dose response)
- grouping (yes/no)
- type (e.g., normal vs. diseased, multiple tissue comparison)
- relationships between all the samples, arrays and hybridisations in the experiment: each sample, each array, and each hybridisation should be given a unique ID or number, and all the relationships should be listed, possibly with appropriate comments. (For instance:
Samples: S1, S2, S3; Arrays: A1, A2, A3 Hybridisations: H1 is S1+S2 on A1, H2 is S2+S3 on A2, H3 is S1+S2 on A3
Note that a detailed description of each sample, array and hybridisation is given in further sections).
- which hybridisations are replicates (e.g., H1 and H3)
quality related indicators
- has the work been published in a peer reviewed journal
- number of replicate hybridisations
- any other quality control steps taken (polya, unspecific binding etc.)
optional user defined "qualifier, value, source" list (see Introduction)
a free text description of the experiment set or a link to a publication
2. Array design: each array used and each element (spot) on the array.
This section describes details of each array used in the experiment (each array has a separate section 2, though an array of each type has to be describe only once and the referenced to. Moreover, we expect that the array descriptions will be given by the array providers, in which case the users will be able to reference them.
The section consists of three parts a) description of the array as the whole, b) description of each type of elements (spot) used (giving properties that are typically common to many elements (e.g., 'synthesized ologo-nucleotide' or 'DNA from a clone' , and c) description of the properties of each element, that are typically different for each element, such as the DNA sequence. In practice, the last part will be provided as a spread-sheet or tab-delimited file.
- array related information
- unique ID as used in part 1 (for commercial or standard arrays a unique ID given by the provider may be used)
- array design name (e.g., "Stanford Human 10K set")
- platform type: insitu synthesized or spotted
- array provider (source)
- surface type: glass, membrane, other
- surface type name
- array support (e.g. slide) dimensions
- number of elements on the array
- a reference system allowing to locate each element (spot) on the array (in the simplest case the number of columns and rows is sufficient)
- production protocol (obligatory if applicable)
- optional "qualifier, value, source" list (see Introduction)
- properties of each group of elements (spots) on the array; elements may be simple, i.e., containing only identical molecules, or composite, i.e., containing different oligonucleotides obtained from the same reference molecule;
- simple or composite
- element type: synthesized oligo-nucleotides, PCR products, plasmids, colonies, other
- singe or double stranded
- element (spot) dimensions
- element generation protocol that includes sufficient information to reproduce the element
- attachment (covalent/ionic/other)
- optional "qualifier, value, source" list (see Introduction)
- element (spot) on the array - for each element the following must be given:
- position on the array allowing to identify the spot in the image (see 5. a) below)
- clone information, obligatory for elements obtained from clones:
- clone ID, clone provider, date, availability
- sequence or PCR information, obligatory for synthetic elements:
- sequence accession number in DDBJ/EMBL/GenBank if known
- sequence itself (if databases do not contain it)
- for composite oligonucleotide elements:
- oligonucleotide sequences, if given
- number of oligonucleotides and the reference sequence (or accession number), otherwise
- approximate lengths if exact sequence not known
- gene name and links to appropriate databases (e.g., SWISS-PROT, or organism specific databases), if known and relevant
Normally this information will be provided in one or more spread-sheets or tab-delimited files.
3. Samples: samples used, extract preparation and labeling
By a 'sample' we understand the biological material, from which the RNA gene products (or DNA) have been extracted for subsequent labeling, hybridisation and measuring. This section describes the source of the sample (e.g., organism, cell type or line), its treatment, as well as preparing the extract and its labeling, i.e., all steps that precedes the contact with an array (i.e., hybridisation). Each sample used in the experiment has a separate section 3. In practice, if the treatments are similar, differing only slightly, the descriptions can be given together, clearly pointing out the differences.
sample source and treatment (this section describes the biological treatment which happens before the extract preparation and labelling, i.e., biological sample in which we intend to measure the gene expression; for each sample only some of the qualifiers given below may be relevant):
- ID as used in section 1
- organism (NCBI taxonomy)
- additional "qualifier, value, source" list; each qualifier in the list is obligatory if applicable; the list includes:
- cell source and type (if derived from primary sources (s))
- sex
- age
- development stage
- organism part (tissue)
- animal/plant strain or line
- genetic variation (e.g., gene knockout, transgenic variation)
- individual
- individual genetic characteristics (e.g., disease alleles, polymorphisms)
- disease state or normal
- target cell type
- cell line and source (if applicable)
- in vivo treatments (organism or individual treatments)
- in vitro treatments (cell culture conditions)
- treatment type (e.g., small molecule, heat shock, cold shock, food deprivation)
- compound
- separation technique (e.g., none, trimming, microdissection, FACS)
- laboratory protocol for sample treatment
hybridisation extract preparation
- laboratory protocol for extract preparation, including:
- extraction method
- whether total RNA, mRNA, or genomic DNA is extracted
- amplification (RNA polymerases, PCR)
- optional "qualifier, value, source" list (see Introduction)
labeling
- laboratory protocol for labelling, including:
- amount of nucleic acids labeled
- label used (e.g., Cy3, Cy5, 33P)
- optional "qualifier, value, source" list (see Introduction)
4. Hybridisations: procedures and parameters
This section describes details of each hybridisation in the experiment. Each hybridisaion have a separate section 4, though if they are similar they may be described together.
ID as given in section 1
laboratory protocol for hybridisation, including:
- the solution (e.g., concentration of solutes)
- blocking agent
- wash procedure
- quantity of labelled target used
- time, concentration, volume, temperature
- description of the hybridisation instruments
optional "qualifier, value, source" list (see Introduction)
5. Measurements: images, quantitation, specifications:
This section describes the data obtained from each scan and their combinations
hybridisation scan raw data:
- the scanner image file (e.g., TIFF) from the hybridised microarray scanning
- scanning information:
- parsed header of the TIFF file, including laser power, spatial resolution, pixel space, PMT voltage;
- laboratory protocol for scanning, including:
- hybridisation ID as in Section 1
- scanning hardware
- scanning software
image analysis and quantitation
- the complete image analysis output (of the particular image analysis software) for each element (or composit element - see 2.b)), for each channel - normally given as a spread-sheet or other external file
- image analysis information:
- image analysis software specification and version, availability, and the description of the algorithm
- all parameters
summarized information from possible replicates
- derived measurement value summarizing related elements as used by the author (this may constitute replicates of the element on the same or different arrays or hybridisations, as well as different elements related to the same entity e.g., gene)
- reliability indicator for the value of c1) as used by the author (e.g., standard deviation); may be "unknown"
- specification how c1 and c2 are calculated; the specification should be bases on b1
6. Normalisation controls, values, specifications for hybridisations
Normalization strategy
- spiking
- "housekeeping gene"
- total array
- optional used defined "quality value"
Normalisation algorithm
- linear regression
- log-linear regression
- ratio statistics
- log(ratio) mean/median centering
- nonlinear regression
- optional used defined "quality value"
Control array elements
- position (the abstract coordinate on the array)
- control type (spiking, normalization, negative, positive)
- control qualifier (endogenous, exogenous)
- optional used defined "quality value"
Hybridisation extract preparation
- spike type
- spike qualifier
- target element
- optional used defined "quality value"
------------------------------------------------
7. Meta-analysis (From John Q) - I think this should be MAML supported, but I'm not 100% sure that this is a part of MIAME - can we impose on anybody to do any meta- analysis at all?
gene normalisation protocol
experiment normalisation protocol
distance metric (Pearson correlation coefficient, Euclidean distance, Cosine, etc.,)
"clustering" algorithm (average linkage, SOM, k-means, etc.)
clustering parameters (e.g. for SOM, x-dim, y-dim, topology, neighbourhood)
|