Home : Workgroups : MIAME : MIAME Archive : Jan 2002
Version 1.1 of the MIAME specification is available as a Word document, a Rich Formatted Text (.rft) document, or as HTML text as seen below.
Minimum Information About a Microarray Experiment - MIAME 1.1 Draft 1
Version 1.1 (Draft, January 28, 2002) - revision of MIAME 1.0, which was adopted in MGED 3 meeting, Stanford University, March 28, 2001.
The goal of this document is to outline the minimum information required to unambiguously interpret and potentially verify array based gene expression monitoring experiments. Although details for particular experiments may be different, MAIME aims to define the core that is common to most experiments. MIAME is not a formal specification, but a set of guidelines.
A major objective of MIAME is to guide the development of microarray databases and data management software. A standard microarray data model and exchange format MAGE, which is able to capture information specified by MIAME, has been recently developed by the OMG. Links to software tools supporting the MIAME information capture and management are available from www.mged.org/miame.
Although MIAME concentrates on the content of the information and should not be confused with a data format, it also tries to provide a conceptual structure for microarray experiment descriptions. More information about the MIAME rationale can be found in 'Minimum information about a microarray experiment (MIAME)-toward standards for microarray data', A. Brazma, at al., Nature Genetics, vol 29 (December 2001), pp 365 - 371. MIAME is continuously developing in accordance with our understanding of microarray technology and its applications.
The MIAME structure
MIAME recommendations include sections that will usually be provided in a free text format, along with information that are recommended to be given by maximum use of controlled vocabularies or external ontologies (such as species taxonomy, cell types, anatomy terms, chemical compound nomenclature). The use of controlled vocabularies are needed to enable database queries and automated data analysis.
Since few controlled vocabularies have been fully developed, MIAME encourages the users, if necessary, to provide their own qualifiers and values identifying the source of the terminology. This is achieved through the use of (qualifier, value, source) triplets (e.g., qualifier: cell type, value: epithelial, source: Gray's anatomy, 38th ed.), which is recommended instead or in addition to free text format descriptions wherever possible. This will allow the community to build up a knowledge base of the most useful controlled vocabularies for describing microarray experiments. The MGED group is developing an ontology for microarray experiment description, and where the ontology is sufficiently mature, the MIAME document recommends its use (see www.mged.org/ontology).
Microarrays are often manufactured independently of particular experiments and their design description can be given separately. Therefore, MAIME has two major sections
(1)array design description;
(2)gene expression experiment description.
Another potentially reusable part of the experiment description is laboratory protocols, including data processing methods (e.g., normalization). MIAME encourages the user to assign unique identifiers to all reusable parts of experiment description and to reference these when the respective parts are reused (possibly indicating the deviations).
I Array design
The array design specification consists of the description of the common features of the array as the whole, and the description of each array design elements (e.g., each spot). Following terminology used in MAGE, we distinguish between three levels of array design elements: feature - the location on the array, reporter - the nucleotide sequence present in a particular location on the array, and composite element - a set of reporters used collectively to measure an expression of a particular gene, exon, or splice-variant. The details that should be given for each of them are described below.
1) Array related information
- array design name
- platform type: in situ synthesized, spotted or other
- surface and coating specification
- physical dimensions of array support (e.g. of slide)
- number of elements on the array
- availability (e.g., for commercial arrays) or production protocol for custom made arrays
2a) For each reporter type
- the type of the reporter: synthetic oligo-nucleotides, PCR products, plasmids, colonies, other
- single or double stranded
2b) For each reporter
- sequence or PCR primer information:
- sequence or a reference sequence (e.g., for oligonucleotides), if known
- sequence accession number in DDBJ/EMBL/GenBank, if exists
- primer pair information, if relevant
- approximate lengths if exact sequence not known
- clone information, if relevant (clone ID, clone provider, date, availability)
- element generation protocol that includes sufficient information to reproduce the element for custom-made arrays that are not generally available
3a) For each feature type
- attachment (covalent/ionic/other)
3b) For each feature
- which reporter and the location on the array
4)For each composite element
For each array that is not generally available (e.g., commercially available), the provided information should be sufficient to reproduce the array and all its design features.
- which reporters it contains
- the reference sequence
- gene name and links to appropriate databases (e.g., SWISS-PROT, or organism specific databases), if known and relevant
II Experiment design
By experiment we understand a set of one or more hybridizations that are in some way related (e.g., related to the same publication). The minimum information includes a description of the following five parts.
- Experimental design
- Samples used, extract preparation and labeling
- Hybridization procedures and parameters
- Measurement data and specifications
- Normalization and controls
MIAME recommends the following details on each of these sections.
1. Experimental design
This section is common to all the hybridizations done in the experiment, such as the goal, brief description, experimental factors tested. It includes the following:
1)Authors, laboratory, contact
2)Type of the experiment, for instance,
3)Experimental factors, i.e. parameters or conditions tested, for instance,
4)How many hybridizations in the experiment?
5)If a common reference is used for all the hybridizations?
6)Quality control steps taken:
- if any replicates done (yes/no), what type of replicates, description?
- whether dye swap is used (only for two channel platforms)?
- other (e.g., polyA tails, low complexity regions, unspecific binding)
7) A brief description of the experiment and its goal and a link to a publication if exists
8) Links (URL), citations
2. Samples used, extract preparation and labeling
By a sample we understand the biological material (biomaterial), from which the nucleic acids have been extracted for subsequent labeling and hybridization. In this section we describe all steps that precedes the hybridization with the array. We can usually distinguish between the source of the sample (bio-source, e.g., organism, cell type or line), its treatment, the extract preparation, and its labeling. MGED is developing an ontology for sample description (www.mged.org/ontology) the use of which is encouraged. Here we list the most essential items that are usually needed.
1) Bio-source properties
- organism (NCBI taxonomy)
- contact details for sample
- descriptors relevant to the particular sample, such as
- development stage
- organism part (tissue)
- cell type
- animal/plant strain or line
- genetic variation (e.g., gene knockout, transgenic variation)
- individual genetic characteristics (e.g., disease alleles, polymorphisms)
- disease state or normal
- is additional clinical information available (link)
- the individual (for interrelation of the samples in the experiment)
2) Biomaterial manipulations: laboratory protocol, including relevant parameters, e.g.,
3) Hybridization extract preparation protocol for each extract prepared from the sample, including
- extraction method
- whether total RNA, mRNA, or genomic DNA is extracted
- amplification (RNA polymerases, PCR)
4) Labeling protocol for each labeling prepared from the extract, including
- amount of nucleic acids labeled
- label used (e.g., A-Cy3, G-Cy5, 33P, ..)
- label incorporation method
3. Hybridization procedures and parameters
Each hybridization description should include information about which labeled extract (related to which sample, which extract) and which array (e.g., array design, batch and serial number) has been used in the experiment and the laboratory protocol, normally including
the solution (e.g., concentration of solutes)
quantity of labeled target used
time, concentration, volume, temperature
description of the hybridization instruments
4. Measurement data and specifications
We distinguish between three levels of data processing - raw data (images), image quantitations and gene expression data matrix. Each hybridization has at least one image, each image has a corresponding image quantitation table, where a row represents an array design element and a column to a different quantitation types, such as mean or median pixel intensity. Several quantitation tables can be combined to obtain the 'final' gene expression measurement table associated with the experiment.
1) Raw data description should include
- for each scan laboratory protocol for scanning, including scanning hardware and software, scan parameters, including laser power, spatial resolution, pixel space, PMT voltage;
- scanned images;
It should be noted that MGED does not have consensus whether the provision of images is a part of MIAME.
2) Image analysis and quantitation
- image analysis software specification and version, availability, and the description or identification of the algorithm and all the parameters used
- for each image the complete image analysis output (of the particular image analysis software)
3) Normalized and summarized data - gene expression data matrix
5. Normalization controls, values, specifications
- data transformation protocol, including normalization algorithm
- final gene expression data table(s) derived from the experiment as the whole,
- derived measurement value summarizing related elements and replicates as used by the author (this may constitute replicates of the element on the same or different arrays or hybridizations, as well as different elements related to the same entity e.g., gene)
- providing a reliability indicator for each datapoint (e.g., standard deviation) is encouraged
1) Normalization strategy, for instance
- "housekeeping" genes
- total array
2) Control array elements
- position (the abstract coordinate on the array)
- control type (spiking, normalization, negative, positive)
- control qualifier (endogenous, exogenous)
3) Control elements during hybridization extract preparation
- spike type
- spike qualifier
- target element
For more details see www.mged.org/normalisation
Relation to MGED ontologies
MIAME is a general set of guidelines for a variety of different users: experimenters, software developers and data analysts. Further details on specific experiment types and appropriate descriptors for these can be found at the MGED ontology site (www.mged.org/ontology). The MGED ontology is a work in progress and not all possible types of experiments have yet been addressed. We encourage experimenters to contribute to the development of ontologies that specifically describe their work and to help us integrate these into the MGED ontologies.
This document has been prepared for discussion at MGED 4 meeting.