Home : Workgroups : MIAME : MIAME 1.1
This an old version of MIAME.
Version 1.1 Draft 6 of the MIAME specification is available as a Word document, a Rich Formatted Text (.rft) document, or as HTML text as seen below.
Please find the latest version from here.
Minimum Information About a Microarray Experiment - MIAME 1.1 Draft 6
Version 1.1 (Draft 6, April 1, 2002) - discussed at MGED 4
The goal of this document is to outline the minimum information required to interpret unambiguously and potentially reproduce and verify an array based gene expression monitoring experiment. Although details for particular experiments may be different, MIAME aims to define the core that is common to most experiments. MIAME is not a formal specification, but a set of guidelines.
A major objective of MIAME is to guide the development of microarray databases and data management software. A standard microarray data model and exchange format MAGE, which is able to capture information specified by MIAME, has been submitted by EBI (for MGED) and Rosetta Biosoftware and recently became an Adopted Specification of the OMG standards group ( see http://www.mged.org/mage ). Many organizations, including Agilent, Affymetrix, and Iobion, have contributed ideas to MAGE. Links to software tools supporting the MIAME information capture and management are available from http://www.mged.org/miame
Although MIAME concentrates on the content of the information and should not be confused with a data format, it also tries to provide a conceptual structure for microarray experiment descriptions. More information about the MIAME rationale can be found in 'Minimum information about a microarray experiment (MIAME)-toward standards for microarray data', A. Brazma, et al., Nature Genetics, vol 29 (December 2001), pp 365 - 371. For explanation of some of the terminology, see http://www.mged.org/Workgroups/MIAME/miame_glossary.html.
The MIAME structure
MIAME recommendations include sections that will usually be provided in a free text format, along with information that are recommended to be given by maximum use of controlled vocabularies or external ontologies (such as species taxonomy, cell types, anatomy terms, chemical compound nomenclature). The use of controlled vocabularies are needed to enable database queries and automated data analysis.
Since few controlled vocabularies have been fully developed, MIAME encourages the users, if necessary, to provide their own qualifiers and values identifying the source of the terminology. This is achieved through the use of
(qualifier, value, source)
triplets, for instance,
(qualifier: 'cell type', value: 'epithelial', source: 'Gray's anatomy, 38th ed.'),
which is recommended instead or in addition to free text format descriptions wherever possible. This will allow the community to build up a knowledge base of the most useful controlled vocabularies for describing microarray experiments. The MGED group is developing an ontology for microarray experiment description, and where the ontology is sufficiently mature, the MIAME document recommends its use (see http://www.mged.org/ontology ).
Microarrays are often manufactured independently of particular experiments and their design description can be given separately. Therefore MIAME has two major sections
(1)array design description;
(2)gene expression experiment description.
Another potentially reusable part of the experiment description is laboratory protocols, including data processing methods (e.g., normalization). MIAME encourages the user to assign unique identifiers to all reusable parts of experiment description and to reference these when the respective parts are reused (possibly indicating the deviations). A standard for the description of protocols, including the data transformation protocols are being developed by MGED, for details see http://www.mged.org/normalisation.
I Array design description
The array design specification consists of the description of the common features of the array as the whole, and the description of each array design elements (e.g., each spot). Following terminology used in MAGE, we distinguish between three levels of array design elements: feature - the location on the array, reporter - the nucleotide sequence present in a particular location on the array, and composite sequence - a set of reporters used collectively to measure an expression of a particular gene, exon, or splice-variant. The details that should be given of each of them are described below.
1) Array related information
- array design name
- platform type: in situ synthesized, spotted or other
- surface and coating specification
- physical dimensions of array support (e.g. of slide)
- number of features on the array
- availability (e.g., for commercial arrays) or production protocol for custom made arrays
2a) For each reporter type
- the type of the reporter: synthetic oligo-nucleotides, PCR products, plasmids, colonies, other
- single or double stranded
2b) For each reporter
- sequence or PCR primer information:
- sequence or a reference sequence (e.g., for oligonucleotides), if known
- sequence accession number in DDBJ/EMBL/GenBank, if exists
- primer pair information, if relevant
- approximate lengths if exact sequence not known
- clone information, if relevant (clone ID, clone provider, date, availability)
- element generation protocol that includes sufficient information to reproduce the element for custom-made arrays that are not generally available
3a) For each feature type
- attachment (covalent/ionic/other)
3b) For each feature
- which reporter and the location on the array
4)For each composite sequence
- which reporters it contains
- the reference sequence
- gene name and links to appropriate databases (e.g., SWISS-PROT, or organism specific databases), if known and relevant
5) Control elements on the array
For each array that is not generally available (e.g., commercially available), the provided information should be sufficient to reproduce the array and all its design features.
- position of the feature (the abstract coordinate on the array)
- control type (spiking, normalization, negative, positive)
- control qualifier (endogenous, exogenous)
II Experiment description
By experiment we understand a set of one or more hybridizations that are in some way related (e.g., related to the same publication). The minimum information includes a description of the following five parts.
- Experimental design
- Samples used, extract preparation and labeling
- Hybridization procedures and parameters
- Measurement data and specifications of data processing
MIAME recommends the following details on each of these sections.
1. Experimental design
This section that is common to all the hybridizations done in the experiment, such as the goal, brief description, experimental factors tested. It includes the following.
1) Authors, laboratory, contact
2) Type of the experiment, for instance,
3) Experimental factors, i.e. parameters or conditions tested, for instance,
- genetic variation
- response to a treatment or compound
4) How many hybridizations in the experiment?
5) If a common reference is used for all the hybridizations?
6) Quality control steps taken:
- if any replicates done (yes/no), what type of replicates, description?
- whether dye swap is used (only for two channel platforms)?
- other (e.g., polyA tails, low complexity regions, unspecific binding)
5) A brief description of the experiment and its goal and a link to a publication if exists
6) Links (URL), citations
2. Samples used, extract preparation and labeling
By a sample we understand the biological material (biomaterial), from which the nucleic acids have been extracted for subsequent labeling and hybridization. In this section all steps that precedes the hybridization with the array are described. We can usually distinguish between the source of the sample (bio-source, e.g., organism, cell type or line), its treatment, the extract preparation, and its labeling. MGED is developing an ontology for sample description (see http://www.mged.org/ontology) the use of which is encouraged. Here we list the most essential items that are usually needed.
1) Bio-source properties
- organism (NCBI taxonomy)
- contact details for sample
- descriptors relevant to the particular sample, such as
- development stage
- organism part (tissue)
- cell type
- animal/plant strain or line
- genetic variation (e.g., gene knockout, transgenic variation)
- individual genetic characteristics (e.g., disease alleles, polymorphisms)
- disease state or normal
- is additional clinical information available (link)
- the individual (for interrelation of the samples in the experiment)
2) Biomaterial manipulations: laboratory protocol, including relevant parameters, e.g.,
3) Hybridization extract preparation protocol for each extract prepared from the sample, including
- extraction method
- whether total RNA, mRNA, or genomic DNA is extracted
- amplification (RNA polymerases, PCR)
4) Labeling protocol for each labeling prepared from the extract, including
- amount of nucleic acids labeled
- label used (e.g., A-Cy3, G-Cy5, 33P, ..)
- label incorporation method
5) External controls added to hybridization extract(s) (spiking controls)
3. Hybridization procedures and parameters
Each hybridization description should include
- element on array expected to hybridize to spiking control
- spike type (e.g., oligonucleotide, plasmid DNA, transcript)
- spike qualifier (e.g., concentration, expected ratio, labelling methods if different than that of the extract)
1) information about which labeled extract (related to which sample, which extract) and which array (e.g., array design, batch and serial number) has been used in the experiment; and
2) the hybridization protocol, normally including
the solution (e.g., concentration of solutes)
quantity of labeled target used
time, concentration, volume, temperature
description of the hybridization instruments
4. Measurement data and specifications of data processing
We distinguish between three levels of data processing - raw data (images), image quantitations and gene expression data matrix. Each hybridization has at least one image, each image has a corresponding image quantitation table, where a row represents an array design element and a column to a different quantitation types, such as mean or median pixel intensity. Several quantitation tables can be combined using data processing metrics to obtain the 'final' gene expression measurement table associated with the experiment.
1) Raw data description should include
- for each scan laboratory protocol for scanning, including scanning hardware and software, scan parameters, including laser power, spatial resolution, pixel space, PMT voltage;
- scanned images;
It should be noted that MGED does not have consensus whether the provision of images is a part of MIAME.
2) Image analysis and quantitation
3) Normalized and summarized data - gene expression data matrix
- image analysis software specification and version, availability, and the description or identification of the algorithm and all the parameters used
- for each image the complete image analysis output (of the particular image analysis software)
gene expression data table(s) derived from the experiment as the whole,
- data processing protocol, including normalization algorithm (for detailed recommendations, see
- derived measurement value summarizing related elements and replicates as used by the author (this may constitute replicates of the element on the same or different arrays or hybridizations, as well as different elements related to the same entity e.g., gene)
- providing a reliability indicator for each datapoint (e.g., standard deviation) is encouraged
This ends the experiment description. The document is based on the earlier version MIAME 1.0 and discussions at MGED 4 meeting. MIAME is continuously developing in accordance with our understanding of microarray technology and its applications. Please join the MIAME discussion list (MIAME mailing list subscribe) and contribute with your ideas and comments.