MIAME Home
 MIAME 1.1
 MIAME MAGE-OM
 MIAME Checklist
 MIAME Software
 MIAME Archive
 Nov 1999
 Apr 2000
 May 2000
 Nov 2000
 Mar 2001
 Jan 2002
 Mar 2002
 MIAME 1.0
 Miscellaneous
 Checklist
 MIAME 1.0
 Home  Meetings  Workgroups  Mission  MGED Board  Site Map

Home : Workgroups : MIAME : MIAME Archive : MIAME 1.0

This an old version of MIAME.
Please find the latest version from here.

Minimum Information About a Microarray Experiment - MIAME

Version 1.0
Approved at MGED 3 meeting, Stanford University, March 28, 2001

The goal of the MIAME is to specify the minimum information that must be reported about an array based gene expression monitoring experiment in order to ensure the interpretability of the results, as well as potential verification by third parties. This is to facilitate establishing repositories and a data exchange format for array based gene expression data. The MGED group will encourag the scientific journals and funding agencies to adopt policies requiring data submissions to repositories, once MIAME compliant repositories and annotation tools are established.

Introduction:

The definition of the minimum information is aimed at cooperative data providers, and  is not intended to close possible loopholes in not providing the information.

Among the concepts in the definition is a list of 'qualifier, value, source' triplets, by means of which we would like to encourage the authors to define their own qualifiers and provide the appropriate values so that the list as the whole gives sufficient information to fully describe the particular part of the experiment.  The idea stems from the information sciences where 'qualifier' defines a concept, and 'value' contains the appropriate instance of the concept .'Source' is either user defined, or a reference to an externally defined ontology or controlled vocabulary, such as the species taxonomy database. The judgement regarding the necessary level of detail is left to the data providers. In the future these 'voluntary' qualifier lists may be gradually substituted by predefined fields, as the respective ontologies are developed.

Parts of the MIAME can be provided as references or links to pre-existing and identifiable descriptions.  For instance for commercial or other standard arrays, all the required information should normally be provided only once by the array provider and referenced thereafter by the users.  Standard protocols should also normally be provided only once.  It is necessary that either a valid reference or the information itself is provided for every experiment set. 

Definition:

The minimum information about a published microarray based gene expression experiment should include a description of the:

  1. Experimental design: the set of hybridisation experiments as a whole
  2. Array design: each array used and each element (spot) on the array
  3. Samples: samples used, extract preparation and labeling
  4. Hybridisations: procedures and parameters
  5. Measurements: images, quantitation, specifications
  6. Normalisation controls: types, values, specifications

An additional section dealing with the data quality assurance will be added in the next MIAME release. 

The following details should be provided for each array, sample, hybridisation and measurement in the experiment set:

1. Experimental design: the set of hybridisation experiments as a whole

This section describes the experiment, which may consist of one or more hybridisations, as a whole.  Normally 'experiment' should include a set of hybridisations which are inter-related and address a common question.  For instance, it may be all the hybridisations related to research published in a single paper.

  1. author (submitter), laboratory, contact information, links (URL), citations
  2. type of the experiment - maximum one line, for instance:
    • normal vs. diseased comparison
    • treated vs. untreated comparison
    • time course
    • dose response
    • effect of gene knock-out
    • effect of gene knock-in (transgenics)
    • shock
      (multiple types possible)
       
  3. experimental variables, i.e. parameters or conditions tested (e.g., time, dose, genetic variation, response to a treatment or compound)
  4. single or multiple hybridisations
    For multiple hybridisations:
    • serial (yes/no)
      • type (e.g., time course, dose response)
    • grouping (yes/no)
      • type (e.g., normal vs. diseased, multiple tissue comparison)

        Relationships between all the samples, arrays and hybridisations in the experiment.  Each sample, each array, and each hybridisation should be given a unique ID, and all the relationships should be listed (with appropriate comments where necessary).  For instance:

        Samples: S1, S2, S3
        Extracts: e1S1, e1S2, e1S3
        Labeled extracts: l1e1S1, l2e1S1, l1e1S2, l1e1S3
        Array types: T1, T2
        Arrays: a1T1, a2T1, a3T2
        Hybridisations: H1 is l1e1S1+l1e1S2 on a1T1
        H2 is l1e1S2+l1e1S3 on a2T1
        H3 is l2e1S1+l1e1S2 on a3T2

        Note that detailed descriptions of each sample, array and hybridisation  are provided in further sections. In the general case each sample may produce more than one extract, and each extract, more than one labeled extract.
         
  5. quality related indicators
    • quality control steps taken:
    • biological replicates?
    • technical replicates (replicate spots or hybs)?
    • polyA tails
    • low complexity regions
    • unspecific binding
    • other
  6. optional user defined "qualifier, value, source" list (see Introduction)
  7. a free text description of the experiment set or a link to a publication

2. Array design: each array used and each element (spot) on the array.

This section describes details of each array used in the experiment.  There are two parts of this section:  2.1 describes the list of physical arrays  themselves,  each of these  referring to specific array design types described in 2.2. We expect that the array design type descriptions will be given by the array providers and manufactures, in which case the users will simply need to reference them.

2.1 Array copy (physical instance)

  • unique ID as used in part 1
  • array design name (e.g., "Stanford Human 10K set")

2.2 Array design

The section consists of three parts a) description of the array as the whole, b) description of each type of elements (spot) used (properties that are typically common to many elements (e.g., 'synthesized oligo-nucleotides' or 'PCR products  from cDNA clones'), and c) description of the specific properties of each element, such as the DNA sequence.  In practice, the last part will be provided as a spread-sheet or tab-delimited file.

  1. a)array related information
    • array design name (e.g., "Stanford Human 10K set") as given in 2.1
    • platform type: in situ synthesized,  spotted or other
    • array provider (source)
    • surface type:  glass, membrane, other
    • surface type name
    • physical dimensions of array support (e.g. of slide)
    • number of elements on the array
    • a reference system allowing to locate each element (spot) on the array (in the simplest case the number of columns and rows is sufficient)
    • production date
    • production protocol (obligatory if custom produced)
    • optional "qualifier, value, source" list (see Introduction)
       
  2. properties of each type of elements (spots) on the array;  elements may be simple, i.e., containing only identical molecules, or composite, i.e., containing different oligo-nucleotides obtained from the same reference molecule;
    • element type unique ID
    • simple or composite
    • element type: synthetic oligo-nucleotides, PCR products, plasmids, colonies, other
    • single or double stranded
    • element (spot) dimensions
    • element generation protocol that includes sufficient information to reproduce the element
    • attachment (covalent/ionic/other)
    • optional "qualifier, value, source" list (see Introduction)
       
  3. specific properties of each element (spot) on the array:
    • element type ID from 2.2b
    • position on the array allowing to identify the spot in the image (see 5. a) below)
    • clone information, obligatory for elements obtained from clones:
      • clone ID, clone provider, date, availability
    • sequence or PCR primer information:
      • sequence accession number in DDBJ/EMBL/GenBank if known
      • sequence itself (if databases do not contain it)
      • primer pair information, if relevant
    • for composite oligonucleotide elements:
      • oligonucleotide sequences, if given
      • number of oligonucleotides and the reference sequence (or accession number), otherwise
    • one of the above should unambiguously identify the element
    • approximate lengths if exact sequence not known
    • gene name and links to appropriate databases (e.g., SWISS-PROT, or organism specific databases), if known and relevant
      Normally this information will be provided in one or more spread-sheets or tab-delimited files.

3. Samples: samples used, extract preparation and labeling

By a 'sample' we understand the biological material, from which the RNA gene products (or DNA) have been extracted for subsequent labeling, hybridisation and measuring.  This section describes the source of the sample (e.g., organism, cell type or line), its treatment, as well as preparation of the extract and its labeling, i.e., all steps that precedes the contact with an array (i.e., hybridisation).  This section is separate of each sample used in the experiment.  In practice, if the treatments are similar, differing only slightly, the descriptions can be given together, clearly pointing out the differences.

  1. sample source and treatment (this section describes the biological treatment which happens before the extract preparation and labelling, i.e., biological sample in which we intend to measure the gene expression; for each sample only some of the qualifiers given below may be relevant): 
    • ID as used in section 1
    • organism (NCBI taxonomy)
    • additional "qualifier, value, source" list; each qualifier in the list is obligatory if applicable; the list includes:
      • cell source and type (if derived from primary sources (s))
      • sex
      • age
      • growth conditions
      • development stage
      • organism part (tissue)
      • animal/plant strain or line
      • genetic variation (e.g., gene knockout, transgenic variation)
      • individual
      • individual genetic characteristics (e.g., disease alleles, polymorphisms)
      • disease state or normal
      • target cell type
      • cell line and source (if applicable)
      • in vivo treatments (organism or individual treatments)
      • in vitro treatments (cell culture conditions)
      • treatment type (e.g., small molecule, heat shock, cold shock, food deprivation)
      • compound
      • is additional clinical information available (link)
      • separation technique (e.g., none, trimming, microdissection, FACS)
    • laboratory protocol for sample treatment
  2. hybridisation extract preparation
    • ID as given in section 1
    • laboratory protocol for extract preparation, including:
      • extraction method
      • whether total RNA, mRNA, or genomic DNA is extracted
      • amplification (RNA polymerases, PCR)
    • optional "qualifier, value, source" list (see Introduction)
  3. labeling
    • ID as given in section 1
    • laboratory protocol for labelling, including:
      • amount of nucleic acids labeled
      • label used (e.g., A-Cy3, G-Cy5, 33P, ..)
      • label incorporation method
    • optional "qualifier, value, source" list (see Introduction)

4. Hybridisations: procedures and parameters

This section describes details of each hybridisation in the experiment.  Each hybridisation has a separate section 4, though if they are similar they may be described together.

  • ID as given in section 1
  • laboratory protocol for hybridisation, including:
    • the solution (e.g., concentration of solutes)
    • blocking agent
    • wash procedure
    • quantity of labelled target used
    • time, concentration, volume, temperature
    • description of the hybridisation instruments
  • optional "qualifier, value, source" list (see Introduction)

5. Measurements: images, quantitation, specifications:

This section describes the data obtained from each scan and their combinations

  1. hybridisation scan raw data:
    1. a1)the scanner image file (e.g., TIFF, DAT) from the hybridised microarray scanning
    2. a2)scanning information:
      • input: hybridisation ID as in Section 1
      • image unique ID
      • scan parameters, including laser power, spatial resolution, pixel space, PMT voltage;
      • laboratory protocol for scanning, including:
        • scanning hardware
        • scanning software
  2. image analysis and quantitation
    1. the complete image analysis output (of the particular image analysis software) for each element (or composite element - see 2.2.b), for each channel - normally given as a spread-sheet or other external file
    2. image analysis information:
      • input: image ID
      • quantitation unique ID
      • image analysis software specification and version, availability, and the description or identification of the algorithm
      • all parameters
  3. summarized information from possible replicates
    1. derived measurement value summarizing related elements as used by the author (this may constitute replicates of the element on the same or different arrays or hybridisations, as well as different elements related to the same entity e.g., gene)
    2. reliability indicator for the value of c1) as used by the author (e.g., standard deviation); may be "unknown"
    3. specification how c1 and c2 are calculated
      • input: one or more quantitation ID's
      • the specification should be based on values provided in b1

6.Normalisation controls, values, specifications

This section will be further detailed in the next MIAME version

  1. Normalisation strategy
    • spiking
    • "housekeeping" genes
    • total array
    • optional user defined "quality value"
  2. Normalisation algorithm
    • linear regression
    • log-linear regression
    • ratio statistics
    • log(ratio) mean/median centering
    • nonlinear regression
    • optional user defined "quality value"
  3. Control array elements
    • position (the abstract coordinate on the array)
    • control type (spiking, normalization, negative, positive)
    • control qualifier (endogenous, exogenous)
    • optional user defined "quality value"
  4. Hybridisation extract preparation
    • spike type
    • spike qualifier
    • target element
    • optional user defined "quality value"

 

Section 7 on quality control will be added to the next MIAME version.

This document represents overall consensus of MGED working group on microarray data annotations in all parts except section 5 a) 'hybridisation scan raw data'. A considerable majority of the working group supports the view that providing raw image data is an essential part of MIAME. However, there is also a notable minority that does not agree to this view. It is possible, that this requirement may be platform specific. We would like to encourage the microarray community to give us their views on the question, as well as on MIAME version 1.0 in general.

 

Home | Meetings | Workgroups | Mission | MGED Board | Site Map
 

Last modified: 26 Sep, 2005.                              Contact Us
This site is hosted by the EMBL -
European Bioinformatics Institute
The maintenance of these pages is partially supported by the European Commission as part of the
TEMBLOR project.
 

MGED Sponsors: