Array Design File - ADF
The ADF (Array Design File) format contains only data about the array design. It is supposed to be simple and human readable, to be easy to create without specific knowledge of the MAGE-OM, and MIAME [array design part] compliant.
Initially, an ADF was only one tabular plain text file and was composed of the following three domains: (see initial ADF specification[adfc] for more details):
- Features - coordinates of the element (the spot) on the microarray;
- Reporters - data about the used reporters associated to features;
- CompositeSequences - data about composite sequences.
Each of these domains objects was divided in items. Initial ADF examples can be seen on MIAMExpress[miaa].
There is differents relations between Features, Reporters and CompositeSequence:
- A Feature - A Reporter - A CompositeSequence (figure1.1)
- A Feature - A Reporter - several CompositeSequences(figure1.2)
Figure 1.1:
Feature/Reporter/CompositeSequence relation N.1.
F : Feature
R : Reporter
C : Composite Sequence
|
Figure 1.2:
Feature/Reporter/CompositeSequence relation N.2.
F : Feature
R : Reporter
C : Composite Sequence
|
The ADF format has been revised in order to:
- ease the creation process for biologist and array manufacturers;
- allow data representation and capture so that enough information is supplied to exactly match MIAME requirement and the needs of MAGE good practice[adfb].
In the new ADF, data, corresponding to Features, Reporters and CompositeSequences,has been split in three tabular plain text files - or three worksheets in a Microsoft Excel workbook - according to the ADF specification[adfa]-with additional information. This information was needed or useful for conversion to MAGE-ML[magb](see section 1.1.2).
So, the three new components of an ADF are: (see annexe A)
- "Header" worksheet / Header file - ".adh" (see annexe A.1.1)
Containing additional information needed or really biologicically useful for comparison (Database), like date of public release and submitter name, or for conversion to MAGE-ML.
- "FeatureReporter" worksheet / FeatureReporter file - "adr" (see annexe A.1.1)
Containing Feature and Reporter data. This file corresponds to the previous ADF format, even though slight modifications have been incorporated. Each row corresponds to a deposit in the array: position (feature) and biological information on the reporter.
- "Composites" worksheet / CompositeSequence file - "adc" (see annexe A.1.1)
Optional, only when a more complex array design has been created and where Reporters can be combined in order to represent different genetic elements. Typically, the case of an array devised to monitor splice variants eventsor an operon. Reporters, representing exons, can be grouped in various ways to represent the different transcripts. This third file allows for the description of such relationships (map) between reporters and CompositeSequence (representing the various transcripts)
In case of "one to one relation" between reporters and composite sequences (no CompositeSequence file), it is supposed that reporter and composite sequence have mainly the same data. Most of the reporter data are reported to the CompositeSequence (name, the reporter associated biological sequence in case of CGH application and Gene Expression application or the Assigned genes (BioSequence) in case of PCR technology or ChIP application).
Subsections
PierreMarguerite-EBI,pierre@ebi.ac.uk