File Type | File Name | Description |
---|---|---|
Raw Data | Sample1_R1.fastq.gz | Raw read1 sequence data |
Sample1_R2.fastq.gz | Raw read2 sequence data | |
iSAAC Alignment File | Sample1_sorted.bam | iSAAC alignment file |
Sample1_sorted.bam.bai | iSAAC alignment index file | |
SNP/INDEL Result | Sample1_SNP_INDEL.vcf | SNP/INDEL file (vcf format) |
Sample1_[chr*].xlsx | Convert SNP_INDEL result (excel file) | |
Sample1.genome.vcf.gz | Genomic VCF | |
Sample1.genome.vcf.gz.tbi | Genomic VCF index file | |
CNV Result | Sample1_CNVs.xlsx | Control-FREEC CNV result |
SV Result | Sample1_SV.vcf | Manta SV result |
Md5sum | Order_#samples_md5sum.xlsx | MD5 is a string of 32 hexadecimal values, which represents a 'fingerprint' of a file. By comparing the supplied MD5 value to the actual value computed by the MD5sums utility, you can make sure that the file that you downloaded off of the internet has not been tampered with or modified from the original file stored in our server. |
Example:
FASTQ file consists of four lines.
Quality score is represented with each character. One character matches its base with Phred+33.
Q = -10 log10(error rate)
Phred Quality Score | Probability of Incorrect Base Call | Base Call Accuracy |
---|---|---|
10 | 1 in 10 | 90% |
20 | 1 in 100 | 99% |
30 | 1 in 1000 | 99.9% |
40 | 1 in 10000 | 99.99% |
50 | 1 in 100000 | 99.999% |
60 | 1 in 1000000 | 99.9999% |
Q scores have been calibrated specifically to the Illumina system and its consumables. It does use Q score binning. This is necessary for Illumina runs due to the quantity of data being generated and since it cannot be turned off.
More information can be found here:
Illumina_technote_understanding_quality_scores.pdf The BAM is a compressed binary format of a SAM(Sequence Alignment Map). The BAM file contains information about
sequence alignment of reads against a large reference sequence.
Example :
Tag | Description |
---|---|
@HD | The header line |
@PG | Program and command line |
@RG | Read group. platform, sample name information |
@SQ | Reference sequence dictionary. The order of @SQ lines defines the alignment sorting order. |
Field | Description |
---|---|
QNAME | Query template name (read ID) |
FLAG | Bitwise flag |
RNAME | Reference sequence name (chromosome id) |
POS | 1-based leftmost mapping position |
MAPQ | Mapping quality |
CIGAR | CIGAR string |
RNEXT | Ref. name of the mate/next read |
PNEXT | Position of the mate/next read |
TLEN | Observed template length |
SEQ | Segment sequence |
QUAL | ASCII of Phred-scaled base QUALity+33 |
Optional | Optional fields. (TAG, TYPE, VALUE) |
The Variant Call Format (VCF) is a text file format that contains information about variants found at specific positions in a reference genome. The file format consists of meta-information lines, a header line, and data lines. Each data line contains information about a single variant.
Example :
Header | Description |
---|---|
#CHROM | Chromosome |
POS | Position (with the 1st base having position 1) |
ID | The dbSNP rs identifier of the SNP |
REF | Reference base(s) |
ALT | Comma separated list of alternate non-reference alleles called on at least one of the samples |
QUAL | A Phred-scaled quality score assigned by the variant caller. Higher scores indicate higher confidence in the variant (and lower probability of errors). |
FILTER | See FILTER tag table for possible entries. |
INFO | See INFO tag table for possible entries. |
FORMAT | See FORMAT tag table for possible entries. |
Filter status: PASS if this position has passed all filters, i.e. a call is made at this position. Otherwise, if the site has not . passed all filters, a semicolon-separated below list of codes for filters that fail.
Tag | Description |
---|---|
IndelConflict | Indel genotypes from two or more loci conflict in at least one sample |
SiteConflict | Site is filtered due to an overlapping indel call filter |
LowGQX | Locus GQX is below threshold or not present |
HighDPFRatio | The fraction of basecalls filtered out at a site is greater than 0.4 |
HighSNVSB | Sample SNV strand bias value (SB) exceeds 10 |
HighDepth | Locus depth is greater than 3x the mean chromosome depth |
LowDepth | Locus depth is below |
NotGenotyped | Locus contains forcedGT input alleles which could not be genotyped |
PloidyConflict | Genotype call from variant caller not consistent with chromosome ploidy |
Additional information: INFO fields are encoded as a semicolon-separated series of short keys with optional values in the format: <key>=<data>. The exact format of each INFO sub-field should be specified in the meta-information.
Tag | Description |
---|---|
END | End position of the region described in this record |
BLOCKAVG_min30p3a | Non-variant multi-site block. Non-variant blocks are defined independently for each sample. All sites in such a block are constrained to be non-variant, have the same filter value, and have sample values {GQX,DP,DPF} in range [x,y], y <= max(x+3,(x*1.3)). |
SNVHPOL | SNV contextual homopolymer length |
CIGAR | CIGAR alignment for each alternate indel allele |
RU | Smallest repeating sequence unit extended or contracted in the indel allele relative to the reference. RUs are not reported if longer than 20 bases |
REFREP | Number of times RU is repeated in reference |
IDREP | Number of times RU is repeated in indel allele |
MQ | RMS of mapping quality |
Tag | Description |
---|---|
GT | Genotype 0/0 - the sample is homozygous reference 0/1 - the sample is heterozygous, carrying 1 copy of each of the REF and ALT alleles 1/1 - the sample is homozygous alternate |
GQ | Genotype quality |
GQX | Empirically calibrated genotype quality score for variant sites, otherwise minimum of {Genotype quality assuming variant position,Genotype quality assuming non-variant position} |
DP | Filtered basecall depth used for site genotyping. In a non-variant multi-site block this value represents the average of all sites in the block. |
DPF | Basecalls filtered from input prior to site genotyping. In a non-variant multi-site block this value represents the average of all sites in the block. |
MIN_DP | Minimum filtered basecall depth used for site genotyping within a non-variant multi-site block |
AD | Allelic depths for the ref and alt alleles in the order listed. For indels this value only includes reads which confidently support each allele (posterior prob 0.51 or higher that read contains indicated allele vs all other intersecting indel alleles) |
ADF | Allelic depths on the forward strand |
ADR | Allelic depths on the reverse strand |
FT | Sample filter, 'PASS' indicates that all filters have passed for this sample |
DPI | Read depth associated with indel, taken from the site preceding the indel |
PL | Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification |
PS | Phase set identifier |
SB | Sample site strand bias |
Sample1_CNVs.xlsx file with coordinates of predicted copy number alterations.
Header | Description |
---|---|
Chromosome | Chromosome |
Start | Start position |
End | End position |
Predicted copy number | The number of copies |
Type of alteration | Types of CNV (gain, loss) |
Gene | Gene annotation in the CNV regions |
Example :
Header | Description |
---|---|
#CHROM | Chromosome |
POS | Position (with the 1st base having position 1) |
ID | Annotation, in the case of BND ('breakend') records for translocations, the ID value is used to link breakend mates or partners. |
REF ALT |
All variants are reported in the VCF using symbolic alleles unless they are classified as a small indel, in which case
full sequences are provided for the VCF REF and ALT allele fields. A variant is classified as a small indel if all of these criteria are met: -The variant can be entirely expressed as a combination of inserted and deleted sequence. -The deletion or insertion length is not 1000 or greater. -The variant breakends and/or the inserted sequence are not imprecise. |
QUAL | A Phred-scaled quality score assigned by the variant caller. Higher scores indicate higher confidence in the variant (and lower probability of errors). |
FILTER | See FILTER tag table for possible entries. |
INFO | See INFO tag table for possible entries. |
FORMAT | See FORMAT tag table for possible entries. |
Tag | Description |
---|---|
Ploidy | For DEL & DUP variants, the genotypes of overlapping variants (with similar size) are inconsistent with diploid expectation |
MaxDepth | Depth is greater than 3x the median chromosome depth near one or both variant breakends |
MaxMQ0Frac | For a small variant (<1000 bases), the fraction of reads in all samples with MAPQ0 around either breakend exceeds 0.4 |
NoPairSupport | For variants significantly larger than the paired read fragment size, no paired reads support the alternate allele in any sample. |
MinQUAL | QUAL score is less than 20 |
MinGQ | GQ score is less than 15 (filter applied at sample level and record level if all samples are filtered) |
MinSomaticScore | SOMATICSCORE is less than 30 |
SampleFT | No sample passes all the sample-level filters |
HomRef | Homozygous reference call |
Tag | Description |
---|---|
IMPRECISE | Imprecise structural variation |
SVTYPE | Type of structural variant |
SVLEN | Difference in length between REF and ALT alleles |
END | End position of the variant described in this record |
CIPOS | Confidence interval around POS |
CIEND | Confidence interval around END |
CIGAR | CIGAR alignment for each alternate indel allele |
MATEID | ID of mate breakend |
EVENT | ID of event associated to breakend |
HOMLEN | Length of base pair identical homology at event breakpoints |
HOMSEQ | Sequence of base pair identical homology at event breakpoints |
SVINSLEN | Length of insertion |
SVINSSEQ | Sequence of insertion |
LEFT_SVINSSEQ | Known left side of insertion for an insertion of unknown length |
RIGHT_SVINSSEQ | Known right side of insertion for an insertion of unknown length |
INV3 | Inversion breakends open 3' of reported location |
INV5 | Inversion breakends open 5' of reported location |
BND_DEPTH | Read depth at local translocation breakend |
MATE_BND_DEPTH | Read depth at remote translocation mate breakend |
JUNCTION_QUAL | If the SV junction is part of an EVENT (ie. a multi-adjacency variant), this field provides the QUAL value for the adjacency in question only |
SOMATIC | Flag indicating a somatic variant |
SOMATICSCORE | Somatic variant quality score |
JUNCTION_SOMATICSCORE | If the SV junction is part of an EVENT (ie. a multi-adjacency variant), this field provides the SOMATICSCORE value for the adjacency in question only |
CONTIG | Assembled contig sequence, if the variant is not imprecise (with --outputContig) |
Tag | Description |
---|---|
GT | Genotype |
FT | Sample filter, 'PASS' indicates that all filters have passed for this sample |
GQ | Genotype Quality |
PL | Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification |
PR | Spanning paired-read support for the ref and alt alleles in the order listed |
SR | Split reads for the ref and alt alleles in the order listed, for reads where P(allele|read)>0.999 |