The FTDNA VCF file

The VCF file is about 11.4Mb may be saved and imported into MS Excel and saved as an xlsx file. There are around 240 thousand rows.
The image below shows a screen shot of part of the file. Of particular interest here is the hg38#2914361 position that shows a mutation from G > T.
The next image shows the corresponding BAM file. This is about 640Mb.


VCF file

The BAM file


VCF file Detail of BAM file location Detail of BAM
On row 253 hg38 #2914352 begins with a C then continues AAACCCAATTATAAAGAAGAAA.

The bold T is the mutation at hg38#2914361 fron G > T. However, note on the row above at the same position it is a G. A closer look below:

Close up

Notice in row 523,(above the arrow), the 4 positions from location 2914361 is TTAT and the 2 rows above are GTAG. The same loctions in ybrowse.org are GTAT. The vcf file at 2914361 has T allele. Also interestingly only row 523 shows the mutation. position 2914363 is A but the vcf has no result. We can also see comparing row 523 with its mutation G > T at ~361 and T > G at ~364 compared with with rows 522 and 521 where the alleles are ancestral. Hence I don't usually use the VCF file.

My BAM file

The dowloaded BAM file is large at 657Mb. However it is easy to mapipulate once a BAM reader is installed. I use BAMSeek.jar from Google Code Archive as it does the job admirably. It is easy to use plus it has a fast scroll through the BAM file. It may be downloaded here: BAMseek download It is a .jar file and needs Java Runtime Environment to be installed. This may be downloaded (free) from Java at https://www.java.com/en/download/ The Jar file needs to be associaed with Java - not easy. However there is a small utility that it makes it easy to open .jar files with a normal double click. It may be downloaded (also free) from https://bitstorm.org/jarx/ The first time BAMSeek is used opening may be slow - a few seconds. Once the BAM file is opened there are a number of columns that I do not use and like in a spreadsheet may be made minimum width. The only columns of interest are Position and the Read Sequence/ data, as above.

GoTo Software used

Return to Home page