Friday 9 August 2013

vcf-compare problems in practice

Recently, I'm using vcftool to compare two vcf files. The command to finish this job is vcf-compare. But the developer of vcf tools doesn't illustrate the how to use it in detail, which causes a lot of problem in pratice.

The two vcf files I'm comparing are supposed to overlap a lot with each other. But the vcf compare told me that there was no overlap between them. This is the output of vcf-compare:

Error: There is no overlap between any of the samples, yet haplotype comparison was requested.

By chance, there should be some overlapping. So I supposed there were some problems in my pipeline and data. After comparing a lot of situations, I found two problems of the data:
1. The format header should be same. Since the two vcf files come from different bam files, the last field of format header is the name of their original bam file. This will lead to the difference in format and it should be same.

2. The name of chromosome. In my case, one file using 1 to representing chr1, while the other using chr1. So I need to add chr in before the numbers.

After fixing two problems above, vcf-compare works well.


No comments:

Post a Comment