Nadezhda Sazonova, Edward Sazonov, E. James Harner, Journal of Biomedical Informatics 43 (2010) 51–59.




Problems of haplotyping and block partitioning have been extensively studied with regard to the regular genotype data, but more cost-efficient data called XOR-genotypes remain under-investigated. Previous studies developed methods for haplotyping of short-sequence partial XOR-genotypes. In this paper we propose a new algorithm that performs haplotyping of long-range partial XOR-genotype data with possibility of missing entries, and in addition simultaneously finds the block structure for the given data. Our method is implemented as a fast and practical algorithm. We also investigate the effect of the percentage of fully genotyped individuals in a sample on the accuracy of results with and without the missing data. The algorithm is validated by testing on the HapMap data. Obtained results show good prediction rates both for samples with and without missing data. The accuracy of prediction of XOR sites is not significantly affected by the presence of 10% or less missing data.