SIFTing the human genome for polymorphisms that affect protein function

Pauline Ng, University of Washington

Approximately half of the gene lesions currently known to be responsible for human-inherited disease are due to amino acid substitutions. Many missense substitutions are identified in single nucleotide polymorphism (SNP) data and large-scale random mutagenesis projects; some of these could potentially affect protein function. We have constructed a computational tool that uses sequence homology to predict whether a substitution affects protein function.

SIFT is a program which sorts intolerant from tolerant substitutions. SIFT may be used to identify plausible disease candidates among the SNPs that cause missense substitutions. Assuming that disease-causing amino acid substitutions are damaging to protein function, we applied SIFT to a database of missense substitutions associated with or involved in disease. SIFT predicted 69% of these to be damaging. SIFT gave predictions for over 3000 nonsynonymous SNPs (nsSNPs) from dbSNP, a database of sequence variants that may or may not be involved in disease. 75% of the variants were predicted to be tolerated. Some of the nsSNPs predicted to affect function were variants known to be associated with disease. Others were artifacts of SNP discovery.

A WWW interface and source code for SIFT is at http://sift-dna.org/.

Abstract Author(s): Pauline C. Ng and Steven Henikoff