Predicting the deleteriousness of missense variants is extremely difficult. Variants whithin sites essential
to protein function is an important criterion for variant pathogenicity prediction. However, for many proteins these
essential sites have not been identified due to lack of statistical power and experimental data.
Here, we have developed a novel statistical framework to identify missense pathogenic enriched regions
(PERs). We compare missense variant density identified in individuals of the general population (gnomAD, n=
2,219,811) against missense variant density retrieved from patient variant databases (ClinVar/HGMD, n = 76,153).
To gain power, we grouped 9,990 genes into 2,871 gene families and evaluated the density of pathogenic variants across all
members of the gene family.
We identified 464 PERs spanning 41,463 amino acids in 1,252 genes. In addition, gene-wise analysis was able to identify
251 additional PERs involving 2,639 amino acids. These regions can be effectively constrained from variation in the general
population and at the same time enriched with disease associated variants.
With the present tool you can explore the neutral/pathogenic missense burden in
a gene- or gene-family wide level. In particular, for large gene families, such as ion channels, our approach
facilitates evaluation of variant pathogenicity.