Nash AJ, Lenhard B.
Bioinformatics. 2019 Jul 15;35(14):2354-2361. doi: 10.1093/bioinformatics/bty1014.
A novel measure of non-coding genome conservation identifies genomic regulatory blocks within primates.
“The kurtosis of the distribution of the lengths of all identical sequences was calculated in [30 kbp] bins across the genome.”
“Runs of 100% sequence identity were [...] filtered for annotated repeats and exonic sequences.”
“The kurtosis of the distribution of lengths in each bin was then calculated as [...] R(F) = q0.99(F) − q0.01(F) / G50 where F is the distribution of the lengths of runs of perfect sequence identity in a bin, and G50 is the range of the middle 50% of the distribution of lengths of all runs of identity, from all bins (background distribution); calculated as [...] q0.75(J) − q0.25(J) where J is the distribution of the lengths of runs of perfect sequence identity across the whole genome.”
“This is an adaptation of the robust kurtosis measure proposed in Ruppert (1987).”
“There is a strong correlation between kurtosis and CNE density, and this correlation is greater within GRBs than outside GRBs”