Nash AJ, Lenhard B.

Bioinformatics. 2019 Jul 15;35(14):2354-2361. doi: 10.1093/bioinformatics/bty1014.

A novel measure of non-coding genome conservation identifies genomic regulatory blocks within primates.

“our method may have utility in the analysis of GRB developmental gene regulation in species that have undergone extreme genome compaction such as the puffer fish, Tetraodon nigroviridis, and the sea squirt, Oikopleura dioica”

“The kurtosis of the distribution of the lengths of all identical sequences was calculated in [30 kbp] bins across the genome.”

“Runs of 100% sequence identity were [...] filtered for annotated repeats and exonic sequences.”

“The kurtosis of the distribution of lengths in each bin was then calculated as [...] R(F) = q0.99(F) − q0.01(F) / G50 where F is the distribution of the lengths of runs of perfect sequence identity in a bin, and G50 is the range of the middle 50% of the distribution of lengths of all runs of identity, from all bins (background distribution); calculated as [...] q0.75(J) − q0.25(J) where J is the distribution of the lengths of runs of perfect sequence identity across the whole genome.”

“This is an adaptation of the robust kurtosis measure proposed in Ruppert (1987).”

“There is a strong correlation between kurtosis and CNE density, and this correlation is greater within GRBs than outside GRBs”