Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.

Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. doi:10.1073/pnas.1932072100

Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes.

Primary paper for chains and nets, built with the BLASTZ and AXTCHAIN programs. Chains are one-to-many alignments and allow skipping over local inversions. In human/mouse comparisons, 2.0 inversion per Mbp, median length 814. Double gaps ≥ 100 per Mbp: 398.6, median length 411. Chains are called “short” when their span is <100,000 bases (span distribution of short chains apparently bimodal). 579 “long” chains (average length 983 kb) cover 32.9% of the bases in the human genome. Collectively all chains span 96.3% of the human genome and align to 34.6% of it. The authors note that the observed distribution of gap lengths violate the usual affine model of aligners.

“A chained alignment [is] an ordered sequence of traditional pairwise nucleotide alignments (“blocks”) separated by larger gaps, some of which may be simultaneous gaps in both species. [...] intervening DNA in one species that does not align with the other because it is locally inverted or has been inserted in by lineage-specific translocation or duplication is skipped”

“The chains are then put into a list sorted with the highest-scoring chain first. [...] each iteration taking the next chain off of the list, throwing out the parts of the chain that intersect with bases already covered by previously taken chains, and then marking the bases that are left in the chain as covered. [...] If a chain covers bases that are in a gap in a previously taken chain, it is marked as a child of the previous chain. In this way, a hierarchy of chains is formed that we call a net.”

“To be considered syntenic, a chain has to either have a very high score itself or be embedded in a larger chain, on the same chromosome, and come from the same region as the larger chain. Thus, inversions and tandem duplications are considered syntenic.”

“We define the (human) span of a chain to be the distance in bases in the human genome from the first to the last human base in the chain, including gaps, and we define the size of the chain as the number of aligning bases in it, not including gaps.”