Work in progress
Chains and nets
Kent and coll. (2003) computed chained alignments with the AXTCHAIN program.
LAST
see LAST for a detailed bibliography.
Cactus
Paten and coll., (March 2011) describe cactus graphs where nodes are sets of adjascencies and edges are aligned blocks of sequences. A genome can be represented as path in these graphs.
Armstrong and coll. (2020) describe progressive cactus, an iterative approach where ancestral genomes are reconstituted using 2-5 pairs of in- and out-group comparisons, and then progressively aligned to each other.
Consumers of multiple genome sequence alignments
- PhyloCSF (Lin, Jungreis and Kellis (2011)).
Lin MF, Jungreis I, Kellis M.
Bioinformatics. 2011 Jul 1;27(13):i275-82. doi: 10.1093/bioinformatics/btr209
PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions.
Armstrong J, Hickey G, Diekhans M, Fiddes IT, Novak AM, Deran A, Fang Q, Xie D, Feng S, Stiller J, Genereux D, Johnson J, Marinescu VD, Alföldi J, Harris RS, Lindblad-Toh K, Haussler D, Karlsson E, Jarvis ED, Zhang G, Paten B.
Nature. 2020 Nov;587(7833):246-251. doi: 10.1038/s41586-020-2871-y
Progressive Cactus is a multiple-genome aligner for the thousand-genome era
Paten B, Diekhans M, Earl D, John JS, Ma J, Suh B, Haussler D.
J Comput Biol. 2011 Mar;18(3):469-81. doi: 10.1089/cmb.2010.0252
Cactus graphs for genome comparisons.
Nat Commun. 2022 Nov 15;13(1):6968. doi:10.1038/s41467-022-34630-w
Edgar RC.
Muscle5: High-accuracy alignment ensembles enable unbiased assessments of sequence homology and phylogeny.
Frith MC, Hamada M, Horton P.
BMC Bioinformatics. 2010 Feb 9;11:80. doi: 10.1186/1471-2105-11-80
Parameters for accurate genome alignment.
Frith MC, Noé L, Kucherov G.
Bioinformatics. 2020 Dec 21;36(22-23):5344–50. doi:10.1093/bioinformatics/btaa1054.
Minimally-overlapping words for sequence similarity search.
Song B, Marco-Sola S, Moreto M, Johnson L, Buckler ES, Stitzer MC.
Proc Natl Acad Sci U S A. 2022 Jan 4;119(1):e2113075119. doi:10.1073/pnas.2113075119
AnchorWave: Sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication.
Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.
Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. doi:10.1073/pnas.1932072100
Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes.
“A chained alignment [is] an ordered sequence of traditional pairwise nucleotide alignments (“blocks”) separated by larger gaps, some of which may be simultaneous gaps in both species. [...] intervening DNA in one species that does not align with the other because it is locally inverted or has been inserted in by lineage-specific translocation or duplication is skipped”
“The chains are then put into a list sorted with the highest-scoring chain first. [...] each iteration taking the next chain off of the list, throwing out the parts of the chain that intersect with bases already covered by previously taken chains, and then marking the bases that are left in the chain as covered. [...] If a chain covers bases that are in a gap in a previously taken chain, it is marked as a child of the previous chain. In this way, a hierarchy of chains is formed that we call a net.”
“To be considered syntenic, a chain has to either have a very high score itself or be embedded in a larger chain, on the same chromosome, and come from the same region as the larger chain. Thus, inversions and tandem duplications are considered syntenic.”
“We define the (human) span of a chain to be the distance in bases in the human genome from the first to the last human base in the chain, including gaps, and we define the size of the chain as the number of aligning bases in it, not including gaps.”
Frith MC, Noé L.
Nucleic Acids Res. 2014 Apr;42(7):e59. doi:10.1093/nar/gku104
Improved search heuristics find 20,000 new alignments between human and mouse genomes.
Mitsuhashi S, Frith MC, Mizuguchi T, Miyatake S, Toyota T, Adachi H, Oma Y, Kino Y, Mitsuhashi H, Matsumoto N.
Genome Biol. 2019 Mar 19;20(1):58. doi:10.1186/s13059-019-1667-6
Tandem-genotypes: robust detection of tandem repeat expansions from long DNA reads.
Nucleic Acids Res. 2018 Feb 28;46(4):1661-1673. doi:10.1093/nar/gkx1266
Frith MC and Khan S.
A survey of localized sequence rearrangements in human DNA.
Treangen TJ, Ondov BD, Koren S, Phillippy AM.
Genome Biol. 2014;15(11):524.
The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes.
Frith MC, Kawaguchi R.
Genome Biol. 2015 May 21;16:106. doi:10.1186/s13059-015-0670-9
Split-alignment of genomes finds orthologies more accurately.
Garrison E, Sirén J, Novak AM, Hickey G, Eizenga JM, Dawson ET, Jones W, Garg S, Markello C, Lin MF, Paten B, Durbin R.
Nat Biotechnol. 2018 Oct;36(9):875-879. doi:10.1038/nbt.4227
Variation graph toolkit improves read mapping by representing genetic variation in the reference.
Frith MC
PLoS One. 2011;6(12):e28819. doi:10.1371/journal.pone.0028819
Gentle masking of low-complexity sequences improves homology search.