pages tagged LAST

bibliography in progress...

Whole-genome alignments with reversed sequences as negative controls showed that e-value filtering is not enough to remove spurious alignments of tandem repeat which therefore need to be masked (Frith MC, Hamada M and Horton P., 2011).
lastdb can use various seeding schemes to build its index. Frith and Noé (2014) discuss some of them. The RY seeds are made of non-overlapping words using the two-letter alphabet R = A|G, Y = C|T, to increase speed with a good tradeoff in sensitivity (Frith MC, Noé L, Kucherov G, 2020).
last-postmask (Frith, 2011): discards alignments that contain a significant amount of lower-case-masked sequences.
last-split (Frith and Kawaguchi, 2015): heuristic algorithm inspired by the “repeated matches algorithm” of Durbin and coll. (1998). It searchs for an optimal set of local alignments (as opposed to a set of optimal local alignments). Its output is also used by third-party tool NanoSV (Cretu and coll., 2017).
last-train (Hamada, Ono, Asai and Frith, 2017): estimation of alignment parameters.
local-rearrangements (Frith and Khan, 2018): detection and display of rearrangements supported by multiple long reads and by the ancestrality of the reference sequence.
tandem-genotypes (Mitsuhashi and coll., 2019): detection of expansion of tandem repeats, after alignment with last-split.
LAST can align DNA sequences to protein databases using a 64 x 21 substitution matrix Yao and Frith, 2020.
JRA (Joint Read Alignment) uses LAST Shrestha and coll., 2018.
A tutorial for the use of dnarrange is published in Frith and Mitsuhashi, 2022.

Martin C. Frith, Satomi Mitsuhashi

Posted August 15, 2022. doi:10.1101/2022.05.30.494079

Finding rearrangements in nanopore DNA reads with last and dnarrange

Tutorial on how to use dnarrange. Examples of gene conversion, repeat insertion in the reference, pseudogene insertion in the query, etc.

Frith MC, Hamada M, Horton P.

BMC Bioinformatics. 2010 Feb 9;11:80. doi: 10.1186/1471-2105-11-80

Parameters for accurate genome alignment.

Aligned genomes after reversing (not reverse-complementing) them as a negative controls. In these comparisons, all alignments are spurious. A large number of spurious alignments were found, and this could be reduced by masking tandem repeats. Spuriously alignments in tandem repeats get abnormally high scores. “Bad” scoring matrices tend to extend alignments with spurious low-quality arms. The X-drop parameter prevents the aligner from extending alignments too far, but high X-drop values can cause small alignments to be discarded by some software because the score becomes negative.

Yin Yao, Martin C. Frith

In: Martín-Vide C., Vega-Rodríguez M.A., Wheeler T. (eds) Algorithms for Computational Biology. AlCoB 2021. Lecture Notes in Computer Science, vol 12715. Springer, Cham. DOI:10.1007/978-3-030-74432-8_11

Improved DNA-versus-Protein Homology Search for Protein Fossils

Uses a 64 x 21 substitution matrix and automatically learns the genetic code. Detected fossils of the polinton and DIRS/Ngaro repeat elements in the human genome. 10 times faster than blastx.

Frith MC, Noé L, Kucherov G.

Bioinformatics. 2020 Dec 21;36(22-23):5344–50. doi:10.1093/bioinformatics/btaa1054.

Minimally-overlapping words for sequence similarity search.

Sparse seeds made of minimally overlapping words improve the speed with a good tradeoff on sensitivity. Describes the seeds RY4, …, RY32 used in LAST.

Frith MC, Noé L.

Nucleic Acids Res. 2014 Apr;42(7):e59. doi:10.1093/nar/gku104

Improved search heuristics find 20,000 new alignments between human and mouse genomes.

“using more codesigned seed patterns makes the alignment more sensitive but slower. The interesting point, though, is that using more seeds beats increasing the rareness threshold. For example, using four seeds with m 1⁄4 10 is both faster and more sensitive than one seed with m 1⁄4 100. The downside is that more seeds require more memory.” “We also tried aligning 10 000 random 1-kb chunks of the melanogaster genome to the pseudoobscura genome. In this case, the 1:1 [transitions:transversions] seeds perform better than the 3:2 seeds, as expected.” “Mammals have a greater excess than Drosophila, presumably because they have more methylcytosine, which mutates rapidly to thymine. Less-similar genomes have a lower excess of transitions: this is as expected because the transitions cannot keep increasing linearly but instead tend to an asymptote.”

Shrestha AMS, Frith MC, Asai K, Richard H.

Nucleic Acids Res. 2018 Feb 16;46(3):e18. doi:10.1093/nar/gkx1175

Jointly aligning a group of DNA reads improves accuracy of identifying large deletions.

For short reads.

Cretu Stancu M, van Roosmalen MJ, Renkens I, Nieboer MM, Middelkamp S, de Ligt J, Pregno G, Giachino D, Mandrile G, Espejo Valle-Inclan J, Korzelius J, de Bruijn E, Cuppen E, Talkowski ME, Marschall T, de Ridder J, Kloosterman WP.

Nat Commun. 2017 Nov 6;8(1):1326. doi:10.1038/s41467-017-01343-4

Mapping and phasing of structural variation in patient genomes using nanopore sequencing.

Primary paper for NanoSV. Fed with last-split alignments.