Liu X, Liu W, Lenstra JA, Zheng Z, Wu X, Yang J, Li B, Yang Y, Qiu Q, Liu H, Li K, Liang C, Guo X, Ma X, Abbott RJ, Kang M, Yan P, Liu J.

Nat Commun. 2023 Sep 19;14(1):5617. doi:10.1038/s41467-023-41220-x

Evolutionary origin of genomic structural variations in domestic yaks.

“assemblies were constructed for 6 wild and 15 domestic yaks” “we used a uniform standard pipeline to annotate these 47 bovine genomes. We identified an average of 24,368 protein-coding genes for each assembly” “1,048,639 high-confidence SNPs were detected and used in phylogenetic analyses with the water buffalo genome as outgroup” “We further constructed a species tree on the basis of 8428 single-copy core genes through selecting one representative individual of each species” “Pangenomes were constructed for yaks and cattle and a super-pangenome for the 7 Bovini species. For the yak pangenome the total gene set approached saturation at n = 20. The percentages of core (present in all 22 genomes), near-core (present in 20–21 genomes) and variable (found in 1–19 genomes) gene families were 50.18, 10.91, 38.91%, respectively.” “ In pairwise comparisons of the assemblies constituting the pangenome, each assembly possessed 123 to 2113 genes not present in the other genome” we constructed a multi-assembly graph-based genome of the 47 genomes used in the phylogenomic and super-pangenome analyses. This comprised 3.14 gigabases (Gb) spread across 5,449,222 nodes (the number of fragments of sequences) and connected by 4,889,530 edges (the connections between nodes), with non-reference nodes spanning 387.0 Mb. The core (shared by all genomes), near-core (in 46 or 45 genomes) and variable nodes (in 44 or less samples) accounted for 60.8, 17.0, 22.2% of all nodes.” “We detected SVs ( ≥ 50 bp) in the graph-based genome using the bubble popping algorithm of gfatools25 and retained 293,712 SVs (81.7% <500 bp, 99.76% <10 kb) that could be genotyped in the BosMut3.0 yak reference genome or at least one other genome” “Next, we used the graph-genotyping software Vg (v1.36.0) on 386 bovines (233 yaks, 140 cattle, 4 bison, 8 wisent and one gaur, including the novel genome sequences for assembling the super-pangenome) for which resequencing data with >6× coverage were available. This yielded 610,921 genotyped SVs, from which 57,432 were retained after quality filtering”