Li C, Lenhard B, Luscombe NM.

Genome Res. 2018 Apr 4. pii: gr.231449.117. doi:10.1101/gr.231449.117

Integrated analysis sheds light on evolutionary trajectories of young transcription start sites in the human genome.

Uses “sequence homology is a reasonable proxy for TSS age”. Found polyA artefacts in the FANTOM5 HeliScopeCAGE libraries. Concludes that “1) new TSSs tend to have weaker transcription than old ones; 2) they tend to appear in repeat elements and associate with transcripts of uncertain functional status; 3) they are less likely to have a clear regulatory role, as demonstrated by the weaker regulatory potential from functional genomic data; 4) they also tend to appear in already active chromatin regions (e.g. near existing TSSs); and 5) new TSSs evolve more rapidly during their early phase of existence, which may be explained by the inherent instability of neighboring sequences or lack of a vital function”. Suggests that new promoters first have a TATA box, and then may lose it: “We also found that ~50% of young LTR-associated TSSs contain a TATA-box motif 25~35 bp upstream (Supplemental Fig. S6) – a TATA-box motif upstream to the TSS is found in many, but not all LTR consensus sequences – whereas the proportion drops to ~30% for old LTR-associated TSSs. This suggests that a substantial fraction of TATA-less promoters may have originated as LTR- derived TATA-box containing promoters.” Strong association between young TSS and repeat elements: “~70% of young TSSs have at least one repeat element within ±100 bp, but that only 24% of old TSSs do.”