Long-read sequence assembly: a technical evaluation in barley

2021 | journal article. A publication with affiliation to the University of Göttingen.

Jump to: Cite & Linked | Documents & Media | Details | Version history

Cite this publication

​Long-read sequence assembly: a technical evaluation in barley​
Mascher, M.; Wicker, T.; Jenkins, J.; Plott, C.; Lux, T.; Koh, C. S. & Ens, J. et al.​ (2021) 
The Plant Cell33(6) pp. 1888​-1906​.​ DOI: https://doi.org/10.1093/plcell/koab077 

Documents & Media

License

GRO License GRO License

Details

Authors
Mascher, Martin; Wicker, Thomas; Jenkins, Jerry; Plott, Christopher; Lux, Thomas; Koh, Chu Shin; Ens, Jennifer; Gundlach, Heidrun; Boston, Lori B; Tulpová, Zuzana; Stein, Nils
Abstract
Abstract Sequence assembly of large and repeat-rich plant genomes has been challenging, requiring substantial computational resources and often several complementary sequence assembly and genome mapping approaches. The recent development of fast and accurate long-read sequencing by circular consensus sequencing (CCS) on the PacBio platform may greatly increase the scope of plant pan-genome projects. Here, we compare current long-read sequencing platforms regarding their ability to rapidly generate contiguous sequence assemblies in pan-genome studies of barley (Hordeum vulgare). Most long-read assemblies are clearly superior to the current barley reference sequence based on short-reads. Assemblies derived from accurate long reads excel in most metrics, but the CCS approach was the most cost-effective strategy for assembling tens of barley genomes. A downsampling analysis indicated that 20-fold CCS coverage can yield very good sequence assemblies, while even five-fold CCS data may capture the complete sequence of most genes. We present an updated reference genome assembly for barley with near-complete representation of the repeat-rich intergenic space. Long-read assembly can underpin the construction of accurate and complete sequences of multiple genomes of a species to build pan-genome infrastructures in Triticeae crops and their wild relatives.
Abstract Sequence assembly of large and repeat-rich plant genomes has been challenging, requiring substantial computational resources and often several complementary sequence assembly and genome mapping approaches. The recent development of fast and accurate long-read sequencing by circular consensus sequencing (CCS) on the PacBio platform may greatly increase the scope of plant pan-genome projects. Here, we compare current long-read sequencing platforms regarding their ability to rapidly generate contiguous sequence assemblies in pan-genome studies of barley (Hordeum vulgare). Most long-read assemblies are clearly superior to the current barley reference sequence based on short-reads. Assemblies derived from accurate long reads excel in most metrics, but the CCS approach was the most cost-effective strategy for assembling tens of barley genomes. A downsampling analysis indicated that 20-fold CCS coverage can yield very good sequence assemblies, while even five-fold CCS data may capture the complete sequence of most genes. We present an updated reference genome assembly for barley with near-complete representation of the repeat-rich intergenic space. Long-read assembly can underpin the construction of accurate and complete sequences of multiple genomes of a species to build pan-genome infrastructures in Triticeae crops and their wild relatives.
Issue Date
2021
Journal
The Plant Cell 
ISSN
1040-4651
eISSN
1532-298X
Language
English

Reference

Citations


Social Media