High-quality chromosome-level genome assembly of Nicotiana benthamiana (2024)

Journal List
Sci Data
v.11; 2024
PMC11021556

As a library, NLM provides access to scientific literature. Inclusion in an NLM database does not imply endorsem*nt of, or agreement with, the contents by NLM or the National Institutes of Health.
Learn more: PMC Disclaimer | PMC Copyright Notice

Sci Data. 2024; 11: 386.

Published online 2024 Apr 16. doi:10.1038/s41597-024-03232-0

PMCID: PMC11021556

PMID: 38627408

Seo-Rin Ko,^#^1,² Sanghee Lee,^#^1,³ Hyunjin Koo,^#¹ Hyojeong Seo,⁴ Jaewoong Yu,⁴ Yong-Min Kim,^1,^2,⁵ Suk-Yoon Kwon,^1,³ and Ah-Young Shin^1,²

Author information Article notes Copyright and License information PMC Disclaimer

Associated Data

Data Citations

Data Availability Statement

Abstract

Nicotiana benthamiana is a fundamental model organism in plant research. Recent advancements in genomic sequencing have revealed significant intraspecific genetic variations. This study addresses the pressing need for a precise genome sequence specific to its geographic origin by presenting a comprehensive genome assembly of the N. benthamiana LAB strain from the Republic of Korea (NbKLAB). We compare this assembly with the widely used NbLAB360 strain, shedding light on essential genomic differences between them. The outcome is a high-quality, chromosome-level genome assembly comprising 19 chromosomes, spanning 2,762 Mb, with an N50 of 142.6 Mb. Comparative analyses revealed notable variations, including 46,215 protein-coding genes, with an impressive 99.5% BUSCO completeness score. Furthermore, the NbKLAB assembly substantially improved the QV from 33% for NbLAB360 to 49%. This refined chromosomal genome assembly for N. benthamiana, in conjunction with comparative insights, provides a valuable resource for genomics research and molecular biology. This accomplishment forms a strong foundation for in-depth exploration into the intricacies of plant genetics and genomics, improved precision, and a comparative framework.

Subject terms: DNA sequencing, Polyploidy in plants

Background & Summary

Nicotiana benthamiana is an indispensable model organism in plant science, particularly for studying plant-microbe interactions and plant pathology due to its high susceptibility to various diseases, especially viral infections¹. Its susceptibility to Agrobacterium has led to the use of agro-infiltration techniques for transient gene expression in leaf tissues². In recent years, plant-derived systems, with N. benthamiana at the forefront, have become leading platforms for producing recombinant proteins, enzymes, vaccine antigens, antimicrobial peptides, diagnostic/research reagents, and monoclonal antibodies^3–6. N. benthamiana plays a pivotal role in fundamental discoveries related to RNA interference, plant-pathogen interactions, metabolic pathway engineering, functional genomics, synthetic biology, and gene editing⁷. Despite its potential for biomanufacturing, challenges in achieving optimal yield and purity of protein products, often due to unintended protein degradation, persist⁸. N. benthamiana, belonging to the Suaveolentes section of the Nicotiana genus, is an allopolyploid believed to have originated from a single crossbreeding event. It possesses a basal haploid chromosome number of n = 12. Initially thought to have 24 chromosome pairs, subsequent polyploidization and chromosomal rearrangements have reduced the count of chromosomes to 19, resulting in a deficit of five chromosomes compared to the presumed ancestral state^9–12. Short-read sequencing initially yielded fragmented drafts of the N. benthamiana genome^13,14.

Recognizing the limitations of short-read sequencing, efforts have sought to explore long-read sequencing techniques. Two recent publications revealed a novel N. benthamiana draft genome using long-read sequencing. Kurotani et al. reported the genome using a PacBio Sequel II with seven SMRT cells¹⁵. In another paper, a hybrid approach that combined PacBio and Oxford Nanopore Technologies (ONT) sequencing platforms led to the creation of a high-quality genome assembly for the N. benthamiana LAB strain NbLAB360¹⁶. Comparative analyses highlighted disparities in single nucleotide polymorphism frequencies between NbLAB360 from the USA and EU laboratory accessions, emphasizing intraspecific genomic variations linked to geographical origin¹⁶. Additionally, we also analysed differences with the most recently published N. benthamiana genome Niben261¹⁷. A similar observation of breed-specific genomic variations across regions was also reported in a recent study of Korean native cattle¹⁸. These findings underscore the need for an accurate and high-quality genome sequence of the N. benthamiana LAB strain widely utilized in the Republic of Korea, NbKLAB.

In this investigation, we assembled a high-quality genome of N. benthamiana by using a combination of Illumina short reads, ONT long reads, and high-throughput chromosome conformation capture (Hi-C) data. This comprehensive approach yielded a genome assembly spanning 2,762 Mb, characterized by an N50 value of 142.6 Mb. Employing Hi-C scaffolding, we validated the presence of 19 chromosomes by utilizing the genome contact map. Furthermore, our efforts culminated in the identification of a total of 46,215 protein-coding genes, leading to an exceptional Benchmarking Universal Single-Copy Orthologs (BUSCO) score of 99.5%. This high-quality chromosomal-level genome assembly of NbKLAB establishes a robust cornerstone for prospective fundamental and applied research endeavors centered around N. benthamiana.

Methods

DNA extraction and genome sequencing

N. benthamiana Republic of Korea LAB (NbKLAB) plants were grown in standard fertilized soil under controlled environmental conditions at a constant temperature of 25°C with a 16-h light and 8-h dark photoperiod. 10 g of young leaves were collected from plants for 4 weeks, and high-molecular-weight genomic DNA was extracted. Nuclei were initially extracted from N. benthamiana cells using an N. benthamiana Nuclei Isolation Buffer (NIBM) (10 mM Tris-HCl pH 8.0, 10 mM EDTA pH 8.0, 100 mM KCL, 0.5 M sucrose, 4 mM spermidine, 1 mM spermine, and 0.15% β-mercaptoethanol). High-quality genomic DNA (gDNA) was obtained from these intact nuclei using a lysis buffer (50 mM Tris-HCl pH 7.5, 1.4 M NaCl, 20 mM EDTA pH 8.0, and 0.5% SDS). The quality of the isolated gDNA was assessed by measuring A₂₆₀/₂₈₀ absorbance ratios, which ranged from 1.8 to 2.0, using a Nanodrop 2000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). To evaluate the concentration and purity of the gDNA, gel electrophoresis was performed. The size distribution of the gDNA fragments was determined using a TapeStation system (Agilent, Australia). Most gDNA fragments were distributed between 10 and 100 kb. The sequencing of N. benthamiana was conducted on three ONT PromethION R10.4 flow cell (FLO-PRO 112). Sequencing libraries were prepared according to the recommended protocols provided by ONT.

Hi-C library preparation

Hi-C technology was also employed for chromosome-level genome assembly. Hi-C library construction protocol is as follows. Flower, root, and leaf tissue was mixed with 1% formaldehyde for fixing chromatin, and then the nuclei were isolated following a nuclei isolation method¹⁹. Fixed chromatin was digested with HindIII-HF (New England BioLabs), and we filled the 5′ overhangs with nucleotides and biotin-14-dCTP (Invitrogen) and ligated free blunt ends. After ligation, we purified DNA and removed biotin from unligated DNA ends. Fragmentation and size selection were performed to shear the Hi-C DNA. Hi-C library preparation was performed using the ThruPLEX® DNA-seq Kit (Takara Bio USA, Inc. Mountain View, CA, USA). The Hi-C library was evaluated by the distribution of fragment sizes with TapeStation D1000 (Agilent Technologies, Santa Clara, CA, USA) and sequenced in Illumina NovaSeq. 6000 (Illumina) with a length of 150-bp paired-end reads. To carry out Hi-C scaffolding analysis, 42.6 Gb (~15.3X) of NovaSeq data was generated.

Genome de novo assembly

To achieve a high-quality assembly, we initiated the process with rigorous quality control of the initial raw reads. Reads with a quality score below 7 and a length shorter than 5,000 bp were filtered out. Additionally, to remove chloroplast and mitochondria sequences, we obtained sequences from closely related species and conducted BLAST analysis. Subsequently, sequences with a query coverage of 80 or higher were removed, and Hi-C scaffolding was performed. As a result of these procedures, we endeavored to thoroughly eliminate potential contamination from chloroplast or mitochondrial genomes. Bascalling using Guppy v.6.1.1²⁰ was carried out to eliminate low-quality reads, followed by read quality assessment using Nanoplot v.1.39²¹. A subsequent quality assessment conducted using Nanoplot v.1.39 provided insights into both the length and quality distributions of the reads. This led to the retention of 5,442,228 reads, spanning a total of 144,579,996 kb, with an N50 read length of 36,409. Next, we utilized NextDenovo v.2.5.0 (https://github.com/Nextomics/NextDenovo) to assemble the N. benthamiana genome using only the Nanopore long reads. The draft assembly was polished using NextPolish v.1.4.0²², first with long-read sequences used in the de novo assembly for one round, and then with short-read genome sequences produced by the Illumina sequencing platform for two rounds. Then we employed the Hi-C technology to obtain chromosome-level genome assembly. Firstly the paired-end Illumina reads were mapped onto the polished assembly using HiC-Pro v.3.1.0²³ with default parameters to check the quality of the raw Hi-C reads. We obtained reads with approximately 15.3-fold coverage through Hi-C, with a total of 34,262,399 contacts, accounting for 25.37% of the filtered reads. Then Juicer v.2.13.07²⁴ and 3D-DNA v.201008²⁵ were applied to cluster the genomic contig sequences into potential chromosomal groups. Afterward, contig orientations were validated and ambiguous fragments were removed with manual curation using Juicebox v.1.11.08²⁶, whereby consecutive contigs were linked to generate a high-quality genome assembly. The density of Hi-C interactions between chromosomes was confirmed through heatmap analysis and Hi-C matrix (Fig.1a, Table1). Our evaluation of k-mer completeness indicates that N. benthamiana possesses a paleopolyploid genome (Fig.1b).

Open in a separate window

Fig. 1

Table 1

Hi-C library statistics for NbKLAB genome.

	Long read + short read polishing	Long read + short read polishing + Hi-C scaffolding
	Including scaffold	Including scaffold	Only chromosome
Contigs	140	668	19
Total length	2,792,201,312	2,792,183,363	2,762,242,804
N50	55,310,590	142,645,986	142,645,986
Minimum length	23,786	1,000	129,057,907
Maximum length	144,363,079	182,285,862	182,285,862
GC	37.84%	37.84%	37.75%
BUSCO	99.70%	99.60%	99.50%

Open in a separate window

We conducted a comparative analysis of the NbKLAB genome, NbLAB360, and Niben261 dataset to assess genome similarity. All three datasets featured the same 19 chromosomes. Quantitative metrics, including genome size (2,762,242,804 bp), maximum contig size (182,285,862 bp), and N50 values (142.6 Mb), revealed remarkable similarities the NbLAB360 dataset. However, the BUSCO v.5.3.2²⁷ value for NbKLAB reached 99.5%, indicating a slightly superior assembly quality compared to the BUSCO values of NbLAB360 and Niben261, which are 98.5% and 98.7%, respectively (Table2). Additionally, the Long Terminal Repeat (LTR) Assembly Index (LAI), a method for assessing genome assembly completeness by examining the accuracy of repeat sequence assemblies, was applied using LTR_retriever²⁸. Additionally, we employed Circos v.0.69–9 software²⁹ to depict the genome density features shown in Fig.1c.

Table 2

Genome assembly and annotation statistics for NbKLAB and NbLAB360, and Niben261 genomes.

Species	NbKLAB	NbLAB360	Niben261
Number of chromosome	19	19	19
N50 (Mb)	142.6	143.1	152.6
L50	9	9	9
Maximum (bp)	182,285,862	182,027,195	194,605,305
Genome size (bp)	2,762,242,804	2,770,503,033	2,939,860,383
GC (%)	37.75	37.75	37.94
BUSCO (%)	99.5	98.5	98.7
Protein-coding genes	46,215	45,796	60,260

Open in a separate window

BUSCO: Benchmarking Universal Single-Copy Orthologs.

Genome annotation

The annotation of protein-coding genes was conducted using the BRAKER2 sortware³⁰. To obtain transcriptome data, RNA-seq reads³¹ were aligned to the NbKLAB reference genome using HISAT2 v.2.2.1³². Subsequent analysis utilized a protein database containing sequences from previously published, which were aligned to our genome assembly with ProtHint v.2.6.0³³. Integration of these datasets was performed with GeneMark-ETP³³ combining evidence from both transcriptomic and protein sequence alignments. The training and prediction of gene models were further refined using AUGUSTUS v.3.3.2³⁴. The integration of predictions from AUGUSTUS and GeneMark-ETP was performed using TSEBRA³⁵. To ensure the quality of predicted protein-coding genes, a filtration process was applied, utilizing BLASTP to remove sequences of poor quality based on specific criteria (E-value cut-off- 1e-10, Query coverage > 0.3, Subject Coverage > 0.3). Finally, we identified a total of 46,215 protein-coding genes.

Comparative genomic analysis

To compare genome sequences between NbKLAB and NbLAB360 at the chromosome level, we conducted pairwise comparisons using Circos v.0.69–9 and MUMmer4³⁶. Protein sequences from both NbKLAB and NbLAB360 were aligned using BLASTP v.2.5.0. We identified conserved syntenic and collinearity blocks across the entire genome by employing the MCScanX program³⁷. To focus on significant conserved genomic regions, we selected scaffolds larger than 1 Mb in length from all genomes for comparison. The results were then visualized using the Circos program (Fig.2a). Additionally, we conducted sequence comparisons between chromosomes using Nucmer within the MUMmer4 software, with the parameters set as “-l 100, -c 500”. The MUMmer analysis revealed successful alignment of all 19 chromosomes between NbKLAB and NbLAB360 (Fig.2b). These results demonstrate the accuracy of the alignment and establish comprehensive and accurate concordance within the genomic region.

Open in a separate window

Fig. 2

Comparative genomics. (a) Syntenic relationship between the NbKLAB and NbLAB360 genomes. (b) Comparison of all chromosomes of NbKLAB and NbLAB360 genomes using MUMmer plot. Alignment of whole genomes demonstrates a clear collinearity for all chromosomes. Dots distributed across the figure represent repetitive sequences aligning at various genomic locations. Red dots represent collinear sequences, while blue dots represent inverted sequences.

Repeat annotation

We employed an integrative approach that combined hom*ology alignment and de novo prediction for repeat annotation. A repeat library was constructed from the assembled genomes using Utilizing RepeatModeler v.2.0.3³⁸. Subsequent repeat annotation was conducted with RepeatMasker v.4.1.3³⁹ (https://www.repeatmasker.org/). Comparatively, NbKLAB displayed a slightly higher detection of LTR elements at 1.40 Gb, constituting 49.99% of its entire genome, while NbLAB360 exhibited 1.37 Gb of LTR elements, accounting for 48.24% of its genome. In contrast, the distribution of SINE and LINE elements in NbKLAB was relatively reduced (Table3).

Table 3

Comparative statistics of repetitive sequences in NbKLAB,NbLAB360, and Niben261 genomes.

	NbKLAB		NbLAB360		Niben261
	Repeat length (bp)	Proportion (%)	Repeat length (bp)	Proportion (%)	Repeat length (bp)	Proportion (%)
SINES	1,142,314	0.04	3,259,793	0.11	937,157	0.03
LINES	105,513,178	3.78	115,717,888	4.08	118,975,624	3.92
LTR elements	1,395,831,879	49.99	1,367,691,325	48.24	1,002,900,094	33.04
DNA transposons	69,392,040	2.49	70,452,776	2.49	72,036,154	2.37
Unclassified	678,817,584	24.31	686,309,469	24.21	958,766,645	31.58

Open in a separate window

SINES: Short Interspersed Elements; LINES: Long Interspersed Elements; LTR: Long Terminal Repeat.

Data Records

The raw sequencing data (Illumina, Nanopore, and Hi-C) used for genome assembly have been deposited in the NCBI Sequence Read Archive under the accession number PRJNA1034276⁴⁰. The final genome assembly sequence of N. benthamiana cv. NbKLAB is available through the NCBI GenBank under accession number JAXGFW000000000⁴¹. Gene annotation data for N. benthamiana cv. NbKLAB has been submitted to the online open-access repository Figshare database³¹.

Technical Validation

We conducted a comprehensive evaluation of the quality and completeness of the raw ONT reads, totaling 9,000,040 reads. To assess the integrity of the raw reads, we employed Guppy v.6.1.1 to extract duplex bases and unpaired-simplex bases. The quality of the raw reads was analyzed using Nanoplot v.1.39(Fig.3).

Open in a separate window

Fig. 3

Raw data validation. Raw read length proportion and read quality.

In this study, we employed a dual-reader approach with the ONT v10.3 platforms for genome sequencing, resulting in an impressive N50 read length of 34,701. The utilization of substantial-sized reads proved pivotal in enhancing the accuracy of our assembly process. This technological advancement substantially contributed to a more precise and comprehensive reconstruction of the genomic landscape compared to earlier N. benthamiana genome assemblies. The extended read sizes, made possible by the dual-reader strategy, underscore a significant enhancement in achieving a more robust and reliable genomic assembly, surpassing the earlier version.

To assess the genome assembly completeness of NbKLAB and compare it to NbLAB360 and Niben261, we conducted a two-step validation. Firstly, we used paired-end illumina short reads to estimate the k-mer completeness score and the QV using Merqury v.1.3⁴². While NbLAB360 and Niben261 exhibited a commendable completeness score of 97.8% and 98.8%, respectively, reflecting solid genomic representation, NbKLAB surpassed this with an exceptional score of 99.4%. Furthermore, quality assessment revealed that NbLAB360 and Niben261 achieved a QV scores of 33 and 29.5, respectively, demonstrating the accuracy of the genome assembly. NbKLAB showcased a remarkable QV of 49, emphasizing its notable advancement and accuracy in genome reconstruction. Secondly, we predicted BUSCO completeness using a set of 1440 embryophyta genes⁴³. Our analysis revealed that the NbKLAB genome assembly identified 99.5% of the conserved complete genes, whereas the NbLAB360 and Niben261 recognized 98.5% and 98.7%, respectively (Table4). We utilized the BRAKER2 software for annotation and subsequently conducted BLASTP analysis using NbLAB360 to validate the annotation. The results revealed the identification of a total of 39,525 genes, with query coverage exceeding 90 and subject coverage surpassing 80, indicating a notably high-quality selection (Fig.4). This suggests that the annotation process, validated through BLASTP analysis, has been effectively carried out. Collectively, these metrics emphasize the advancements achieved by our sequencing of NbKLAB, demonstrating significant improvements in assembly and annotation quality.

Table 4

Genome assembly validation.

	NbKLAB	NbLAB360	Niben261
QV value	49	31.5	29.5
Merqury k-mer completeness score (%)	99.4	97.8	98.8
Scaffold N50 (Mb)	142.6	145	151
Complete BUSCOs (C) (%)	99.5	98.1	98.7
Complete and single-copy BUSCOs (S) (%)	34.3	46	33.7
Complete and duplicated BUSCOs (D) (%)	65.2	52.1	65
LTR Assembly index	15.82	17.4	8.78

Open in a separate window

BUSCO: Benchmarking Universal Single-Copy Orthologs; LTR: Long Terminal Repeat; QV: Quality Value.

Open in a separate window

Fig. 4

Quality Assessment of NbKLAB and NbLAB360 genes through BLASTP Analysis.

Acknowledgements

This work was supported by a National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (2023R1A2C1006404), the Korea Research Institute of Bioscience and Biotechnology (KRIBB) Research Initiative Program (KGM9942421 and KGM1002412) to A.Y.S., and the Basic Science Research Program through the NRF funded by the Ministry of Education (NRF-2021R1I1A2044678) to Y.M.K.

Author contributions

A.Y.S., S.Y.K., and Y.M.K. conceived the project, designed the analysis, and organized the manuscript. S.R.K. and S.L. generated the Nanopore raw data, H.S. and J.Y. performed the genome assembly, and H.K. performed the genome annotation. S.R.K., S.L., and H.K. analyzed the data and performed the genome assembly evaluation. A.Y.S., S.Y.K., and Y.M.K. wrote the manuscript. All authors critically commented on and approved the manuscript.

Code availability

All software employed for data processing was executed following the guidelines of the bioinformatic software cited above. If no detailed parameters are mentioned, the default parameters were used.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Seo-Rin Ko, Sanghee Lee, Hyunjin Koo.

Contributor Information

Yong-Min Kim, Email: rk.er.bbirk@mikmy.

Suk-Yoon Kwon, Email: rk.er.bbirk@nowkys.

Ah-Young Shin, Email: rk.er.bbirk@yanihs.

References

1. Goodin MM, Zaitlin D, Naidu RA, Lommel SA. Nicotiana benthamiana: Its history and future as a model for plant-pathogen interactions. Mol. Plant-Microbe Interact. 2008;21:1015–1026. [PubMed] [Google Scholar]

2. Chen Q, et al. Delivery for Production of Pharmaceutical Proteins. Adv Tech Biol Med. 2014;1:1–21. [Google Scholar]

3. Lobato Gómez M, et al. Contributions of the international plant science community to the fight against human infectious diseases – part 1: epidemic and pandemic diseases. Plant Biotechnol. J. 2021;19:1901–1920. [PMC free article] [PubMed] [Google Scholar]

4. Shanmugaraj B, Phoolcharoen W. Addressing demand for recombinant biopharmaceuticals in the COVID-19 era. Asian Pac. J. Trop. Med. 2021;14:49–51. [Google Scholar]

5. Capell T, et al. Potential Applications of Plant Biotechnology against SARS-CoV-2. Trends Plant Sci. 2020;25:635–643. [PMC free article] [PubMed] [Google Scholar]

6. Kumar, M. et al. A comprehensive overview on the production of vaccines in plant-based expression systems and the scope of plant biotechnology to combat against sars-cov-2 virus pandemics Plants10, (2021).

7. Waterhouse PM, Helliwell CA. Exploring plant genomes by RNA-induced gene silencing. Nat. Rev. Genet. 2003;4:29–38. [PubMed] [Google Scholar]

8. Grosse-Holz F, et al. The transcriptome, extracellular proteome and active secretome of agroinfiltrated Nicotiana benthamiana uncover a large, diverse protease repertoire. Plant Biotechnol. J. 2018;16:1068–1084. [PMC free article] [PubMed] [Google Scholar]

9. Kelly LJ, et al. Intragenic recombination events and evidence for hybrid speciation in nicotiana (solanaceae) Mol. Biol. Evol. 2010;27:781–799. [PubMed] [Google Scholar]

10. Bally J, et al. The extremophile Nicotiana benthamiana has traded viral defence for early vigour. Nat. Plants. 2015;1:1–6. [Google Scholar]

11. Chase MW, et al. Molecular systematics, GISH and the origin of hybrid taxa in Nicotiana (Solanaceae) Ann. Bot. 2003;92:107–127. [PMC free article] [PubMed] [Google Scholar]

12. Clarkson JJ, et al. Phylogenetic relationships in Nicotiana (Solanaceae) inferred from multiple plastid DNA regions. Mol. Phylogenet. Evol. 2004;33:75–90. [PubMed] [Google Scholar]

13. Naim, F. et al. Advanced Engineering of Lipid Metabolism in Nicotiana benthamiana Using a Draft Genome and the V2 Viral Silencing-Suppressor Protein. PLoS One7, (2012).

14. Bombarely A, et al. A draft genome sequence of Nicotiana benthamiana to enhance molecular plant-microbe biology research. Mol. Plant-Microbe Interact. 2012;25:1523–1530. [PubMed] [Google Scholar]

15. Kurotani KI, et al. Genome Sequence and Analysis of Nicotiana benthamiana, the Model Plant for Interactions between Organisms. Plant Cell Physiol. 2023;64:248–257. [PMC free article] [PubMed] [Google Scholar]

16. Ranawaka B, et al. A multi-omic Nicotiana benthamiana resource for fundamental research and biotechnology. Nat. Plants. 2023;9:1558–1571. [PMC free article] [PubMed] [Google Scholar]

17. D’Andrea L, et al. Polyploid Nicotiana section Suaveolentes originated by hybridization of two ancestral Nicotiana clades. Front. Plant Sci. 2023;14:1–14. [Google Scholar]

18. Jang J, et al. Chromosome-level genome assembly of Korean native cattle and pangenome graph of 14 Bos taurus assemblies. Sci. Data. 2023;10:1–9. [PMC free article] [PubMed] [Google Scholar]

19. Garcia-Arraras JE, Dolmatov IY. Echinoderms: potential model systems for studies on muscle regeneration. Curr Pharm Des. 2010;16:942–955. [PMC free article] [PubMed] [Google Scholar]

20. Wick RR, Judd LM, Holt KE. Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol. 2019;20:1–10. [PMC free article] [PubMed] [Google Scholar]

21. De Coster W, D’Hert S, Schultz DT, Cruts M, Van Broeckhoven C. NanoPack: Visualizing and processing long-read sequencing data. Bioinformatics. 2018;34:2666–2669. [PMC free article] [PubMed] [Google Scholar]

22. Hu J, Fan J, Sun Z, Liu S. NextPolish: A fast and efficient genome polishing tool for long-read assembly. Bioinformatics. 2020;36:2253–2255. [PubMed] [Google Scholar]

23. Servant N, et al. HiC-Pro: An optimized and flexible pipeline for Hi-C data processing. Genome Biol. 2015;16:1–11. [PMC free article] [PubMed] [Google Scholar]

24. Durand NC, et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst. 2016;3:95–98. [PMC free article] [PubMed] [Google Scholar]

25. Dudchenko O, et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chlromosome-length scaffolds. Science. 2017;356:92–95. [PMC free article] [PubMed] [Google Scholar]

26. Durand NC, et al. Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. Cell Syst. 2016;3:99–101. [PMC free article] [PubMed] [Google Scholar]

27. Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. [PubMed] [Google Scholar]

28. Ou S, Chen J, Jiang N. Assessing genome assembly quality using the LTR Assembly Index (LAI) Nucleic Acids Res. 2018;46:e126. [PMC free article] [PubMed] [Google Scholar]

29. Krzywinski M, et al. Circos: An information aesthetic for comparative genomics. Genome Res. 2009;19:1639–1645. [PMC free article] [PubMed] [Google Scholar]

30. Brůna T, Hoff KJ, Lomsadze A, Stanke M, Borodovsky M. BRAKER2: Automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genomics Bioinforma. 2021;3:1–11. [Google Scholar]

31. Shin A-Y. 2024. Nicotiana benthamiana KLAB Genome assembly and annotation. figshare. [CrossRef]

32. Kim D, Langmead B, Salzberg SL. HISAT: A fast spliced aligner with low memory requirements. Nat. Methods. 2015;12:357–360. [PMC free article] [PubMed] [Google Scholar]

33. Brůna T, Lomsadze A, Borodovsky M. GeneMark-EP+: Eukaryotic gene prediction with self-training in the space of genes and proteins. NAR Genomics Bioinforma. 2020;2:1–14. [Google Scholar]

34. Stanke M, et al. AUGUSTUS: A b initio prediction of alternative transcripts. Nucleic Acids Res. 2006;34:435–439. [Google Scholar]

35. Gabriel L, Hoff KJ, Brůna T, Borodovsky M, Stanke M. TSEBRA: transcript selector for BRAKER. BMC Bioinformatics. 2021;22:1–12. [PMC free article] [PubMed] [Google Scholar]

36. Delcher AL, Phillippy A, Carlton J, Salzberg SL. Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 2002;30:2478–2483. [PMC free article] [PubMed] [Google Scholar]

37. Wang Y, et al. MCScanX: A toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 2012;40:1–14. [PMC free article] [PubMed] [Google Scholar]

38. Flynn JM, et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA. 2020;117:9451–9457. [PMC free article] [PubMed] [Google Scholar]

39. Chen N. Using Repeat Masker to identify repetitive elements in genomic sequences. Curr protoc Bioinformatics. 2004;5:4–10. [Google Scholar]

40. 2023. NCBI Sequence Read Archive. SRP469582

41. Shin A-Y. 2023. Chromosome level genome assembly of Nicotiana benthamiana using ONT sequencing platform. GenBank. JAXGFW000000000

42. Rhie A, Walenz BP, Koren S, Phillippy AM. Merqury: Reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 2020;21:1–27. [Google Scholar]

43. Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Mol. Biol. Evol. 2021;38:4647–4654. [PMC free article] [PubMed] [Google Scholar]

Articles from Scientific Data are provided here courtesy of Nature Publishing Group