Genomics and Bioinformatics Team | 2019 CucCAP Progress Report

Team members: Zhangjun Fei (Boyce Thompson Institute), Umesh Reddy (West Virginia St. Univ), Amnon Levi (USDA, ARS), Yiqun Weng (USDA, ARS), Michael Mazourek (Cornell University), Pat Wechter (USDA, ARS), Rebecca Grumet (Michigan State University)

1.1. Develop genomic and bioinformatic platforms for cucurbit crops

View the pdf version of this report

1.1.1. Genotyping by sequencing

In closely working with Cornell Genomic Diversity Facilty, we have set up the genotyping-by-sequencing (GBS) platform for the cucurbit species.

1.1.2. Sequence data processing/analysis

We have established a GBS data analysis pipeline based on TASSEL-GBS (http://www.maizegenetics.net/tassel).

1.1.3. ICuGI database development

We have re-implemented the ICuGI database (now named Cucurbit Genomics Database (CuGenDB), and the new URL: http://cucurbitgenomics.org/, using the GMOD Tripal system (http://gmod.org/wiki/Tripal) and the Chado database schema (http://gmod.org/wiki/Chado). The newly designed and developed database was released in May 2017. Currently the database contains genome sequences of melon, watermelon (97103 and Charleston Gray), cucumber (Chinese Long and Gy14), wild cucumber (Cucumis sativus var. hardwickii PI 183967), four Cucurbita species (C. pepo, C. maxima, C. moschata and C. argyrosperma) and bottle gourd. Genome syntenies between any two of the sequenced cucurbits have been identified and a synteny viewer have been implemented in the database. An “expression” module has been developed in the database using RNA-Seq datasets publicly available for cucurbit species, mainly collected from NCBI Sequence Read Archive (SRA). A set of tools to mine and analyze the RNA-Seq datasets, such as heatmap view of expression profiles and differential gene expression analysis, were implemented. The synteny viewer and the expression module have been packed as Tripal extension modules which can be implemented in other genomic databases developed using the Tripal system. Development of tools and interfaces to analyze and integrate genotype and phenotype data is ongoing. A manuscript describing the database has been published (Zheng et al., 2019, Nucleic Acids Research, 47:D1128).

1.1.4 Community standardized nomenclature.

This is in progress.

1.1.5. Genomic, bioinformatics workshops

A workshop on the Cucurbit Genomics Database was held at the Solcuc2017 meeting in Sept. 2017 at Valencia, Spain. A talk on the database was presented at the CUCURBITACEAE 2018 in November 2018 at Davis, California.

1.2. Perform GBS analysis of PI collections, establish core populations of the four species, and provide community resource for genome wide association studies (GWAS)

1.2.1. GBS of cucurbit species, establish molecular-informed core populations

We have finished GBS for all cucumber, melon, watermelon, Cucurbita pepo, C. maxima and C. moschata accessions collected from the USDA National Plant Germplasm System (Table 1). After removing accessions with insufficient reads and merging duplicated accessions, a total of 1,564 cucumber, 2,077 melon, 1,365 watermelon, 852 C. pepo, 463 C. maxima and 314 C. moschata accessions have been genotyped (Table 2). We have finished processing the GBS data and SNP calling for cucumber, melon and watermelon, and analysis of the GBS data for C. pepo, C. maxima and C. moschata is underway.

We obtained a total of 1.71, 1.57 and 0.88 billion GBS reads with expected barcodes for melon, cucumber and watermelon, respectively. From these reads, a total of 54,192,089, 76,860,960 and 34,621,369 unique tags were obtained, and 743,545, 593,678 and 388,298 tags with at least 10 reads were used for SNP calling for melon, cucumber and watermelon, respectively. A total of 89,377, 114,338 and 62,258 SNPs were called in melon, cucumber and watermelon, respectively, and 27,846, 23,828, and 25,930 SNPs were obtained by applying criteria of missing data rate < 0.5 and minor allele frequency (MAF) > 0.01 (Table 3).

Table 1 Summary of cucurbit GBS

Batch	DNA plate No.	Multi-plex Level	Crop	DNA Submission Date	Data Release Date
1	8	96	cucumber	42473	42711
2	9	96	cucumber	42405	42711
3	11,12,13,14	384	cucumber	42606	42661
4	2,5,6,16	384	cucumber	42636	42695
5	1,4,7,15	384	cucumber	42439	42695
6	31,34,35,36	384	watermelon	42662	42695
7	37,38,39,40	384	watermelon	42674	42795
8	41,42,43,44	384	watermelon	42471	42781
9	49	96	melon	42594	42810
10	3,10,17,46	384	cucumber	1/20/2017 & 2/2/2017	42886
11	50,51,52,53	384	melon	42780	42860
12	54,55,56,57	384	melon	42788	42860
13	58,59,60,61	384	melon	42769	42860
14	62,63,64,65	384	melon	42810	42860
15	66,67,68,69	384	melon	42817	42860
16	21,32,33,70	384	melon & watermelon	42817	42886
17	71,72,73,74	384	1melon&3squash	42844	42899
18	75,76,77,78	384	squash	42886	43046
19	22,23,79,80	384	squash	42965	43003
20	18,19,28,29	384	cucumber	43124	43223
21	90	96	watermelon	43314	43223
22	91	96	watermelon	43314	43223
23	92	96	watermelon	43314	43223
24	93	96	watermelon	43314	43223
25	94	96	watermelon	43314	43223
26	81,82	192	C. maxima	43103	43181
27	83,84	192	C. maxima	43103	43181
29	27	96	C. maxima	43179	43353

Note: Those in yellow background are samples from mapping populations

Table 2 Summary of cucurbit accessions genotyped using GBS

	melon	cucumber	watermelon	C. pepo	C.moschata	C. maxima
Total No. of plants genotyped	2090	1604	1377	854	318	463
No. accessions with low reads	5	3	11	0	0	0
No. accessions genotyped more than once	8	36	1	1	4	0
Final No. accessions genotyped	2077	1564	1365	852	314	463

Table 3 Summary of GBS sequencing and called SNPs

	melon	cucumber	watermelon
Total good barcoded reads	1.71E+09	1.57E+09	8.84E+08
Total reads covering tags (>=10)	1.61E+09	1.44E+09	8.28E+08
Mapped reads	1.26E+09	1.05E+09	5.51E+08
Unmapped reads	3.45E+08	3.87E+08	2.77E+08
Total tags	54192089	76860960	34621369
Tags with >= 10 reads	743545	593678	388298
Mapped tags	373133	351594	246506
Unmapped tags	370412	242084	141792
No. raw SNPs	89377	114338	62258
No. SNPs at missing rate < 0.5	62789	91092	41601
No. SNPs at missing rate < 0.5 and MAF > 0.01	27846	23828	25930

A core collection selection strategy has been developed. Briefly, a total of ~400 accessions will be selected for each species. Around 300 accessions which represent the majority of the genetic diversity of the germplasm, based on the core collection analysis using GenoCore (Jeong et al., 2017, PLoS ONE 12:e0181420), will be selected. Another ~100 accessions with interesting traits and/or parents of mapping/breed populations will be selected. In the final core collection, if a selected line is known to be derived from a PI accession that is also in the final core collection, then the corresponding PI should be replaced with the most closely related one on the phylogenetic tree. Accessions in the final core collection whose genomes have already been resequenced should also be replaced by the most closely related ones on the phylogenetic tree, unless they harbor very interesting/important traits. Based on this strategy, core collections of melon and cucumber have been established. The melon core collection contains 384 accessions and captures 98.96% of all allelic diversity in the melon germplasm we have genotyped, and the cucumber core collection contains 395 accessions, of which 354 are from the GBS collection and captures 95.9% of all allele diversity, and 41 are historical varieties with important horticultural and disease resistance traits. Principal component analysis (PCA) of the melon and cucumber core collections showed similar pattern to that of the entire collections (e.g., melon shown in Figure 1). Core collection section is currently underway for watermelon and C. pepo.

Figure 1.Principal component analysis of the melon core collection (red) and the entire collection (gray)

1.2.2. Population genomics and GWAS analyses

Using SNPs called from the GBS data, we have performed population genomic analyses for cucumber, watermelon and melon accessions. Phylogenetic, PCA and population structure analyses have been done for accessions of cucumber, watermelon and melon. The results from these analyses for watermelon accessions are shown in Figure 2 as an example. Linkage disequilibrium (LD) decay patterns and population differentiation have also been investigated for these species.

Figure 2. Phylogenetic relationship and population structure of Citrullus spp. accessions. (a) Maximum-likelihood tree of 1,367 Citrullusspp. accessions. (b) Model-based clustering analysis with Kfrom 2 to 5. Each accession is represented by a vertical bar. Each color represents one ancestral population, and the length of each colored segment in each vertical bar represents the proportion contributed by ancestral populations. (c) Principal component analysis of 1,367 watermelon accessions with PC1 and PC2 explaining 63.7%, and 2.1% of variance. (d) Principal component analysis of C. lanatus and C. mucuosospermus accessions with PC1 and PC2 explaining 4.6% and 2.3% of variance.

We have collected historical phenotype data from the USDA National Plant Germplasm System for cucumber, watermelon and melon accessions. GWAS have been performed to identify SNPs and regions that are significantly associated with important agronomic traits. GWAS for watermelon resistance to powdery mildew race 2 is shown in Figure 3 as an example.

Manuscripts reporting the results from population genomics and GWAS analyses as well as core collection development for cucumber has been published (Wang et al., 2018, Horticulture Research5:64), for watermelon has been submitted, and for melon is under preparation. Analysis of the GBS data for Cucurbitaspecies is underway.

Figure 3. Genome-wide association studies (GWAS) of resistance to powdery mildew race 2 in stem (left) and leaf (right) of watermelon.

1.2.3 Genomic resequencing of core collectionsWe have compared cost-effective services for Illumina genomic library construction to accommodate our budget for genome resequencing of the core collections, and selected the “Nextera skim sequencing WGS library preps (1/3 concentration)’ service provided by Cornell Biotechnology Resource Center (http://www.biotech.cornell.edu/brc/genomics/services/price-list#ht), which charges $1,152 per full plate (96 samples) and additional $900 for pooling and Blue pippin size selection ($2,052 in total; $21.4 per sample).

We have sent 21 C. pepo samples (one Illumina lane) and a plate of cucumber samples (96 samples; 6 lanes) in the core collection for library construction. The constructed libraries have been sequenced at GENEWIZ (~$1,500 per lane, which generates ~120 Gb paired-end sequence data). We have obtained cleaned sequence data of >10× depth of the coverage for most of the accessions (Figure 4).

Figure 4. Sequencing depth (based on the final cleaned data) of 96 cucumber accessions (left) and 21 C. pepo accessions (right).

1.1. Develop genomic and bioinformatic platforms for cucurbit crops

1.1.1. Genotyping by sequencing

1.1.2. Sequence data processing/analysis

1.1.3. ICuGI database development

1.1.4 Community standardized nomenclature.

1.1.5. Genomic, bioinformatics workshops

1.2. Perform GBS analysis of PI collections, establish core populations of the four species, and provide community resource for genome wide association studies (GWAS)

1.2.1. GBS of cucurbit species, establish molecular-informed core populations

Table 1 Summary of cucurbit GBS

Batch

DNA plate No.

Multi-plex Level

Crop

DNA Submission Date

Data Release Date

Table 2 Summary of cucurbit accessions genotyped using GBS

melon

cucumber

watermelon

C. pepo

C.moschata

C. maxima

Table 3 Summary of GBS sequencing and called SNPs

melon

cucumber

watermelon