2018 CucCAP Genomics and Bioinformatics Team Annual Report

Progress Reports and Work Plans

Team members: Zhangjun Fei (Boyce Thompson Institute); Umesh Reddy (West Virginia St. Univ.); Amnon Levi (USDA, ARS); Yiqun Weng (USDA, ARS); Michael Mazourek (Cornell University); Pat Wechter (USDA, ARS); and Rebecca Grumet (Michigan State University) reported on their lab’s progress and plans.

Work in progress and plans

1.1. Develop genomic and bioinformatic platforms for cucurbit crops

1.1.1. Genotyping by sequencingIn closely working with Cornell Genomic Diversity Facilty, we have set up the genotyping-by-sequencing (GBS) platform for the cucurbit species.  

1.1.2. Sequence data processing/analysisWe have established a GBS data analysis pipeline based on TASSEL-GBS (http://www.maizegenetics.net/tassel). 

1.1.3. ICuGI database developmentWe have re-implemented the ICuGI database (now named Cucurbit Genomics Database, and the new URL: http://cucurbitgenomics.org/) using the GMOD Tripal system (http://gmod.org/wiki/Tripal) and the Chado database schema (http://gmod.org/wiki/Chado). The newly designed and developed database was released in May 2017. Genome sequences of melon, watermelon Charleston Gray, cucumber Gy14, wild cucumber (Cucumis sativus var. hardwickii PI 183967), three Cucurbita species (C. pepo, C. maxima, C. moschata) and bottle gourd have been processed and added in the new database, besides those already in the previous database. Genome syntenies between any two of the sequenced cucurbits have been identified and a synteny viewer have been implemented in the database. An “expression” module has been developed in the database using RNA-Seq datasets publicly available for cucurbit species, mainly collected from NCBI Sequence Read Archive (SRA). A set of tools to mine and analyze the RNA-Seq datasets, such as heatmap view of expression profiles and differential gene expression analysis, were implemented. The synteny viewer and the expression module have been packed as Tripal extension modules which can be implemented in other genomic databases developed using the Tripal system. We are currently working on developing a module for analyses of small RNA (sRNA) datasets. Development of tools and interfaces to analyze and integrate genotype and phenotype data are planned.

1.1.4 Community standardized nomenclature.  This has not begun. 1.1.5. Genomic, bioinformatics workshopsA workshop on the Cucurbit Genomics Database was held at the Solcuc2017 meeting in Sept. 2017 at Valencia, Spain. We are currently working with the BTI communication team to prepare a webinar on how to use the database. The video will be put on the database website. 

1.2. Perform GBS analysis of PI collections, establish core populations of the four species, and provide community resource for genome wide association studies (GWAS)

1.2.1. GBS of cucurbit species, establish molecular-informed core populations

We have finished GBS for all cucumber, melon, watermelon, Cucurbita pepo and C. moschata accessions collected from the USDA National Plant Germplasm System. Four 96-well plates of C. maxima samples were submitted to Cornell GBS facility in early March, 2018 and are currently under processing (Table 1). After removing accessions with insufficient reads and merging duplicated accessions, a total of 1,564 cucumber, 2,077 melon, 1,365 watermelon, 829 C. pepo and 191 C. moschata accessions have been genotyped. We have finished processing the GBS data and SNP calling for cucumber, melon and watermelon, while GBS data analysis for C. pepo and C. moschata is underway.

Table 1 Status of cucurbit GBS (March 13, 2018)

Batch DNA plate No. Multi-plex Level Crop DNA Submission Date Data Release Date
1 8 96 cucumber 4/13/2016 7/12/2016
2 9 96 cucumber 5/2/2016 7/12/2016
3 11,12,13,14 384 cucumber 8/24/2016 10/18/2016
4 2,5,6,16 384 cucumber 9/23/2016 11/21/2016
5 1,4,7,15 384 cucumber 10/3/2016 11/21/2016
6 31,34,35,36 384 watermelon 10/19/2016 11/21/2016
7 37,38,39,40 384 watermelon 10/31/2016 1/3/2017
8 41,42,43,44 384 watermelon 11/4/2016 2/15/2017
9 49 96 melon 12/8/2016 3/16/2017
10 3,10,17,46 384 cucumber 1/20/2017 & 2/2/2017 5/31/2017
11 50,51,52,53 384 melon 2/14/2017 5/5/2017
12 54,55,56,57 384 melon 2/22/2017 5/5/2017
13 58,59,60,61 384 melon 3/2/2017 5/5/2017
14 62,63,64,65 384 melon 3/16/2017 5/5/2017
15 66,67,68,69 384 melon 3/23/2017 5/5/2017
16 21,32,33,70 384 melon & watermelon 3/23/2017 5/31/2017
17 71,72,73,74 384 melon & squash 4/19/2017 6/13/2017
18 75,76,77,78 384 squash 5/31/2017 7/11/2017
19 22,23,79,80 384 squash 8/18/2017 9/25/2017
20 18,19,28,29 384 cucumber & melon 1/24/2018 3/5/2018
21 90 96 watermelon 2/8/2018 3/5/2018
22 91 96 watermelon 2/8/2018 3/5/2018
23 92 96 watermelon 2/8/2018 3/5/2018
24 93 96 watermelon 2/8/2018 3/5/2018
25 94 96 watermelon 2/8/2018 3/5/2018
26 81,82,83,84 384 C. maxima 3/1/2018

Note: Rows 20 – 24 with are samples from mapping populations.

We obtained a total of 1.57, 1.71 and 0.88 billion GBS reads with expected barcodes for cucumber, melon and watermelon, respectively. From these reads, a total of 76,860,960, 54,192,089 and 34,621,369 unique tags were obtained, and 593,678, 743,545 and 388,298 tags with at least 10 reads were used for SNP calling for cucumber, melon and watermelon, respectively. A total of 113,854, 89,204 and 61,520 SNPs were called in cucumber, melon and watermelon, respectively, and 24,319, 27,835 and 25,739 SNPs were obtained by applying criteria of missing data rate < 0.5 and minor allele frequency (MAF) > 0.01.

Figure 1. Principal component analysis of the melon core collection (red) and the entire collection (gray)

A core collection selection strategy has been developed. Briefly, a total of ~400 accessions will be selected for each species. Around 300 accessions which represent the majority of the genetic diversity of the germplasm, based on the core collection analysis using GenoCore (Jeong et al., 2017, PLoS ONE 12:e0181420), will be selected. Another ~100 accessions with interesting traits and/or parents of mapping/breed populations will be selected. In the final core collection, if a selected line is known to be derived from a PI accession that is also in the final core collection, then the corresponding PI should be replaced with the most closely related one on the phylogenetic tree. Accessions in the final core collection whose genomes have already been resequenced should also be replaced by the most closely related ones on the phylogenetic tree, unless they harbor very interesting/important traits. Based on this strategy, a core collection of melon has been established, which contains 384 accessions. The melon core collection captures 98.96% of all allelic diversity in the melon germplasm we have genotyped. Principal component analysis (PCA) of the melon core collection showed similar pattern to that of the entire collection (Figure 1). Core collection section is currently underway for cucumber and watermelon.

1.2.2. Population genomics and GWAS analyses

Using SNPs called from the GBS data, we have performed population genomic analyses for cucumber, watermelon and melon accessions. Phylogenetic, PCA and population structure analyses have been done for accessions of cucumber, watermelon and melon. The results from these analyses for cucumber accessions are shown in Figure 2 as an example. Linkage disequilibrium (LD) decay patterns and population differentiation have also been investigated for these species.

We have collected historical phenotype data from the USDA National Plant Germplasm System for cucumber, watermelon and melon accessions. GWAS have been performed to identify SNPs and regions that are significantly associated with important agronomic traits. GWAS for cucumber anthracnose resistance is shown in Figure 3 as an example.

Figure 2. Phylogenetic and population genomic analyses of cucumber accessions. Unrooted neighbor-joining phylogenetic tree (a), principal component analysis (b) and population structure analysis (c and d) of cucumber accessions.

Manuscripts reporting the results from population genomics and GWAS analyses as well as core collection development are in preparation for watermelon, melon and cucumber. Analysis of the GBS data for Cucurbita species is underway.

Figure 3. Frequency distribution of the anthracnose resistance trait (left) and Manhattan plot of GWAS result for anthracnose resistance in cucumber (right).

1.2.3 Genomic resequencing of core collections

We have been trying to identify cost-effective services for Illumina genomic library construction to accommodate our budget for genome resequencing of the core collections. We have submitted 43 DNA samples to Cornell Genomic Diversity Facility, which charges $33 for each library (http://www.biotech.cornell.edu/brc/genomic-diversity-facility/price-list). Additional $30 would be charged for each library to determine its concentration. For our samples, three randomly selected libraries were processed for concentration determination.


The libraries were pooled into two pools (21 samples in one pool and 22 in the other pool), and sent to Novegene for sequencing on a HiSeq X platform. We obtained 96 Gb and 88 Gb for the two pools, respectively. However, after processing with Trimmomatic to remove low quality sequences, only 73.6% of the reads and 59.6% of the bases were left. In addition, large variations of sequencing output are observed among different libraries (Figure 4). We are currently seeking other possible commercial services for library construction, with a backup plan that libraries will be made in our own or collaborator’s labs.

Figure 4. Depth of coverage for resequenced cucumber accessions.