Genomics and Bioinformatics Team | 2024 Progress Report

View the Genomics Team progress report including all tables and figures in pages 12 – 19 of the pdf version of this report.

Genomics and Bioinformatics Team members:

  • Zhangjun Fei (Boyce Thompson Institute)
  • Shan Wu (Boyce Thompson Institute)
  • Amnon Levi (USDA, ARS)
  • Yiqun Weng (USDA, ARS)
  • Michael Mazourek (Cornell University)
  • Jim McCreight (USDA, ARS)
  • Rebecca Grumet (Michigan State University)

Objectives: Develop novel advanced bioinformatic, pan-genome, and genetic mapping tools for cucurbits.

1.1. Develop genomic and bioinformatic platforms for cucurbit crops.

1.1.1. Development of high-resolution genotyping platforms for cucurbits.

Genome resequencing of the cucumber (388 accessions) and the squash (207 Cucurbita pepo accessions) core collections has been completed. The average depths of cleaned sequences of cucumber and squash cores are 49.7⋅ and 49.9⋅, respectively. For melon (384 accessions) and watermelon (372 accessions) cores, genome sequencing of 313 and 301 accessions, respectively, has been completed. In addition, we have also completed genome resequencing for 26 C. maxima and seven C. moschata accessions.

The sequence data of cucumber, squash, melon and watermelon cores have been processed for SNP and small indel calling using the Gy14 genome (v2.1), the MU‐CU‐16 genome (v4.1), the 97103 genome (v2.5) and the DHL92 genome (v4) as the references, respectively. Statistics of called variants are summarized in Table 1. Raw sequencing data and called variants have been distributed to our industry partners who have requested access to the data. Biallelic variants with MAF>0.01 of cucumber and squash core collections are available for mining publicly at CuGenDBv2 ( The remaining accessions in the melon and watermelon cores are currently under sample collection and DNA preparation and will be sequenced. Currently, of the remaining 71 accessions in the watermelon core, DNA has been prepared for 45 accessions while the other 26 accessions did not germinate. Variants will be updated for the watermelon and melon cores once new sequences are available.

Table 1 Summary of genome sequencing of cucurbit core collections

We recently found that a total of 58 accessions in the cucumber core contain large numbers of missing SNPs (5-35%) due to the poor quality of the sequencing libraries. These libraries were constructed during CucCAP1 using a cheap protocol. Sequencing of these accessions are bein redone. DNA has been prepared for 45 accessions, while the remaining 13 accessions did not germinate. Variants will be updated with new sequences when available.

1.1.2. Development of novel, advanced genome and pan-genome platforms for cucurbit species.

For cucumber, we have selected 25 accessions including five wild Cucumis sativus var. hardwickii, four semi-wild Xishuangbanna and 16 cultivated cucumbers for PacBio HiFi sequencing. Ten of these 25 accessions are from the core collection. HiFi sequences have been generated for all the 25 accessions, with an average depth of 33.4×.

For watermelon, we selected a total of 135 accessions for reference-grade genome development, including one Citrullus naudinianus, one C. rehmii, two C. ecirrhosus, five C. colocynthis, 16 C. amarus, seven C. mucosospermus, five C. lanatus var. cordophanus, seven landraces, and 82 cultivars and nine interspecific hybrids. HiFi sequences have been generated for all 135 accessions, with an average depth of 30.3×.

For melon, a total of 27 representative accessions have been selected for HiFi sequencing, including 14 C. melo ssp. melo and 13 C. melo ssp. agrestis accessions, among which 13 from India/Pakistan, two from Turkey, three from Americas, and two from Africa, four from Central/West Asia, two from East Asia, and one from Europe. HiFi sequences have been generated for 22 of the 27 accessions, with an average depth of 33.7×.

For squash, three accessions, two from Cucurbita pepo ssp. texana (also known as ssp. ovifera) and one from C. pepo ssp. pepo, have been selected for HiFi sequencing. HiFi sequences of these three accessions have been generated. We have also generated HiFi sequences for C. maxima Rimu and C. moschata Rifu.

1.1.3. De novo genome assembly and pan-genome construction

We have finished the assembling of chromosome-scale genomes of the 25 cucumber accessions.  The assembled genome sizes of the 25 accessions range from 259.0 Mb to 302.3 Mb (average:  287.43 Mb) and N50 contig sizes from 5.25 Mb to 22.98 Mb (average: 15.46 Mb). BUSCO  completeness rate of these genome assemblies ranges from 96.4% to 98.8%, with an average of  98.4%. An average of 95.5% of the contigs (ranging from 90.3% to 97.8%) are assigned to the  seven cucumber chromosomes. Protein-coding genes have been predicted in these genomes, as  well as an additional of 11 previously published chromosome-scale cucumber genomes (seven  cultivated, one Xishuangbanna and three wild hardwickii). The number of predicted genes ranges  from 21,347 to 22551, with an average of 21,870. BUSCO completeness rate of genes predicted  from each of these 36 cucumber genome assemblies ranges from 93.0% to 97.0%, with an average  of 96.0%. Using the newly assembled WI7631 (‘Chinese long’) genome as the  reference/backbone, large structural variants (SVs) have been called and for the other 24  assembled genomes and the 11 previously published genomes (Table 2). A graph pan-genome has  been constructed using the WI7631 genome and the called SVs and used to Genotype these SVs  in the core collection using the resequencing short reads.

For watermelon, we have finished chromosome-scale genome assemblies and gene prediction for  all 135 accessions. The assembled genome sizes range from 368.6 Mb to 406.7 Mb (average: 377.5  Mb) and N50 sizes are all greater than 20 Mb (20.37-35.64 Mb; an average of 30.49 Mb). BUSCO  completeness rate of these genome assemblies ranges from 93.9% to 99.2%, with an average of  99.0%. An average of 99.2% of the contigs (ranging from 96.2% to 99.9%) are assigned to the 11  watermelon chromosomes. The number of predicted protein-coding genes ranges from 20,834 to  23,330 (average: 21,785). BUSCO completeness rate of genes predicted from each of these 135  watermelon genome assemblies ranges from 91.6% to 96.6%, with an average of 95.9%. Using  the newly assembled ‘97103’ genome as the backbone, SVs are being called in the other 134  watermelon accessions, as well as three previously published long read assemblies (Table 2). The  final SVs and the ‘97103’ genome have been used to construct a Citrullus graph pan-genome,  which has been used to genotype these SVs in the core collection and other accessions using the  resequencing short reads (a total of 756 accessions, including 436 cultivars, 114 landraces, 13 cordophanus, 39 mucosospermus, 120 amarus, 33 colocynthis and 1 rehmii).

Table 2 Summary statistics of SVs identified in cucumber and watermelon across 36 and 138 genome assemblies, respectively.

For melon, we have finished the chromosome-level assemblies of 22 accessions. The assembled  genome sizes range from 355.7 Mb to 387.0 Mb (average: 371.7 Mb) and N50 contig sizes from  9.41 Mb to 19.60 Mb (average: 13.85 Mb). BUSCO completeness rate of these genome assemblies  ranges from 93.7% to 97.9%, with an average of 97.3%. An average of 97.2% of the contigs  (ranging from 92.4% to 99.5%) are assigned to the 12 melon chromosomes. Protein-coding genes  have been predicted in 21 of the 22 assembled genomes, and the number of genes predicted in each  genome ranges from 23,108 to 27,678 (average: 24,570). BUSCO completeness rate of genes  predicted from each of these 21 melon genomes ranges from 95.5% to 97.6%, with an average of  96.6%.

For Cucurbita species, we have finished genome assemblies and gene predictions of three squash  (C. pepo) accessions, and C. maxima Rimu and C. moschata Rifu (Table 3).

Table 3 Statistics of Cucurbita genome assemblies.

1.1.4. Breeder-friendly web-based database for phenotypic, genotypic and QTL information

We have updated CuGenDB to version 2 (CuGenDBv2) and officially released CuGenDBv2 in  April 2022. CuGenDBv2 currently hosts 34 reference genomes from 27 cucurbit  species/subspecies belonging to 10 different genera. Protein-coding genes from all these 34  genomes (total: 919,903; average: 27,056) have been comprehensively annotated, and the  annotated genes can be queried and extracted in the database. Genomic synteny blocks and  syntenic gene pairs have been identified between any two and within each of the 34 cucurbit  genome assemblies (595 pairwise genome comparisons). A total of 391,379 synteny blocks and  12,130,719 syntenic gene pairs (average: 31 per synteny block) have been identified between the  34 cucurbit genomes. The ‘Synteny Viewer’ module have been re-implemented in CuGenDBv2  to improve the efficiency in processing and displaying the large-scale synteny data.

A ‘Genotype’ module has been newly developed in CuGenDBv2. The module provides a suite of  functions that allow users to mine, analyze, extract, and download variants including SNPs and  small indels from large-scale population genome sequencing projects. Currently variants (SNPs  and small indels) called for cucumber and squash core collections and watermelon resequencing  panel, and SNPs called from the GBS data generated under CucCAP1 for watermelon, melon,  cucumber, C. pepo, C. maxima and C. moschata are available in the database for query and mining. 

The ‘Expression’ module in CuGenDBv2 has been redesigned to provide a complete cucurbit gene  expression atlas, using the publicly available cucurbit RNA-Seq datasets. Currently raw RNA-Seq  data of a total of 221 projects, 1,513 distinct samples and 3,560 runs (or libraries) have been  downloaded from NCBI and processed to derive expression values, which can be queried in  CuGenDBv2 to display expression profiles of specific interesting genes in different tissues,  development stages, and under different treatment conditions.

Phenotype data have been generated for melon and cucumber core collections. A total of 33  vegetative, flower and fruit characters and two disease resistance traits have been evaluated for the  melon core collection, and for the cucumber core collection a combination of 15 external and  internal characteristics have been collected for immature and mature fruit of plants grown in 2019  and 2021. A tool to display the fruit images of cucumber core accessions has been developed  ( Additional tools to visualize and  analyze the phenotypic data will be developed in CuGenDBv2.

1.2 Perform seed multiplication and sequencing analysis of core collections of the four species, provide community resources for genome wide association studies (GWAS).

1.2.1. Seed multiplication of core collections

For cucumber, seed increases of the 388 accessions in the core collection were carried out by five participating seed companies. As of March 2024, seeds for 310 accessions with more than 1000 seeds per accession have been received.
For watermelon, HM.Clause is increasing the seeds for 293 accessions in the core collection given to them by USDA-ARS. HM.Clause have already shipped to the USDA, ARS, U.S. Vegetable Laboratory S3 seeds of 177 accessions (with about 1,000 seed/accession) and will ship during 2024 the S3 seeds of the other 116 accessions they committed to increase. S2 seed of additional 39 accessions will be sent by University of Georgia to HM.CLAUSE for increase. During 2024, S2 seeds of additional 167 PIs (mainly Citrullus amarus) will be increased at the USDA, ARS, U.S. Vegetable Laboratory to reach 500 S3 seeds per accession.
Three companies assisted in advancing the melon core set in 2023: 259 of the 384 melon core lines were sent to three seed company cooperators; seed was obtained from 180 of those lines. United Genetics advanced 13 S0 lines to S1 and three S1 lines to S2. Nunhems advanced 13 S0 lines to S1 (Table 4). Sakata advanced 151 S2 lines to S3, with seed counts per line ranging from 21 to 3,100, based on seed weight; only 57 lines produced 1,000 or more S3 seed (Table 5).

Table 4 Seed multiplication status of melon core

Table 5 Estimated number of seeds per S3 Melon core lines (based on seed weight) by Sakata

For the C. pepo squash core increase, we expect to receive the last of the seed this summer. All of the squash core will be increased by a professional nursery, Villa Plants and have robust phytosanitary documentation. One line may have some IP restrictions and may be dropped from the core.

1.2.2. Population genetics and phenotype-genotype association analysis

Phylogenies of accessions in the cucumber, melon, squash, and watermelon cores have been inferred using the LD-pruned SNPs at four-fold degenerate sites. The phylogenies of cucumber and melon core accessions are largely consistent with their geographic origins and the phylogeny of watermelon accessions is consistent with their species classifications, while no clear separations were observed for squash accessions related to their geographic origins or improvement status.
Phenotype-genotype association analysis has been performed for the cucumber core. The cucumber core accessions were grown in the field at the Michigan State University Horticulture Teaching and Research Center in 2019-2022. Young and mature fruits were harvested at ~5-7 and 30-40 days post pollination, respectively. The following traits were measured for mature fruit: fruit length, diameter, fruit shape index, carpel number, seed cavity, flesh thickness, hollowness, curvature, tapering, skin color, flesh color and netting; and the following for young fruit: fruit shape index, curvature, tapering, skin color, and spine density. Genome-wide association studies
(GWAS) were performed on these fruit traits using different models including FarmCPU, BLINK, MLMM, and MLM (Fig. 1). Chromosomal locations of the detected significantly associated SNPs are illustrated in Fig. 2. QTLs for some of the traits were closely clustered. For example, SNPs for several highly correlated fruit size and shape traits, including mature fruit length, young fruit shape index, carpel number, and seed cavity size, were closely located on chromosome 1 at ~10 Mb. Multiple external fruit traits were also mapped to the same region on chromosome 1, such as netting, spine density, young fruit color R/G values. Several significant SNPs identified by GWAS were also in close vicinity (within 1Mb) to prior identified fruit trait QTL and candidate genes.