Large-scale gene expression alterations introduced by structural variation drive morphotype diversification in Brassica oleracea - PubMed (original) (raw)

. 2024 Mar;56(3):517-529.

doi: 10.1038/s41588-024-01655-4. Epub 2024 Feb 13.

Yong Wang # 1, Chengcheng Cai # 1 2, Jialei Ji # 1, Fengqing Han # 1, Lei Zhang # 1, Shumin Chen 1, Lingkui Zhang 1, Yinqing Yang 1, Qi Tang 1, Johan Bucher 2, Xuelin Wang 1, Limei Yang 1, Mu Zhuang 1, Kang Zhang 3, Honghao Lv 4, Guusje Bonnema 5, Yangyong Zhang 6, Feng Cheng 7

Affiliations

Large-scale gene expression alterations introduced by structural variation drive morphotype diversification in Brassica oleracea

Xing Li et al. Nat Genet. 2024 Mar.

Abstract

Brassica oleracea, globally cultivated for its vegetable crops, consists of very diverse morphotypes, characterized by specialized enlarged organs as harvested products. This makes B. oleracea an ideal model for studying rapid evolution and domestication. We constructed a B. oleracea pan-genome from 27 high-quality genomes representing all morphotypes and their wild relatives. We identified structural variations (SVs) among these genomes and characterized these in 704 B. oleracea accessions using graph-based genome tools. We show that SVs exert bidirectional effects on the expression of numerous genes, either suppressing through DNA methylation or promoting probably by harboring transcription factor-binding elements. The following examples illustrate the role of SVs modulating gene expression: SVs promoting BoPNY and suppressing BoCKX3 in cauliflower/broccoli, suppressing BoKAN1 and BoACS4 in cabbage and promoting BoMYBtf in ornamental kale. These results provide solid evidence for the role of SVs as dosage regulators of gene expression, driving B. oleracea domestication and diversification.

© 2024. The Author(s).

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1

Fig. 1. Phylogenetic analysis and transposable element characteristics in B. oleracea.

a, Phylogenetic tree of 704 B. oleracea accessions. Different colors of branches indicate accessions from different morphotype groups. The images of the 27 representative accessions were placed next to their branches. The light blue, yellow and green backgrounds denote the following three main clusters: the wild/ancestral group, the arrested inflorescence lineage and the leafy head lineage. The red stars denote the 22 newly assembled genomes and the red rectangles denote five previously reported genomes. b, Phylogenetic tree of the 27 representative B. oleracea accessions, with the genome of B. rapa as the outgroup. c, The estimated insertion time (y axis) of all the full-length LTRs in the 27 B. oleracea genomes along the nine chromosomes (x axis) of B. oleracea. The lengths of nine chromosomes were normalized to 0–100, proportional to their physical lengths. Each dot represents one LTR insertion event. The heatmap denotes the density of the full-length LTRs. Purple bars below each chromosome denote centromeric regions detected by centromere-specific repetitive sequences. d, Distribution of insertion time of full-length Copia and Gypsy LTRs in the 27 individual genomes. Each line represents a genome in the left graph. The two circles show the Copia and Gypsy LTRs that can be clustered into groups with sequence similarity of ≥90%. e, The heatmap shows the TAD prediction on chromosome eight of T10 (as an example), in which the region colored in dark red denotes a TAD structure. The line charts below the heatmap show the density of Copia and Gypsy LTRs, respectively, highlighting the enrichment of Copia LTRs in the centromere region, which is surrounded by high density of Gypsy LTRs.

Fig. 2

Fig. 2. The syntenic pan-genome constructed from the 27 B. oleracea genomes.

a, The number of syntenic pan and core gene families in the 27 genomes. b, Composition of the syntenic pan-genome. The histogram shows the frequency distribution of syntenic gene families shared by different numbers of genomes. The pie chart shows the proportion of different groups of syntenic gene families. c, Percentage of different groups of syntenic gene families in each of the 27 genomes. d, Presence and absence information of all syntenic gene families in the 27 genomes. e,f, The average number of TE insertions in genes and the expression level of genes in different groups of syntenic gene families (two-sided Student’s t test; centerline, median; box limits, first and third quartiles; whiskers, 1.5× IQR). Different lowercase letters above the box plots represent significant differences (P < 0.05). g, Functional analysis (gene ontology) of lost genes in the syntenic softcore or dispensable gene families, in different B. oleracea morphotypes_,_ highlighting strong function enrichment associated with specific metabolites. The number of lost genes in different morphotypes is provided in the tree diagram. h, Syntenic gene families were separated into three groups corresponding to the numbers of homoeologs (single-, two- or three-copy) retained from the Brassica mesohexaploidization event. The percentage of gene families in different pan-genome classes for these groups is shown in each of the 27 B. oleracea genomes. IQR, interquartile range.

Fig. 3

Fig. 3. Characteristics of structural variations identified in the 27 B. oleracea genomes.

a, The distribution of GC content (33–41%), gene numbers (0–200 Mb−1) and TE density (20–100%) in the T10 reference genome, the nonredundant SVs (presence, 2–100 kb/Mb; absence, 20–400 kb/Mb and all SVs, 10–400 kb Mb−1) among 27 genomes, as well as the SNPs (10–40 kb−1) and InDels (10–30 kb Mb−1) identified in the 704 B. oleracea accessions. b, The number of different types of SVs from the nonredundant set of SVs in individual B. oleracea genomes. c, The number of SVs present in different numbers of query genomes. The bottom lines colored in light blue, light orange and light green mark these accessions from the wild/ancestral group, the AIL and the LHL, respectively. The sample IDs colored in light orange and light green denote accessions from broccoli/cauliflower and cabbage, respectively. The red rectangle marks the accessions of broccoli/cauliflower, highlighting the lower number of SVs in broccoli/cauliflower compared to the other accessions. d, The number of private SVs in wild B. oleracea, broccoli/cauliflower and cabbage genomes, showing significantly more private SVs in wild B. oleracea than in others (n = 7 versus 5 versus 7; two-sided Wilcoxon rank-sum test; centerline, median; box limits, first and third quartiles; whiskers, 1.5× IQR). e, The frequency distribution of SVs in the following five different genomic regions: upstream (within −3 kb), exon, intron, downstream (within +3 kb) and intergenic regions. The SV ratios in the five regions were calculated for each of the 27 genomes, and these values were then sorted and plotted from small to large for each of the five regions. f, The density of SV sequences per 100 bp in gene bodies and 5 kb flanking regions in the 27 B. oleracea genomes. The area plots mark the maximum and minimum values across the 27 B. oleracea genomes, and the lines denote average values.

Fig. 4

Fig. 4. SVs identified by multiple genome comparisons and their effect on the expression of associated genes (SV genes).

a, Different types of SV genes, based on the location of the SV relative to the gene, with data on expression, show a high proportion of SV genes with gene expression changes. b, The expression of SV genes from 6,526 syntenic gene families, with separated expression values for the absence and presence genotype groups (of corresponding SV). The x axis shows two groups, of which 3,536 and 2,990 syntenic gene families associated with suppression and promotion SVs, respectively. The y axis shows the normalized (z score) expression values. The green/yellow lines link the average expression values from each syntenic gene family for their presence and absence of genotype groups. c, Comparison of CpG island density and the ratio of highly methylated CpG islands between different types of SVs in −1.5 kb (n = 484 versus 369 versus 2,794; permutation test for 10,000 times; centerline, median; triangle, mean; box limits, first and third quartiles; whiskers, 1.5× IQR) or −3 kb (n = 153 versus 148 versus 1,391; permutation test for 10,000 times; centerline, median; triangle, mean; box limits, first and third quartiles; whiskers, 1.5× IQR). d, The expression fold changes of SV genes between the presence and absence of genotype groups. The black stars below the term ‘Suppress’ denote DNA methylation modifications. The x axis shows the distance between SV and SV genes.

Fig. 5

Fig. 5. GWAS analysis identified SVs associated with the cauliflower/broccoli morphotype and details of SVs that change the expression of genes BoPNY and BoCKX3.

a, Manhattan plot showing the SV signals associated with cauliflower/broccoli (significance was calculated by two-tailed Fisher’s exact test. A Bonferroni-corrected P < 0.05 was interpreted as significant). The light red dots show the top 5% P values and deep red dots show the top 1% P values. b, One SV is associated with BoPNY. c, The number of accessions with presence or absence SV (associated with BoPNY) genotype for broccoli/cauliflower accessions and all the other accessions (statistical test: two-tailed Fisher’s exact test). d, Expression comparison of BoPNY between SV presence and absence accessions (two-sided Student’s t test; centerline, median; box limits, first and third quartiles; whiskers, 1.5× IQR). e, Sequence methylation level around BoPNY between absence and presence genotype groups, which is negatively associated with the expression level of the gene. f, Two SVs associated with BoCKX3. g, The number of accessions with presence or absence SV (associated with BoCKX3) genotypes for broccoli/cauliflower accessions and all other accessions (statistical test: two-tailed Fisher’s exact test). h, The four possible haplotype groups are formed by two SVs. Haplotype 4 was not detected in our population. i, Expression comparison of BoCKX3 between the three haplotype groups (two-sided Student’s t test; centerline, median; box limits, first and third quartiles; whiskers, 1.5× IQR). j, Expression of BoCKX3 in different tissues of cauliflower and cabbage, highlighting high expression of this gene in leaf 2 of cauliflower. Leaf 1 denotes fresh leaf before curd initiation; leaf 2 denotes fresh leaf during curd development; curd 1 denotes developing curd; curd 2 denotes mature curd. ‘N’ indicates a missing value as cabbage makes no curds.

Fig. 6

Fig. 6. SVs derived by PIF/Harbinger-type TE insertions promote the expression of BoMYBtf in ornamental kale; overview of increased expression levels of genes in the 27 B. oleracea accessions with PIF/Harbinger insertions in their promoter regions.

a, One SV (PIF/Harbinger-type TE insertion) is associated with BoMYBtf. b, The number of accessions with presence or absence of SV (associated with BoMYBtf) genotypes for ornamental kale accessions and all other accessions (statistical test: two-tailed Fisher’s exact test). c, Expression comparison of BoMYBtf between SV presence and absence accessions (two-sided Student’s t test; centerline, median; box limits, first and third quartiles; whiskers, 1.5× IQR). d, TF-binding elements identified in the PIF/Harbinger insertion. e, Schematic diagrams of reporter constructs used for the LUC/REN assay. The upstream sequences of BoMYBtf from ornamental kale T18 (with TE, 1,239 bp), wild B. oleracea T10 (without TE, 951 bp), cabbage T20 (without TE, 968 bp) and the SV sequence (TE itself, 280 bp). The empty vector was set as mock control. The activities of these promoter constructs are reflected by the LUC/REN ratio (two-sided Student’s t test; data are presented as the mean ± s.d.). f, Distribution of the PIF/Harbinger insertion in the 27 B. oleracea genomes. g, Boxplot showing normalized (z score) expression of 44 syntenic gene families, with a PIF/Harbinger insertion within a −3 kb region from the nearest genes. The light blue and light purple backgrounds denote these syntenic gene families with PIF/Harbinger insertions located within −1.5 kb and −3 kb to −1.5 kb, respectively, of corresponding gene members (red stars); whereas the gray dots denote their syntenic gene members without PIF/Harbinger insertion (centerline, median; box limits, first and third quartiles; whiskers, 1.5× IQR).

References

    1. Francis A, Lujan-Toro BE, Warwick SI, Macklin JA, Martin SL. Update on the Brassicaceae species checklist. Biodivers. Data J. 2021;9:e58773. doi: 10.3897/BDJ.9.e58773. -DOI -PMC -PubMed
    1. Borpatragohain P, Rose TJ, King GJ. Fire and brimstone: molecular interactions between sulfur and glucosinolate biosynthesis in model and crop Brassicaceae. Front. Plant Sci. 2016;7:1735. doi: 10.3389/fpls.2016.01735. -DOI -PMC -PubMed
    1. Lee YR, et al. Reactivation of PTEN tumor suppressor for cancer treatment through inhibition of a MYC-WWP1 inhibitory pathway. Science. 2019;364:eaau0159. doi: 10.1126/science.aau0159. -DOI -PMC -PubMed
    1. Cheng F, et al. Subgenome parallel selection is associated with morphotype diversification and convergent crop domestication in Brassica rapa and Brassica oleracea. Nat. Genet. 2016;48:1218–1224. doi: 10.1038/ng.3634. -DOI -PubMed
    1. Guo N, et al. Genome sequencing sheds light on the contribution of structural variants to Brassica oleracea diversification. BMC Biol. 2021;19:93. doi: 10.1186/s12915-021-01031-2. -DOI -PMC -PubMed

MeSH terms

Grants and funding

LinkOut - more resources