Transcription is under tight regulatory control. Core promoter surrounding transcriptional start site (TSS) controls transcription initiation. As a functional important component in the genome, the core promoter must be under evolutionary selection to adjust gene expression level for better adaptation to different environments. While the basic structure of core promoter has been extensively studied, little is known for core promoter diversity in the human population and its relationship with diseases.
We analyzed human core promoter diversity by using the genomics data from 2,682 individuals of 25 worldwide human ethnic populations collected by the 1000 Genome Project. We identified 31,996 variants in the core promoter region (−100 to +100) in 12,509 human genes. Analyzing the variation data revealed the highly ethnic-specific features of core promoter variation in human population, identified the genes with highly variable core promoters and the motifs of core promoters highly affected by the variants, and the functional pathways affected by the genes with core promoter variation; eQTL analysis revealed the altered expression for the genes affected by their core promoter variation; comparison with GWAS database located 163 core promoter variants as the GWAS identified traits associated with multiple types of diseases. Data from our study highlights the highly diversified nature of core promoter in the humans, and implies that core promoter variation can play important roles in not only regulation of gene expression but also disease predisposition.
We developed this database to share with the research community the core promoter variation data generated from our study. The data should promote gene expression study and provide a reference source to identify disease-related mutations in the core promoter region.