Bioinformatics analysis reveals novel hub gene pathways associated with IgA nephropathy

Background Immunoglobulin A nephropathy (IgAN) is the most common primary glomerulopathy worldwide. However, the molecular events underlying IgAN remain to be fully elucidated. This study aimed to identify novel biomarkers of IgAN through bioinformatics analysis and elucidate the possible molecular mechanism. Methods Based on the microarray datasets GSE93798 and GSE37460 downloaded from the Gene Expression Omnibus database, the differentially expressed genes (DEGs) between IgAN samples and normal controls were identified. Using the DEGs, we further performed a series of functional enrichment analyses. Protein–protein interaction (PPI) networks of the DEGs were constructed using the STRING online search tool and were visualized using Cytoscape. Next, hub genes were identified and the most important module among the DEGs, Biological Networks Gene Ontology tool (BiNGO), was used to elucidate the molecular mechanism of IgAN. Results In total, 148 DEGs were identified, comprising 53 upregulated genes and 95 downregulated genes. Gene Ontology (GO) analysis indicated that the DEGs for IgAN were mainly enriched in extracellular exosome, region and space, fibroblast growth factor stimulus, inflammatory response, and innate immunity. Module analysis showed that genes in the top 1 significant module of the PPI network were mainly associated with innate immune response, integrin-mediated signaling pathway and inflammatory response. The top 10 hub genes were constructed in the PPI network, which could well distinguish the IgAN and control group in monocyte and tissue samples. We finally identified the integrin subunit beta 2 (ITGB2) and Fc fragment of IgE receptor Ig (FCER1G) genes that may play important roles in the development of IgAN. Conclusions We identified key genes along with the pathways that were most closely related to IgAN initiation and progression. Our results provide a more detailed molecular mechanism for the development of IgAN and novel candidate gene targets of IgAN.


Background
IgA nephropathy (IgAN), the most prevalent type of glomerulonephritis in humans, is characterized by mesangial cell proliferation, the expansion of the glomerular mesangial matrix. Nearly 25-30% of affected patients develop end-stage renal disease. Presently, several clinical biomarkers have been identified to be associated with IgAN progression, such as proteinuria, serum creatinine, hypertension and advanced histological involvement [1]. In 2011 [2], Suzuki et al. hypothesized that the pathogenesis of IgAN is based on four hits: first, the occurrence of an abnormal IgA1 glycosylation process leading to galactose-deficient IgA1 (Gd-IgA1); second, the formation of antiglycan antibodies against Gd-IgA1; third, the formation of nephrogenic circulating immune complexes; fourth, the deposition of these complexes in the mesangium of glomeruli, leading to renal injury with variable clinical expression. However, the exact pathogenesis is not very clear.
Many studies have also shown a genetic predisposition to IgAN [3]. Serino et al. found six significantly upregulated miRNAs, two of which modulate the O-glycosylation process of IgA1. Specifically, let-7b regulates the gene GALNT2 and miR-148 modulates the gene target C1GALT1, which has been considered an underlying biomarker to predict the probability of IgAN [4,5]. Wang et al. found that low urinary levels of miR-29b and miR-29c are correlated with proteinuria and renal function. High levels of miR-93 were correlated with glomerular scarring. miR-200a, miR-200b, and miR-429 have also been suggested as potential biomarkers to monitor the progression of the disease at the renal level in IgAN patients [6]. However, due to the lack of large-scale studies, the limitation of animal models and current lowthroughput genetic studies, the crucial genes involved in the development and effective treatment of IgAN have remained elusive.
Bioinformatics studies have been widely performed in various fields to extract potential information and reveal the underlying mechanics of various diseases. Recently, bioinformatics analysis has gradually provided insight into the molecular mechanisms of kidney disease. For example, PSMB8, as a novel hub gene, plays a significant role in the occurrence of membrane nephropathy [7]. In lupus, bioinformatics analysis revealed that CD38 and CCL2 are hub macrophage-related genes [8]. Additionally, EST1 may be a drug target for diabetic nephropathy treatment [9]. Currently, only a few bioinformatics analyses have been performed on IgAN; its critical associated genes and interactions have not been thoroughly investigated.
In the present study, two original microarray datasets were selected from the Gene Expression Omnibus (GEO) database. After identifying the differentially expressed genes (DEGs) in IgAN patients and control group, we employed the Database for Annotation, Visualization and Integration Discovery (DAVID) to identify the functions of the identified DEGs and performed Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses. The protein-protein interaction (PPI) network was generated using the STRING database, and hub genes and the most significant module among the PPI networks were identified using cyto-Hubba and the Molecular Complex Detection (MCODE) plug-in.
The present study aimed to identify potential novel candidate hub genes to diagnose and treat IgAN.

Microarray data
The microarray data were downloaded from the GEO database (http://www.ncbi.nlm.nih.gov/geo) using IgAN as the search term. GSE93798 is based on the Affymetrix Human GeneChip U133 2.0 platform (includes 42 samples, 20 IgAN patients and 22 healthy controls). GSE37460 is based on the Human Genome U133A Affymetrix platform (includes 54 samples, 27 IgAN patients and 27 healthy controls).

Identification of DEGs
The DEGs were identified based on the series matrix file using the Limma package in R software (version 3.5.0). An adj P value < 0.01 and a |log FC (fold change) | ≥ 1 were defined as the thresholds for DEG screening. The DEGs overlapped between the two datasets were identified and then used for further functional enrichment analysis. The overlapped DEGs were subjected to bidirectional hierarchical clustering analysis using the Pheatmap package in R to recognize and visualize the differences in DEGs between IgAN and the control.

PPI network construction and module analysis hub gene identification
The DEGs identified were subjected to PPI analysis using the search functionality of STRING (http://strin g.embl. de/) [12] to explore the association between the DEGs, and a network interaction matrix was built. An interaction with a combined score > 0.4 was set as the cut-off value. Next, the network was visualized using Cytoscape software [13], which is a broadly used tool to visualize the interaction networks among numerous biomolecules, including proteins and genes. The MCODE plug-in was used to identify the most significant module in the PPI networks with MCODE scores > 5, degree cut-off = 2, node score cut-off = 0.2, max depth = 100 and k score = 2. CytoHubba [14] is a tool used to identify hub objects and subnetworks from a complex interactome. 'MCC' is a topological analysis method in CytoHubba that was used to identify featured nodes and the hub genes from all the DEGs. The biological processes of the hub genes were visualized using the Biological Networks Gene Ontology  [15], with a significance threshold of 0.01 and Homo sapiens as the selected organism.

External microarray dataset validation
To validate the expression of FCER1G and ITGB2 in other glomerulonephritis types and IgAN, we used GSE104948 as an independent validate cohort, comprising focal segmental glomerular sclerosis(FSGS), minimal change disease (MCD), membranous nephropathy (MN), and thin membranous disease (TMD). The relative mRNA expression levels of FCER1G and ITGB2 in each patient were extracted from the raw data, which were then analyzed by Graphpad Prism 8. The data were presented as mean ± standard deviation. One-way analysis of variance was used to examine the differences between different groups. P < 0.05 was considered to indicate a statistically significant difference.

Identification of DEGs
We identified 96 samples, comprising 47 IgAN samples and 49 normal samples, in the two datasets. Based on the cut-off criteria of |log2 FC| ≥ 1.0 and adj P-value ≤ 0.05, 148 overlapping DEGs were shown by a Venn diagram obtained from the IgAN group vs. control group, comprising 53 upregulated and 95 downregulated genes. The results of the expression level analysis are presented in a volcano plot in Fig. 1a. As indicated in the clustering heat map (Fig. 1b), these DEGs could well distinguish the IgAN and control group completely.

Gene ontology and KEGG analyses of DEGs
To investigate the biological classification of DEGs, the overall genes in three ontologies were identified using DAVID. The cut-off criterion was set as P < 0.05. The GO function annotation is divided into three functional groups, cell component (CC), molecular function (MF), and biological process (BP

Construction of the PPI network and module analysis
To further investigate the interaction among the 148 DEGs, a PPI network was constructed from STRING (Fig. 4a). The most significant module was obtained using Cytoscape MCODE plug-in. The module comprised 15 nodes and 89 edges, including the Fc fragment of IgE receptor Ig (FCER1G), HCK proto-oncogene, Src family tyrosine kinase (HCK), TYRO protein tyrosine kinase binding protein (TYROBP), V-set and immunoglobulin domain containing 4 (VSIG4), colony stimulating factor 1 receptor (CSF1R), complement C1q A chain (C1QA), complement C3a receptor 1 (C3AR1), cytochrome b-245 beta chain (CYBB), hematopoietic cell-specific Lyn substrate 1 (HCLS1), integrin subunit beta 2 (ITGB2), interleukin 10 receptor subunit alpha (IL10RA), lysosomal protein transmembrane 5 (LAPTM5), neutrophil cytosolic factor 2 (NCF2), CD48 and CD53, which exhibited the highest score, 12.714 (Fig. 4b). These findings indicate that these genes exhibited higher hub degrees and could play critical roles in the development of IgAN. GO analysis of the module showed that the CC terms of the DEGs were mostly enriched in integral component of plasma membrane, cell surface and plasma membrane (Figs. 5a, 6a). The MF terms were mainly enriched in superoxide-generating NADPH oxidase, receptor activity and protein binding (Figs. 5b, 6b). The BP terms were mainly enriched in the innate immune response, integrin-mediated signaling pathway and inflammatory response (Figs. 5c, 6c).
KEGG pathway enrichment analysis revealed that the DEGs were mainly enriched in tuberculosis, natural killer cell-mediated cytotoxicity, osteoclast differentiation and Staphylococcus aureus infection (Figs. 5d, 6d).

Identification and analysis of hub genes
We exported the STRING data to Cytoscape to construct and visualize the PPI network by implementing cyto-Hubba. Thereafter, we implemented the MCC method to evaluate the significance of the genes in the network. The top ten genes included IL10RA, ITGB2, HCK, C3AR1, CYBB, LAPTM5, FCER1G, CD53, C1QA and TYROBP. Hierarchical clustering of the hub genes was performed as indicated in the clustering heat map (Fig. 7a); these hub genes could well distinguish the IgAN and control group completely. The biological process analysis of hub genes was performed using the BiNGO plug-in shown in Fig. 7b.
For further analysis, the microarray dataset GSE58539 was downloaded from GEO. This dataset contained 17 monocyte samples, including 15 monocytes samples isolated from IgAN patients and 2 monocytes samples isolated from a healthy control group. We used these selected hub genes for analysis. The scatter plot showed that each hub gene was significantly different between the IgAN and control group (Fig. 8a). Hierarchical clustering of the hub genes was performed. As indicated in the clustering heat map (Fig. 8b), the hub genes could well distinguish the IgAN and control group in monocyte sample. FCER1G and ITGB2 are the first and second-ranked hub genes. We further validated their gene expression in IgAN and determined whether they are specific for IgAN. We validated the relative gene expression in another independent cohort that combined IgAN and other primary glomerulonephritis types (such as FSGS, MCD, MN, and TMD) from the GEO database (GSE104948). Compared with LD, FCER1G and ITGB2 overexpression was observed in all the disease groups (P < 0.05) while the IgAN group showed much higher expression than other diseases (Fig. 9). Compared with the other glomerulonephritis types, FCER1G expression was much higher in IgAN (P < 0.05). ITGB2 expression has obvious overexpression in IgAN compared with the MN and MCD groups (P < 0.05).

Discussion
Bioinformatics analysis plays an important role in disease studies, facilitating the understanding of pathogenesis by integrating data at the genome level with systematic bioinformatics methods. In the present study, 148 DEGs were identified from microarray data by reanalyzing the datasets that could distinguish between the IgAN and healthy controls. Previous studies have shown a significant association of IgAN development and prognosis with the inflammatory reaction, and activation of TGF-β signaling is closely related to fibrosis in IgAN [16,17]. In our study, enrichment analysis revealed that the DEGs were significantly enriched in response to cAMP, cellular response to fibroblast growth factor stimulus and inflammatory response, a finding that is consistent with that in a previous study.
Infection plays an important role in the onset of IgAN, and nearly 30% of patients have a clear history of disease exacerbation after upper respiratory or gastrointestinal infections. Novak et al. reported that viruses (e.g., Epstein-Barr virus) or bacteria (e.g., Streptococcus) expressing GalNAc-containing moieties induce the development of IgG antiglycan autoantibodies, which might subsequently cross-react with glycans on IgA1, resulting in the formation of IgA1-IgG complexes [18]. This 'molecular mimicry' could also explain the association of macroscopic hematuria with upper respiratory tract infections. Yamamoto Y reported that the antigens of Haemophilus parainfluenzae are detected in the renal tissue of patients with IgAN [19]. In our study, KEGG analysis revealed that the DEGs were mainly enriched in pertussis and Staphylococcus aureus infection, a finding that coincides with the above studies.
Using STRING and MCODE, we selected the most important module, which comprised 15 nodes and 89 edges, including CSF1R, IL10RA, ITGB2, HCK, NCF2, C3AR1, CYBB, HCLS1, CD48, C1QA, VSIG4, LAPTM5, FCER1G, CD53 and TYROBP. Further GO analysis revealed that the BP was mainly enriched in the innate immune response, integrin-mediated signaling pathway and inflammatory response. Previous studies have shown a significant association of IgAN development and prognosis with the inflammatory reaction and innate immune response. Toll-like receptors (TLRs) are the key components of the mammalian innate immune system and mediate immune and inflammatory responses through binding PAMPs and/or DAMPs [20]. Many studies have confirmed the elevated expression of TLR4 mRNA in IgAN rats. In an in vitro coculture system of IgA and mesangial cells, TLR4 mediates MAPK activation and MCP-1 secretion, indicating that TLR4 is engaged in glomerular mesangium damage by inducing inflammatory cytokines in IgAN [21]. TLR4 is also involved in the activation of NF-κB, triggering the transcription of mRNA encoding many inflammatory mediators, such as cytokines, chemokines, and fibrinogen and contributing significantly to the effects of the innate and adaptive immune responses [22]. We further implemented the MCC method and selected 10 hub genes, all of which overlapped with the important module selected by MCODE. Hierarchical clustering of the hub genes showed that these hub genes could well distinguish the IgAN and control groups completely. We further introduced these genes into blood samples for testing, and the results showed that the genes also played a crucial role in differentiating disease from control in the blood tissue.
Hck is a member of the highly conserved Src family of cytoplasmic protein tyrosine kinases that transduce various extracellular signals. Hck has been reported to be significantly upregulated in diabetic nephropathy, IgA nephropathy, and lupus nephritis, and is a key mediator of renal fibrosis via its effects on inflammation, fibroblast cell proliferation, and regulation of TGF-β signaling [23].
LAPTM5, which is preferentially expressed in hematopoietic cells and localized to the lysosome, was initially isolated by a subtractive hybridization strategy between hematopoietic and nonhematopoietic cells. A recent study showed that LAPTM5 is a positive regulator of proinflammatory signaling pathways by facilitating NF-κB and MAPK signaling, as well as proinflammatory cytokine production in macrophages [24,25]. CYBB is also responsive to several inflammatory cytokines such as IFN-γ, LPS, and TNF-α [26]. CD53 codes for cluster of differentiation 53, a leukocyte surface antigen. Many studies have indicated that CD53 plays a substantial role in cellular stability and the inflammatory response to adverse conditions [27]. The inflammatory response plays an important role in IgAN, and the above hub genes were all identified to be involved in the pathogenesis of IgAN.
FCER1G is a protein coding gene that interacts with other factors and participates in various nuclear pathways [28]. Specifically, FCER1G is a constitutive component of the high-affinity immunoglobulin E receptor and interleukin-3 receptor complex and is mainly involved in mediating the allergic inflammatory signaling of mast cells, selectively mediating the production of interleukin 4 by basophils, and initiating the transfer from T cells to the effector T-helper 2 subset [29]. Additionally, FCER1G is associated with the progression of clear-cell renal cell carcinoma and may improve prognosis by affecting immune-related pathways. Furthermore, FCER1G is a critical molecule in signaling pathways and is widely involved in various immune responses and cell types [30]. Until now, no study has reported the association of FCER1G with IgAN.
ITGB2 is a protein coding gene that encodes an integrin beta chain, which combines with multiple different alpha chains to form different integrin heterodimers. Integrins are integral cell-surface proteins that participate in cell adhesion as well as cell surface-mediated signaling. The encoded protein plays an important role in the immune response, and defects in this gene cause leukocyte adhesion deficiency. ITGB2 was reported to be involved in cellular adhesion and ECM remodeling in patients with renal cancer [31]. Furthermore, ITGB2 was identified to be closely associated with apoptosis in patients with Alzheimer's disease [32]. Bioinformatics analysis in CKD patients showed that ITGB2, CTSS and CCL5 are correlated negatively with the eGFR of CKD patients [33].
In our study, FCER1G and ITGB2 were the first-and second-ranked hub genes, respectively, and BiNGO analysis confirmed that FCER1G is directly involved in the innate immune response. Further analysis uncovered that, except for IgAN, both hub genes exhibit higher expression in other primary glomerulonephritis types (FSGS, MCD, MN TMD). The latter finding indicates that these hub genes may also be associated with the pathogenesis of other primary glomerulonephritis types. Although these two genes have the highest expression in IgAN compared with other primary glomerulonephritis types, they may play an important role in IgAN but are not specific for the disease. Presently, limited research has reported the association between ITGB2 or FCER1G and IgAN. Further investigation of these two genes is warranted.

Conclusions
Through bioinformatics analysis, we identified hub genes involved in the pathological changes of IgAN. These genes not only can be used in tissue samples, but also play important roles in blood samples. The present study is the first to apply an integrated bioinformatics analysis to investigate novel candidate genes and mechanisms involved in the pathogenesis of IgAN. Among the genes, ITGB2 and FCER1G may play important roles in the development of IgAN and act as potential candidate molecular targets that deserve further research.