DCN, NPM3 and SULF1 are hub genes related to vasculogenic mimicry in lung adenocarcinoma

Data source

Gene expression data and corresponding clinical information for LUAD were obtained from The Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov/), which included 541 tumor tissues and 59 adjacent normal tissues, serving as the primary training cohort. For independent validation, the GSE140797 dataset, comprising 7 LUAD and 7 normal samples, was downloaded from the Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/). A set of 82 vasculogenic mimicry-related genes (VMRGs) was curated from the "ANGIOGENESIS.v2023.2.Hs" and "HALLMARK_ANGIOGENESIS.v2023.2.Hs" gene sets within the Molecular Signatures Database (MSigDB, https://www.gsea-msigdb.org/gsea/msigdb).

Differential expression analysis (DEA)

DEA between LUAD and normal samples in the training cohort was performed using the "limma" R package. Genes with an absolute log2-fold change (|log2FC|) greater than 1 and an adjusted p-value (Padj) of less than 0.05 were defined as statistically significant differentially expressed genes (DEGs). The results were visualized using a volcano plot. Additionally, a heatmap was generated to display the expression patterns of the top 10 most significantly up- and down-regulated DEGs, ranked by their Padj values.

Weighted gene co-expression network analysis (WGCNA)

In this study, we employed VMRGs as the gene set and conducted single-sample gene set enrichment analysis (ssGSEA) to score all disease samples in the training set. The resulting ssGSEA scores were treated as quantitative traits to construct a co-expression network using the R package “WGCNA,” aiming to identify gene modules most strongly associated with the ssGSEA scores.

Following ssGSEA scoring, hierarchical clustering based on Euclidean distance of expression levels was performed to detect potential outlier samples; no outliers were identified and thus no samples were excluded. We then selected an appropriate soft-thresholding power (β) from 1 to 20 to ensure a scale-free co-expression network. Based on the criterion of achieving a scale-free topology fit index (R2) > 0.85, the optimal soft-thresholding power was determined as β = 3. The relationship between β and both the scale-free model fit (R2) and average connectivity was evaluated accordingly.

A signed co-expression network was constructed using β = 3, with a minimum module size set to 30 genes. Module-trait associations were assessed by correlating module eigengenes with the ssGSEA scores. Modules exhibiting a strong absolute correlation with the ssGSEA scores (|cor|> 0.3, p < 0.05) were considered key modules. Among these, the yellow and brown modules showed the most significant correlations with the VM phenotype (ssGSEA score).

To further prioritize genes within these key modules, we applied thresholds for Gene Significance (GS > 0.2), which reflects the correlation between gene expression and the ssGSEA score, and Module Membership (MM > 0.7), indicating the correlation between gene expression and module eigengene. This filtering process yielded 101 high-confidence candidate genes associated with VM.

Functional enrichment analysis and protein–protein interaction network establishment

Candidate genes were identified by taking the intersection of the DEGs and the VM-associated genes derived from the WGCNA. To elucidate their functional roles, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed using the clusterProfiler R package. The GO analysis comprehensively covered biological processes (BP), molecular functions (MF), and cellular components (CC). For all enrichment results, statistical significance was determined using an adjusted p-value (FDR) < 0.05 after applying the Benjamini-Hochberg (BH) procedure to control the false discovery rate in multiple testing. Furthermore, a protein–protein interaction (PPI) network was constructed using the STRING database with a minimum interaction score of 0.4 and visualized with Cytoscape software.

Machine learning

Hub gene identification was performed using two complementary machine learning algorithms: (1) Least Absolute Shrinkage and Selection Operator (LASSO) regression was carried out with the "glmnet" R package. The optimal regularization parameter (lambda) was determined through tenfold cross-validation. To ensure model parsimony and prevent overfitting, the lambda value corresponding to one standard error above the minimum mean squared error (lambda.1se) was selected. This procedure identified 12 candidate feature genes. (2) Random Forest (RF) analysis was implemented using the "randomForest" R package with 500 trees under default settings. Genes were ranked by importance according to the %IncMSE (Percentage Increase in Mean Squared Error) metric. A threshold of MeanDecreaseGini > 1 was applied, retaining the top 26 significant genes. The final set of hub genes was defined as the intersection of the genes selected by both LASSO and RF approaches.

Expression validation

The expression levels of candidate genes were compared between tumor and normal tissues in both the training and validation cohorts using Wilcoxon rank-sum tests, with results presented as bar graphs. Protein expression was further assessed using immunohistochemistry data from the Human Protein Atlas (HPA; https://www.proteinatlas.org/). A gene was designated as a key biomarker only if it exhibited consistent differential expression across all datasets and achieved statistical significance (P < 0.05).

Molecular regulatory network analysis

To explore the transcriptional regulation of the key genes, we predicted their interacting transcription factors (TFs) using the ChEA database via the NetworkAnalyst platform (https://www.networkanalyst.ca/), and visualized the resulting mRNA-TF interaction network. Subsequently, a protein–protein interaction and co-expression network for each key gene and its associated TFs was constructed using GeneMANIA (http://genemania.org/) to identify functionally related genes and uncover predominant interaction types and biological functions within the network.

Gene set enrichment analysis (GSEA)

GSEA was performed using the "clusterProfiler" R package to identify biological pathways associated with the key genes. The analysis involved ranking all genes in the training set based on their correlation with each key gene, followed by enrichment testing against the MSigDB collections "c2.all.v2024.1.Hs.symbols" (curated pathways) and "c5.all.v2024.1.Hs.symbols" (Gene Ontology terms). Pathways meeting the dual thresholds of nominal p-value < 0.05 and FDR < 0.05 were considered statistically significant.

Immune infiltration analysis

The immune infiltration landscape in the TCGA-LUAD cohort was characterized using ssGSEA implemented in the GSVA R package, which computed sample-wise enrichment scores based on predefined gene signatures representative of each immune cell population to quantify the relative abundance of 28 immune cell types. The resulting enrichment scores were visualized in a heatmap. Differences in immune infiltration between tumor and normal tissues were assessed using Pearson correlation analysis (visualized with the corrplot package) and Wilcoxon rank-sum tests (visualized with box plots). Immune cell subtypes showing significant correlations with key genes (|r|> 0.3, p < 0.05) were considered biologically relevant.

Cell culture and transfection

The A549 cell line was obtained from the China Center for Type Culture Collection (CCTCC, Wuhan, China) and cultured in RPMI 1640 medium (Gibco, 11875093) containing 10% fetal bovine serum (Gibco, A5256701) at 37 °C with 5% CO₂. To overexpress sulfatase 1 (SULF1), lentiviral vectors carrying the SULF1 sequence (Ov-SULF1) or an empty vector control (Ov-NC) were constructed using the GV493 backbone (Ledel Biotechnology, Guangzhou, China). After 48 h of transduction, transfection efficiency was assessed. Stable polyclonal cell lines were subsequently established by selection with 6 μg/ml puromycin (Beyotime, China).

RNA extraction and quantitative real-time polymerase reaction (qRT-PCR)

Total RNA was extracted with RNAiso Plus reagent, and cDNA was synthesized using the PrimeScript RT reagent kit under the following conditions: 25 °C for 10 min, 42 °C for 30 min, and 85 °C for 5 s. qRT-PCR was performed on an ABI 7600 system with SYBR Premix DimerEraser, using the following cycling protocol: 95 °C for 5 min, then 40 cycles of 95 °C for 15 s and 60 °C for 32 s. Gene expression was quantified via the 2^ (–ΔΔCt) method, normalizing to GAPDH as the internal control. The primer sequences used were as follows: SULF1 (forward: 5′-GGCTGCTCAGGAAGTAGATAG-3′; reverse: 5′-GCCAGTGGTTGTTGTCATG-3′) and GAPDH (forward: 5′-GGGAAACTGTGGCGTGAT-3′; reverse: 5′-GAGTGGGTGTCGCTGTTGA-3′).

(Livak and Schmittgen 2001).

Transwell assay

Cell migration was evaluated using Transwell chambers (BD Falcon, 353,097). Briefly, 1 × 105 treated cells in serum-free medium were seeded into the upper chamber, while the lower chamber was filled with complete medium as a chemoattractant. After incubation for 12–48 h at 37 °C, non-migratory cells on the upper surface of the membrane were carefully removed. Cells that had migrated to the lower surface were fixed, stained with 0.1% crystal violet, and counted using ImageJ software.

For the invasion assay, Transwell chambers were pre-coated with Matrigel matrix diluted 1:3 in serum-free medium and allowed to polymerize at 37 °C for 2 h, followed by overnight equilibration. Cells were then seeded following the same procedure as the migration assay. After 24–48 h, invaded cells were quantified using the same staining and counting method.

Three-dimensional cell culture

Tube formation ability was assessed with a Matrigel-based assay. In brief, 2 × 104 treated A549 cells were seeded per well into 96-well plates pre-coated with growth factor-reduced Matrigel (BD Biosciences, 356,234) and cultured for 48 h under standard conditions. The number of branch points per field was quantified under a phase-contrast microscope to evaluate VM formation.

Periodic acid-Schiff (PAS) staining

Cells were seeded in 96-well plates at a density of 1 × 104 cells per well and treated as indicated. After treatment, PAS staining was performed as follows: cells were washed with PBS, fixed with PAS fixative for 10 min, and rinsed with distilled water. They were then oxidized in 0.5% periodic acid for 15 min at room temperature, washed again, and incubated in Schiff’s reagent for 10 min in the dark. After a 10-min rinse under running water, nuclei were counterstained with hematoxylin for 2 min, followed by a final distilled water wash. Stained cells were visualized under a bright-field microscope.

Western blot analysis

Protein lysates were separated by 10% SDS-PAGE and transferred onto PVDF membranes. After blocking with 5% skim milk for 1 h at room temperature, the membranes were incubated overnight at 4 °C with primary antibodies against VEGF (ab46154, Abcam), TGF-β (81,746–2-RR, Proteintech), Vimentin (10,366–1-AP, Proteintech), and GAPDH (KC-5G5, Kangcheng Bio). Subsequently, membranes were incubated with appropriate HRP-conjugated secondary antibodies—Goat Anti-Rabbit IgG (1:20,000, 4050–05, SouthernBiotech) or Rabbit Anti-Mouse IgG (1:10,000, 6170–05, SouthernBiotech)—for 1 h at room temperature. Protein bands were visualized using an ECL detection system and quantified with a Bio-Rad imaging analyzer.

Animal model

Five-to-six-week-old female BALB/c nude mice (Guangdong MedKang Biotechnology Co., Ltd) were acclimatized for one week. Subsequently, 2 × 10⁶ A549 cells suspended in 200 μL PBS were subcutaneously injected into the right flank of each mouse (Day 0). Tumor size was measured every three days starting when the tumor diameter reached approximately 5 mm. For experimental procedures, mice were anesthetized with pentobarbital sodium (50 mg/kg, i.p.). Upon reaching the humane endpoint (tumor diameter ≤ 1.5 cm), mice were euthanized by CO₂ inhalation followed by cervical dislocation. Tumors were then harvested, photographed, and prepared for further analysis.

Statistical analysis

All statistical analyses were conducted using R software (version 4.1.3) and GraphPad Prism 8. Continuous variables are presented as mean ± standard deviation (SD). Differences between groups were evaluated using two-tailed Student’s t-tests, Pearson correlation analysis, and Wilcoxon rank-sum tests, with a P-value < 0.05 considered statistically significant.

Comments (0)

No login
gif