1 Hybrid Neural Network Model for Person-Job Matching
1.1 Local Semantic Feature Extraction Layer
A
This layer employs multi-scale convolutional neural networks to extract local semantic features from resume and job description texts. Addressing the recognition needs for skill keywords (such as Python, project management) and experience fragments (such as 3 years of internet development experience) in person-job matching scenarios
[1], the design includes parallel convolutional kernels of three sizes: 3×1, 5×1, and 7×1, capturing phrase-level, sentence-level, and paragraph-level features respectively. For input text vector sequence
, the feature extraction process of the
-th convolutional kernel is expressed as:
Where
is the convolutional kernel weight matrix,
is the convolutional kernel size,
is the bias term, and
denotes the convolution operation.
Local features are aggregated through max pooling:
Where
is the feature vector extracted by the
-th convolutional kernel.
Finally, all convolutional kernel outputs are concatenated to form the local semantic representation
, effectively capturing fine-grained semantics of skill terms and their contextual associations.
The process of parallel processing text sequences by 3×1, 5×1, and 7×1 convolutional kernels is shown in Fig. 1.
1.2 Global Semantic Association Modeling Layer
This layer models the global semantic dependencies between resumes and job descriptions based on the Transformer encoder architecture. It adopts a 6-layer encoder structure, with each layer containing 8-head self-attention mechanism and feedforward neural networks. The multi-head self-attention calculation formula is:
Where single-head attention is defined as:
Where
are the query, key, and value matrices respectively,
,
,
are projection matrices,
is the output projection matrix,
is the number of attention heads, and
is the dimension per head.
This mechanism captures semantic differences of "deep learning frameworks" between algorithm engineer and data analyst positions, as well as long-range associations between work experience and job requirements, by calculating attention weights between any two positions
[2]. Positional encoding uses sinusoidal-cosine functions to preserve sequence order information, and the feedforward network performs nonlinear feature mapping through two linear transformations and ReLU activation function, ultimately outputting the global semantic vector
.
1.3 Adaptive Feature Fusion Layer
To dynamically balance the contribution of local features and global semantics, this layer designs a gating fusion mechanism. Traditional fixed-weight fusion cannot adapt to different text characteristics, while the adaptive mechanism dynamically adjusts weights based on input content. The gating unit calculation formula is:
Where
is the Sigmoid activation function,
is the gating weight matrix,
is the bias term,
is the average pooling result of global features, and
is the fusion weight.
The final fusion representation is:
Where
is the fused semantic vector. When processing skill-intensive resumes,
tends toward 1 to strengthen local features; when facing positions with complex job responsibilities,
tends toward 0 to enhance global understanding.
This mechanism automatically learns the optimal weight allocation strategy through backpropagation, solving the core problem of multi-dimensional information fusion in person-job matching, enabling the model to possess both fine-grained recognition and macro-level association modeling capabilities.
1.4 Person-Job Matching Degree Calculation and Loss Function
The matching degree calculation module maps resume representation
and job representation
into a unified semantic space, using cosine similarity to measure matching scores:
Where
represents the matching degree between resume rr r and job jj j.
Addressing the "one-to-many" scenario characteristics of person-job matching, a combined loss function is designed. Contrastive learning loss is used to pull matching samples closer and push non-matching samples apart:
Where
is the positive job sample,
is the candidate job set, and
is the temperature coefficient.
Cross-entropy loss supervises the binary classification task:
Where
is the true label, and
is the predicted probability. The total loss is:
Where
are balancing coefficients.
This loss design balances both ranking and classification performance, adapting to the dual needs of enterprise batch screening and precise recommendation. The comparison of neural network architectures for person-job matching is shown in Table 1.
Table 1
Comparison of Neural Network Architectures for Person-Job Matching
Architecture | Local Feature Extraction | Global Semantic Modeling | Adaptive Fusion | Matching Scenario Adaptability |
|---|
CNN-only | Multi-scale convolution (3×1, 5×1, 7×1) | ✗ | ✗ | Skill keyword matching |
Transformer-only | ✗ | Multi-head self-attention (8 heads, 6 layers) | ✗ | Long-range dependency capture |
CNN-RNN | Single-scale convolution | Bidirectional LSTM | Fixed weight (0.5:0.5) | Limited to sequential text |
Hybrid Model (Ours) | Multi-scale convolution | Transformer encoder | Gated mechanism (dynamic α) | Multi-dimensional matching |
2 Person-Job Semantic Matching System Architecture Design
2.1 Hierarchical System Architecture
The system adopts a four-layer decoupled architecture to achieve data flow and functional modularization. The components and data flow relationships of the data layer, preprocessing layer, model layer, and application layer are shown in Fig. 2.
The data layer stores structured resume and job fields (name, education, work years, job responsibilities, etc.) based on MySQL, and uses Elasticsearch to build inverted indexes to support rapid retrieval of skill keywords[3]. Index granularity is refined to the word level, with query response time controlled within 50ms. The preprocessing layer encapsulates text cleaning pipelines, including HTML tag filtering, special character normalization, and Named Entity Recognition (NER) to extract skill terms and educational background, using the BERT-base model to convert text into 768-dimensional dense vectors. The model layer deploys TensorFlow Serving to provide RESTful API interfaces, supporting batch inference requests[4], capable of processing 128 resume-job pairs per batch, with GPU inference latency of approximately 35ms. The application layer integrates matching query modules, which call the model API after receiving job descriptions to obtain Top-K candidate resumes. The result visualization module displays skill matching degree distribution through heatmaps, and the model update module supports incremental training to adapt to new domain positions.
This architecture achieves independent module iteration through standardized inter-layer interfaces. The data layer and preprocessing layer use message queues (Kafka) for asynchronous communication to ensure high throughput. The model layer and application layer achieve low-latency interaction through gRPC protocol. The overall system concurrent capability reaches 1000 QPS, meeting the needs of tens of thousands of daily matching queries in medium and large enterprises.
2.2 Core Module Interaction Mechanism
Inter-module interaction follows the "request-processing-response" process, ensuring data consistency and service availability. When HR inputs job description text, the application layer first performs parameter validation (field completeness, text length limit of 512 tokens), then serializes the request into JSON format and sends it via HTTP POST to the preprocessing layer API gateway (using Nginx load balancing). The preprocessing layer calls the text cleaning service to remove redundant information, extracts core fields (such as "Python ≥ 3 years", "Bachelor's degree or above") through Named Entity Recognition service, and the BERT encoder converts cleaned text into vectors and caches them in Redis (TTL = 1 hour) to accelerate repeated queries. Vector data is transmitted to the model layer via gRPC interface, TensorFlow Serving loads pre-trained weight files to execute inference[5], outputting a list of resume IDs and matching scores. The application layer batch queries resume details from the data layer, sorts them in descending order of scores, and returns Top-20 results while asynchronously recording query logs to HDFS for offline analysis.
Inter-module communication adopts circuit breaker mechanism (Hystrix). When preprocessing layer response times out (threshold 200ms), it automatically downgrades to keyword matching strategy to ensure system robustness. Version management uses Docker containerized deployment. During model updates, blue-green deployment strategy is adopted, with new and old versions running in parallel for 24 hours before switching traffic. Rollback time is less than 5 minutes, ensuring continuous service availability.
2.3 System Performance Optimization
Multi-level optimization strategies are implemented for the high concurrency and low latency requirements of the person-job matching system. The functional components and technical specifications of system layers are shown in Table 2.
Table 2
Functional Components and Technical Specifications of System Layers
Layer | Core Component | Technology Stack | Performance Metric | Scalability |
|---|
Data Layer | MySQL + Elasticsearch | InnoDB storage engine, inverted index | Query latency: 50ms | Horizontal sharding (10M + records) |
Preprocessing Layer | Text cleaning + NER + BERT encoding | Python 3.8, spaCy 3.0, Transformers 4.20 | Throughput: 500 requests/s | Microservice cluster (5 nodes) |
Model Layer | TensorFlow Serving + GPU inference | TF 2.10, CUDA 11.6, TensorRT 8.4 | Batch size: 128, Latency: 35ms | Auto-scaling (2–10 pods) |
Application Layer | Matching query + Visualization | Flask 2.0, ECharts 5.3, React 18.0 | Concurrent capacity: 1000 QPS | CDN acceleration |
Model lightweighting compresses Transformer layers from 6 to 4 through knowledge distillation technology, reducing parameters by 35% while F1 score only decreases by 1.2%, and inference speed improves by 40%. Mixed precision training (FP16) and TensorRT optimization engine are adopted, GPU memory usage decreases by 50%, and single-card batch processing capability increases from 64 to 128. Cache mechanism is deployed in three levels: local cache (Guava Cache) stores high-frequency job vectors (hit rate 75%), distributed cache (Redis Cluster) stores candidate resume pools (expiration time 30 minutes), and database connection pool (HikariCP) reuses connections to reduce establishment overhead.
A
Load balancing adopts weighted round-robin algorithm
[6], dynamically adjusting weights based on server CPU and memory usage, combined with Kubernetes horizontal scaling strategy, automatically starting Pod replicas during traffic peaks. The incremental learning module monitors new job types, triggering model fine-tuning when sample size exceeds 500, using LoRA (Low-Rank Adaptation) technology to update only 0.1% of parameters, reducing training time from 8 hours to 45 minutes. Performance monitoring collects metrics such as QPS, latency, and error rate through Prometheus, with Grafana real-time visualization display. When P99 latency exceeds 100ms, alerts notify the operations team to ensure system stability. The technical implementation paths of model compression, multi-level caching, load balancing, and incremental learning are shown in Fig. 3.
图3 系统性能优化流程图
Figure 3: System Performance Optimization Flowchart
3 Experimental Validation and Performance Analysis
3.1 Person-Job Matching Dataset Construction and Experimental Setup
The experiment constructs training corpus by combining public datasets and self-built annotated sets. The public dataset uses the Job-Resume dataset[7] open-sourced by Zhilian Recruitment, containing 28,500 resume-job pairs across technical positions (algorithm engineers, backend developers), operational positions (product operations, market promotion), and management positions (project managers, department supervisors), with a positive-to-negative sample ratio of 1:3. The self-built annotated set is formed by crawling recruitment data from BOSS Zhipin and Lagou over the past 6 months, manually annotated to form 15,000 high-quality samples covering emerging positions (large model engineers, data governance experts). The dataset is divided into training set (30,450 samples), validation set (6,525 samples), and test set (6,525 samples) at a ratio of 7:1.5:1.5. The text preprocessing process includes: filtering HTML tags and special symbols using regular expressions, Chinese word segmentation using jieba segmentation tool[8], and constructing a stop word list by screening low-frequency words (occurrence count < 5) through TF-IDF algorithm[9].
Resume text average length is 385 tokens, job description average length is 210 tokens, uniformly truncated to 512 tokens to adapt to BERT input limitations. Data augmentation strategies include synonym replacement (WordNet), back translation (Chinese-English mutual translation), and random deletion, expanding training samples to 1.8 times the original volume. Experimental environment configuration: NVIDIA Tesla V100 32GB GPU, PyTorch 1.12 framework, batch size 32, learning rate 5e-5, using AdamW optimizer, training for 20 epochs, with early stopping strategy (stop when validation set loss does not decrease for 3 consecutive rounds).
3.2 Baseline Models and Person-Job Matching Evaluation Metrics
Four types of baseline models are selected for comparative experiments. Traditional methods include TF-IDF + cosine similarity (representing resumes and jobs as word frequency vectors then calculating similarity) and BM25 algorithm (considering word frequency saturation and document length normalization). Deep learning single models include TextCNN (3-layer convolution + max pooling), BiLSTM (bidirectional LSTM capturing sequence dependencies), and BERT-base (12-layer Transformer encoder). The existing hybrid model is CNN-RNN (3-layer convolution + 2-layer GRU, fixed weight 0.5:0.5 fusion). Comparisons of different models on Precision@k, Recall@k, and F1-score metrics are shown in Fig. 4.
The evaluation metric system is designed following both ranking and classification dimensions: Precision@k (k = 5, 10, 20) measures the proportion of truly matched positions in Top-k recommendation results, reflecting screening accuracy[10]; Recall@k evaluates recall rate to ensure quality candidates are not missed; F1-score balances precision and recall comprehensively; MAP (Mean Average Precision) calculates the mean average precision across all queries, adapting to "one-to-many" matching scenarios[11]; MRR (Mean Reciprocal Rank) evaluates the average reciprocal rank of the first correct match, reflecting recommendation top position accuracy[12]; AUC-ROC area under curve measures binary classification performance.
Addressing person-job matching scenario characteristics, additional metrics include Skill Matching Accuracy (SMA) and Experience Matching Accuracy (EMA)[13], calculating the model's recognition accuracy in skill keyword and work years dimensions respectively. Named Entity Recognition tools extract skill terms (such as Python, project management) and experience descriptions (such as 3–5 years) from true labels and predictions[14], calculating Jaccard similarity coefficient as the score.
3.3 Main Experimental Results Comparison
Experimental results show that the hybrid neural network model outperforms baseline models across all evaluation metrics. In overall performance, the model F1-score reaches 87.6%, improving by 9.2 percentage points over the best baseline BERT-base (78.4%) and by 12.5 percentage points over CNN-RNN (75.1%). MAP metric is 0.823, improving by 55.0% compared to TF-IDF (0.531), and MRR is 0.891, proving that the model can rank the most matched resume first. Detailed dimension analysis shows that Skill Matching Accuracy (SMA) reaches 92.3%, significantly higher than TextCNN (81.7%) and BiLSTM (79.5%), validating the multi-scale convolutional kernel's precise capture capability for skill keywords; Experience Matching Accuracy (EMA) is 88.1%, superior to BERT-base (82.6%), attributed to the Transformer encoder's effective modeling of long-distance semantics such as "3 years of experience at major internet companies".
Regarding performance differences across job types, technical positions (algorithm engineers, backend developers) have an F1-score of 89.2%, higher than operational positions (85.3%) and management positions (86.7%), because technical position resumes contain more structured skill terms, facilitating local feature extraction. Text length impact experiments show that when resume length exceeds 400 tokens, the model F1-score only decreases by 2.1%, while TextCNN decreases by 8.7%, proving the long text processing advantage of the global semantic modeling layer. In low-sample scenario testing, when the training set is reduced to 30%, the model F1-score is 81.4%, with generalization ability superior to CNN-RNN (68.9%), validating the robustness of the adaptive fusion mechanism. Performance comparison on person-job matching dataset is shown in Table 3.
Table 3
Performance Comparison on Person-Job Matching Dataset
Model | Precision@10 | Recall@10 | F1-score | MAP | MRR | SMA | EMA |
|---|
TF-IDF | 0.542 | 0.518 | 0.530 | 0.531 | 0.623 | 0.694 | 0.671 |
BM25 | 0.586 | 0.561 | 0.573 | 0.579 | 0.658 | 0.712 | 0.695 |
TextCNN | 0.734 | 0.709 | 0.721 | 0.728 | 0.801 | 0.817 | 0.763 |
BiLSTM | 0.718 | 0.695 | 0.706 | 0.711 | 0.785 | 0.795 | 0.748 |
BERT-base | 0.791 | 0.776 | 0.784 | 0.789 | 0.854 | 0.863 | 0.826 |
CNN-RNN | 0.762 | 0.741 | 0.751 | 0.756 | 0.824 | 0.828 | 0.791 |
Hybrid Model (Ours) | 0.883 | 0.870 | 0.876 | 0.823 | 0.891 | 0.923 | 0.881 |
| Note: Bold values indicate the best performance. SMA = Skill Matching Accuracy, EMA = Experience Matching Accuracy. |
3.4 Ablation Experiments and Module Contribution Analysis
To validate the effectiveness of each sub-module of the hybrid neural network, four ablation experiment groups are designed. The Full Model includes local semantic extraction layer, global semantic modeling layer, and adaptive fusion layer. Removing multi-scale convolution (w/o Multi-scale CNN) and keeping only single-scale 3×1 convolutional kernel, F1-score drops to 82.1% (-5.5 percentage points), SMA decreases to 85.7%, proving the key role of multi-scale design in capturing skill terms of different granularities[15]. Removing Transformer encoder (w/o Transformer) and using only CNN to extract features, F1-score drops to 79.8% (-7.8 percentage points), EMA decreases to 80.3%, validating the necessity of global semantic modeling for long-range dependency recognition.
Removing adaptive fusion mechanism (w/o Adaptive Fusion) and using fixed weight 0.5:0.5 fusion, F1-score drops to 83.4% (-4.2 percentage points), indicating that dynamic weight adjustment significantly improves model adaptability. Further analysis of gating unit weight distribution reveals that skill-intensive resumes (such as algorithm engineers) have an average α value of 0.68, strengthening local features[16]; positions with complex responsibilities (such as project managers) have an average α value of 0.42, focusing on global understanding, confirming the rationality of the adaptive mechanism[17]. Hyperparameter sensitivity testing shows that when Transformer layers decrease from 6 to 4, F1-score drops by 1.8%; when attention heads decrease from 8 to 4, it drops by 2.3%; convolutional kernel size combination (3, 5, 7) outperforms (2, 4, 6) by approximately 1.5 percentage points; learning rate at 5e-5 achieves optimal convergence effect. Comparison details are shown in Fig. 5.
3.5 System Engineering Performance and Typical Case Analysis
System engineering performance testing is conducted in production environment simulation scenarios, with deployment architecture of 4-core CPU (Intel Xeon E5-2680) + 1 GPU (NVIDIA Tesla V100). Concurrent stress testing uses Apache JMeter tool[18], simulating 1000 concurrent users submitting matching requests simultaneously. System average response latency is 85ms, P95 latency is 127ms, P99 latency is 168ms, meeting enterprise-level real-time query requirements (< 200ms). Throughput testing shows system QPS (queries per second) peak reaches 1200, CPU utilization stabilizes at 75%, GPU memory usage 18GB, with no memory overflow or service timeout phenomena. In terms of resource consumption, single matching query consumes 35ms GPU computing time, 28ms preprocessing time, 12ms database query time, and 10ms network transmission time, with reasonable time proportion for each module.
Typical case analysis selects person-job matching adaptation scenarios from a recruitment software and a university research. For algorithm engineer position requirements (Python, deep learning frameworks, 3 + years experience), the system recommends Top-10 results from 1000 candidate resumes, with 9 matching human screening results, achieving 90% accuracy. Specific cases show the system successfully identifies semantic associations between "proficient in using TensorFlow for model training" in resumes and "deep learning framework development experience" in job requirements[19], while traditional TF-IDF methods cause missed detections due to vocabulary mismatch. Comparing operational position cases (new media operations, data analysis capability), the system distinguishes the weight differences of "data analysis" between technical and operational positions, proving the scenario adaptation capability of the adaptive fusion mechanism[20].
Conclusion
This study proposes a person-job semantic matching system based on hybrid neural networks. By fusing the complementary advantages of CNN and Transformer, combined with adaptive gating fusion mechanism, it effectively improves the semantic association capture capability between resumes and job descriptions. The system adopts hierarchical engineering architecture, achieving end-to-end processes from data processing to result output. On the self-built person-job matching dataset, F1 score reaches 87.6%, skill matching accuracy and experience matching accuracy both outperform baseline models, and response latency meets enterprise-level deployment requirements.
Ablation experiments validate the key role of the adaptive fusion module in handling skill polysemy and experience associations. The system can be directly applied to scenarios such as precise screening in enterprise batch recruitment and job recommendation for talent career development, with potential for migration to text matching tasks such as recommendation systems and question-answering systems. Future research can be deepened from directions such as introducing skill knowledge graphs, exploring model compression and edge deployment, and enhancing matching result interpretability.