April 25 ~ 26, 2026, Copenhagen, Denmark
Inés Pérez 1, Lara Suárez 1, Gabriel Novas 1, and Santiago Muíños 1,
Automatic retrieval and standardization of material data from scientific literature remains a critical challenge. This work introduces an artificial intelligence-based framework that extracts and structures material property data and converts it into the International System of Units using state-of-the-art opensource models (Mistral, LLaMA3). The system is highly modular, enabling independent development of specific information retrieval tasks, including PDF parsing, semantic table interpretation, text segmentation, reranking and embedding models for retrieval, and post-processing for data structuring, format validation, and unit conversion. Both supervised and unsupervised evaluation methods are employed to quantify accuracy, consistency, and confidence in the extracted data. By leveraging multiple complementary PDF parsing configurations, we implemented a robust unsupervised evaluation strategy that enhances output reliability. Large Language Models (LLMs) were systematically evaluated using both zero-shot and in-context learning, complemented by a custom scoring system for supervised assessment. The framework was applied to two representative manufacturing use cases (metal additive manufacturing and fiber-reinforced composites) to address the challenge of incomplete, scattered, and heterogeneous material property data in simulation workflows. Results show high extraction accuracy across diverse document layouts, with complementary parsing strategies improving LLM comprehension and tabular processing further boosting overall performance.
Information Retrieval, PDF Parsing, LLMs, Data Standardization, Material Properties
Bin Ge 1, Chunhui He 1, Qingqing Zhao 1, Chong Zhang 1, Jibing Wu 1
1 Laboratory for Big Data and Decision, National University of Defense Technology, Changsha 410073, ChinaIdentifying high-quality frontier topics from massive scientific research data to assist researchers in accurately conducting scientific research is of paramount importance. Traditional analysis methods face bottlenecks such as limited cross-domain adaptability, high resource consumption, and low efficiency. To address these challenges, this study proposes an AI-agent-based frontier topic mining method. An innovative generative-verification dual-agents (D-Agents) architecture is constructed. Specifically, prompt engineering is employed to develop a generative agent (G-Agent), which leverages the semantic understanding capabilities of large-scale pre-trained language models to automatically generate candidate frontier topics. Subsequently, a verification agent (V-Agent) is introduced to establish a multi-dimensional evaluation system, which automatically verifies candidate topics from dimensions including academic novelty, topic accuracy, and completeness of frontier topics. The effectiveness of the proposed method is validated through three manually labeled datasets in computer vision (CV), natural language processing (NLP), and machine learning (ML). Experimental results demonstrate that the D-Agents framework can simultaneously perform frontier topic mining tasks across multiple domains. On the three labeled datasets (CV-DataSet, NLP-DataSet, and ML-DataSet), the D-Agents achieve a precision exceeding 74% while maintaining a recall over 85%. Compared to traditional bibliometric methods, the proposed approach significantly improves precision and recall in frontier topic mining across three distinct fields: altitude sickness, recommendation systems, and oyster reef ecosystems, with performance exceeding 67%. The DAgents framework effectively mitigates the hallucination issue of G-Agent through its automatic generation and self-verification mechanism, thereby substantially enhancing the efficiency of frontier topic mining.
LLMs, Frontier Topics, Prompt Engineering, D-Agents, G-Agent, V-Agent, RAG
Lingzhi Gao1, Xuan Wang2, Tianrun Cai3, Xiunao Lin 4 and Chao Wu1*1Zhejiang University, Hangzhou, China2CHINA MEDIA GROUP, Beijing, China3University of Manchester, Manchester ,UK4Zhejiang Post & Telecommunication Construction Co.Ltd, Hangzhou, China
Chain-of-thought (CoT) prompting has shown great potential in enhancing the reasoning capabilities of large language models (LLMs), and recent studies have explored distilling this ability into smaller models. However, existing CoT distillation methods often overlook student model errors as valuable learning signals. In this paper, we propose CDFG, a two-stage distillation framework that treats model errors as opportunities for improvement. After an initial imitation-based training phase, the teacher model analyzes the student’s incorrect outputs and generates natural language feedback that highlights reasoning flaws and suggests correction strategies. The student model is then retrained using this guided input. Experiments on several mathematical reasoning benchmarks demonstrate that CDFG consistently improves student model performance. Our results show that incorporating feedback-driven learning into CoT distillation can enhance reasoning accuracy.
Chain-of-thought distillation, Large language model, Reasoning
Youssef Alothman , and Mohamed Bader-El-Den
1University of Portsmouth, UK
2Abdullah Al Salem University, Kuwait
The semiconductor manufacturing industry generates large volumes of highly imbalanced, non-stationary, and operationally critical textual data. Although transformer-based language models achieve strong classification accuracy, their robustness and probability calibration under industrial constraints remain insufficiently addressed, particularly in resource-limited deployments. This paper proposes LiteFormer, a lightweight and calibrated transformer framework for imbalanced industrial text classification. The approach integrates geometry-aware minority oversampling using D-SMOTE, imbalance-sensitive optimization through Focal Loss, and post-hoc temperature scaling for probability calibration. Experimental evaluation on a large-scale industrial Root Cause Analysis corpus demonstrates improved macro-F1 performance and substantially reduced Expected Calibration Error compared to standard transformer baselines, while maintaining computational efficiency. Results under temporal and domain shift further confirm stable performance and reliable confidence estimation.
Imbalanced text classification, lightweight transformers, probability calibration, focal loss, industrial NLP