- ホーム
- > 洋書
- > ドイツ書
- > Mathematics, Sciences & Technology
- > Chemistry
Full Description
Accelerate materials innovation using language models and machine learning methods
Language models and machine learning are transforming how researchers discover, design, and optimize advanced materials. AI-Powered Innovation in Materials Science: The Role of Language Models in Discovery and Design provides a systematic exploration of these methods, from data mining and predictive modeling to autonomous experimentation. Written by award-winning researchers from the University of Science and Technology Beijing, this reference connects foundational AI theory with practical implementations.
The book covers the evolution of language models in materials science, demonstrating methodologies through real-world case studies in energy, sustainability, and advanced manufacturing applications. Readers gain actionable insights into predicting material properties before experimental validation, optimizing synthesis pathways, and uncovering hidden correlations in materials data. The authors critically analyze current challenges while mapping future directions for materials intelligence research.
You'll also discover:
Methodologies for integrating AI throughout the materials research pipeline from initial data mining through autonomous experimentation and discovery workflows
Practical case studies demonstrating how language models accelerate innovation in renewable energy, aerospace, and high-performance electronics applications
Frameworks for predictive modeling that minimize costly trial-and-error processes while optimizing synthesis pathways for scalable material production
Strategies for translating laboratory breakthroughs into practical manufacturing solutions through end-to-end lifecycle management and sustainability considerations
Critical analysis of current limitations and a comprehensive roadmap for developing next-generation materials intelligence capabilities and research directions
Materials scientists, theoretical chemists, computational scientists, and computer scientists working at the intersection of AI and materials research will find this book invaluable. It provides the theoretical foundations and practical methodologies needed to accelerate materials development for grand challenges in energy, sustainability, and advanced manufacturing.
Contents
Preface xi
1 The Revolution of AI for Materials 1
1.1 Introduction 1
1.2 What Is AI4Mater? 1
1.2.1 Definition 1
1.2.2 History 2
1.2.3 Motivation 4
1.3 Foundations and Frontiers 5
1.4 Previous Works 6
1.4.1 Materials Data Infrastructure 6
1.4.2 Machine Learning in Materials 9
1.4.3 Autonomous Experiments 16
1.4.4 Intelligent Computation 20
1.4.5 Intelligent Manufacture 24
References 26
2 Fundamentals of Language Models and NLP 37
2.1 Introduction 37
2.2 Historical Evolution of NLP 38
2.2.1 Statistical Language Models 38
2.2.2 Machine Learning Models 39
2.2.3 Deep Learning Models 41
2.2.4 Pre-training and LLMs 42
2.3 Core Architectures in Modern NLP 44
2.3.1 Language Models 44
2.3.2 Encoder-decoder 46
2.3.3 Transformers 49
2.4 Training and Optimization Methods for LLMs 50
2.4.1 Pretraining Strategy 50
2.4.2 Fine-tuning Strategy 53
2.4.3 Rag 55
2.4.4 Agent 57
2.4.5 Reinforcement Learning 58
2.4.6 Corpus Building 59
2.5 Major Language Model Families 62
2.5.1 Word2Vector 62
2.5.2 Bert 65
2.5.3 Gpt 68
2.5.4 T 5 70
2.6 Practical Tools and Libraries for NLP 73
2.6.1 Hugging Face 73
2.6.2 PyTorch 75
2.6.3 Nltk 76
References 78
3 Reinforcement Learning in Materials 89
3.1 Introduction 89
3.1.1 The Basic Concepts of RL 90
3.1.2 The Development History of RL 94
3.2 Key Algorithm 98
3.2.1 Bellman Equation 98
3.2.2 Value-based Algorithm 104
3.2.3 Policy-based Algorithm 110
3.2.4 AC Methods 111
3.3 Typical Applications of RL 117
3.3.1 Applications of RL in Materials Science 118
3.3.2 Applications of RL in LLMs 128
References 132
4 Materials Word Embedding Models 135
4.1 Introduction 135
4.2 Unsupervised Word Embeddings Capture Latent Knowledge 136
4.2.1 Foundations of Unsupervised Word Embeddings for Materials Science 136
4.2.2 Encoding Scientific Knowledge Through Semantic Relationships 138
4.2.3 Element Embedding Spaces and Periodic Table Correlations 141
4.2.4 Discovering and Predicting Materials with Word Embeddings 142
4.2.5 Historical Validation and Temporal Trends in Discovery Prediction 144
4.2.6 Unconventional Discoveries and Knowledge Beyond Composition 146
4.2.7 Conclusion 147
4.3 Context Similarity for Designing UHEAs 148
4.3.1 From Linguistics to Materials Science 148
4.3.2 Element Embeddings and Chemical Intuition 149
4.3.3 From Similarity Scores to Alloy Discovery 150
4.3.4 Correlation with Thermodynamic Predictors 151
4.3.5 Designing Lightweight HEAs 152
4.3.6 Integration with ICME and KGs 153
4.4 Conclusion 153
References 153
5 Materials Transformer-based Models 157
5.1 Introduction 157
5.2 Encoder-based Models for Materials 157
5.2.1 BatteryBERT 158
5.2.2 MatSciBERT 169
5.3 Decoder-based Models for Materials 179
5.3.1 Chemistry Assistant: A Decoder-based Model for MOF Synthesis Text Mining and Prediction 179
5.3.2 NatureLM: Unlocking the Language of Nature for Scientific Discovery 189
5.4 Conclusion 200
References 200
6 Materials Data Extraction from Literature by NLP and Large Language Models 205
6.1 Introduction 205
6.2 The Pipeline for Automatically Extracting Materials Data Using Nlp 206
6.2.1 Overview of NLP and Its Differences from LLMs in Data Extraction 207
6.2.2 Traditional NLP Pipeline 209
6.2.3 Recent Developments Using LLMs 212
References 214
7 Case Studies of Chemical Information Extraction 219
7.1 Introduction 219
7.2 ChemDataExtractor 219
7.3 A General-purpose Extraction Pipeline for Polymer Properties 229
References 239
8 Case Studies of Alloy Information Extraction 243
8.1 Introduction 243
8.2 Automated Pipeline for Superalloy Data by Text Mining 243
8.3 Alloy Synthesis and Processing by Semi-supervised Text Mining 266
References 291
9 Case Studies of Materials Synthesis Information Extraction 299
9.1 Introduction 299
9.2 Machine-learned and Codified Synthesis Parameters of Oxide Materials 299
9.3 Automated Extraction of Chemical Synthesis Actions From Experimental Procedures 306
References 314
10 Materials Predictive Modeling with Language-augmented Approaches 317
10.1 Introduction 317
10.2 Materials Descriptors and Representations 318
10.3 Structured Descriptors in Materials Science 318
10.3.1 Understanding Structured Descriptors 318
10.3.2 Element-based Descriptors of Materials Informatics 319
10.4 Textual and Contextual Features from Literature 323
10.4.1 Sparse Vector Space Models 323
10.4.2 Dense Vector Representations 324
10.4.3 Comparison of Different Methods 326
10.5 Atomic Structure and Graph-based Features 326
10.5.1 From Atomic Coordinates to Graph Representations 327
10.5.2 Classical Graph-derived and Structural Fingerprints 328
10.5.3 GNNs Learning Features from Atomic Architectures 329
10.6 Strategies for Multimodal Data Fusion 332
10.6.1 Early Fusion Feature-level Integration 333
10.6.2 Intermediate Fusion Joint Learning and Hybrid Architectures 333
10.6.3 Late Fusion Decision-level Integration and Ensemble Methods 334
10.6.4 Multimodal Strategies in Materials Informatics 334
References 336
11 Case Studies of Materials Predictive Modeling 339
11.1 Introduction 339
11.2 Polymer Property Prediction with Language Models 339
11.2.1 Data Preparation 340
11.2.2 Model Architecture and Pre-training 341
11.2.3 Fingerprints Characterization 345
11.2.4 Performance Evaluation and Future Potential 349
11.3 Steel Design and Process Optimization 351
11.3.1 Corpus Collection 353
11.3.2 Pre-training of SteelBERT 355
11.3.3 Prediction Model Architecture 356
11.3.4 SteelBERT Evaluation 357
11.3.5 SteelBERT Interpretability 358
11.3.6 Mechanical Property Prediction 361
11.3.7 Steel Design 365
11.4 Transformer-generated Atomic Embeddings to Enhance Prediction Accuracy 368
11.4.1 Leveraging Language Models for Material Representation 368
11.4.2 Characterizing Transformer Material Representations 369
11.4.3 Enhancing Property Prediction Accuracy with Embeddings 370
11.4.4 Summary and Future Directions 372
11.5 Fine-tuning Foundation Models for Materials Discovery 374
11.5.1 A Two-stage Fine-tuning Strategy for Domain Adaptation 375
11.5.2 An Empirical Evaluation of Fine-tuning Strategies 376
11.5.3 Unpacking the Benefits of MTL 378
11.5.4 Benchmarking and Application to Bandgap Prediction 379
References 380
12 Retrieval-augmented Generation for Materials Large Language Models 387
12.1 Introduction 387
12.2 LLM-powered Research Assistants: Design Principles 387
12.2.1 Design Principles for Literature-related Functions 388
12.2.2 Design Principles for Experimental Design 389
12.2.3 Design Principles for Data Analysis 389
12.2.4 Design Principles for Theoretical Research 390
12.3 Overview of the RAG Framework 391
12.3.1 Naive RAG 392
12.3.2 Advanced RAG 394
12.3.3 Modular RAG 395
12.4 Retrieval in RAG Framework 398
12.4.1 Retrieval Source 398
12.4.2 Indexing Optimization 400
12.4.3 Query Optimization 401
12.4.4 Embedding 402
12.4.5 Adapter 403
12.5 Case Studies in Materials 404
12.5.1 Methodology 405
12.5.2 Graph RAG Versus G-RAG 405
12.5.3 The G-RAG System Architecture: Key Components and Workflow 405
12.5.4 PDF Parsing 406
12.5.5 Entity Linking and Relation Extraction 407
12.5.6 Span Parser 408
12.5.7 Passage Processor 410
12.5.8 Experimental Settings 410
12.5.9 Results and Discussion 411
References 413
13 Fine-tuning and Application for Materials Large Language Models 419
13.1 Introduction 419
13.2 The Evolution, Capabilities, and Adaptation Strategies for LLMs 419
13.2.1 Data Preprocessing 421
13.2.2 Architectures 422
13.2.3 Pretraining Objectives 422
13.2.4 Supervised Fine-tuning 424
13.2.5 LLM Evaluation 425
13.3 Case Studies for Materials LLM Fine-tuning 427
13.3.1 Language-interfaced Fine-tuning for Classification and Regression 428
13.3.2 Beyond Fine-tuning of OpenAI Models 430
13.3.3 Regression 432
13.3.4 Stretching the Limits 433
13.4 LLM Evaluation Example 435
13.4.1 Benchmark Corpus 436
13.4.2 Benchmark Suite Design 437
13.4.3 Overall System Performance 438
13.4.4 Confidence Estimation 440
References 441
14 Materials Agents for Autonomous Research 449
14.1 Introduction 449
14.2 Architecture of Autonomous Materials Research Agents 450
14.2.1 Perception 451
14.2.2 Decision-making 451
14.2.3 Execution 452
14.2.4 Integration Framework of LLMs and Robotic Laboratories 453
14.2.5 Coordination Mechanisms of Heterogeneous Systems 462
14.3 Core Techniques in Agents 467
14.3.1 Natural Language Interaction Interfaces for Equipment Control 467
14.3.2 Design of Transformer Models for Experimental Planning 475
14.3.3 Reinforcement Learning Strategies for Iterative Optimization 485
References 491
15 Case Studies of Materials Agents 505
15.1 Introduction 505
15.2 AtomAgents: Alloy Design and Discovery Through Multi-agents 505
15.2.1 System Architecture of AtomAgents 506
15.2.2 Experimental Design and Verification 507
15.3 Coscientist: Autonomous Chemical Research with LLMs 512
15.3.1 System Architecture and Module Functions 513
15.3.2 Experimental Verification and Performance 515
15.4 ChemAgent: A Multiagent-driven Robotic AI Chemist Enabling Autonomous Chemical Research on Demand 522
15.4.1 Core Technology and System Architecture 522
15.4.2 Experimental Tasks and Performance Verification 525
15.5 ChatMOF: An AI System for Predicting and Generating MOFs using LLMs 535
15.5.1 Agent Module 536
15.5.2 Toolkit Module 536
15.5.3 Evaluation Module 544
15.6 Summary and Prospect 546
References 548
16 Challenges and Future Developments 551
16.1 Challenges 551
16.1.1 Numerical Understanding 551
16.1.2 Quantitative Prediction 551
16.1.3 Efficiency and Resource Optimization 552
16.1.4 Scientific Reasoning 552
16.2 Future Developments 553
16.2.1 Materials Data Circulation Infrastructure 553
16.2.2 Advances in AI Power, Algorithms, Models, and Tools 554
16.2.3 Federated Training of Materials Science LLMs 554
16.2.4 Intelligent Experimental Techniques 555
16.2.5 Multiscale Modeling and Digital Twin Technology 556
16.2.6 Building an Interdisciplinary Workforce 556
Index 559



