- ホーム
- > 洋書
- > 英文書
- > Computer / General
Full Description
This book focuses on the new paradigm of artificial intelligence and systematically introduces the key technologies, foundational models, and typical applications of multimodal large models. To make the technical content more accessible for lower-year undergraduate students and newcomers to the AI field, the book presents each key technical point in an easy-to-understand manner and provides numerous intuitive examples. It deeply analyses the structure and technology of several classic multimodal large models. The aim is to offer readers a clear guide to the technical methods, open-source platforms, and application scenarios of multimodal large models, as well as to provide insights into achieving general artificial intelligence, including cutting-edge technologies such as causal reasoning, world models, embodied intelligence, and multi-agent systems. The book aspires to provide a clear perspective for both academia and industry, helping AI researchers gain a more comprehensive understanding of multimodal large model technologies and the development directions of the next generation of artificial intelligence.
The book is divided into five chapters. Chapter 1 explores the most representative large model structures in depth. Chapter 2 provides a thorough analysis of the core technologies of multimodal large models. Chapter 3 introduces several representative multimodal large models. Chapter 4 delves into three typical applications: visual question answering, AI-generated content (AIGC), and embodied intelligence. Chapter 5 discusses feasible approaches to achieving general artificial intelligence.
This book is suitable not only as a textbook for senior undergraduate and graduate students in relevant university programs but also as an essential reference for IT professionals. The Chinese version of this book has been selected for the undergraduate textbook series at Sun Yat-sen University.
The translation was done with the help of artificial intelligence. A subsequent human revision was done primarily in terms of content.
Contents
.- 1 The Large Model Family
.- 2 Core Technology of Multimodal Large Models
.- 3 Multimodal Foundation Models
.- 4 Applications of Multimodal Large Models
.- 5 Multimodal Large Models Towards AGI.



