- ホーム
- > 洋書
- > 英文書
- > Computer / General
Full Description
AI images flood feeds, yet the models behind them feel mysterious. Relying on black boxes risks bias, errors, and costly creative dead ends. You deserve hands-on skills to build, audit, and improve these generators yourself. This book starts from a blank notebook, guiding every line of Python code. Learn transformers for vision, then craft diffusion models that sharpen noise into art. Finish with a custom system generating high-resolution images from any text prompt.
Vision transformer anatomy: Decode image patches and attention flows for transparent decision paths.
End-to-end diffusion pipeline: Transform random noise into detailed, photorealistic pictures you can trust.
Captioning and classification builds: Extend models to describe or categorize images for downstream tasks.
Fine-tuning walkthroughs: Adapt pretrained networks quickly, saving compute while boosting domain accuracy.
Deepfake detection skills: Differentiate authentic photos from generated fakes to safeguard projects and brands.
Fully runnable notebooks: Experiment, tweak, and visualize results without configuration hassles.
In Build a Text-to-Image Generator (from Scratch), the author combines clear prose, diagrams, and production-ready Python to deliver practical authority.
Starting with patch tokenization, you implement a vision transformer, then pivot to diffusion. Step-by-step chapters layer theory, code, and visual outputs, ensuring concepts click before you move on. By the final page you can craft, tune, and deploy image generators that suit your data, budget, and ethical standards. You control every hyperparameter and understand every pixel produced.
Ideal for data scientists and Python-savvy enthusiasts eager to master state-of-the-art image generation.
Contents
PART 1: UNDERSTANDING ATTENTION AND TRANSFORMERS
1 A TALE OF TWO MODELS: TRANSFORMERS AND DIFFUSIONS
2 BUILD A TRANSFORMER
3 CLASSIFY IMAGES WITH A VISION TRANSFORMER (VIT)
4 ADD CAPTIONS TO IMAGES
PART 2: INTRODUCTION TO DIFFUSION MODELS
5 GENERATE IMAGES WITH DIFFUSION MODELS
6 CONTROL WHAT IMAGES TO GENERATE IN DIFFUSION MODELS
7 GENERATE HIGH-RESOLUTION IMAGES WITH DIFFUSION MODELS
PART 3: TEXT-TO-IMAGE GENERATION WITH DIFFUSION MODELS
8 CLIP: A MODEL TO MEASURE THE SIMILARITY BETWEEN IMAGE AND TEXT
9 TEXT-TO-IMAGE GENERATION WITH LATENT DIFFUSION
10 A DEEP DIVE INTO STABLE DIFFUSION
PART 4: TEXT-TO-IMAGE GENERATION WITH TRANSFORMERS
11 VQGAN: CONVERT IMAGES INTO SEQUENCES OF INTEGERS
12 A MINIMAL IMPLEMENTATION OF DALL-E
PART 5: NEW DEVELOPMENTS AND CHALLENGES
13 NEW DEVELOPMENTS AND CHALLENGES IN TEXT-TO-IMAGE GENERATION
APPENDIX
INSTALL PYTORCH AND ENABLE GPU TRAINING LOCALLY AND IN COLAB



