Summer Reading: AI Edition, A Review of “Generative Deep Learning: Teaching Machines to Paint, Write, Compose, and Play” by David Foster (O’Reilly Second Edition)

Blog

May 25, 2024

Sarah Oh Lam

May 25, 2024

Sarah Oh Lam

Artificial Intelligence

For your poolside reading, go ahead and add Generative Deep Learning to the summer reading bag. It’s not a legal thriller or self-help book, but it will improve your thinking for fall policy discussions! Policymakers may find that for less than $50, it could be worth the time and money to better understand how generative AI works under the hood.

Generative Deep Learning, Second Edition 2023, by David Foster
Source: O’Reilly Media, Inc.

The first part of the book explains the neural networks, weights, and probability analysis used by learning models to output prediction estimates. A good way to describe this first part of the book is the question, “Is this image a picture of a chihuahua or muffin?” For people who remember AI discussions ten years ago, this was the canonical example of artificial intelligence. Economists with training in econometrics will likely be quick studies of this section of the book. For any data scientists who have dabbled in Kaggle competitions, this section of the book explains the stacking and dropping layers to discover weights that are used to get make predictions.

The second part of the book goes into the “generative” mechanics that have made this wave of AI innovation such a paradigm-shift. Teaching machines how to paint and compose words, images, and audio has been the work of computer scientists over the last ten years with rapid consumer use in the last two.

How Generative AI Works

The book includes updates as of publication in May 2023, such as Midjourney, Dall.E 2, and Stable Diffusion. It’s been a year already since this book’s publication and in May 2024, these chapters may already be dated. But even as a May 2023 edition, it explains a lot of why current generative AI models are so powerful. It also explains large multimodal models such as text-to-image models, such as ControlNet, which is built on a Stable Diffusion model with fine-grained control of output. The figure below shows the ability of the model to change the image based on textual descriptions and prompts.

ControlNet, built on Stable Diffusion
Source: Lvmin Zhang, ControlNet (Foster, page 403, Fig. 14-6)

Another interesting section of the book describes so-called “world models” for reinforcement learning, which is from research that was first published in 2018.[1] Reinforcement learning is one of three major branches of machine learning, which also includes supervised learning (using labeled data) and unsupervised learning (using unlabeled data).

The training process in reinforcement learning is behind many of today’s generative AI applications. For example, the book shows readers the inner workings of a world model and the iterative and statistical evolutionary strategy for making predictions.

Covariance Matrix Adaptation Evolution Strategy (CMA-ES)
Source: CMA-ES, Wikipedia (Foster, Fig. 12-14, page 351)

The world model discussion may be reminiscent of computer science materials from the 1990s, and that’s because the basics are the same with a Model-View-Controller (MVC) software design architectures. Each part of the world model is trained separately. The figure below is an example from the book of the view, the model, and the controller. The training process in reinforcement learning allows models to train themselves on new tasks based on their understanding of the environment.

Reinforcement Learning in a “World Model” Architecture.
Source: Foster, Fig. 12-3, p. 336

The book also includes a chapter on generative music, explaining how music generation is a sequence prediction problem. Economists will recognize the autoregressive models from time series analysis that are applied here in treating musical notes as data sequences of tokens. Notes and durations of notes are tokenized or parsed to create training datasets.

Width Window of 4 Sixteenth Notes in Music Model
Source: Foster, p. 305, Fig. 11-5.

Heat Map of Prediction of Next Musical Note in Music Model
Source: Foster, p. 310, Fig. 11-9.

The inputs and outputs in a “musical Transformer model” are chunked in input windows of various sizes in the diagram. A width window of 4 sixteenth notes is illustrated. The next diagram shows how distributions of notes over time are collected and shown in a heat mat. The next note is predicted based on the training data.

How Much Do Policymakers Need to Know?

How much do policymakers need to know about how generative AI works? On the one hand, knowing how generative AI works is critically important for policymakers. As innovation moves forward and resources move with it, policymakers would do well to have a baseline understanding of the machinery in generative AI in order to direct institutional expertise to the policy questions around these new technologies.

On the other hand, understanding many of the details probably isn’t necessary for most policymakers. Do policymakers really need to understand how electricity or nuclear power is made? Isn’t the job rather to keep watch on Congressional lawmaking, administrative law, and separation of powers? Institutions such as the courts, Congress, and federal agencies are filled with people who are asking the questions, collecting the data, and hearing from experts in order to interpret the impacts of AI on society. The people who make decisions in these institutions should have a basic understanding of technology proportionate to the task at hand.

That optimal level of understanding is somewhere between competent and expert. Reading materials such as this accessible guide to Generative Deep Learning may just be what you need to keep up this summer!

[1] David Ha, Jurgen Schmidhuber, World Models (2018), https://arxiv.org/abs/1803.10122.

Sarah Oh Lam

+ posts

Sarah Oh Lam is a Senior Fellow at the Technology Policy Institute. Oh completed her PhD in Economics from George Mason University, and holds a JD from GMU and a BS in Management Science and Engineering from Stanford University. She was previously the Operations and Research Director for the Information Economy Project at George Mason School of Law. She has also presented research at the 39th Telecommunications Policy Research Conference and has co-authored work published in the Northwestern Journal of Technology & Intellectual Property among other research projects. Her research interests include law and economics, regulatory analysis, and technology policy.

Share This Article

View More Publications by
Sarah Oh Lam

Explore More Topics

Antitrust and Competition 181

Artificial Intelligence 34

Big Data 21

Blockchain 29

Broadband 382

China 2

Content Moderation 15

Economics and Methods 36

Economics of Digitization 15

Evidence-Based Policy 18

Free Speech 20

Infrastructure 1

Innovation 2

Intellectual Property 56

Miscellaneous 334

Privacy and Security 137

Regulation 12

Trade 2

Uncategorized 4

Shane Greenstein on Co-Invention and the Geography of AI Innovation

November 17, 2025

From Simple to Impossible: How Task Complexity Limits AI Research Assistants

November 4, 2025

Measuring AI Intensity by Occupation: Adjusting for Workforce Size

October 24, 2025

AI’s Energy Crisis Is Not What You Think

September 25, 2025

Needham’s Laura Martin on Why Disney Should Ditch ABC

July 30, 2025

Want AI Leadership? Stop Attacking the Science That Creates It

July 4, 2025

Does Agentic AI Require New Policy Frameworks?

May 21, 2025

DISCOVER THE LATEST IN TECH POLICY

Shane Greenstein on Co-Invention and the Geography of AI Innovation

From Simple to Impossible: How Task Complexity Limits AI Research Assistants

Monetizing AI: Subscriptions, Ads, or Something New with Catherine Tucker

EXPLORE OUR RESEARCH TOPICS

Antitrust and Competition

Artificial Intelligence

Broadband

Content Moderation

Economics and Methods

Free Speech

Evidence-Based Policy

Intellectual Property

Privacy and Security

Miscellaneous

Antitrust and Competition

Broadband

Content Moderation

Economics and Methods

Economics of Digitization

Evidence-Based Policy

Intellectual Property

Miscellaneous

Privacy and Security

Summer Reading: AI Edition, A Review of “Generative Deep Learning: Teaching Machines to Paint, Write, Compose, and Play” by David Foster (O’Reilly Second Edition)

How Generative AI Works

How Much Do Policymakers Need to Know?

Sarah Oh Lam

Share This Article

Get The Latest On Tech Policy In Your Inbox

View More Publications by Sarah Oh Lam

Recommended Reads

Avi Goldfarb on AI and Predictive Analytics

Kristina McElheran on The Effects of AI on Workers and Firms

The Case for Not Regulating AI … Or, At Least, Not Yet

Explore More Topics

Related Articles

Shane Greenstein on Co-Invention and the Geography of AI Innovation

From Simple to Impossible: How Task Complexity Limits AI Research Assistants

Measuring AI Intensity by Occupation: Adjusting for Workforce Size

AI’s Energy Crisis Is Not What You Think

Needham’s Laura Martin on Why Disney Should Ditch ABC

Want AI Leadership? Stop Attacking the Science That Creates It

Does Agentic AI Require New Policy Frameworks?

Preserving the Bottom Rung: How We Can Protect Experiential Learning in the AI Era

Connect with Us

Sign Up For Updates

Sign Up for Updates

View More Publications by
Sarah Oh Lam