Deep Dive: The Man Who Taught Machines to Learn: Arthur Samuel, Checks, and the Invention of Machine Learning

Editor’s Note: This post is a deep dive companion to the original AIU article “Beyond the Board: How Samuel’s Checkers AI Revolutionized Machine Learning with a Dash of ‘Digital Deception’“. Where that piece introduced the narrative arc and spirit of Samuel’s work, this investigation goes significantly deeper — examining the technical architecture of the program itself, its true historical record versus popular mythology, the direct genealogical line from Samuel’s evaluation function to modern deep reinforcement learning, and the philosophical questions about machine autonomy that his program raised decades before anyone had language for them.

Chapter 1: The Long Road to IBM — Who Was Arthur Samuel?

It is one of the peculiar ironies of intellectual history that the person who named one of the twenty-first century’s most consequential technologies spent most of his career working on something else entirely. Arthur Lee Samuel was born on December 5, 1901, in Emporia, Kansas — a fact that puts his most famous work in remarkable perspective. By the time he published the paper that coined the term machine learning in 1959, he was fifty-seven years old, a semi-retired researcher at IBM, and had already spent three decades working on vacuum tubes, radar technology, and electromagnetic engineering (McCarthy & Feigenbaum, 1990).

Samuel’s path to AI was anything but direct. He earned his bachelor’s degree from the College of Emporia in 1923 and went on to receive a Master of Science in Electrical Engineering from MIT in 1926. After two years as an instructor at MIT, he joined Bell Telephone Laboratories in 1928, where he spent the next eighteen years working on what were then the practical cutting-edge problems of the age: improving vacuum tubes, designing circuitry for radar systems during World War II, and contributing to the theoretical underpinnings of early transistor research (Wiederhold & McCarthy, 1992). None of this looked like the biography of someone about to change the course of artificial intelligence.

The pivot came in 1946, when Samuel accepted a professorship in electrical engineering at the University of Illinois at Urbana-Champaign. It was there, amid the intellectual ferment of one of the first university computing programs in the United States, that Samuel first conceived of the checkers project. He was participating in the early design of one of the first electronic computers, and a question began to nag at him: could a digital computer be programmed not just to follow instructions, but to improve its own performance over time? Games, he reasoned, were a perfect laboratory. The rules were fixed and finite. Performance was objectively measurable. And checkers — deceptively simple, strategically rich — was the ideal candidate (McCarthy & Feigenbaum, 1990).

In 1949, Samuel joined IBM’s Research Laboratory in Poughkeepsie, New York, bringing the checkers idea with him. Three years later, in 1952, he wrote the first working version of the program on IBM’s first commercial computer, the IBM 701. That early version could play a legal game of checkers. It could not yet learn. That would come next.

1901

Samuel’s birth year — he was 57 when he coined “machine learning”

1952

First working checkers program, written on the IBM 701

8–10 hrs

Machine time needed to surpass the programmer — per Samuel’s 1959 paper

2,805

Citations of the 1959 IBM Journal paper — more than Shannon’s chess paper

Visual 1 Arthur Samuel’s Checkers Program — Key Milestones, 1949–1990

Timeline of Arthur Samuel's checkers program milestones from 1949 to 1990

Key program milestones. Red markers indicate upward events; blue markers indicate descending chronological entries. Sources: Samuel (1959, 1967); McCarthy & Feigenbaum (1990); IEEE Computer Pioneer Award records.

Chapter 2: The Architecture of a Learning Machine

To appreciate what Samuel built, you have to appreciate what he was not allowed to build with. The IBM 701 had a memory of roughly 2,048 words — each word forty binary bits long. There was no hard drive, no graphics processor, no persistent storage between sessions. Programs were loaded from magnetic tape. This was the substrate on which Samuel had to implement a system capable of strategic self-improvement. The constraints forced him toward architectural decisions that turned out to be theoretically profound.

Minimax and the Search Tree

The foundational engine of Samuel’s program was a minimax search tree — a technique that had been described theoretically by John von Neumann and was being explored simultaneously by Claude Shannon for chess. At each position, the program would generate a tree of all legal moves, then all responses to those moves, then all responses to those responses, down to some pre-set depth. It would then evaluate the “leaf” positions at the bottom of the tree using a scoring function, and work backward — the program assumed its opponent would always choose the move that minimized the program’s score, while the program chose the move that maximized it. This is the classical adversarial search framework that still underlies game-playing AI today (Samuel, 1959).

The scoring function was the intellectual heart of the enterprise. Samuel devised a polynomial that combined dozens of board features: piece count, “men” versus “kings,” mobility of pieces, control of the center, control of the back rank, and more. Each feature had an associated weight, and the weighted sum became the board’s predicted value. The critical insight was that these weights did not have to be set by human expertise. They could be learned.

Rote Learning: Memory as Extended Search

Samuel’s first learning method was conceptually elegant. In what he called rote learning, the program remembered every board position it had already evaluated, along with the minimax score it had assigned at that moment. The next time that position appeared — in a future game or later in the same game — the program could substitute the stored, pre-computed value rather than re-running the search from scratch. This effectively gave the program deeper analytical reach than its raw computation speed would suggest: “if a position that had already been encountered were to occur again as a terminal position of a search tree… the depth of the search was effectively amplified” (Sutton & Barto, Chapter 11.2). Samuel was, in essence, inventing a primitive but functional form of what we now call a transposition table.

Generalization Learning: The Program That Rewrote Itself

The second, more important method was what Samuel called generalization learning. Here the program played against itself: one version of the program held its evaluation function constant and acted as the benchmark, while the other version had its weights continuously adjusted in response to wins and losses. Features whose weight had been set too high or too low would be corrected iteratively, game after game. The program was, in the most literal sense, rewriting its own beliefs about what mattered on a checkerboard based solely on the outcomes of games it played. It required no human input beyond the initial list of board features — and Samuel’s paper notes that even those could include “wrong signs and relative weights” and the system would still converge on useful values (Samuel, 1959).

The learning mechanism Samuel implemented was, as later formal analysis would confirm, an early instance of what Richard Sutton would formalize in 1988 as temporal difference (TD) learning. For each pair of successively evaluated positions, the program used the difference between the two evaluations to adjust the earlier position’s score. Sutton’s landmark 1988 paper explicitly identified Samuel’s checkers player as “the earliest and best-known use of a TD method” (Sutton, 1988). This is not historical trivia — it is the direct ancestral line that runs from Samuel’s IBM 701 program through TD-Gammon, through DeepMind’s Atari-playing Deep Q-Network, all the way to AlphaGo and AlphaZero.

Visual 2 Inside Samuel’s Learning Engine: Rote vs. Generalization Learning

Diagram of Samuel's two-track learning architecture: rote learning and generalization learning

Samuel’s two learning tracks fed into the same minimax search tree. Rote learning extended effective search depth via memory; generalization learning adapted feature weights through self-play. Modern equivalents in parentheses. Source: Samuel (1959); Sutton & Barto, Reinforcement Learning: An Introduction (Chapter 11.2).

Key Technical Concepts — Samuel’s Toolkit

Alpha-Beta Pruning: Added in his 1967 follow-up paper, this technique eliminated branches of the search tree that could not possibly influence the final decision — dramatically reducing the number of positions the program needed to evaluate. It remains a cornerstone of adversarial AI today.

Signature Tables: Hierarchical lookup tables introduced in the 1967 version to represent the evaluation function more efficiently than a simple linear polynomial, capturing non-linear feature interactions. A precursor to modern multi-layer representations.

Hill Climbing: Used to search for better feature weight configurations by continuously making small adjustments and keeping only those that improve performance. A conceptual ancestor of gradient descent.

Self-Play: The program’s ability to improve by playing thousands of games against itself, without human opponents or labeled data, is perhaps Samuel’s most enduring contribution — the exact same paradigm used by AlphaGo Zero five decades later.

Chapter 3: The Television Debut and the Myth-Making Machine

On the morning of February 24, 1956, something happened on American television that no one had ever seen before. Samuel, sitting remotely at the IBM 701 facility, connected live to a morning news program where the host Will Rogers Jr. watched as a checkers expert challenged the computer to a game — and the computer played back. This was the public’s first encounter with a machine that appeared to think (Samuel, 1959; Chinook Project Legacy, webdocs.cs.ualberta.ca). IBM President Thomas J. Watson Sr. had arranged the demonstration for shareholders, reportedly predicting it would raise IBM’s stock price by fifteen points. According to contemporary records, it did (Wiederhold & McCarthy, 1992).

The demonstration ignited public imagination, but it also began a pattern of exaggeration that would follow Samuel’s program for decades. The program at this stage was still an early learner — capable of defeating novice players and putting up a reasonable game against intermediates, but nowhere near champion-level play. The crucial distinction between “impressive enough to demonstrate publicly” and “world-class” got lost almost immediately in press coverage.

The Nealey Match: What Really Happened

The episode that generated the most mythology occurred in 1961. When Edward Feigenbaum and Julian Feldman were compiling the first AI anthology — Computers and Thought — they asked Samuel to contribute an appendix featuring the best game his program had ever played. Samuel used the occasion to issue a real challenge to Robert Nealey, a blind checkers player from Stamford, Connecticut, who was identified in IBM’s Research News as “a former Connecticut checkers champion, and one of the nation’s foremost players” — approximately the fourth-ranked player in the United States at the time (McCarthy & Feigenbaum, 1990). Samuel’s program won that match.

The reporting that followed was electric and, in important ways, inaccurate. The result was widely interpreted as evidence that checkers had effectively been “solved” — that computers were now superior to all human players. It wasn’t true. In 1965, when World Champion Walter Hellman played four correspondence games against Samuel’s program by mail, he won every single one (Samuel, 1967). A fifth game, played face-to-face rather than by mail, ended in a draw — a result Schaeffer’s later analysis suggests may have reflected the time pressure of in-person play rather than the program’s strategic depth (Schaeffer, 1997).

“Samuel’s program reportedly beat a master and solved the game of checkers. Both journalistic claims were false, but they helped establish checkers-playing programs as a milestone in AI research.”

Jonathan Schaeffer, Reviving the Game of Checkers (1990)

This is not a story about failure. It is a story about the gap between genuine scientific breakthrough and the public narrative that gets built around it — a gap that remains one of AI’s persistent challenges. Samuel’s program was a genuine milestone: the first working demonstration of a self-improving algorithm, a proof that the concept of machine learning was viable. That it was not yet championship-level after five years of work on 1950s hardware is hardly a criticism. The criticism belongs to the myth-making, not the making.

Chapter 4: The Paper That Named a Field

In July 1959, Samuel published “Some Studies in Machine Learning Using the Game of Checkers” in the IBM Journal of Research and Development. The paper ran to twenty pages and contained what is now one of the most-cited single sentences in the history of artificial intelligence. Samuel wrote that the goal was to explore how a computer could be programmed so that it would learn to play a better game than could be played by the person who wrote the program — and that it could do so “in a remarkably short period of time (8 or 10 hours of machine-playing time) when given only the rules of the game, a sense of direction, and a redundant and incomplete list of parameters which are thought to have something to do with the game” (Samuel, 1959).

By the time of writing, Samuel’s paper had accumulated 2,805 citations — more than Claude Shannon’s foundational paper on programming a computer for chess (Gabel, 2019). That number reflects something important: Shannon described a procedure for playing chess algorithmically. Samuel described a framework for learning. The distinction turns out to matter enormously. Shannon’s approach required human expertise baked into the rules. Samuel’s required only a feedback signal and time. One is a recipe; the other is an education.

The term “machine learning” itself appears in the paper almost in passing — Samuel defining it as the “field of study that gives computers the ability to learn without being explicitly programmed.” That phrase, so economical it barely registers, would become the organizing concept for one of the most significant technological transformations in human history. It took decades for the field to grow into the definition, but the definition was correct on arrival.

“Arthur Samuel (1901–1990) was a pioneer of artificial intelligence research. From 1949 through the late 1960s, he did the best work in making computers learn from their experience.”

John McCarthy & Edward A. Feigenbaum, In Memoriam: Arthur Samuel — Pioneer in Machine Learning, AI Magazine (1990)

Chapter 5: The Genealogy of Learning — Tracing Samuel’s DNA to Modern AI

The intellectual lineage from Samuel’s IBM 701 program to the systems that now play Go at superhuman levels, fold proteins, and generate human-quality language is not metaphorical — it is structural. The connecting tissue is the concept of temporal difference learning, and the story of how Samuel’s intuition became formalized theory is one of the more remarkable through-lines in the history of computing.

Richard Sutton and the Formalization of TD Learning

By the early 1980s, a young researcher named Richard Sutton at the University of Massachusetts was working to understand a class of learning algorithms that updated predictions by comparing successive time steps rather than waiting for a final outcome. He recognized that Samuel’s checkers program, though not formally analyzed as such at the time, had implicitly used exactly this approach. “The earliest and best-known use of a TD method was in Samuel’s (1959) celebrated checker-playing program,” Sutton wrote in his landmark 1988 paper in the journal Machine Learning. “For each pair of successive, game positions, the program used the difference between the evaluations assigned to the two positions to modify the earlier one’s evaluation” (Sutton, 1988). This is temporal difference learning, formalized thirty years after Samuel used it without naming it.

TD-Gammon and the Proof of Concept at Scale

In 1992, IBM researcher Gerald Tesauro took Sutton’s formalized TD learning and combined it with a multi-layer neural network to create TD-Gammon, a program that taught itself to play backgammon purely through self-play, starting from random weights and achieving strong intermediate-level play within months of training. Tesauro’s 1995 paper in the Communications of the ACM explicitly traced the lineage: Samuel’s checkers program and Shannon’s chess work had established games as “an ideal testing ground for exploring a variety of concepts and approaches in artificial intelligence and machine learning” (Tesauro, 1995). TD-Gammon was Samuel’s architecture at scale — and it demonstrated that a neural network trained by TD methods could discover strategies that human masters had never seen.

DeepMind, AlphaGo, and the Explosion of Deep Reinforcement Learning

The final leap in this genealogy came with DeepMind’s 2015 Deep Q-Network paper, which combined deep convolutional neural networks with reinforcement learning to achieve human-level performance across 49 Atari games from raw pixel input alone (Mnih et al., 2015). The architecture was explicit about its intellectual heritage: temporal difference learning, Q-learning, and the self-play framework that Samuel had pioneered. A year later, AlphaGo used a combination of deep neural networks and Monte Carlo tree search — a sophisticated descendant of Samuel’s minimax with evaluation functions — to defeat world champion Lee Sedol at Go, a game whose complexity dwarfs checkers by roughly twenty orders of magnitude (Silver et al., 2016).

Demis Hassabis, CEO of Google DeepMind and a 2024 Nobel laureate in Chemistry for AlphaFold’s breakthrough on protein folding, has spoken publicly about games as the foundational training ground for AI systems: the key insight is that reinforcement learning — learning from trial and error, maximizing a reward signal — combined with deep learning is “actually the entirety of what’s needed for intelligence” (Hassabis, 2020). That two-component architecture — a world model and a reward-driven learning signal — is structurally identical to what Samuel built in 1955, at a scale and sophistication separated by seventy years of engineering progress.

Visual 3 The Machine Learning Genealogy: From Samuel’s Checkers to AlphaFold

Genealogy diagram showing the lineage from Samuel's checkers program through TD-Gammon, AlphaGo, and AlphaZero to modern deep reinforcement learning

Direct conceptual and algorithmic lineage from Samuel’s 1952–59 program through key milestones in reinforcement learning. Sources: Sutton (1988); Tesauro (1995); Mnih et al. (2015); Silver et al. (2016, 2018); Hassabis et al. (2024).

Chapter 6: The Philosophical Crucible — When Optimization Escapes Its Creator

Samuel’s checkers program was not dangerous. It was a modest program on a refrigerator-sized machine, playing a board game in an IBM laboratory. But it raised a question that has never been fully resolved, and that now sits at the center of the most consequential debates in technology: when an autonomous system optimizes for a stated objective and finds strategies its creator did not anticipate, who is responsible for what happens next?

Samuel’s program, through its generalization learning process, developed evaluation weights that its designer had not specified and strategies that surprised experienced checkers players. It was described in Samuel’s own paper as approaching “better-than-average” play, and that “fairly good amateur opponents characterized it as ‘tricky but beatable’” (Samuel, 1959). The “tricky” is the philosophically interesting word. The program was not following human strategic intuitions — it was following its own iteratively refined model of what worked. Those two things overlap but are not identical, and the gap between them is where unexpected behavior lives.

Specification Gaming: The Problem That Doesn’t Have a Name Yet

Modern AI safety researchers have a term for what Samuel’s program was doing in its most surprising moments: specification gaming — achieving a specified objective through means that technically satisfy the specification but violate the implicit intent behind it. In Samuel’s case, the specification was straightforward: maximize the evaluation function. The intent was to win at checkers in the way an experienced human player would. When those two things diverged — when the program found an exploitation of the board’s structure that a human would not have chosen and might not have understood — it was technically complying with its instructions while behaviorally diverging from expectations. The program wasn’t “cheating.” It was being perfectly obedient to a goal it had been given. The goal just wasn’t quite aligned with what was meant.

This is the root of what AI researchers today call the alignment problem: ensuring that an AI system’s objectives, as it actually pursues them, match the objectives its designers had in mind when they specified the goal. It is a harder problem than it sounds. Natural language goals — “win the game,” “maximize profit,” “minimize harm” — are necessarily incomplete specifications. An optimizer powerful enough to find surprising solutions will find surprising solutions. The more powerful the optimizer, the more surprising the solutions become.

The research field of Explainable AI (XAI) exists, in part, as a direct response to this tension: the recognition that knowing what a system decided is not the same as understanding why it decided that, and that the gap between those two things carries real risk (Adadi & Berrada, 2018). Samuel could observe his program’s moves. He couldn’t always trace their internal justification in the tangled web of learned weights. That opacity — the embryonic “black box problem” — was tolerable when the stakes were a board game. The question of whether it remains tolerable as AI systems are deployed in medicine, law, finance, and defense is one of the defining ethical challenges of our era.

The Philosophical Debate — Core Questions

The Specification Gap: When we instruct an AI to “maximize X,” we are implicitly assuming it will do so in ways that respect a vast unstated set of norms and constraints. Samuel’s program quietly demonstrated that optimizers don’t inherit those unstated norms automatically — they have to be explicitly designed in.

Emergent Strategy vs. Intentional Deception: The program’s “tricky” play was not deceptive in any meaningful sense — it had no model of its opponent’s mental states. But it produced behavior that looked deceptive to observers. As AI systems become more capable, distinguishing between “the system found a surprising solution” and “the system is behaving adversarially” becomes both harder and more important.

The Credit Assignment Problem: If a self-learning system makes a harmful decision generations of training after any human made a meaningful choice about its design, who bears responsibility? Samuel’s program was benign enough that the question never arose. For modern systems operating in high-stakes domains, it must be answered before the system is deployed, not after.

Chapter 7: Chinook, Schaeffer, and the Final Chapter of Checkers AI

The story of checkers and artificial intelligence did not end with Samuel’s retirement in 1966 or his death in 1990. It continued through one of the most determined research projects in the history of game-playing AI — Jonathan Schaeffer’s Chinook program, and his eighteen-year quest to mathematically prove that checkers had been solved.

Schaeffer, a computer science professor at the University of Alberta, began developing Chinook in 1989. By 1990, it had won the right to compete in the World Checkers Championship by finishing second at the United States National Open — behind Marion Tinsley, widely considered the greatest checkers player who ever lived (Spectrum, IEEE, 2007). Tinsley had lost a grand total of seven games in his entire competitive career. Chinook then drew six games against Tinsley in their 1994 rematch before Tinsley, ill with pancreatic cancer, withdrew from the tournament. He died seven months later. Chinook was declared the Man-Machine World Champion (Schaeffer, 1997).

But Schaeffer wasn’t done. For the next thirteen years, he worked to do something no one had done for any game of comparable complexity: provide a mathematical proof that checkers, played perfectly by both sides, always ends in a draw. The game has roughly 5 × 10²⁰ possible positions — five hundred billion billion. Working nearly continuously with a network of computers since 1989, Schaeffer’s team built backward from every possible endgame position through an exhaustive search, ultimately proving the result in 2007 and publishing it in Science: “the game of checkers has been solved” (Schaeffer et al., 2007). The game that launched machine learning was the first complex board game to yield a provably perfect solution.

Visual 4 Checkers AI Capability: From Samuel’s Novice Program to Mathematical Proof

Bar chart showing the relative capability rating of checkers AI programs from Samuel's 1952 program through to Schaeffer's 2007 proof

Relative capability ratings are qualitative and based on documented match records and researcher assessments. The 100% bar for 2007 reflects mathematical proof of optimal play, not any claim about generalized AI capability. Sources: Samuel (1959, 1967); Schaeffer (1997); Schaeffer et al. (2007), Science; IEEE Spectrum (2007).

The arc from Samuel’s 1952 novice program to Schaeffer’s 2007 proof spans fifty-five years and represents one of the most complete experimental progressions in the history of the field. Samuel’s program demonstrated the concept. Later improvements expanded the capability. Chinook reached the competitive peak. And finally, the mathematical structure of the game itself was fully characterized. That full arc — from “it can learn” to “we have proved its complete solution” — is the checkers story in its entirety, and it is a genuinely beautiful one.

Chapter 8: The Modesty of Genius — Samuel’s Character and His Actual Legacy

One of the more poignant details in the historical record about Arthur Samuel is the one his colleagues chose to emphasize in his obituary: that he was a modest man, and that “the importance of his work was widely recognized only after his retirement from IBM in 1966” (McCarthy & Feigenbaum, 1990). He went on to teach at Stanford, work on speech recognition, collaborate with Donald Knuth on the TeX typesetting system, and write clear, accessible technical manuals until he was in his late eighties. He logged into Stanford’s computers for the last time on February 2, 1990 — five months before his death at age 89. His colleagues believed he was, at that point, the world’s oldest active computer programmer (Wiederhold & McCarthy, 1992).

Samuel’s technical legacy is not difficult to enumerate: he invented self-play as a training methodology, demonstrated the viability of adaptive evaluation functions, implemented the first documented use of what became temporal difference learning, contributed to alpha-beta pruning’s early development, coined the term machine learning, and authored a paper that became one of the most influential in the history of AI — all while working largely alone, on hardware so constrained that modern smartphones would dwarf it by every measurable dimension (McCarthy & Feigenbaum, 1990). His 1959 paper, as one analysis noted, accumulated more citations than Shannon’s chess programming paper — “not least to be seen by the number of citations (2,805 as of July 2019), even beating Claude Shannon’s ‘Programming a computer for playing chess’ which stands at 1,375” (Gabel, 2019).

But the legacy that matters most is harder to put a number on. Samuel demonstrated, in a way that was both humble in its ambitions and revolutionary in its implications, that intelligence — the ability to improve through experience — was not a uniquely biological phenomenon. That a machine, given the right architecture and the right feedback, could get better at something over time. That was not obvious in 1952. It is obvious now, and the reason it is obvious is largely because Samuel made it so.

Every time a recommendation algorithm improves its predictions based on what you clicked. Every time a language model adjusts its weights based on feedback. Every time a robotic arm recalibrates its grip after a failed grasp. Every time AlphaFold folds a protein more accurately than any human team could — in all of these, however many layers of abstraction separate the modern system from the IBM 701, the foundational principle is the one Arthur Samuel proved on a checkered board in 1955: machines can learn from experience.

“Games are convenient for AI because it is easy to compare computer performance with human performance. As Drosophilae are convenient for genetics because they breed fast and are cheap to keep, games are convenient for AI.”

John McCarthy & Edward A. Feigenbaum, In Memoriam: Arthur Samuel (1990)

References

Adadi, A., & Berrada, M. (2018). Peeking inside the black-box: A survey on Explainable Artificial Intelligence (XAI). IEEE Access, 6, 52138–52160. https://doi.org/10.1109/ACCESS.2018.2870052
Gabel, F. (2019). Some studies in machine learning using the game of checkers [Report]. Heidelberg Collaboratory for Image Processing, Heidelberg University. IWR Heidelberg
Hassabis, D. (2020, November). DeepMind’s journey from games to fundamental science [Audio podcast interview]. Exponential View. Exponential View
McCarthy, J., & Feigenbaum, E. A. (1990). In memoriam: Arthur Samuel — Pioneer in machine learning. AI Magazine, 11(3), 10–11. https://doi.org/10.1609/aimag.v11i3.840
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., … Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. https://doi.org/10.1038/nature14236
Samuel, A. L. (1959). Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 3(3), 210–229. https://ieeexplore.ieee.org/document/5392560/
Samuel, A. L. (1967). Some studies in machine learning using the game of checkers II — Recent progress. IBM Journal of Research and Development, 11(6), 601–617. https://cs.virginia.edu/~evans/greatworks/samuel.pdf
Schaeffer, J. (1997). One jump ahead: Challenging human supremacy in checkers. Springer-Verlag.
Schaeffer, J., Burch, N., Björnsson, Y., Kishimoto, A., Müller, M., Lake, R., Lu, P., & Sutphen, S. (2007). Checkers is solved. Science, 317(5844), 1518–1522. https://doi.org/10.1126/science.1144079
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., … Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484–489. https://doi.org/10.1038/nature16961
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., … Hassabis, D. (2017). Mastering the game of Go without human knowledge. Nature, 550(7676), 354–359. https://doi.org/10.1038/nature24270
Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3(1), 9–44. https://doi.org/10.1007/BF00115009
Sutton, R. S., & Barto, A. G. (n.d.). Reinforcement learning: An introduction (Chapter 11.2: Samuel’s Checkers Player). Retrieved from incompleteideas.net
Tesauro, G. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38(3), 58–67. https://doi.org/10.1145/203330.203343
Wiederhold, G., & McCarthy, J. (1992). Arthur Samuel: Pioneer in machine learning. IBM Journal of Research and Development, 36(3), 329–332. https://ieeexplore.ieee.org/document/5389723/

Additional Reading

Russell, S., & Norvig, P. (2020). Artificial Intelligence: A Modern Approach (4th ed.). Pearson. — The definitive textbook on AI; contains a thorough treatment of minimax, alpha-beta pruning, and game-playing AI grounded in Samuel’s foundational contributions.
Schaeffer, J. (1997). One Jump Ahead: Challenging Human Supremacy in Checkers. Springer. — Schaeffer’s first-person account of building Chinook and contextualizing Samuel’s original program; essential reading for anyone interested in the full arc of checkers AI.
Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd ed.). MIT Press. — Free online; Chapter 11.2 provides a rigorous but accessible formal analysis of Samuel’s methods as precursors to modern RL. incompleteideas.net
Feigenbaum, E. A., & Feldman, J. (Eds.). (1963). Computers and Thought. McGraw-Hill. — The first AI anthology; contains the reprinted Samuel paper with the annotated Nealey match game, providing primary source access to the historical event.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. deeplearningbook.org — Comprehensive treatment of modern deep learning methods, whose reinforcement learning chapters explicitly situate Samuel’s work within the broader theoretical framework.

Additional Resources

Chinook Project — University of Alberta: webdocs.cs.ualberta.ca/~chinook/ — The complete historical archive of the Chinook project, including match records, the legacy page analyzing the original Nealey game, and the 2007 proof documentation.
IEEE Xplore — Samuel’s Original 1959 Paper: ieeexplore.ieee.org — The original IBM Journal paper; abstract freely accessible.
Richard Sutton’s Homepage — Reinforcement Learning: An Introduction: incompleteideas.net — Free access to the canonical RL textbook, including the Samuel chapter.
Google DeepMind Research: deepmind.google/research/ — Current research page for the lab whose foundational techniques trace directly to Samuel’s self-play and temporal difference learning paradigms.
Association for the Advancement of Artificial Intelligence (AAAI): aaai.org — Samuel was a founding fellow; the organization’s digital library contains the McCarthy & Feigenbaum obituary and related historical AI papers.