The tutorial that is the basis for much of the lecture is: Intro paper by Kaelbling, Littman and Moore . Ballard Chapters 10 and 11 offer another treatment
The section on hidden state comes from work by Andrew Kachites McCallum, e.g. Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks, in From Animals to Animats, Fourth International Conference on Simulation of Adaptive Behavior, (SAB'96). Cape Cod, Massachusetts. September, 1996.
The best entree to the current RL research seems to be: The Reinforcment Page