Just completed reading “The Tangled web: A guide to securing modern applications” by Michael Zalewski.
The book is surprisingly small given the amount of information it covers about interaction of web browsers, web sites and client-side web technologies.
The book starts with the discussion of what a valid URL could look like (http://yahoo.com:email@example.com/microsoft.com – think which site is being connected to here) and then discusses several fundamental building blocks of the modern web (like cookies) as well as standard technologies (like Flash) in depth. The issue of same-origin policy and how it differs from DOM to cookie to pseudo-urls is explained with amazing clarity.
One of the best things about this book is that it makes regular references to RFCs for authoritative answers and the corresponding deviant [and undefined] behavior implemented by the browsers.
The book also covers new (HTML5) security features in detail.
While reading the book, occasionally I felt information overload but I think the “Tangled web” and not the book “Tangled web” is responsible for that.
I would strongly recommend this book for anyone who deals with web[site] security as well as parsing HTML.
Disclosure: We both work for [different teams under] Google Security.