Just completed reading “The Tangled web: A guide to securing modern applications” by Michael Zalewski.

The book is surprisingly small given the amount of information it covers about the interaction of web browsers, websites, and client-side web technologies.

The book starts with a discussion of what a valid URL could look like (http://yahoo.com:80@google.com/microsoft.com – think which site is being connected to here) and then discusses several fundamental building blocks of the modern web (like cookies) as well as standard technologies (like Flash) in depth. The issue of the same-origin policy and how it differs from DOM to the cookie to pseudo URLs are explained with amazing clarity.
One of the best things about this book is that it makes regular references to RFCs for authoritative answers and the corresponding deviant [and undefined] behavior implemented by the browsers.
The book also covers HTML5 security features in detail.
While reading the book, occasionally I felt information overload but I think the “Tangled web” and not the book “Tangled web” is responsible for that.

I would strongly recommend this book for anyone who deals with web[site] security as well as parsing HTML.

Disclosure: We both work for different teams in Google Security.