Introduction to Information Retrieval is my book of the week. It is co-authored by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze. The authors generously make the e-version of the book freely available to the public. I benefited from this generosity and read the pdf version. It is very convenient to follow the links in pdf, although the lack of reverse-link in pdf makes it hard to navigate back to the source of the link.
This book is very comprehensive and probably the best textbook available if you wish to know about information retrieval. There are both fundamental and advanced topics covered. Moreover, each chapter includes a References and Further Reading section, providing more resources for readers who would like to dive further into specific topics. The other notable attribute of this book is its clarity in explaining the concepts without introducing unnecessarily complicated formula. The texts accompanying the algorithms express the logic clearly. If you are still unsure about how certain algorithm works after reading the text part, thinking through one of the several exercises typically included in each chapter helps a great deal.
The version online is from April, 2009. There are a great amount of recent advances in the information retrieval field that are not covered here. However, grasping the content in the book would no doubt help to better understand more recent works. There are more recent lecture notes based on this book available online that I have not explored yet, partially because I have one fairly recently published book, Information Retrieval: Implementing and Evaluating Search Engines, on my to-read list for the near future. Should you become positively obsessed with this topic, like me, you might appreciate that the authors also very helpfully offer a comprehensive list of information retrieval resources.
In the preface, the authors talk about the organisation of this book in depth. Here is my feeble attempt to show you what this book covers at a very high level. If you are interested in learning how search engines work or how to build one, Chapter 1 to 8 cover the basics, such as an inverted index, index construction, compression, vector space model, relevance score calculation, evaluation and so on. Chapter 9 on relevance feedback and query expansion is of great guidance for real-world projects, as I was handling such a challenge in my work while reading this book. In the authors’ words: it discusses methods by which retrieval can be enhanced through the use of techniques like relevance feedback and query expansion, which aim at increasing the likelihood of retrieving relevant documents. Chapters 9 to 18 cover more advanced topics, for example: probabilistic language model, text classification, clustering and latent semantic analysis. Chapters 19 to 21 dive into web search basics and more in depth on crawling, indexing and finally link analysis. Forgive me for my lack of diligence here, since there could be no better overview than the one written by the authors in the preface.
I enjoyed reading this book no less than the Cicero trilogy (Imperium, Conspirata and Dictator) and made many notes for future re-visits.
One thought on “Introduction to Information Retrieval”
Comments are closed.