Resources
Main Textbook
- Introduction to Information Retrieval, by Christopher D. Manning, Prabhakar Raghavan & Hinrich Schütze, Cambridge University Press, 2008. The main textbook of this course.
Coursera
- Introduction to Information Retrieval, by Christopher D. Manning
Articles
- "Database and Information-Retrieval Methods for Knowledge Discovery," by Gerhard Weikum, Gjergji Kasneci, Maya Ramanath and Fabian Suchanek. Communications of the ACM. Vol. 52, No. 4, April 2009, pp. 56-64.
- "Challenges on Distributed Web Retrieval," by Baeza-Yates, R.; Castillo, C.; Junqueira, F.; Plachouras, V.; Silvestri, F. Proceedings of the IEEE 23rd International Conference on Data Engineering (ICDE 2007). 15-20 April 2007 Page(s):6 - 20.
- "The anatomy of a large-scale hypertextual Web search engine," by Sergey Brin, Larry Page, Computer Networks and ISDN Systems, vol. 30, no. 1-7, pp. 107-117, 1998.
- "Combining Systems and Databases: A Search Engine Retrospective," by E. Brewer. In Readings in Database Systems, Hellerstein and Stonebraker (editors), 2005. MIT Press.
Useful Textbooks
- Search Engines: Information Retrieval in Practice. Bruce Croft, Donald Metzler, Trevor Strohman. Addison-Wesley, 2009. An introductory book for search engines. Recommended for undergraduate students.
- Modern Information Retrieval. R. Baeza-Yates, B. Ribeiro-Neto. Addison-Wesley, 2011.
- Introduction to Information Retrieval, by Christopher D. Manning, Prabhakar Raghavan & Hinrich Schütze, Cambridge University Press, 2008.
- Modeling the Internet and the Web: Probabilistic Methods and Algorithms. P. Baldi, P. Frasconi, P. Smyth. Wiley, 2003. Recommended for those who have a good foundation in probability theory, but are new to IR.
- Google's PageRank and Beyond: The Science of Search Engine Rankings, Amy N. Langville & Carl D. Meyer, Princeton University Press, 2006. It offers a comprehensive and erudite presentation of PageRank and related search-engine algorithms, and it is written in an approachable way, given the mathematical foundations involved.
- Information Retrieval: Algorithms and Heuristics. D.A. Grossman, O. Frieder. Springer, 2004. #1 or #2 seller on Amazon.
- Managing Gigabytes. I.H. Witten, A. Moffat, T.C. Bell. Morgan Kaufmann, 1999. The authority on index construction and compression.
- Mining the Web: Analysis of Hypertext and Semi Structured Data. S. Chakrabarti. Morgan Kaufmann, 2002. The best introduction for Web-centric IR.
- Finding Out About. R. Belew. Cambridge UP, 2001. Suitable for undergraduate classes.
- Information Retrieval: A Health and Biomedical Perspective. W.R. Hersh. Springer, 2002. As the title says: a health/biomedical perspective.
- TREC: Experiment and Evaluation in Information Retrieval. E.M. Voorhees, D.K. Harman. MIT Press, 2005. A survey of recent research results.
- Language Modeling for Information Retrieval. W.B. Croft, J. Lafferty. Springer, 2003. Language models are of increasing importance in IR.
- Readings in Information Retrieval. K. Sparck Jones, P. Willett. Morgan Kaufmann, 1997. A collection of classical IR papers.
- Recommended Reading for IR Research Students. A. Moffat, J. Zobel, D. Hawking. SIGIR Forum, 39(2), 2005. Not a book, but a collection of seminal papers, more up-to-date than Sparck-Jones et al.
- Information Storage and Retrieval Systems. G. Kowalski, M.T. Maybury. Springer, 2005. Takes a system approach, discussing all aspects of an Information Retrieval System.
- The Geometry of Information Retrieval. C.J. van Risjbergen. Cambridge UP, 2004. An ambitious attempt to develop quantum mechanics as a new foundation for IR.
- Introduction to Modern Information Retrieval. G.G. Chowdhury. Neal-Schuman, 2003. Intended for students of library and information studies.
- Text Information Retrieval Systems. C. Meadow, B. Boyce, D. Kraft. Academic Press, 2000. Takes a library/information science perspective.
- Readings in DataBase Systems. Hellerstein and Stonebraker (editors), 4th edition, MIT Press, 2005. A collection of classical DB papers.
- Lucene in Action. O. Gospodnetic, E. Hatcher, Manning, 2004. It is packed with examples and advice on how to effectively use Lucene.
Journals
- ACM Transactions on Information Systems (TOIS): dblp home
- Information Processing and Management (IPM): dblp home
- Information Retrieval: dblp home
- International Journal on Digital Libraries: dblp home
- Journal of the American Society of Information Science and Technology (JASIST): dblp home
- SIGIR Forum: dblp home
- Data & Knowledge Engineering: dblp home
- Information Processing Letters: dblp home
- Information Systems: dblp home
- Journal of Intelligent Information Systems: dblp home
- Knowledge and Information Systems: dblp home
- Foundations and Trends in Information Retrieval: home
Proceedings of the International IR Conferences
Useful Papers
- J. Zobel and A. Moffat, Inverted files for text search engines, ACM Computing Surveys, vol. 38, no. 2, 2006.
- S. Melnik, S. Raghavan, B. Yang, H. Garcia-Molina, Building a distributed full-text index for the Web, ACM Transactions on Information Systems, vol. 19, no. 3, pp. 217-241, 2001.
- M. W. Berry, Z. Drmac, E.R. Jessup, Matrices, vector spaces and information retrieval, SIAM Review, vol. 41, no. 2, pp. 335-362, 1999.
- A. Moffat and J. Zobel, Self-indexing inverted files for fast text retrieval, ACM Transactions on Information Systems, vol. 14, no. 6, pp. 349-379, 1996.
- M.W. Berry, S.T. Dumais, G.W. O'Brien, Using linear algebra for intelligent information retrieval, SIAM Review, vol. 37, no. 4, pp. 573-595, 1995.
- Jon Kleinberg, Authoritative sources in a hyperlinked environment, Journal of the ACM, vol. 46, no. 5, pp. 604-632, 1999.
- Sergey Brin, Larry Page, The anatomy of a large-scale hypertextual Web search engine, Computer Networks and ISDN Systems, vol. 30, no. 1-7, pp. 107-117, 1998.
- Monica Bianchini, Marco Gori, Franco Scarselli, Inside PageRank, ACM Transactions on Internet Technology, vol. 5, no. 1, pp. 92-128, 2005.
- Amy N. Langville and Carl D. Meyer, Deeper Inside PageRank, Internet Mathematics, vol. 1, no. 3, pp. 335-380, 2003-2004.
- Pavel Berkhin, A Survey on PageRank Computing, Internet Mathematics, vol. 2, no. 1, pp. 73-120, 2005-2006.
- Gravano, L., García-Molina, H., and Tomasic, A. 1999. GlOSS: text-source discovery over the Internet. ACM Trans. Database Syst. 24, 2,229-264. 1999.
- Dhillon, I. S., Mallela, S., and Kumar, R. 2002. Enhanced word clustering for hierarchical text classification. In Proceedings of the Eighth ACM SIGKDD international Conference on Knowledge Discovery and Data Mining (Edmonton, Alberta, Canada, July 23 - 26, 2002).
- Scholer, F., Williams, H. E., Yiannis, J., and Zobel, J. 2002. Compression of inverted indexes For fast query evaluation. In Proceedings of the 25th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Tampere, Finland, August 11 - 15, 2002).
- Khan, L., Awad, M., and Thuraisingham, B. 2007. A new intrusion detection system using support vector machines and hierarchical clustering. The VLDB Journal 16, 4, 507-521, Oct. 2007.
- Search Privacy Practices: A Work in Progress, Center for Democracy and Technology Report, August 2007.
- Weikum, G. 2007. DB&IR: both sides now. In Proceedings of the 2007 ACM SIGMOD international Conference on Management of Data (Beijing, China, June 11 - 14, 2007). Extended Abstract, Slides
Videos and Podcasts
- The Web: Wisdom of Crowds and a Long Tail, Prof. Ricardo Baeza-Yates, CS Colloqium.
- On Information Retrieval, interview of Ricardo-Yates by ACM Sigmod online.
- Scalability and Efficiency on Data Mining Applied to Internet Applications, Wagner Meira (Aug. 2007, Google).
- The Search, John Battelle (12/7/2005, Google New York).
- Google: The Story, David Vise (March 2006, Google).
- Search, Google, and Life: Sergey Brin - Google (Fall 2005, UC Berkeley, SIMS 141 course).
- Google: A Behind-the-Scenes Look, Jeff Dean (October 2004, University of Washington colloquium).
- An Inside Look at Google, commercial video made by Google.
- , by Dr. Gary Marchionini
- , by Dr.
- , by Michael Cafarella
- , by Dr.
-
, by Dr.
Popular Articles
- "The Future of Search", Wired, 2013
- Wikipedia: Information Retrieval
- Wikipedia: Search Engines
- Bruce Croft: What Do People Want From IR
- Information Retrieval on the World Wide Web
- Michael Lesk: The Seven Ages of Information Retrieval
Blogs
- Search Engine Land: Must Read News about Search Marketing and Search Engines.
- John Battelle's blog. John Battelle is co-founding editor of WIRED and the author of the book "The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture."
- Apache Lucene Blog: Web blog about the open source search technology.
- GoogleBlog: Insight from Googlers into their products, technology and the Google culture
Software / IR Toolkits
- Lucene: A free open source information retrieval library
- Lemur & Indri (CMU/Univ. of Massachusetts)
- UIMA (IBM Research)
- Smart (Cornell)
- MG4J (Managing Gigabytes for Java)
- Galago search toolkit
- Terrier (Glasgow)
- Hadoop!, a Map-Reduce paradigm
- Doug Oard's list of available text retrieval systems
- Avi Rappoport: open source search engines
- MySQL full text search
- Text to Matrix Generator, a MATLAB toolbox for indexing, retrieval and other text processing tasks
- Ferret: A high-performance, full-featured text search engine library written for Ruby
Professional Organizations
Other Resources
- Information retrieval resources
- Search Engine Showdown: The user's guide to Web searching
- U.S. search share rankings, press release, December 2007
- Web IR & IE site
- Research Channel
- Glossary (Modern Information Retrieval)
- Information retrieval research links @ Search Tools
- BUBL: Information Retrieval Links
- LSU: Information Retrieval Systems
- Open Directory: Information Retrieval Links
- UBC: Indexing Resources
- IR & Neural Networks, Symbolic Learning, Genetic Algorithms
- A stop list (that is, a list of stop words)
- Machine Learning and Information Retrieval (older collection of papers)
Related Courses on the WEB
- Information Retrieval and Web Mining (Stanford): http://www.stanford.edu/class/cs276/
- Information Retrieval (CMU): http://www.cs.cmu.edu/~yiming/courses/11741/index.html
- Information Retrieval, Discovery and Delivery (Princeton): http://www.cs.princeton.edu/courses/archive/spring06/cos435/
- Web Information Systems (Purdue): http://www.cs.purdue.edu/homes/lsi/CS490W_Fall_07/CS490W.html
- Information Retrieval and Text Mining (Stuttgart): http://www.ims.uni-stuttgart.de/lehre/teaching/2007-SS/ir/
- Information Retrieval and Web search (Texas): http://www.cs.utexas.edu/~mooney/ir-course/
- The structure of Inforamtion Networks (Cornell): http://www.cs.cornell.edu/Courses/cs685/2007fa/
- Information retrieval (UMASS): http://ciir.cs.umass.edu/cmpsci646/
- Introduction to Text Information Systems (Illinois): http://sifaka.cs.uiuc.edu/course/410s07/
- Information retrieval and Web Agents (Johns Hopkins University) : http://www.cs.jhu.edu/~yarowsky/cs466.html
- Information Retrieval and Data Mining (MPI): http://www.mpi-inf.mpg.de/departments/d5/teaching/ws05_06/irdm/
- Search Engines: Technology, Society, and Business (Berkeley): http://courses.ischool.berkeley.edu/i141/f07/
- Search Engine Technology (Columbia): http://www1.cs.columbia.edu/~radev/SET07.html
- Information Retrieval Systems (University of Crete): http://www.csd.uoc.gr/~hy463/2007/en/index.html
- Information Retrieval (Cornell): http://www.infosci.cornell.edu/courses/info430/2007fa/index.html
- Information Retrieval Systems (University of Maryland): http://www.umiacs.umd.edu/~jimmylin/LBSC796-INFM718R-2006-Spring/
- Information Retrieval and Search Engines (Pennsylvania State University): http://clgiles.ist.psu.edu/IST441/index.html