OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

PLOS ONE 2012

Your Relevance Feedback Is Essential: Enhancing the Learning to Rank Using the Virtual Feature Based Logistic Regression

DOI: 10.1371/journal.pone.0050112

Fei Cai, Deke Guo, Honghui Chen, Zhen Shu

Full-Text Cite this paper Add to My Lib

Abstract:

Information retrieval applications have to publish their output in the form of ranked lists. Such a requirement motivates researchers to develop methods that can automatically learn effective ranking models. Many existing methods usually perform analysis on multidimensional features of query-document pairs directly and don't take users' interactive feedback information into account. They thus incur the high computation overhead and low retrieval performance due to an indefinite query expression. In this paper, we propose a Virtual Feature based Logistic Regression (VFLR) ranking method that conducts the logistic regression on a set of essential but independent variables, called virtual features (VF). They are extracted via the principal component analysis (PCA) method with the user's relevance feedback. We then predict the ranking score of each queried document to produce a ranked list. We systematically evaluate our method using the LETOR 4.0 benchmark datasets. The experimental results demonstrate that the proposal outperforms the state-of-the-art methods in terms of the Mean Average Precision (MAP), the Precision at position k (P@k), and the Normalized Discounted Cumulative Gain at position k (NDCG@k).

References

[1]	Salton G, Fox EA, Wu H (1983) Extended boolean information retrieval. Communications of the ACM 26: 1022–1036.
[2]	Salton G (1971) The SMART Retrieval System Experiments in Automatic Document Processing. NJ, USA: Prentice-Hall, Inc. Upper Saddle River.
[3]	Robertson S (1997) Overview of the okapi projects. Journal of Documentation 53: 3–7.
[4]	Lafferty J, Zhai C (2001) Document language models, query models, and risk minimization for information retrieval. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New Orleans, LA USA, pp. 111–119.
[5]	Nallapati R (2004) Discriminative models for information retrieval. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Sheffield, UK, pp. 64–71.
[6]	Burges C, Shaked T, Renshaw E, Lazier A, Deeds M, et al.. (2005) Learning to rank using gradient descent. In: Proceedings of the 22nd International Conference on Machine Learning. Bonn, Germany, pp. 89–96.
[7]	Cao Z, Qin T, Liu T, Tsai M, Li H (2007) Learning to rank: from pairwise approach to listwise approach. In: Proceedings of the 24th International Conference on Machine Learning. Corvallis, Oregon, USA, pp. 129–136.
[8]	Tian A, Lease M (2011) Active learning to maximize accuracy vs. effort in interactive information retrieval. In: Proceedings of the 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Beijing, China, pp. 145–154.
[9]	Phophalia A (2011) A survey on learning to rank (letor) approaches in information retrieval. In: Proceedings of International Conference on Current Trends in Technology. Ahmedabad, Gujarat, India, pp. 1–6.
[10]	Chapelle O, Chang Y, Liu T (2011) Future directions in learning to rank. Journal of Machine Learning Research 14: 91–100.
[11]	Taylor M, Guiver J, Robertson S, Minka T (2008) Softrank: Optimizing non-smooth rank merits. In: Proceedings of the International Conference on Web Search and Web Data Mining. Palo Alto, California, USA, pp. 77–86.
[12]	Burges CJ, Ragno R, Le QV (2006) Learning to rank with non-smooth cost functions. In: Proceedings of the International Conference on Advances in Neural Information Processing Systems. Vancouver, Canada, pp. 193–200.
[13]	Xu J, Liu T, Lu M, Li H, Ma W (2008) Direct optimizing evaluation measures in learning to rank. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Singapore, pp. 107–114.
[14]	Newsam S, Surnengen B, Manjunath B (2001) Category-based image retrieval. In: Proceedings of the IEEE International Conference on Image Processing, Special Session on Multimedia Indexing, Browsing and Retrieval. Thessaloniki, Greece, pp. 596–599.
[15]	Li P, Burges CJ, Wu Q (2007) Mcrank: Learning to rank using multiple classification and gradient boosting. In: Proceedings of the 21st Annual Conference on Neural Information Processing System. Vancouver, Canada, pp. 845–852.
[16]	Joachims T (2002) Optimizing search engines using clickthrough data. In: Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Alberta, Canada, pp. 133–142.
[17]	Zheng Z, Zha H, Zhang T, Chapelle O, Chen K, et al.. (2008) A general boosting method and its application to learning ranking functions for web search. In: Advances of the 22nd Annual Conference on Neural Information Processing System. Vancouver, Canada, pp. 1697–1704.
[18]	Xu J, Li H (2007) Adarank: A boosting algorithm for information retrieval. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Amsterdam, Netherlands, pp. 391–398.
[19]	Qin T, Liu T, Tsai M, Zhang X, Li H (2006) Learning to search web pages with query-level loss functions. Technical Report 156 Microsoft Research Asia.
[20]	Geng X, Liu T, Qin T, Arnold A, Li H, et al.. (2008) Query dependent ranking using k-nearest neighbor. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Singapore, pp. 115–122.
[21]	Veloso A, Almeida HM, Goncalves M, Miera W Jr (2008) Learning to rank at query-time using association rules. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Singapore, pp. 267–274.
[22]	Bennett PN, Svore K, Dumais ST (2010) Classification-enhanced ranking. In: Proceedings of International World Wide Web Conference Committee. Raleigh, North Carolina, USA, pp. 111–120.
[23]	Jolliffe I (2002) Principal Component Analysis, Second Edition. Springer.
[24]	Golub GH, Loan CFV (1996) Matrix Computations,third edition. The Johns Hopkins University Press.
[25]	Chatterjeeand S, Hadi AS (1986) Influential observations, high leverage points, and outliers in linear regression. Statistical Science 1: 379–393.
[26]	Herbrich R, Graepel T, Obermayer K (2000) Large Margin Rank Boundaries for Ordinal Regression. MIT Press.
[27]	Freund Y, Iyer R, Schapire RE, Singer Y (2003) An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research 4: 933–969.
[28]	Manning CD, Raghavan P, Schtze H (2009) An Introduction to Information Retrieval. Cambridge, England: Cambridge University Press.
[29]	Liu T (2011) Learning to Rank for Information Retrieval. Springer.
[30]	Liu C, White RW, Dumais S (2010) Understanding web browsing behaviors through weibull analysis of dwell time. In: Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Unimail, Geneva, Switzerland, pp. 379–386.
[31]	Liu C, Liu J, Belkin N, Cole M, Gwizdka J (2011) Using dwell time as an implicit measure of usefulness in different task types. Proceedings of the American Society for Information Science and Technology 48: 1–4.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133