|
- 2016
面向垂直搜索基于本体的可读性计算模型
|
Abstract:
摘要: 作为一项新兴的信息检索评价指标,可读性在文档相关性、实用性以及质量评估中占据重要地位。其中,如何为用户提供相关可读的文档已成为垂直搜索领域一个亟待解决的问题。为了有效解决这个问题,提出了一种基于本体结构的可读性计算模型。该模型以用户的阅读抽象过程为背景,分别从语篇表面层次和概念层次对文本进行可读性计算,从而引入了3个可读性指标,即概念势、概念域和文档连贯性。具体地是将单个指标或者指标组合计算所得可读性得分融入传统垂直检索模型中,对文档初次检索结果进行重排。在医学领域中,用户实验结果表明基于本体概念序列信息的可读性指标相对于传统的非序列化指标可以更加有效地预测文档的真实可读性水平。系统实验结果进一步说明了基于可读性的重排序模型可以兼顾文档的相关性和可读性,提升垂直领域信息检索性能。
Abstract: As an emerging evaluation criteria of information retrieval(IR), readability plays an important role in accessing documents relevance, utility and quality. How to provide different users with relevant and readable documents has been an urgent problem in vertical search. In order to solve this problem, we propose a new ontology-based readability method. Based on users’ reading process, we measure documents readability from surface and conceptual levels. In this model, three readability indicator shave been introduced, i.e., Concept Topography, Concept Scope and Document Coherence. Specifically, the readability of a document that computed by individual or combined indicators can be used to re-rank the initial lists of documents which are returned by a conventional search engine. In medical domain, the user-oriented evaluations show that our model has good correlation with humans’ judgments in readability prediction. And our model is also competitive compared with one of the state-of-the-artreadability models in system-orient edevaluation
[1] | YILMAZ E, VERMA M, CRASWELL N, et al. Relevance and effort: an analysis of document utility[C] // Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. New York: ACM, 2014: 91-100. |
[2] | BENDERSKY M, CROFT W B, DIAO Y. Quality-biased ranking of web documents[C] // Proceedings of the 4 ACM International Conference on Web Search and Data Mining. New York: ACM, 2011: 95-104. |
[3] | KIM J Y, COLLINS-THOMPSON K, BENNETT P N, et al. Characterizing web content, user interests, and searchbehavior by reading level and topic[C] // Proceedings of the 5 ACM International Conference on Web Search and Data Mining. New York: ACM, 2012: 213-222. |
[4] | ZHANG Y, ZHANG J, LEASE M, et al. Multidimensional relevance modeling via psychometrics and crowdsourcing[C] // Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval. New York: ACM, 2014:435-444. |
[5] | ZUCCON G, KOOPMAN B. Integrating understandability in the evaluation of consumer health search engines[C] // Proceedings of the SIGIR Workshop on Medical Information Retrieval. New York: MedIR@SIGIR, 2014: 32-35. |
[6] | TENENBAUM J B, KEMP C, GRIFFITHS T L, et al. How to grow a mind: statistics, structure, and abstraction[J]. Science, 2011, 331(6022):1279-1285. |
[7] | CHALL J S, DALE E. Readability revisited: the new Dale-Chall readability formula[M]. Cambridge: Massachusetts: Brookline Books, 1995. |
[8] | SCHWARM S E, OSTENDORFM. Reading level assessment using support vector machines and statistical language models[C] // Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics. New York: ACM, 2005: 523-530. |
[9] | PETERSEN S E, OSTENDORF M. A machine learning approach to reading level assessment[J]. Computer Speech & Language, 2009, 23(1):89-106. |
[10] | CROSSLEY S A, DUFTY D F, MCCARTHY P M, et al. Toward a new readability: a mixed model approach[C] // Proceedings of the 29th Annual Conference of the Cognitive Science Society. New York: ACM, 2007: 197-202. |
[11] | PITLER E, NENKOVA A. Revisiting readability: a unified framework for predicting text quality[C] // Proceedings of the Conference on Empirical Methods in Natural Language Processing Association for Computational Linguistics. New York: ACM, 2008: 186-195. |
[12] | HEILMAN M J, COLLINS-THOMPSON K, CALLAN J, et al. Combining lexical and grammatical features to improve readability measures for first and second language texts[J]. Proceedings of NAACL HLT.[S.l.] :[s.n.] , 2007: 460-467. |
[13] | KIM H, GORYACHEV S, ROSEMBLAT G, et al. Beyond surface characteristics: a new health text-specific readability measurement[J]. AMIA Annual Symposium Proceedings. American Medical Informatics Association, 2007, 2007: 418. |
[14] | SHOAIB J, QIAN X, LAM W. N-gram fragment sequence based unsupervised domain-specific document readability[J]. Proceedings of COLING.[S.l.] :[s.n.] , 2012: 1309-1326. |
[15] | YAN X, LAU R Y K, SONG D, et al. Toward a semantic granularity model for domain-specific information retrieval[J]. ACM Transactions on Information Systems(TOIS), 2011, 29(3): 15. DOI: 10.1145/1993036.1993039. |
[16] | YAN X, SONG D, LI X. Concept-based document readability in domain specific information retrieval[C] // Proceedings of the 15th ACM International Conference on Information and Knowledge Management. New York: ACM, 2006: 540-549. |