Readability and the Web
Keywords Abstract

Readability and the Web

DOI: 10.3390/fi4010238

Keywords: web document readability, content extraction, corpus statistics

Readability indices measure how easy or difficult it is to read and comprehend a text. In this paper we look at the relation between readability indices and web documents from two different perspectives. On the one hand we analyse how to reliably measure the readability of web documents by applying content extraction techniques and incorporating a bias correction. On the other hand we investigate how web based corpus statistics can be used to measure readability in a novel and language independent way.


