Chen Yu,Ma Weiying,Zhang Hongjiang. Detecting Web Page Structure for Adaptive Viewing on Small Form Factor Devices // Proc of the 12th International Conference on World Wide Web. Budapest,Hungary,2003: 225-233
[2]
Yu Shipeng,Cai Deng,Wen Jirong,et al. Improving Pseudo-Relevance Feedback in Web Information Retrieval Using Web Page Segmentation // Proc of the 12th International Conference on World Wide Web. Budapest,Hungary,2003: 11-18
[3]
Uszkoreit J,Ponte J M,Popat A C,et al. Large Scale Parallel Document Mining for Machine Translation // Proc of the 23rd International Conference on Computational Linguistics. Beijing,China,2010: 1101-1109
[4]
Adelberg B. NoDoSEA Tool for Semi-Automatically Extracting Structured and Semistructured Data from Text Documents // Proc of the ACM SIGMOD International Conference on Management of Data. Washington,USA,1998: 283-294
[5]
Kang D K,Choi J. MetaNews: An Information Agent for Gathering News Articles on the Web // Proc of the 14th International Symposium Methodologies for Intelligent Systems. Maebashi,Japan,2003: 179-186
[6]
Yang Shaohua,Lin Hailüe,Han Yanbo. Automatic Data Extraction from Template-Generated Web Pages. Journal of Software,2008,19(2): 209-223
[7]
Kohlschütter C,Fankhauser P,Nejdl W. Boilerplate Detection Using Shallow Text Features // Proc of the 3th ACM International Conference on Web Search and Data Mining. New York,USA,2010: 441-450
[8]
Song Ruihua,Liu Haifeng,Wen Jirong,et al. Learning Important Models for Webpage Blocks Based on Layout and Content Analysis. ACM SIGKDD Explorations Newsletter,2004,6(2): 14-23
[9]
Gibson J,Wellner B,Lubar S. Adaptive Web-page Content Identification // Proc of the 9th ACM International Workshop on Web Information and Data Management. Lisbon,Portugal,2007: 105-112
[10]
Ziegler C N,Skubacz M. Content Extraction from News Pages Using Particle Swarm Optimization on Linguistic and Structural Features // Proc of the IEEE/WIC/ACM International Conference on Web Intelligence. Fremont,USA,2007: 242-249
[11]
Pasternack J,Roth D. Extracting Article Text from the Web with Maximum Subsequence Segmentation // Proc of the 18th International Conference on World Wide Web. Madrid,Spain,2009: 971-980
[12]
Finn A,Kushmerick N,Smyth B. Fact or Fiction: Content Classification for Digital Libraries // Proc of the 2nd DELOS Network of Excellence Workshop on Personalization and Recommender Systems in Digital Libraries. Dublin,Ireland,2001: 1-6
[13]
Pinto D,Branstein M,Coleman R,et al. QuASM: A System for Question Answering Using Semi-Structured Data // Proc of the 2nd ACM/IEEE-CS Joint Conference on Digital Libraries. Portland,USA,2002: 46-55
[14]
Mantratzis C,Orgun M,Cassidy S. Separating XHTML Content from Navigation Clutter Using DOM-Structure Block Analysis // Proc of the 16th ACM Conference on Hypertext and Hypermedia. Salzburg,Austria,2005: 145-147
[15]
Debnath S,Mitra P,Giles C L. Automatic Extraction of Informative Blocks from Webpages // Proc of the ACM Symposium on Applied Computing. Santa Fe,USA,2005: 1722-1726
[16]
Gottron T. Content Code Blurring: A New Approach to Content Extraction // Proc of the 19th International Conference on Database and Expert Systems Applications. Turin,Italy,2008: 29-33
[17]
Gibson D,Punera K,Tomkins A. The Volume and Evolution of Web Page Templates // Proc of the 14th International Conference on World Wide Web. Chiba,Japan,2005: 830-839
[18]
Weninger T,Hsu W H,Han Jiawei. CETR-Content Extraction via Tag Ratios // Proc of the 19th International Conference on World Wide Web. Raleigh,USA,2010: 971-980