|
计算机应用研究 2009
Hierarchical information extraction from research papers based on conditional random fields
|
Abstract:
Current information extractions from research papers based on CRFs just segment text into total blocks or words, so can not fully utilize the context information to segment and extract them in the proper granularity.This paper proposed a hierarchical information extraction from research papers based on CRFs.The algorithm made use of the format information such as list separator, new line character and line header character, and combined them with the feature functions of CRFs to segment the text hierarchically into proper lines, blocks and words. Finally on different hierarchy applied the CRFs to the extraction information in special fields. Experimental results show that the proposed method possesses better performance than that based on the CRFs simply segments text into total blocks or words.