Intelligent web crawler
algorithmsdesigndomain knowledgeexperimentationimportance-metricsontologyperformancereliabilityweb applicationsweb crawlerweb services
The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2018 ACM, Inc.
This paper proposes a novel system that aids in the writing of research papers by gathering and analysing other researchers’ comments for a given reference paper to provide some features, advantages or disadvantages of the referenced research. A lexicon-based reference comments crawler (LRCC) classifies the comments about a reference paper and the surrounding sentences using part-of-speech lexicons and a dynamic text window into four categories (normal, advantage, disadvantage and complex). The extraction of comments and surrounding sentences from research papers is effectively and efficiently carried out using the reference identifier and some simple extraction rules. In this paper, we considered the various types of reference identifiers, because a reference identifier is a key solution for the sentence extraction in the LRCC system. Several experiments were performed using published research papers to evaluate the LRCC’s precision and recall. The results showed that the LRCC can extract and classify comments with a high degree of precision and recall, as well as present them to the user in an effective and efficient manner.