Automatic extraction of semi-structured Web content: Case study of Brazilian football
Keywords:
Information Extraction, Production Rules, JEOPS, Wrapper, CrawlerAbstract
Information extraction techniques provide automated generation of a structured representation from unstructured or semi-structured content. Structured information enables or facilitates further processing by third-part Web applications. This work describes the implementation of a domain-oriented information extraction system. The system automatically converts semi-structured Web content into structured content, by means of object-oriented production rules that instantiate a specific domain classes provided. These rules are implemented in JEOPS, a Java-based first-order forward chaining inference engine. We have fully specified classes modeling the Brazilian Soccer Championship to show the feasibility of the proposal. Taking as input a Web site address, the system uses facts and rules defined in its knowledge base in order to identify related links, find the championship classification table and extract table data. As a result, it automatically fulfills domain classes’ instances.Downloads
How to Cite
de Melo, A. S., & Macedo, H. T. (2011). Automatic extraction of semi-structured Web content: Case study of Brazilian football. Scientia Plena, 5(8). Retrieved from https://scientiaplena.org.br/sp/article/view/640
Issue
Section
Articles
License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work