OKKAM Community Portal

 
  • Increase font size
  • Default font size
  • Decrease font size
Home -> Showcases -> Applications you can build with OKKAM tools

Applications you can build with OKKAM tools

E-mail Print PDF

In the following we report some facts about an entity-based approach and we describe two applications we are currently developing.

Functionality enabled by an Entity-based approach

Link entities to information about them in an unambiguous manner. Interesting for:

  • Content producers at authoring time, to have more info on what they are writing
  • Content consumers after article publication, to have more information on the subject discussed in the article

Retrieve entities and information about entities from different sources. Interesting for all Information Retrieval applications

Possible applications

  • Automatic extraction and annotation of entities in text Example I: a news portal
  • Interactive extraction and annotation of entities in text: Example II: scientific article writing

Example I: News portal

Overview

Goal: offer background knowledge to the viewer and pointers to relevant news
Focus: entities contained in news, namely people, organizations and locations
User interaction scenario: the user can browse a news item, click on a particular entity and get a pop-up containing an entity profile, plus references to other news on the same entity

Input/Output

Source of the process: ANSA English news archive
Source news encoding: NewsML v.2
Output: annotated news on online portal
Output encoding: XHTML + RDFa annotation

Technology

Pipeline for news items:
Text analysis to detect the entities and associate their OKKAM id (NewsML -> annotated NewsML)
XSLT transformation from annotated NewsML to XHTML + RDFa
Submission of the pages to Sindice, the Semantic Search engine -> extraction of the RDFa information + indexing of the page

Screenshots

news portal with popup window

Resources for developers

The entity extraction/annotation pipeline, available here

Documentation available here

Sindice: search engine indexing sources containing RDF, RDFa or Microformats annotations.

Sigma: live, embeddable information summaries from sites which use RDF, RDFa or Microformats.

Example II: Scientific article authoring

Overview

Goal: detect unambiguously entities on scientific articles to:

  • Link this information to background knowledge for the author (at authoring time) and reader (after publication)
  • Make information contained in articles unambiguously searchable via the identifier

Focus: entities contained in scientific articles, e.g. biological entities and authors

User interaction scenario: focus on an authoring tool (a Word plugin) that analyzes the text, detects candidate entities and let the user mark up entities with the correct ids.

Input/Output

Source of the process: articles being authored by an author
Source news encoding: Word document (or any format Word can edit)
Output: annotated articles
Output encoding: in document annotations (separate section, Word comments), or export to XML and CSV

Technology

The user interacts with a Word plugin that:

  • Sends text from the article to a Name Entity Recognition (NER) service (or more)
  • For each of the detected entities from the NER retrieves the relevant identifiers (e.g. Uniprot for proteins, OKKAM ids for authors)
  • Presents the user with a list of candidates ids and related information, so that the user assigns the correct id to each entity

Screenshots

thewordplugin

Resources for developers

The Word plugin available here with documentation

Sindice: search engine indexing sources containing RDF, RDFa or Microformats annotations.

Sigma: live, embeddable information summaries from sites which use RDF, RDFa or Microformats.



 

 

Functionality enabled by an Entity-based approach

Last Updated on Tuesday, 18 May 2010 12:52