On every project we are involved with there always comes a time to discuss the ontology required for the information and processes that are being managed. An ontology allows you to categorise your information based on the business realm it is relevant for. This enhances user productivity through accurate, metadata based content discovery and process mapping. Defining types, properties and relationships is key to managing information, however the management of a metadata structure for a document always seems counter intuitive to users, who cannot see the long term goal of faster content discovery. From a change management perspective it’s tough going to convince users the upfront effort will help them in the long term.
Semantic Content Management is a practice that can be used to help with this among other things. It facilitates augmenting non semantic information, such as documents in Alfresco, with semantic information such as entities (person, place or organisation ), geo-tag information, abstracts etc. Semantic Content Management is supported by the Semantic web. According to the W3C, “The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries”. The semantic web includes sites like dbpedia.org which provides web pages in a defined xml format to contextualize the content. So items such as the language, place or type of data is stored with the content and also the relationships to other content are clearly defined. This structured web information is often referred to as the Web of Things. The advantage of this is that the content is machine readable and this can be exploited by systems to extract and populate the managed content with contextualized metadata. Apache Stanbol is a system that provides services around Semantic Content Management. At Seed we are using Stanbol for one of our customers to auto extract the people, organisations and places that are referenced in the content and attaching it to the content. The plan is to use this enhanced data, in conjunction with search, to allow for analysis of the data in terms of entities and entity graphs and also to annotate any content with further semantic details such as related articles, definitions, geo-tagging etc. Furthermore we plan to build up a database of entities that then be further nourished and provide historic data to our auto categorisation process.
Over the next few weeks we will expand on Semantic Content Management and provide you with examples of how Stanbol features such as Content Enhancement, Reasoning, Knowledge Models and Persistence services are integrated with Alfresco ECM to provide auto categorisation, enriched analysis capability and semantic annotations.