What are Linked Data and Linked Open Data?
Ontotext > Knowledge Hub > Fundamentals > What are Linked Data and Linked Open Data?
What are Linked Data and Linked Open Data?
Linked Data is one of the core concepts and pillars of the Semantic Web, also known as the Web of Data. The Semantic Web is all about making links between datasets understandable not only to humans but also to machines, and Linked Data provides the best practices for making those links. Linked Data is a set of design principles for sharing machine-readable interlinked data on the Web.
How are you going to connect Linked Data without a Graph Database Tool?
The Linked Data Rules of the Game
The more things, concepts, objects, persons, locations are connected together, the more powerful the Web of Data is. However, in order to link, merge and integrate huge sets of data from disparate raw sources, the Linked Data movement needs basic guidelines to stick to.
The inventor of the World Wide Web and the creator and advocate of the Semantic Web and Linked Data, Sir Tim Berners-Lee, laid down the four design principles of Linked Data as early as in 2006.
1. Use URI as names for things.
The Uniform Resource Identifier (URI) is a single global identification, a kind of unique ID, for all things linked, so that we can distinguish between those things, integrate them without confusion, or know that one thing from one dataset is the same as another in a different dataset because they have one and the same URI.
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL).
The Resource Description Framework (RDF) is a standard model for data publishing and interchange on the Web developed by the W3C. RDF is the standard used in a semantic graph database, also referred to as an RDF triplestore.
The semantic graph database is technology developed to store interlinked data and make sense of that interconnected data by semantically enriching the datasets. Unlike the relational database, the triplestore maps the various relationships between entities in graph databases. SPARQL, on the other hand, is the W3C-standardized query language for the RDF triplestore.
4. Include links to other URIs so that they can discover more things.
If you knew how easy it is to find new knowledge with graph databases you would never go back to relational databases.
Linked Data vs. Open Data
Still, not all data is freely available and open for anyone to use and share. Open Data is data that can be freely used and distributed by anyone, subject only to, at most, the requirement to attribute and share-alike.
Open Data does not equal Linked Data. Open Data can be made available to everyone without links to other data. At the same time, data can be linked without being freely available for reuse and distribution.
Therefore, the efforts of the W3C community and all advocates of data openness are channeled to enrich Linked Open Data cloud (LOD) .
Linked Open Data
Linked Open Data is a powerful blend of Linked Data and Open Data: it is both linked and uses open data sources. A graph db for instance is able to handle huge raw datasets from various sources and link them to Open Data, which provides richer queries and findings in data management and analysis. One notable example for a linked open data source is DBpedia, a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web.
The Benefits of Linked (Open) Data
Linked Data breaks down the information silos that exist between various formats and brings down the fences between various sources. Linked Data makes data integration and browsing through complex data easier, due to the standards it adheres to. Those guidelines also allow for easy updates and extensions of the data models.
Representing data in a linked way under a set of global principles also increases data quality. In addition, the semantic graph database for representing Linked Data creates semantic links between varied disparate sources and formats and infers new knowledge out of existing facts.
Furthermore, linking open datasets enhances creativity and innovation as all developers, citizens and businesses can use all those datasets to put things into context and create knowledge and apps. For example, Linked Open Data encourages the creation of applications to discover the best neighborhood to live in, based on data on schools, transportation, office buildings and clubs/parks in the area.
Due to the common standards and the open data policy for transparency, Linked Open Data is useful to organizations and society alike.
What is Semantic Technology?
Semantic Technology, as the phrase itself suggests, uses formal semantics to give meaning to all the disparate and raw data that surrounds us. The Semantic Web Technology – or technology for the Web of Data or the Linked Data technology as envisioned by World Wide Web inventor Sir Tim Berners-Lee – builds relationships between data in various formats and sources, from one string to another, helping build context and creating links out of those relationships.
The Semantic Technology defines and links data on the web or within an enterprise by developing languages to express rich, self-describing interrelations of data in a form that machines can process. Thus, machines are not only able to process long computing strings of characters and index tons of data, but they are also able to store, manage and retrieve information based on meaning and logical relations. Semantics adds another layer to the web and is able to show related factsitems instead of just word matching.
Semantic Technology at a Glance
The principal technologies of the Semantic Technology, the semantic graph database for example, use a set of universal standards, as set down by the World Wide Web Consortium (W3C) international community that develops open standards.
“The core difference between Semantic Technologies and other technologies for data, the relational database for instance, is that the Semantic Technology deals with the meaning rather than the structure of the data.
W3C’s Semantic Web initiative states that the purpose of this technology in the context of the semantic web to create a ‘universal medium for the exchange of data’ by smoothly interconnecting the global sharing of any kind of personal, commercial, scientific and cultural data. W3C has developed open specifications for the semantic technology developers to stick to and has identified, via open source development, the infrastructure parts that will be needed to scale in the Web, and applicable elsewhere.
In terms of Semantic Technology, the standards that apply are primarily the Resource Description Framework (RDF), SPARQL (SPARQL Protocol and RDF Query Language), and optionally OWL (Web Ontology Language).
RDF(S), or triples, is the format the Semantic Technology uses to store data in graph databases.
SPARQL is the semantic query language of the Semantic Web, which is specifically designed to query data across various systems and databases and to retrieve and process data stored in RDF format.
(optionally) OWL is the computational logic-based language which is designed to show the data schema and represents rich and complex knowledge about hierarchies of things and the relations between things. It is complementary to RDF and allows for formalizing a data schema/ontology in a given domain, separately from the data itself.
Using all those standards, the Semantic Technology makes life easier by helping computers help us how to find the right data piece right away and how to filter items to create more value.
Industry application of Semantic Technology
Semantic Technology helps users and enterprises discover smarter data, infer links and extract knowledge from enormous sets of raw data in various formats and from various sources. The Semantic Web, let’s say that a technology such as GraphDB , makes data content easier for machines to integrate, find, access, retrieve, process and automate. This, in turn, enables organizations to gain faster and more cost-effective access to meaningful and accurate data, to analyze that data and turn it into knowledge. Then they can further use that knowledge to gain insights, apply predictive models and make data-driven decisions.
Various businesses are already using semantic technologies and graph databases to manage their content, repurpose and reuse information, cut costs and gain new revenue streams. The BBC, FT and Elsevier use semantic publishing; in healthcare and life sciences Astra Zeneca also uses semantic technology. The financial industry and insurance companies have also started adopting technologies to semantically enrich content and access and process complex and heterogeneous data. E-commerce, the automotive industry, the government and public sector, technology providers, the energy sector, the services sector, among others, are also employing semantic technology developers to extract knowledge from data by attributing meaning to various datasets.
Meaning, this is what the Semantic Web is all about.
As early as in 2007, Sir Berners-Lee told Bloomberg “The Semantic Technology isn’t inherently complex. The Semantic Technology language, at its heart, is very, very simple. It’s just about the relationships between things.”
Chances are that the ‘relationships between things’ will make the lives of all users easier and will help organizations manage data more efficiently to create more and smarter data and gain more value.
Semantic Web Activity Statement
October 2013 Work conducted under the Semantic Web Activity has ended or is now nearing the end of its charter. See the highlights section for the current situation.
The goal of the Semantic Web initiative is as broad as that of the Web: to create a universal medium for the exchange of data. It is envisaged to smoothly interconnect personal information management, enterprise application integration, and the global sharing of commercial, scientific and cultural data. Facilities to put machine-understandable data on the Web are quickly becoming a high priority for many organizations, individuals and communities.
The Web can reach its full potential only if it becomes a place where data can be shared and processed by automated tools as well as by people. For the Web to scale, tomorrow's programs must be able to share and process data even when these programs have been designed totally independently. The Semantic Web Activity is an initiative of the World Wide Web Consortium (W3C) designed to provide a leadership role in defining this Web. The Activity develops open specifications for those technologies that are ready for large scale deployment, and identifies, through open source advanced development, the infrastructure components that will be necessary to scale in the Web in the future.
The principal technologies of the Semantic Web fit into a set of layered specifications. The current components are the Resource Description Framework (RDF) Core Model, the RDF Schema language, the Web Ontology language (OWL), and the Simple Knowledge Organization System (SKOS). Building on these core components is a standardized query language, SPARQL (pronounced "sparkle"), enabling querying decentralized collections of RDF data. The POWDER recommendations provide technologies to find resource descriptions for specific resources on the Web; descriptions which can be “joined” to other RDF data. The GRDDL and RDFa Recommendations aim at creating bridges between the RDF model and various XML formats, like XHTML. RDFa also plays an important role as a format to add Structured Data to HTML, i.e., as a means to help using Linked Data in Web Applications. The goal of the R2RML language is to provide standard language to map relational data and relational database schemas to RDF and OWL. Finally, the goal of the newly proposed Linked Data Profile Working Group is to provide a “entry level” layer to manage Linked Data file using RESTful, HTTP based API.
Highlights Since the Previous Advisory Committee Meeting
The RDF Working Group began its work in February 2011; its charter has been extended until the 31st of December 2013. The mission of the group is to update the 2004 version of the Resource Description Framework (RDF) Recommendation. The group has published many Candidate Recommendations in recent months: TurtleTerse RDF Triple Language in February, RDF 1.1 Concepts and Abstract Syntax in July 2013 and RDF 1.1 Semantics in November 2013.
JSON-LD Syntax 1.0 was published as a Candidate Recommendation in September 2013 and has just moved to Proposed Recommendation. The standard was originally developed by a separate JSON for Linking Data Community Group and has now been incorporated into the Recommendation track document of the RDF Working Group (JSON-based serialization of RDF is part of the group’s charter). Last Call Working drafts have also published for TriG and N-Triples and N-Quads.
The RDFa Working Group has successfully taken HTML+RDFa 1.1 to Recommendation and is now dormant pending the completion of HTML5 and RDF 1.1 that may trigger an edited Recommendation.
The Web Schemas Task Force, under the control of the Semantic Web Interest Group, continues its activity; it has become the major public discussion forum for the evolution of schema.org vocabularies, and we envisage keeping this task force open for the coming period. The proposed Data Activity assigns Team resources to this group to support the development of vocabularies at W3C (see below.
The Linked Data Platform Working Group began its operation in June 2012. The goal of this group is to define an “entry level” set of RESful APIs to develop simpler Linked Data Applications that may include large scale Enterprise Integration or Web Applications based on Linked Data. The Working Group has published a Last Call Working Draft for the Linked Data Platform 1.0 document.
The Semantic Web Health Care and Life Sciences Interest Group (HCLSIG) continues to be a primary forum for experts in this area, considering Semantic Web technologies for the management of biomedical data.
Semantic tagging is powered by a Knowledge graph that combines public and private data.
How Does Semantic Tagging Work?
This service analyzes the text, extracts concepts, identifies topics, keywords, and important relationships, and disambiguates similar entities. The resulting semantic fingerprint of the document comprises metadata, aligned to a knowledge graph that serves as the foundation of all content management solutions.
The Key Ingredient of Semantic Tagging: Ontotext’s Concept Extraction Service
The Concept Extraction Service (CES) extracts the essence from the content. CES enriches the processed documents with semantic tags, containing references to a Knowledge graph, thus creating meaningful connections between the unstructured text and the structured data.
Content Classification provides context-sensitive analysis and automation features for the purpose of organizing unstructured content.
How Does Content Classification Work?
The solution categorizes unstructured information by performing Knowledge graph-powered semantic analysis over the full text of the documents and applying supervised machine learning and rules that automate classification decisions.
Machine Learning and Semantic Fingerprint Creation Working in Concert
When classifying with big hierarchical taxonomies (up to 10000 classes on up to 8 levels), the machine learning model leverages the semantic fingerprint of the document, the concept rank and the relationships in the Knowledge graph such as class proximity, class co-occurrence, parent-child relationships, etc.
Ontotext’s Content Recommendation solution helps media and publishers to suggest similar items to their audiences.
How Does Content Recommendation Work?
By leveraging the documents’ semantic fingerprints extracted by the Concept Extraction Service, The Recommendation Service efficiently suggests relevant related content. In addition, the quality of the Recommendation Service is further enhanced by custom tailored behavioral recommendations, based on the actions of the readers and their profiles.
The Best of All Techniques
We combine document similarity, relevance to the user’s reading history and collaborative filtering to increase user engagement, encourage content consumption and improve the reader experience.
Linking Text and Knowledge Graphs to Deliver Actionable Insights
Ontotext Platform is a cognitive content analytics technology. It supports the cognitive reading machines vision and the dynamic semantic publishing (DSP) application pattern.The Platform uses big knowledge graphs for text analysis. It interlinks text and graphs and enriches these graphs with facts extracted from the text.
Ontotext Platform provides Semantic Search, Exploration, Categorization and Recommendation as well as Deep Analytics.
Semantic Knowledge Graph
Ontotext Platform consists of a set of databases, machine learning algorithms, APIs and tools we use to build various solutions for specific enterprise needs.
At a high level, the platform consists of:
GraphDB™ as a semantic graph database for storing semantic indexes, consolidated entity profiles, linked open data and user behavior profiles;
Machine learning models for text analysis, disambiguation of entities, concept extraction and classification;
APIs for text analysis, model training, search, recommendations, content management and concept profiles;
Tools used to build UIs for contextual authoring, enrichment monitoring, curation and quality assurance, template definition and other user applications;
Semantic Data Modeling
We are able to ingest and normalize content and data from any number of diverse sources.
As users work with the system, Ontotext Platform is constantly learning from human feedback, remembering and refining responses.
We create user profiles based on user activity and combine them with contextual similarity derived from the semantic analysis.
Ontotext’s Cognitive Content Analytics Vision
Ontotext’s technology is inspired by the vision for cognitive reading machines that have:
knowledge and awareness
Knowledge and Awareness
Represented as a knowledge graph that contains factual information: diverse data about rich set of entities and concepts of interest.
Possessing enough lexical and grammatical knowledge to comprehend text, combined with the skill to resolve ambiguity and extract relationships from free text.
ability to learn
Ability to Learn
Being able to learn new facts from a text and get better in comprehending its meaning.
Ontotext Platform capabilities are demonstrated by NOW – a public news portal that allows for topic-centric reading and exploration.
BBC Case Study
The BBC transformed its content delivery by using Dynamic Semantic Publishing and Linked Data.
With 32 teams, 8 groups and 776 individual players, managing the Web site for the 2010 FIFA World Cup was a daunting task. There were simply too many pages and too few journalists too many pages and too few journalists to create and manage the site’s content.
Euromoney Case Study
Euromoney needed a unified semantic publishing platform for creating and presenting content.
After growing in part through acquisition, Euromoney found itself with 84 brands and more than 100 different publications. They turned to Ontotext for a solution that would help them easily reuse and repurpose content within and between the business units.
Ontotext Cognitive Cloud
Cognitive Cloud Logo
Enterprise Grade Software. Freemium Model.
Enterprise text analysis services for news, life sciences and social media.
Access to knowledge graphs such as DBpedia, Wikidata and GeoNames.
A scalable semantic graph database-as-a-service and analytics.
Secure and fully managed service.
What is Ontotext Cognitive Cloud
GraphDB™ is the semantic graph database (RDF triplestore) that powers Ontotext Cloud. With GraphDB™ you can store, interlink and search entities extracted from the Ontotext Cloud text mining services, and create your private knowledge graphs integrating structured and unstructured data with facts from open knowledge graphs.
The Ontotext Cloud text analytics services can identify and extract entities from free flowing text and map them to reference entities in knowledge graphs. The services can also extract relationships between entities and categorize documents. Results are provided in JSON but can also be transformed into RDF and stored in the semantic graph database-as-a-service.
The text analytics services of Ontotext Cloud provide near real-time performance and can handle large volumes of documents of various types and formats. It also provides access to a highly scalable semantic graph database-as-a-service, which can accommodate your needs for large scale graph analytics and management.
Some of Our Clients
oxford university press
Houses of Parliament
Why Ontotext Cognitive Cloud
On Demand, on Cloud
ON DEMAND, IN THE CLOUD
Ontotext Cloud provides on-demand access to text analytics, semantic graph databases and Linked Data technology in the cloud. You can start building Smart Data prototypes without the need for licensing, provisioning, installation and maintenance. Experiment more, experiment faster!
Prise Cognitive Cloud
Ontotext Cloud is available at a fraction of the price of enterprise solutions. We offer tiered pricing based on the data volume processed by the text analytics services, or data volume managed by the graph database. You only pay for what you use and you can start at no cost with Ontotext Cloud’s free tier.
Scalability Cognitive Cloud
The text analytics services of Ontotext Cloud provide near real-time performance and can handle large volumes of documents of various types and formats. It also provides access to a highly scalable semantic graph database-as-a-service, which can accommodate your needs for large scale graph analytics and management