Introduction to Linked Data

The goal of every organization is to ensure required information (data, master data) to its employees and customers when they need it and in the required quality. Unfortunately, data is typically shattered in various places and in various data formats (e.g. relational database using different schemas, text files and spreadsheets created by different authors and having different structure) and, what is more, data is insufficiently linked (integrated). Such situation leads, e.g., to redundant data (data about one customer are in multiple databases) or to difficult and erroneous analyses of the current financial situation (as a result of manual pairing of realized sales and paid invoices).

The absence of integrated data leads to chaos and brings the company direct and indirect financial loss - e.g., the advertisement is send to one customer more than once or non-integrated data leads to biased market analysis. Furthermore, in case of the need of integrated view on top of the data coming from multiple sources, it is necessary to pay adhoc manual data integration costs.

Is there any acceptable solution for cheaper and more systematic data integration? Yes! You can publish your data and work with your data according to Linked Data principles. To do that, you have to obey the following steps:

  • Define unique HTTP identifiers (URI) for master data (e.g. a customer, product, an invoice)
  • Publish data and data models in RDF data model, which enables machine readability of data (applications can understand that data)
  • Link (Integrate) machine readable and uniquely identified data from the previous two steps with other data.

The result of applying Linked Data principles to Your data is a single data space (e.g. see the public Linking Open Data Cloud as an example); you can transparently query that data space and you can develop application working transparently on top of that space. Machine readability of the published data further brings possibility of automatic cleaning and linking of the data and brings advanced querying possibilities, which transcend the possibilities of current searching according to keywords. Good news is that there is lots of Linked data tools out there, which will help you when storing, publishing, linking, or querying Linked Data. It is our goal to teach you working with these tools.

Why shouldn't you be afraid of Linked Data?

  • It is not necessary to throw away Your relational databases or other data sources in other to start publishing data according to Linked Data principles. We will help you to create Linked Data wrapper around the data sources which should become available according to Linked Data principles.
  • You can switch to Linked Data gradually, in a set of iterations. Always, the functionality of all your systems should be fully available.
  • Lots of Linked Data tools for storing, cleaning, linking, publishing, and querying the data.
  • Linked Data tools are used in a number of commercial projects, such as Guardian or BBC.

What is the difference between the use of Linked Data tools and data warehouse products?

  • Linked Data tools for data integration are typically for free.
  • You are not bounded to one single data warehouse provider.
  • Thanks to unique identifiers and open standard, it is possible to integrate freely available Linked Data sources in the public data space (governmental data, encyclopedic data, statistical data etc.) The amount of such linked open data is exponentially growing in the last few years.