Dictionaries, encyclopedias, and many other repositories aimed at defining the real world are structured as a set of concepts (with identifiable keywords in each language). Real world things may be classified into these concepts. Internet is a huge repository of content which is made publicly available as a set of web pages that may be visually rendered by browsers. However, web pages do not usually associate common semantics to its content in an explicit form. This is the reason why Internet search engines have been traditionally limited to syntactic searches based on the included words.
But what if we could progressively conceptualize Internet content as data classified into commonly-defined concepts? For example, in a web page that lists the events that take place in Barcelona, we could specify that the tag with title “Barcelona” is an instance of the common concept “City” and “The Miserables” is an instance of the common concept “Music Event”, which is currently associated to the city “Barcelona.” And what if other pages that have similar information enrich their published contents in the same way? In that context, Internet searches could be able to provide global semantic answers about, for example, where the musical event “The Miserables” is taking place while additionally checking possible inconsistencies on the sources for improving Internet information quality assurance. There exist initiatives that are already working in this direction (see schema.org for example). However, it implies enriching web pages with semantic references based on a commonly accepted definition of concepts (a conceptual schema). In this way, web pages data become instances of such concepts and transforms simple data into semantic information that may be used for better knowledge acquisition. In this situation, all query and processing potential of conceptual schemas would be applicable for the Internet. Nevertheless, several technical, organizational, and ethical challenges need to be considered: Which “authority” governs the global conceptual schema of Internet information; how can such concept definitions be maintained in a day-to-day changing world; how will website designers mark up their web pages in a standardized way; how can we reach common acceptation of concepts definition; how can we support different meanings and cultural diversity, etc.
Again, we come back to modeling… but now, the challenge is modeling the world in a world of many perceptions, diversity, and huge dimension… If we contribute to the aim of moving forward in this direction we will be “rethinging” Internet.
About Albert Tort
Albert Tort is CTO of Sogeti Spain. He is a software engineering and testing & quality assurance specialist.
More on Albert Tort.