Controlled vocabularies therefore facilitate the linking and
searching of information in software systems.
For example, in the system they can search or analyze different types of
information that are linked by those controlled vocabularies.
And, of course, they enable humans but not computational agents to reference and
exchange knowledge.
If you want to build a computational system that can operate on knowledge,
one first has to formalize knowledge in a way that it can be understood and
processed by the computer.
One way of doing this is using ontologies.
The original use of the term ontology
goes back to the Greek philosophers Parmeniedes and Plato.
The philosophical definition of ontology is the study of existence and
the nature of being.
So in philosophy, ontology would ask which things exist,
what is the nature of things, and what is reality?
The computer science definition of ontology, which we are talking about here
is a formal description of knowledge of a subject domain of interest.
So an ontology would be a formal,
logic based definition of the types, their properties and
their interrelationships within real objects in a particular subject domain.
For example, to formalize the domain family, we can define the relationships
husband wife, mother daughter, brother sister, and so on.
If done properly, the knowledge in the subject domain,
in this case family, can be formalized using logical actions.
So for example, in the case of family we would understand that a cousin
means a child of a father's or mother's, brother or sister, for example.
To be computer processable, of course all of this also needs to be
formalized in a machine processable specification.
One such data format for ontologies, for example, is Web Ontology Language,
in using a particular version of logic or description logic.
So given all this, a very brief definition of ontology can be given as,
an ontology is a specification of a conceptualization.
In addition to controlled vocabulary in which entities are defined using
human definitions, An ontology contains entities, which are called classes and
their relationships which are called object properties.
By doing so
an ontology allows to capture an abstract knowledge using logical axioms.
And this is done using explicit specification based on logic.
And using a language.
For example, BEP, Ontology Description Language Description Logic, OWL-DL.
So an ontology allows us to build a formal knowledge model, and
allows to compute with that knowledge, using so-called reasoning engines.
Ontologies are a foundation of semantic web information systems.
Semantic Web is also sometimes referred to as Web 3.0.
So ontologies are formalized representations of knowledge to
enable computing said knowledge.
That is in contrast to controlled vocabularies,
which allow humans to exchange knowledge but not computational agents.
For those of you who know relational databases and
are interested in Semantic Web technologies, here is a brief contrast of
relational database management systems for those ontologies.
Relational databases operate under the closed world assumption.
That means there is a pre-defined schema and
every entity has to fit somewhere in that schema.
Ontologies, in contrast, operate under the open world assumption.
That means that a class is defined by its relationships to other classes and that
the reasoning engine decides where it is placed in the framework of the ontology.
Relational database management systems do not support reasoning.
Everything we want to query from the system needs to be put in explicitly to
the database.
In contrast, semantic web technologies support inference reasoning.
For example, it would be relatively simple to infer somebody's cousin
if you know his or her parents, the parents' siblings and their children.
So in contrast, the relational database system,
we do not explicitly have to put in who is somebody's cousin.
But you can infer that information.
To ask specialized query to a relational database, you need to know the schema and
all of the combined information from different tables.
In contrast, semantic web technologies,
via formal semantics, provide a restriction free framework to ask queries.
In a way, those queries work by traversing a graph that is defined by the ontology.
As a consequence, in relational database systems,
data sharing is often not easy, because there are no formal semantics.
And if you want to exchange any information,
the schemas would have to be somehow compatible.
In contrast, semantic web technologies provide easier data sharing and
knowledge sharing.
Data and knowledge sharing using semantic web technologies requires that
the ontologies talk to each other, and this is facilitated by formal semantics,
and this is relatively easy for common domains.
For example, family, where everybody agrees what the relationships and
the classes are.
But it is much more difficult and not yet achieved for very complex knowledge.
For example, in the biomedical domains where there are many different ontologies
for overlapping subdomains.
Information systems based on semantic web technologies use so-called triple stores.
In a triple store,
all relationships between all individuals are explicitly stored.
Relational databases are very powerful for
very large datasets that all have the same structure.
Where triple store can be more flexible, and
it's better applicable to very complex domain knowledge.
Relational database systems are an established technology, and
there are industry standards, for example, Oracle.
Open source examples of relational database systems include MySQL and
Postgres.
In contrast, semantic web technologies such as Jena or Virtuoso are still
relatively early stage, and standards in this domain are still emerging.
Hundreds of biomedical ontologies have been developed over the years.
One of the most comprehensive repositories of biomedical ontologies is
the NCBO Bioportal, which has already been introduced in the previous lecture.
Another important resource of biological and
biomedical ontologies is the OBO Foundry.
In contrast to the NCBO Bioportal,
which has a very open policy in allowing developers to deposit their ontologies,
the OBO Foundry takes a much more selective approach.
OBO Foundry ontologies have to comply with Foundry principles.
Including design decisions, naming conventions and
the use of certain upper level ontologies.
New ontologies are typically admitted as OBO Foundry candidate anthologies and only
after a thorough review process can they be promoted to OBO Foundry ontologies.
One of the goal of the OBO Foundry is to promote the collection of compatible, and
non-overlapping biological and biomedical ontologies.
The OBO Foundry is a good starting point to research existing biological and
biomedical ontologies.
Another important resource of biological and biomedical ontologies
is the EBI Ontology Lookup Service at the European Bioinformatics Institute.
The EBI Ontology Lookup Service provides a centralized query interface for
almost 100 ontologies.
During the last section of this lecture,
I want to briefly introduce the LINCS metadata standards.
Recall that LINCS signatures have three primary dimensions.
The first dimension is the biological model system.
LINCS by logical model systems include proliferating immortalised cell lines,
primary cells and use pure reportant stem cells and differentiated cells.
Another dimension is the perturbation of the model system.
LINCS perturbagens include small molecules, RNAis, proteins and
other reagents.
The third dimension of LINC signatures are the molecular entities and
cellular features that are detected unquantified in an assay.