A crucial step in this analysis is the part that assembles the transcripts and
then quantifies them.
So I'm going to spend a little bit more time explaining those.
What you see here on the left hand side is the computational process
that creates a representation of a gene and its splice variants.
So we start with an RNA molecule, shown at the top.
You see it organizes in an RNA, and as it's exons locations along the genome,
and, the portions that are alternatively spliced are shown in red.
So, you see here that we have an exon skipping event, exon two escaped,
that is included in some transcripts, and excluded from others, and
we also have another event that we call intron rotation.
That particular intron is sometimes included in the exon, sometimes not.
Once the RNA molecule is sequenced, then we have a number of reads,
which we show underneath.
The reads are being mapped to the genome.
If a read falls right inside, entirely inside an exon,
then it's going to be aligned as a contiguous fragment.
However, if the read spans the boundary between two different exons,
it's going to have to be spliced.
So we talked already about spliced alignments.
Now this type of information when seen along the genome give us
two particular types of clues that can inform transcript assemblers in
gene-finding methods as to how they can build the most likely gene structure.
The first type of information comes from the bulk of the alignments.
Because the rates that are coming along the exons means that
when viewed along the genome the alignments are going to cluster in columns
at the location of the exons.
So the bulk of the alignments are going to tell us
roughly where the exons are located.
The other type of information comes from the splice reads, and
they are going to give us information about the introns.
The introns connecting the exons.
So that's what you see in the next two pictures, the levels of the reads along
the genome, and the splice junctions that can be obtained from the splice reads.
The next stage of transcript assembly methods typically
creates a graph representation that combines the gene and
the transcript together into a compact form.
There are a variety of graph representations, overlap graph,
exon graph, sub exon graph, connectivity graph.
But the one that you have shown here is a very simple exon graph.
Let me put it simply.
Exons are represented as nodes in the graph,
and you see them here represented with bars, red or blue.
And then we connect two exons or
two nodes by an edge if there is an intron that connects them.
So that's what we see here.
Now we can reverse the graph from a node that has no incoming edge
all the way to the end to the node that has no outgoing edge, and
we can obtain all the possible splice variants or transfers for the gene.
See, if you're looking at our example, we can have four possible splice variants.