Ensembl 'ContigView' is the principal data visualisation tool for genome sequence annotation information. It provides a high level view of the contig sequences that form the genome sequence assembly, and of genes and other features that have been placed on it.
'ContigView' can be customised to suit you. More information can be added or the displays can be simplified to make browsing faster. Please look at the 'Menu Bar' section for details.
The page is split into four sections representing different levels of zooming into the chromosome:
The red boxes in the 'Chromosome', 'Overview' and 'Detailed View' panels represent the regions shown at higher magnification in the panel following below. The absolute base pair location of the region displayed in 'Detailed View' is indicated in the 'Navigation Bar' at the top of the panel. You can use this bar to navigate along any chromosome by entering a new chromosome and location. Physical map locations may be directly specified entering base pair coordinates or numbers with 'kb' or 'Mb' as suffix.
The 'Chromosome' panel displays an ideogram of an entire chromosome together with its cytogenetic banding pattern. Maps of cytogenetic bands to the genome sequence allow for rather crude orientation and are not available for all species. For all those species with genome sequence assemblies in a pre-chromosome stage, Ensembl displays other 'top-level' sequence entities such as 'scaffolds'.
The red box illustrates the extent of the region displayed in the 'Overview' panel below and can be moved by clicking anywhere on the chromosome. The 'Chromosome' display can be turned on or off using the plus [+] or minus [-] boxes, respectively.
The 'Overview' panel displays a larger section of a chromosome together with its basic annotation. Usually the range is set to 1 Mb but can be smaller for species with genomes of higher density.
The panel displays the following information:
Chromosome bands - Cytogenetic chromosome band(s) that map to a particular region allow for crude orientation on the genome sequence. Data sets mapping cytogenetic bands to genome sequence coordinates are currently not available for all species displayed in Ensembl.
Scale bar - A scale bar illustrates the physical map coordinates this particular region falls into. Generally, Ensembl displays the genome sequence in standard notation from the p-telomere to the q-telomere. Since estimated gap sizes are included in the physical map, coordinates might change between genome sequence assembly builds.
DNA (contigs) - The individual contig sequences that form the genome sequence assembly in this particular region are depicted in alternating dark and light blue colour. Where no blue contig is shown, this indicates a gap in the assembly.
Markers - The positions of Sequence Tagged Sites (STS) markers are indicated in magenta directly under the contig map ideogram.
Ensembl genes - Coloured boxes represent automatically annotated Ensembl genes. Thereby 'Ensembl known genes', which correlate to species-specific entries in public sequence databases are red, 'Ensembl novel genes' are black and 'Ensembl pseudogenes' are drawn in grey.
ncRNA genes - Several classes of hand checked RNA genes are also drawn if available for the organism-specific data set.
Vega Havana genes - Manually curated genes from the Vega database (Ashurst, et al., 2005) annotated by the Havana group at the Wellcome Trust Sanger Institute.
Vega External genes - Manually curated genes from the Vega database (Ashurst, et al., 2005) annotated by other groups than Havana. The group responsible for annotating the gene is indicated by the prefix of the gene name.
Gene legend - The color coding of all types of genes is dynamically shown at the bottom of the 'Overview' panel in the 'Gene legend' track. For additional information about the categories of genes displayed, see transcript information below.
The red box illustrates the extent of the region displayed in the subsequent 'Detailed View' panel below. You may click anywhere in the 'Overview' panel to re-centre the red box at that point on the contig map. The 'Detailed View' display below will change accordingly. Except for re-centring the display, contigs and genes are not clickable in the 'Overview' display, but they are selectable in the 'Detailed View' panel below. The 'Overview' display can be turned on or off using the plus [+] or minus [-] button, respectively.
The third panel 'Detailed View' shows smaller regions of chromosomes and provides more detailed insight into genome annotation. Features are annotated in tracks along the genome sequence assembly in its standard notation from the p-telomere to the q-telomere. The genomic DNA sequence is generally assembled from smaller sequence-level entities (BAC clones, whole genome sequencing scaffolds or contig sequences in general), which are represented by alternating dark and light blue blocks. Colour-coded features above the contigs are positioned on the forward strand, while those below are on the reverse strand, respectively.
The entire 'Detailed View' display panel can be turned on or off using the plus [+] or minus [-] button, respectively.
The 'Menu Bar' and the 'Navigation Bar' on top of the 'Detailed View' panel are the main tools to customise this display. A set of pull-down menus is available and allows upon opening selection of options via check boxes. Changes take effect by clicking at the 'Close menu' option at bottom of the menus. The following menus are available:
Features - Genes, transcripts, markers and features in general that are annotated on the genome sequence are organised into tracks. Some feature tracks are not displayed by default, but can be added. Turning off unwanted features and functions will not only make the web pages download and render faster but also make it easier to see the features of interest.
Generally, Ensembl annotates a broad variety of features. Some of them may be species-specific. A more detailed description of feature tracks and the underlying data sets follows below.
Genes - Gene tracks (e.g. Fgenesh Models) display several gene categorizations, distinguished by color. The "Genes" menu provides the ability to control the display of specific categories of genes. Checking off categories turns off the display of their color-coded genes. Gene categories are defined by the specific analysis method that generates the genes or the data set on which the genes are annotated. Please see Transcripts and Genes for specific information about gene categories. By default, all gene categories are displayed.
Comparative - Ensembl provides both pairwise and multiple whole genome alignments. This menu provides a list of tracks that can be added to the display. Please see the 'Whole Genome Similarity Matches' section for a detailed description of the calculation method. The following naming convention is in use:
Species - We use the common name (e.g. Human for Homo sapiens, Cat for Felis catus, ...) or the abbreviated form of the scientific name when the common name is not suitable (C.intestinalis for Ciona intestinalis or C.savignyi for Ciona savignyi)
Conservation - Ensembl compares the genomes of species pairs within phyla (e. g. vertebrates or arthropods) where significant homology can be expected. The conservation (cons) track annotates the results of this comparison. For closely releated species pairs the initial conservation information is then re-scored into a second levels of conservation, which is annotated by the high conservation (high cons) track.
Algorithm - Ensembl may compare species pairs with two or more different algorithms. In these cases the algorithm is indicated in full (e. g. BLAT) or abbreviated form (e. g. bz for BLASTz). Please see the 'Whole Genome Similarity Matches' section for more details.
Constrained elements are displayed in dark pink (available for the 10 way Pecan multiple alignments only). These are stretches of the multiple alignment where the sequences are highly conserved according to the score calculated using Gerp (Cooper GM et al.).
DAS Sources - While all features are part of the underlying Ensembl databases, the Distributed Annotation System (DAS) provides a way to display Ensembl-external data sets in the genome browser. Ensembl provides already a set of pre-configured data sources, which can be added by selecting from the check boxes. Others data sets be added and configured in Ensembl 'DasConfView', which is available via the 'Manage sources ...' option in this menu.
Repeats - Ensembl characterises and annotates several classes of repetitive sequences in repeats tracks. Options in this menu allow annotation of individual or all classes simultaneously.
Options are properties of the assembly or display, rather than features located on the assembly.
Half-height glyphs - It is also possible to set the display to show most of the features at half their normal height.
Show empty tracks - This option will cause an information message to be displayed for a feature type, even when no features of that type appear in the current view. You may prefer this to the default behaviour, whereby the track is not displayed at all when there are no features to display.
Quality Scores Gridlines - This option toggles horizontal gridlines at quality score levels 30, 60, and 90.
Quality Score Line Plot - This option controls whether the plot is drawn as a line plot (on) or as an area plot (off).
Gene legend - Unchecking this option will remove the descriptive legend explaining the colour-coding of different gene and transcript types.
Show register lines - Unchecking this option will remove the evenly spaced vertical lines.
Show pop-up menus - For most features in the 'Detailed View' panel, extra information and links can be displayed in pop-up text windows by pointing at features.
The pop-up menu function can be turned off entirely by unchecking this option. This may speed your browsing. You will still be able to click on a feature to go to the corresponding information page. An alternative way of using the pop-up windows is also available: you can choose to have pop-up menus appear only when you click on a feature, by check ing the '... popup on click' option. However, if you plan to use Ensembl regularly it is well worth getting used to the default behaviour of the pop-up menus.
Clones -
Tile path - The tile path track in human Ensembl shows the tiling path (i. e. the locations) of BAC clones that form the current genome sequence assembly (the "golden path"). The different colours of red, orange and gold are only used to help distinguish between clones in the display. Pink clones are still in the phase1Ac stage. The name of the clone will be displayed if there is room in the display. Clones for which fluorescence in situ hybridisation (FISH) mapping information is available are marked with a black triangle in the top left corner. Where a clone is shown in outline, the mapping of the clone to the sequence assembly is problematic and the true length is not displayed. Mouse-over brings up information about a particular clone and an option to re-centre the display around its location.
1 Mb Clone Set - The 1 Mb clone set has been developed as a resource to aid the identification of breakpoints in chromosome rearrangements. The clones were selected to provide a set spaced at approximately 1 Mb intervals across the entire genome. Clones for which FISH-mapping data is available are marked in the top left corner with a black triangle. Dark and light green are only used to help distinguish between clones in the display. The name of the clone will be displayed if there is room in the display. Pointing at a clone will display a pop-up window with information about the clone and a clickable link to re-centre the clone.
32k Clone Set - Clones from the human genome high-resolution BAC re-arrayed 32k clone set mapped to the genome sequence. The 32k clone set and individual clones from it are available via the BAC-PAC resource.
Export - This pull-down menu gives several options for downloading the data represented in the 'Detailed View' panel.
'Flat file', 'FASTA' and 'Image' will redirect you to an 'ExportView' page preset to the extent of genome sequence displayed in 'Detailed View' and to the kind of download you have requested.
Ensembl gene list, EST gene list, Vega gene list and SNP list will redirect you to the BioMart data mining system, with the displayed region and choice of focus already selected.
Image Size - By default, the overall image size is set to a width of 700 pixels. This is appropriate for standard-sized screens. The 'Image size' pull-down menu on the gold bar allows you to adjust the width up to 2000 pixels.
Help - An additional pull-down menu on the menu bar gives you the option to jump directly to help sections on 'Detailed View' display configuration, description of 'DAS Sources', this general help page for 'ContigView' and to a page for sending questions or comments to the Helpdesk.
The 'Navigation Bar' and the 'Menu Bar' on top of the 'Detailed View' panel are the main tools to customise this display. The following navigation functions are available:
Horizontal Scrolling - Navigation buttons allow horizontal scrolling of the display for 1, 2 and 5 Mb to the left or right. As the 'Overview' panel displays 1 Mb sequence these buttons shift the entire display for multiples of this panel. Window buttons move the display only 80% to the left or right, preserving 20% of the display to facilitate orientation.
Zooming Buttons - By default, the 'Detailed View' panel shows a region of 1 Mb. Clicking the plus or minus buttons zooms into or out of the region by a factor of approximately 2, respectively. Both buttons allow zooming in a range of as little as 1 bp and as much as 1 Mb.
You can also navigate by clicking on the scale bars at the top and bottom of the 'Detailed View' panel. This brings up a clickable pop-up menu that lets you zoom or re-centre the 'Detailed View' display.
Zooming Ladder - The zooming ladder restricts or expands the field of view to a scale suitable to view any feature of interest. Individual steps of the ramp represent 1, 5, 10, 50, 100, 200, 500 kb or 1 Mb sequence, respectively. Regions larger than 1 Mb up to an entire chromosome are best viewed in Ensembl 'CytoView'.
Physical Coordinates - The physical coordinates of the sequence region displayed in the 'Detailed View' panel are indicated in the 'Navigation Bar'. To move to a different chromosome or to specify a new chromosomal location in base pairs, enter numbers in the appropriate boxes and click the Refresh button.
To specify a region between two generic features such as cytogentic bands or STS markers, use Ensembl 'MapView'.
Feature tracks are named at the left side of the 'Detailed View' panel. Clicking a track name will directly link to a description in Ensembl 'HelpView'. Black track names represent Ensembl-internal feature tracks, while blue names indicate tracks served via the Distributed Annotation System from external DAS sources. Tracks may be turned on or off and customized to suit your requirements. Pointing the mouse to a feature will bring up a pop-up window showing the feature identifier together with links to more detailed information whenever available. Pop-up menus can be turned off by un-checking 'show pop-up menus' in the 'Options' pull-down menu. A single click on most features will take you to an appropriate page with more information on that particular feature, unless the '... pop-up on click' option from the Options menu has been selected. (See customizing the display for more details.)
Quality Scores track This track displays a score plot of the raw Phred quality scores. Quality values range between 0 and 90 and are associated with individual bases. They express the log probability that the base was called incorrectly. The quality scores track provides a visual cue for the confidence in predicted and computational annotations in the sequence region. The quality scores track is available on both the 'Detailed View' and 'Basepair View'
Several options provide greater control over the Quality Scores track and are available in the Options menu
Clicking on any point on the score plot shows a context menu with the raw Phred score at that point.
Improved Regions track This track displays regions within BAC contigs that were tagged by the sequence finishing team for further work. These features are only annotated on BAC clones with the HTGS_IMPROVED status designation. The DNA sequence contained within these regions are generally of higher quality.
Because of the nature of the sequencing project, maize BAC clones will likely not reach beyond GenBank Phase I. In order to maximize the quality of the generated sequence, problematic regions are tagged in several stages and specific sequence reads are then called. An automated two-phase approach is used for identifying such regions. First, sequences with less desirable qualities are mined for high repeat content. Next, GSS and GLL reads are aligned within those regions in order to "rescue" regions that are then further sequenced.
It is important to note that while sequences within improved regions are believed to be of high quality, this does not preclude non-improved regions from being of high quality as well. Improved regions are specifically targeted at problematic regions that are thought to contain biologically meaningful markers.
The 'DNA (contigs)' track shows a representation of the genomic sequence assembly. Alternating light and dark blue blocks represent individual contig sequences in the genome sequence assembly. Small arrows near sequence identifiers represent the relative orientation of a particular contig sequence within the genome assembly in standard notation. Where no blue contig is shown, there is a gap in the assembly.
Pointing at a contig sequence representation in 'Detailed View' will display a pop-up menu with the complete Ensembl sequence identifier (e.g. AC120349.5.1.183055) at the top. Sequence identifiers regularly include an EMBL accession number whenever available, as well as a sequence version, a start and an end coordinate ([EMBL accession number].[sequence version].[start].[end]). Since Ensembl is designed to use several coordinate systems like 'contigs', 'clones', 'supercontigs', 'scaffolds', 'chunks' or 'chromosomes' in parallel, corresponding sequence regions in other coordinate systems will be listed. Links in the pop-up windows allow for export of the sequence region or for centring the 'Detailed View' panel on a particular sequence region. For BAC clones, Ensembl will provide an "EMBL source file" link to the underlying sequence database record in the pop-up window.
Clicking on a contig sequence representation in the 'Detailed View' track will immediately centre on the sequence region.
'Detailed View' does not display genes as such but rather as their individual transcripts. Transcripts shown above the DNA:contig bar are transcribed in the forward direction (left to right), while transcripts shown below the bar are transcribed in the reverse direction (right to left). Ensembl considers genes as a collection of exons, which may form several transcripts. A colored box represents each exon, while angled lines represent introns joining all exons in a transcript. 5' and 3' untranslated regions of the transcript (UTRs) are shown as coloured outlines, while the predicted coding regions are shown in solid colour. This distinction is seen best when viewing relatively small regions of a chromosome.
If there are several transcripts displayed on different lines at the same point on the sequence, then that gene has been assessed as producing multiple alternatively spliced transcripts.
Several transcript types are available within the Ensembl system:
Fgenesh Models - This track shows gene models predicted by Fgenesh (Salamov, A., Solovyev, V., 2000), an ab initio gene prediction algorithm. The gene predictions are classified based on their similarity to the non-redundant GenBank protein set. Similarity is computed using protein-protein BLAST (Altschul, S. et al, 1990). The track distinguishes these classifications as color-coded gene objects:
Protein-coding genes are predicted models that have a significant protein alignment to a known protein.
Hypothetical genes are predicted models that do not align significantly to any known proteins.
Transposon-like genes are predicted models that have a significant alignment to known transposable elements. The list of transposable elements is a carefully-annotated subset of the full NR data set.
Pointing at an Fgenesh model in 'Detailed View' will display a pop-up menu with the stable transcript identifier (e.g. AC211704.1_FGT029) or an assigned gene symbol at the top. Additional identifiers link to 'Ensembl Gene Report', 'Ensembl Transcript Report' and 'Ensembl Protein Report' pages. 'Export cDNA' and 'Export peptide' links lead to Ensembl 'ExportView' pages, allowing cDNA or protein sequence export for this particular transcript in FASTA format, respectively.
Clicking on a transcript in the 'Detailed View' track will directly lead to the corresponding Ensembl 'Ensembl Gene Report' page.
Vega Havana Transcripts - This track shows a set of manually curated transcripts from the Vega database (Ashurst, et al., 2005) annotated by the Havana group at the Wellcome Trust Sanger Institute. Since manual curation is very labour intensive, manually curated genes and transcripts are currently limited to certain chromosomes of popular research organisms. An extensive list of manually annotated sets and credits for the annotation are available directly from the Vega genome browser.
'Curated Transcripts' are shown in shades of blue, purple and grey and the colour coding is summarised at the bottom of the 'Overview' panel. Generally, Vega transcripts are either labelled with gene symbols or identifiers derived from international BAC clone names (e.g. RP11-217H1), gene and transcript numbers (e. g. RP11-217H1.1-001). When the track is expanded the Transcript class is shown below the transcript, when the track is condensed the Gene type is shown instead.
These transcripts are the results of comprehensive manual annotation and a detailed description of the different categories of Vega genes and transcripts is given in the corrsponding Vega gene classification help document.
Clicking on a Vega transcript in 'Detailed View' will display a pop-up menu with the assigned name at the top, followed by the Transcript class, Gene type and the Vega group responsible for the annotation. Additional identifiers link to 'Vega Gene Report', 'Vega Transcript Report' and 'Vega Protein Report' pages. 'Export cDNA' and 'Export peptide' links lead to Ensembl 'ExportView' pages, allowing cDNA or protein sequence export for this particular transcript in FASTA format, respectively. The 'View in Vega' link leads to the Vega Transcript Report in the Vega genome browser.
Merged Havana Transcripts - For human, full length CDS (ATG-Stop) transcripts from havana have been merged into the ensembl set. Where havana and ensembl genes overlap a merged gene is created (gold genes in overview panel on contigview). Where the havana transcript is identical to an ensembl transcript (all exon boundaries including UTR exons) a merged transcript is created (gold transcripts in detail panel of contigview). Where the only differences between havana and ensembl transcripts with the same CDS are the outer boundaries of the 5' and 3' UTR exons the transcript with the longer UTR is retained and the other removed. If the havana transcript is unique (no ensembl equivalent) it is added to the ensembl gene.
Vega External Transcripts - This track shows a set of manually curated transcripts from the Vega database (Ashurst, et al., 2005) annotated by the other groups than the Havana group. Since manual curation is very labour intensive, manually curated genes and transcripts are currently limited to certain chromosomes of popular research organisms. An extensive list of manually annotated sets and credits for the annotation are available directly from the Vega genome browser.
'Curated Transcripts' are shown in shades of orange and grey and the colour coding is summarised at the bottom of the 'Overview' panel. Generally, Vega transcripts are either labelled with gene symbols or identifiers derived from international BAC clone names (e.g. RP11-217H1), gene and transcript numbers (e. g. RP11-217H1.1-001). When the track is expanded the Transcript class is shown below the transcript, when the track is condensed the Gene type is shown instead.
These transcripts are the results of comprehensive manual annotation and a detailed description of the different categories of Vega genes and transcripts is given in the corrsponding Vega gene classification help document.
Clicking on a Vega transcript in 'Detailed View' will display a pop-up menu with the assigned name at the top, followed by the Transcript class, Gene type and the Vega group responsible for the annotation. Additional identifiers link to 'Vega Gene Report', 'Vega Transcript Report' and 'Vega Protein Report' pages. 'Export cDNA' and 'Export peptide' links lead to Ensembl 'ExportView' pages, allowing cDNA or protein sequence export for this particular transcript in FASTA format, respectively. The 'View in Vega' link leads to the Vega Transcript Report in the Vega genome browser.
ncRNA - Non-coding RNAs are identified through conserved patterns of secondary structure. We use cmsearch to search the genome using RFAM covariance models. Because of compute intensive nature of covariance model searching an initial BLAST step against RFAMSEQ identifies the regions of the genome to search.
Unlike most ncRNAs, miRNAs show very high sequence conservation across species, subsequently we take a different approach to identify them. MicroRNA precursor sequences from miRBase are aligned to the genome using BLAST. The resulting alignments are assessed to ensure they encompass the mature miRNA sequence and RNAfold is used to confirm that the precursor sequence can fold into a hairpin structure.
Ensembl also includes a set of hand-checked non-coding RNA genes provided by Sean Eddy and Tom Jones. The ncRNA set, as well as a detailed description of the annotation methods can be obtained from ftp://selab.janelia.org/.
The following non-coding RNA gene types are annotated:
tRNA - Nuclear transfer RNA (or pseudogene).
Mt-tRNA - Mitochondrially-derived tRNA pseudogenes located in nuclear genome.
rRNA - Ribosomal RNA (or pseudogene).
scRNA - Small cytoplasmic RNA (or pseudogene).
snRNA - Small nuclear RNA (or pseudogene).
snoRNA - Small nucleolar RNA (or pseudogene).
miRNA - microRNA precursors (or pseudogene).
misc_RNA - Miscellaneous other RNA, such as Xist (or pseudogene).
EST Transcripts displays transient transcript predictions annotated by the Ensembl analysis and annotation pipeline using EST evidence alone (Eyras et al., 2004). You may wish to compare these predictions with those in the Ensembl or Vega transcript tracks.
Pointing at an EST transcript in 'Detailed View' will display a pop-up menu with the non-stable EST transcript identifier (e.g. ENSESTT12345678901) at the top. Additional identifiers link to 'EST Transcript Report' and 'EST Protein Report' pages. 'Export cDNA' and 'Export peptide' links lead to Ensembl 'ExportView' pages, allowing cDNA or protein sequence export for this particular transcript in FASTA format, respectively.
Clicking on a EST transcript in the 'Detailed View' track will directly lead to the corresponding Ensembl 'EST Transcript Report' page.
GENSCAN tracks display transcripts predicted ab initio by the GENSCAN gene prediction programme. GENSCAN is run on individual contigs, so that predictions do not span more than one contig.
Pointing at a GENSCAN transcript in 'Detailed View' will display a pop-up menu with the non-stable GENSCAN transcript identifier (e.g. GENSCAN12345678901) at the top. Additional identifiers link to GENSCAN Transcript Report and GENSCAN Protein Report pages. 'Export cDNA' and 'Export peptide' links lead to Ensembl 'ExportView' pages, allowing cDNA or protein sequence export for this particular transcript in FASTA format, respectively.
Clicking on a GENSCAN transcript in the 'Detailed View' track will directly lead to the corresponding 'GENSCAN Transcript Report' page.
GENSCAN information is also available via flat file export by selecting 'Prediction Features' on the 'Flat File' tab on Ensembl 'ExportView' pages. Complete sets of GENSCAN-predicted transcripts or peptides can be found in 'abinitio' files in 'cdna' or 'peptide' directories in the Ensembl Download area.
SNAP tracks display transcripts predicted ab initio by the Semi-HMM-based Nucleic Acid Parser (SNAP). Like GENSCAN, it predicts transcripts solely on the basis of the underlying genomic sequence and does not take any experimental evidence into account. The SNAP track is not available for all species, but SNAP performs better than GENSCAN in some species.
Pointing at a SNAP transcript in 'Detailed View' will display a pop-up menu with the non-stable SNAP transcript identifier (e.g. SNAP12345678901) at the top. Additional identifiers link to 'SNAP Transcript Report' and 'SNAP Protein Report' pages. 'Export cDNA' and 'Export peptide' links lead to Ensembl 'ExportView' pages, allowing cDNA or protein sequence export for this particular transcript in FASTA format, respectively.
Clicking on a SNAP transcript in the 'Detailed View' track will directly lead to the corresponding 'SNAP Transcript Report' page.
SLAM tracks display transcripts predicted by this comparative-based tool for syntenic genomic sequences. SLAM predicts gene structures for any suitably related pair of organisms (e. g. human and mouse or human and rat).
Genefinder systematically uses statistical criteria (primarily log likelihood ratios, or LLRs) to attempt to identify likely genes within a region of genomic sequence. Candidate genes are evaluated on the basis of scores that reflect their splice site, translation start site, and coding potential LLRs, and intron sizes. A dynamic programming algorithm is used to find the set of non-overlapping candidate genes (on a given strand) having the highest total score (among all such sets). Genefinder is an unpublished work of Colin Wilson, LaDeana Hilyer, and Phil Green. The source code is freely available for research and educational purposes.
Displays of gene and transcript predictions from NCBI and other groups may be available as DAS sources.
Protein homology evidence tracks display protein sequence entries from various databases aligned against the genome sequence. Evidence for Ensembl gene predictions, taken from protein sequence entries in databases. The presence of an entry in an evidence track shows that it has significant homology with at least one of the exons displayed in an Ensembl or GENSCAN transcript. The data sets displayed differ for the different Ensembl species.
Proteins - Hits to protein sequence database entries from UniProt/Swiss-Prot, UniProt/TrEMBL and proteins annotated as CDS features in EMBL nucleotide sequence records.
Species-specific Proteins - Alignments of species-specific sub-sets of the above databases are provided.
Pointing at a protein sequence representation in 'Detailed View' will display a pop-up menu with the external database accession number and a clickable link to a UniProt/Swiss-Prot or UniProt/TrEMBL display of this entry. The same database record is also reached by directly clicking on the feature.
Please note that a maximum of seven entries is displayed in any one position, although more entries may have been mapped to this location. (All protein entries mapped to a certain genome position can be retrieved from the 'protein_align_feature' tables in species-specific Ensembl 'core' databases.) Those entries that were actually used during the building of an Ensembl transcript can be seen in more detail by examining the 'Supporting Evidence' section on Ensembl 'ExonView' pages.
Evidence for Ensembl gene predictions, taken from mRNA sequence entries in databases. The presence of an entry in an evidence track shows that it has significant homology with at least one of the exons displayed in an Ensembl or GENSCAN transcript (except for the ESTs track and the human cDNA track which show all above-threshold hits to the assembly - see below). The data sets that were used for the gene predictions and that are displayed in 'ContigView' differ for the different Ensembl species.
Pointing at a cDNA sequence representation in 'Detailed View' will display a pop-up menu with the external database accession number and a clickable link to an EMBL display of this entry. The same database record is also reached by directly clicking on the feature.
Please note that a maximum of seven entries is displayed in any one position, although more entries may have been mapped to this location. (All mRNA entries mapped to a certain genome position can be retrieved from the 'dna_align_feature' tables in species-specific Ensembl 'core' databases.) Those entries that were actually used during the building of an Ensembl transcript can be seen in more detail by examining the 'Supporting Evidence' section on Ensembl 'ExonView' pages.
EMBL mRNAs - Hits to vertebrate mRNA sequences deposited in the EMBL Nucleotide Sequence Database. Pointing at an alignment repesentation will display a pop-up menu, which will show the EMBL ID and a clickable link to this EMBL entry. This entry can also be reached by directly clicking on the feature. A maximum of seven entries are displayed in any one position although more entries may have been mapped to this location. The 'Supporting Evidence' section on Ensembl 'ExonView' pages displays those entries that were actually used during the building of an Ensembl transcript.
Species-specific cDNAs - Hits to species-specific mRNAs deposited in the EMBL Nucleotide Sequence Database are displayed in dark green, and to the mRNA class ('NM_' prefixed accession numbers) of NCBI RefSeq entries in light green.
Those entries that were actually used during the building of an Ensembl transcript can be seen by examining the 'Supporting Evidence section for that transcript in 'ExonView'.
cDNA Updates - While all other biological evidence tracks show alignments of biological sequences to the genome that have been calculated as the basis for every new gene-build procedure, these cDNA tracks are updated with every Ensembl release. The aim is to display the current state of biological evidence as all cDNAs that have become available from source databases after the gene-build procedure has been completed will be included in this track.
UniGene - Hits to NCBI UniGene clusters of GenBank nucleotide sequence database entries.
Pointing at a sequence representation in 'Detailed View' will display a pop-up menu with the non-stable UniGene Cluster ID and a clickable link to the database at the NCBI. The same database record is also reached by directly clicking on the feature.
Please note that a maximum of seven entries is displayed in any one position, although more entries may have been mapped to this location. (All nucleotide entries mapped to a certain genome position can be retrieved from the 'dna_align_feature' tables in species-specific Ensembl 'core' databases.)
EST - This track displays hits to species-specific Expressed Sequence Tags (ESTs). Please note that this does not represent the complete set of ESTs available from public sequence databases but rather a stringently filtered set. Only ESTs better than 97% identical to the genome over more than 90% of their length are included.
In human and mouse Ensembl, this track shows the evidence on which EST transcripts are based. Mouse-over will show the EMBL accession number and a clickable link to the database entry. The entry is also reached by directly clicking on the feature. A maximum of seven entries are displayed in any one position although more entries may have been mapped to this location. Unlike the other 'evidence' tracks, the EST track shows ESTs mapped by homology to the all of the genomic sequence, instead of just to predicted exon regions.
Danio rerio EST Clusters shows EST assemblies from Washington University Zebrafish Genome Resources Project. Note that these are not generated in the same way as the Ensembl EST transcripts displayed in human and mouse Ensembl.
Indicates the location of tRNA genes, predicted by the tRNAscan-SE program.
Todd M. Lowe and Sean R. Eddy
tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence.
Nucleic Acids Res. 1997 Mar 1;25(5):955-964
[Abstract] [Full text]
Warning In the mouse and rat genomes, tRNAscan is currently unable to distinguish reliably between functional tRNAs, pseudogenes, and tRNA-related SINE repeats. The tRNA track in these species will therefore show a mixture of these elements.
Indicates non-coding RNA families from Rfam (RNA families database of alignments and CMs).
Rfam: an RNA family database.
Sam Griffiths-Jones, Alex Bateman, Mhairi Marshall, Ajay Khanna and Sean R. Eddy.
Nucleic Acids Research, 2003, 31, 1, 439-441.
Eponine is an algorithm that predicts transcription start sites in mammalian genomic DNA, based on the linear combination of weight matrices. Although Ensembl annotates Eponine transcription start site (TSS) predictions on the genomic sequence, its predictions are presently not used to artificially extend Ensembl gene model predictions beyond solid biological evidence.
Thomas A. Down and Tim J. P. Hubbard
Computational detection and location of transcription start sites in mammalian genomic DNA.
Genome Res. 2002 Mar;12(3):458-461.
[Abstract] [Full text]
The First Exon Finder FirstEF is a 5' terminal exon and promoter prediction program. It implements a decision tree based on discriminant functions that can recognise structural and compositional features such as CpG islands, promoter regions and first splice-donor sites. The probabilistic models are optimised to find potential first donor sites and CpG-related and non-CpG-related promoter regions based on discriminant analysis. For every potential first donor site (GT) and an upstream promoter region, FirstEF decides whether or not the intermediate region can be a potential first exon, based on a set of quadratic discriminant functions.
Ramana V. Davuluri, Ivo Grosse and Michael Q. Zhang
Computational identification of promoters and first exons in the human genome.
Nat Genet. 2001 Dec;29(4):412-417.
doi:10.1038/ng780
Ensembl annotates microarray probe sets on the genome sequences if manufacturers disclosed individual probe set sequences for a particular micro array. The mapping process is a two step procedure out-lined in the Microarry Probe Set Mapping document.
Tracks displaying whole genome similarity matches to other genomes in Ensembl are available from the 'Compara' menu. The track names include a four-letter abbreviation of the systematic species name and the method used for characterising the whole genome similarity matches.
An overview document lists species pairs and methods involved in the comparison.
The following tracks are available:
Conservation - we use Gerp to calculate conservation scores and call constrained elements on the 10 way multiple alignments (see below). Conservation scores are estimated on a column-by-column basis. Constrained elements are stretches of the multiple alignment where the sequences are highly conserved according to the previous score.
This track is collapsed by default, when expanded (click on the [+]) a conservation plot is displayed within a dynamic scale defined by the window shown.
Gregory M. Cooper, Eric A. Stone, George Asimenos, NISC Comparative Sequencing Program, Eric D. Green, Serafim Batzoglou and Arend Sidow
Distribution and intensity of constraint in mammalian genomic sequence.
Genome Res. 2005 Jul;15(7):901-913.
[Abstract] [Full Text]
Pecan - The Pecan algorithm is used to obtain global multiple genomic alignments. First, co-linear regions are defined with Mercator and then Pecan builds alignments in these syntenic regions. Please, refer to this document for an up-to-date list of pecan alignments.
Pecan is a global multiple sequence alignment program that makes practical the probabilistic consistency methodology for significant numbers of sequences of practically arbitrary length. As input it takes a set of sequences and a phylogenetic tree. The parameters and heuristics it employs are highly user configurable, it is written entirely in Java and also requires the installation of Exonerate. Read more about Pecan.
BlastZ-net - Untranslated whole genome comparisons by BLASTz are performed for species pairs, which are thought to be similar enough to be able to detect homology directly at the DNA level. Some of the BLASTz data were obtained from the UCSC Genome Bioinformatics group. After running BLASTz, the alignments are cleaned and grouped into 'chains' using the 'AxtChain' algorithm. See the UCSC Genome Browser for complete details of the parameters used for each of the species pairs. When Ensembl and the UCSC Genome Browser are out of sync with the genome sequence assemblies, or the Genome Bioinformatics group has not performed a comparison that is of interest to the Ensembl user, we run BLASTz comparisons in house using the same procedure.
Scott Schwartz, W. James Kent, Arian Smit, Zheng Zhang, Robert Baertsch, Ross C. Hardison, David Haussler and Webb Miller
Human-Mouse Alignments with BLASTZ.
Genome Res. 2003 Jan;13(1):103-107.
[Abstract] [Full Text]
Translated BLAT - The translated BLAT algorithm is used to compare genomes from more evolutionarily distant species, at the amino acid level. Thus regions of similarity will be biased towards those that code for proteins, although highly conserved non-coding regions might be detected as well.
The soft masked database sequence is translated in all six reading frames and stored in memory as an index of non-overlapping amino acid pentamers. These pentamers are then compared to the query sequence, also translated in all six reading frames, to find regions of likely similarity, which are then extended into full un-gapped alignments. The scoring matrix is a simple +2/-1 matrix, which also helps to speed up the querying.
W. James Kent
BLAT - The BLAST-Like Alignment Tool.
Genome Res. 2002 Apr;12(4):656-664.
[Abstract] [Full Text]
Clicking the red plus [+] or minus [-] box to the left of the track toggles whole genome alignment tracks between expanded and collapsed display, respectively.
When the matches are shown as expanded, individual high scoring pairs (HSPs) of identical orientation are joined by horizontal lines and the minus [-] box is shown. Pointing at the track produces pop-up windows with the coordinates of the assembly segment from the matching species, the relative orientation and a link to see that segment in 'ContigView'.
When the matches are shown as collapsed, individual high scoring pairs (HSPs) on the same region of the chromosome or scaffold are not joined by horizontal lines and the plus [+] button is shown. Pointing at a hit provides a pop up window with links to the pairwise alignment in Ensembl 'AlignView', a dot matrix display of the aligned region from the two species in 'DotterView' and an option to display genomic regions from both species simultaneously in 'MultiContigView'. Alternativelly you can display the alignments in 'AlignSliceView' by selecting the "View alignment with..." option in the left hand side menu. There is also a link to the corresponding 'ContigView' display for the other species.
N. B. All similarity matches are strand independent tracks and are therefore displayed at the top of the 'Detailed View' panel. For more information about regions of conserved synteny, consult Ensembl 'SyntenyView'.
Mouse-over will show the marker identifier and an option to view details and synonyms in 'MarkerView'. Note that only a sub-set of the markers stored in the Ensembl databases are displayed. Information about other markers may be found via the text search box near the top of almost any Ensembl page. For detailed instructions see the Ensembl 'TextView' page.
Only a preliminary mapping of Rattus norvegicus Quantitative Trait Loci (QTLs) is available at present. Loci are mapped onto the genome sequence assembly via mapping of QTL-defining sequence tagged sites (STS) markers. Because some markers could not be mapped to the assembly by the Ensembl analysis and annotation pipeline, some QTLs may be represented by just one marker, while others are not shown at all. The loci are annotated as red blocks, with the name of the trait displayed on the block if there is enough space. Where only one of the defining markers could be mapped, a red block of arbitrary size 1 Mb is drawn around it. Pointing to a block produces a pop-up menu with the name of the trait and a link to either the Rat Genome Database (RGD) or the RatMap resource for further information.
Indicates a CpG island. Pointing with the mouse over an island provides the score and the location as additional information. The Ensembl analysis and annotation pipeline uses the cpg program for the definition of CpG islands. This programme was developed by Gos Micklem and is essentially identical to the newcpgreport programme in the EMBOSS package.
For the inclusion of CpG islands into the Ensembl database we require a minimum length of 1000 bp, a minimal observed/expected ratio of 0.6 and a minimal GC content of 50%, for human. For other species, the length cutoff may be different.
The regulatory build provides a single "best guess" set of regulatory elements. These elements are based on the information contained within the ensembl-functional genomics database.
Regulatory features are built as a composite set of annotations based on co-occurence analysis and classification of multiple genome wide epigenomic data sets. Each feature was built on some or all of the following data sets:
Anchor/Focus Sets | Data type | Source |
---|---|---|
DNase1 Hypersensitivity site | ChIP-Seq | 1 |
CCCTC-binding factor (CTCF) | ChIP-Chip* | 2 |
Histone 3 Lysine 4 Tri-Methylation (H3K4me3) | ChIP-Chip | 3 |
Supporting Sets | Data type | Source |
---|---|---|
H4K20me3 | ChIP-Chip | 3 |
H3K27me3 | ChIP-Chip | 3 |
H3K36me3 | ChIP-Chip | 3 |
H3K79me3 | ChIP-Chip | 3 |
H3K9me3 | ChIP-Chip | 3 |
The anchor/focus sets were chosen to define a set of regions as potentially regulatory and for their previously known specific properties including DNaseI as a marker of open chromatin, H4K4me3 association with active promoters, and CTCF's association with "insulator regions."
In short, the Regulatory Build process performs an overlap analysis on each anchor/focus set with respect to each other and each of the supporting sets. The result of this analysis was then combined into one 'RegulatoryFeature' set, merging constituent feature boundaries up to a maximium of 4KB and integrating information on proximity (<2.5KB) to transcription start and end sites. The 4KB limit was chosen to avoid chaining of large regulatory regions (e.g the Hox cluster) in an attempt to provide more granularity over regions of interest. On breach of the 4KB maximum length limit, features were broken down into H3K4me3 elements, or CTCF elements where appropriate.
The composite regulatory features were then classified by the patterns of data sets observed across each regulatory feature. Some preliminary analysis has identified the following combinations that are common and strongly associated with other annotated features in Ensembl:
We have used these patterns of the basis of annotations used in this version of the Ensembl Regulatory Build.
Data source citations:
1. Genome-wide identification of DNaseI hypersensitive sites was performed by Greg Crawford and Terry Furey (Duke University) using a whole genome DNase-sequencing protocol (Crawford et al., Genome Research 2006).
DNase-sequencing was performed using the Illumina (Solexa) sequencing by synthesis method from a DNase treated library generated from the GM06990 cell line (Crawford and Furey, unpublished). A Parzen density estimator used density of sequences in regions to generate scores indicating the presence of DNaseI hypersensitive sites.
2. Kim, T.H.; Abdullaev, Z.K.; Smith, A.D.; Ching, K.A.; Loukinov, D.I.; Green, R.D.; Zhang, M.Q.; Lobanenkov, V.V. & Ren, B.
Analysis of the vertebrate insulator protein CTCF-binding sites in the human genome.
Cell, 2007 , 128 , 1231-1245
3. Hirst, M; Hurd, P.J.; Bainbridge, M.; Robertson, G.; Kirmizis, A.; Nelson, C.; Zhao, Y.; Zeng, T.; Pandoh, P.; Tam, A.; Prabhu, A.; Dhalla, N.; Sa, D.; Delaney, A.; Bilenky, M.; Jones, S.; Kouzarides, T.; Marra, M. (In preparation)
* The CTCF data was processed with the Nessie HMM (Flicek, unpublished).
Enriched sites were identified by the nessie algorithm for ChIP-chip data analysis (Flicek, unpublished). For this analysis, nessie uses a two-state hidden Markov Model.
Raw data from tiling array experiments is normalised and displayed as simple wiggle tracks. This data is supplied to support and give a visual reference for the associated annotated features track. The default normalisation of the data uses the VSN (Variance Stabilisation Normalisation) package from Bioconductor, which performs a generalised log transformation. This roughly equates to the difference between the control and experimental value at low signal and smoothly transforms to the ratio between the values at high signals i.e. significant signal. This has the effect of minimising anomalies arising from low signals pairs giving high ratio scores.
The CTCF data source is:
Kim, T.H.; Abdullaev, Z.K.; Smith, A.D.; Ching, K.A.; Loukinov, D.I.; Green, R.D.; Zhang, M.Q.; Lobanenkov, V.V. & Ren, B.
Analysis of the vertebrate insulator protein CTCF-binding sites in the human genome.
Cell, 2007 , 128 , 1231-1245
For human this track displays regulatory features imported from the cisRED database and microRNA regulatory features resulting from the miRanda analyis performed by Anton Enright's group at the Wellcome Trust Sanger Institute.
Those regions of the genome that were subjected to cisRED analysis are indicated in the track 'cisRED search regions'.
For fly this track displays regulatory features based on a curated set of transcription factor binding sites imported from the Drosophila DNase I Footprint Database and a set of 120 likely transcription factor binding sites identified from a large set of Drosophila promoter regions, using the Tiffin pipeline.
Many of these motifs could be correlated to patterns of embryonic gene expression. Regulatory regions on the Drosophila melanogaster genome are predicted using a phylogenetic HMM, then scanned using the motif set to identify probable transcription factor binding sites.
Single Nucleotide Polymorphisms and other sequence variations are mapped to the genome sequence. Small insertions and deletions (in-dels) are annotated with small triangles, while SNPs are represented by vertical bars and colour-coded as follows:
When zoomed in on a small region, as in the 'Basepair View' panel, the ambiguity code for the SNP polymorphism is displayed.
Pointing at a genetic variation representation in 'Detailed View' or 'Basepair View' will display a pop-up menu with the SNP identifier at the top and clickable 'SNP properties' link to Ensembl 'SNPView'. Depending on the source of the variation data, a summary about variation properties and links to HGBASE data, TSC-CSHL data and dbSNP data are available. Ensembl 'SNPView' is also reached by directly clicking on the feature.
Please note that alleles and ambiguity codes for genetic variations shown in 'ContigView' and 'SNPView' is identical to the NCBI dbSNP entry. To see the alleles and effects as appropriate to a transcript or protein, look at the SNP information in 'TransView' and 'ProteinView'.
Repeats - Repetitive sequence regions of all classes are annotated in this track. Most tracks are characterised by running the RepeatMasker program, while the Tandem Repeats Finder generates the 'tandem repeats' track. Mouse-over on an individual repeat element brings up additional information. Tracks may be switched on or off using the 'Repeats' pull-down menu on the 'menu bar'.
Repeat Sub-Classes - Sub-classes of repeats like Dust, LTRs, Low complexity regions, simple repeats, RNA repeats can be selected from the 'Repeats' pull-down menu on the 'menu bar'.
Tile path - The tile path track in human Ensembl shows the location of BAC clones within the current genome sequence assembly. Clones for which fluorescence in situ hybridisation (FISH) mapping information is available are marked with a black triangle in the top left corner. Where a clone is shown in outline, the mapping of the clone to the sequence assembly is problematic and the true length is not displayed. Mouse-over brings up information about a particular clone. More information about clones is available from Ensembl 'CytoView', which can be reached via the 'Jump to...' menu on the menu bar above the 'Detailed View' panel.
Acc clones - The accessioned clones track displays BAC clones for which some sequence has been deposited in the nucleotide sequence databases. Coloured bars represent BAC clones and pointing at them displays a pop-up window with more information about the clone, including its EMBL nucleotide sequence database accession number, sequencing status and estimated length. The segment of the sequence assembly to which the clone has been mapped can be exported by going to ExportView, selecting 'Sequence ID' in the Feature section, and entering the international clone name or its accession number. To retrieve the actual sequence deposited in the sequence databases, use EMBL or GenBank directly. Note that these clones represent only a small proportion of the BAC clones positioned on the complete FPC-based clone map. To view other clones, go to Ensembl CytoView using the 'Jump to' pull-down menu on the menu bar above the 'Detailed View' panel.
Vertical black lines at the left or right ends of a clone indicate that the BAC end sequence of that end has been matched to this point on the assembly. The length of the black line underneath a clone indicates its length estimated by fingerprinting. The line length may differ from the length of the coloured bar, because the clone lengths as shown by bars have been adjusted so that the entire FPC map will fit around the points at which clones on the map have been matched to the assembly. If the BAC is represented by an outline instead of a coloured bar, the adjusted clone length is unrealistically large and should be treated with caution. The different length estimates are shown in the pop-up window list.
Fosmid Map - For mouse, end sequences of WIBR-1 fosmid library clones have been mapped to the genome sequence.
Shows gaps in the current sequence assembly. Where possible, gaps are categorised as:
Pointing with the mouse shows the category, and the size and position of the gap in the assembly.
The plot shows the relative content of the nucleotides G+C along the genome sequence. The horizontal red line indicates 50% G+C.
The Distributed Annotation System (DAS) provides a way of displaying external annotation.
A pre-configured, species-specific set of external data sources is available from the 'DAS sources' pull-down menu. Tick to select individual sources. The selected tracks will then be displayed after the menu has been closed. Brief descriptions for pre-configured DAS sources are available from the 'Apropos: Genome DAS' document.
In addition to the pre-configured sources it is also possible to add DAS sources from external DAS servers via the 'Manage sources' option in the 'DAS Sources' menu.
You can also upload your own data sets into a DAS server provided by Ensembl using the 'Upload data' menu item.
Generally, all DAS tracks in Ensembl 'ContigView' and 'CytoView' panels have blue names labels. Pointing to features in most DAS tracks will produce a pop-up menu showing an identifier, and one or more links to view an associated sequence in FASTA format if appropriate. Ensembl 'FASTAView' pages also provide, where possible, a brief description of the data source and a link to a web page from the group that provided the data for the DAS track.
Ensembl allows for attachment and display of smaller user data sets via the simple URL souce mechanism. Thereby, data sets obeying simple formatting rules are generally placed within user directories that are exported via web servers. URLs corresponding to these data files could be attached from the "URL-based data" option in the DAS Sources menu. Ensembl will then query the corresponding third party web server before rendering the page.
Detailed data set formatting rules can be found in the corresponding help page.
The fourth panel is used to show features on a small segment of the assembly. By default, 'Basepair View', shows a region of 100 bp, taken from the centre of the 'Detailed View' panel. The display can be zoomed in or out, and moved left or right, using navigation controls similar to those of the 'Detailed View' panel. Clicking the plus or minus buttons zooms into or out of the region by a factor of approximately 2. These buttons allow zooming from as little as 1 bp up to 500 bp. The zooming ladder restricts or expands the field of view to a scale suitable to view any feature of interest. Individual steps of the ramp represent 25, 50, 100, 200, 300 and 500 bp sequence, respectively. Regions larger than 500 bp can only be seen in the 'Detailed View' panel.
The 'Basepair View' panel can be turned on or off using the plus [+] or minus [-] button. The tracks can be switched off or on using the 'Options' pull-down menu on the gold bar of the 'Detailed View' panel. 'Basepair View' displays some of the tracks from 'Detailed View' like DNA (contigs), Transcripts and Genes, repeats and tile path) plus the following additional features:
The forward strand of the genomic sequence is displayed above the DNA (contigs) track, the reverse strand below. Each ribonucleoside has a different background colour, to make it easier to visualise runs of bases:
green: adenine (A)
blue: cytosine (C)
yellow: guanine (G)
red: thymidine (T)
The IUPAC single letter codes for ribonucleosides are shown when there is room for display.
Raw translations of the assembled genome sequence. All three possible reading frames are shown, above the DNA (contigs) bar for the forward strand, below it for the reverse strand. Each amino acid has a different background colour, and amino acids with related physico-chemical properties have related shades:
green: hydrophobic (A, G, I, L, M, P, V)
blue: large hydrophobic (F, H, W, Y)
gold: negative charge (D, E)
pink: positive charge (K, R)
purple: polar (N, Q, S, T)
yellow: cysteine (C)
red: stop codon (*)
The IUPAC single letter codes for the amino acids are shown when there is room for display.
The positions of potential start and stop codons are displayed, only when a region less than 50 kb is displayed in 'Detailed View'.
This track shows the sequence of potential restriction endonuclease cleavage sites together with the name of the enzyme. Enzymes that cut the DNA strands in a staggered fashion to produce 'sticky ends' are drawn in blue. Sites in green denote enzymes that cut the DNA strands at the same point to produce 'blunt ends'. Red vertical lines mark the expected cleavage positions, joined by horizontal lines to the recognition site where the cleavage site does not overlap the recognition site. Pointing to a site produces a pop-up window with the name of the enzyme and its general recognition sequence.
For most features in the 'Detailed View' panel, you can display extra information and links with the mouse pointer. Putting the mouse pointer over a feature ("mouse-over") brings up a pop-up menu window. The top menu item is a name or database identifier in bold text. Below this may be one or more information points (text in grey and not clickable), and one or more clickable links (text in colour).
So that the display does not become cluttered, the pop-up text windows stay on the screen for only about 6 seconds. Click on the X at top-right or the title bar to close the menu window immediately. The window will also disappear if you move the pointer onto a different feature. To re-display, move the pointer off the feature and then point again. To click on a displayed link, move the mouse pointer down the menu - the hand link symbol will appear when you are over a clickable link.
Hints: Does the pop-up window disappear when you try to move the pointer onto it? You have probably moved through another feature without realising it. Try moving slowly onto a feature and stop moving when the text appears. If you keep having problems, zoom in on the region you are exploring, so that features are not so close together. You can also click directly on most features to go straight to a default link.
The pop-up menus can be turned off by unchecking the 'show popup menus' option on the 'Options' pull-down menu in the 'Detailed View' 'Menu Bar'. This may speed your browsing. You will still be able to click on a feature to go to an appropriate information page. An alternative way of using the pop-up windows is also available. To have pop-up menus appearing only when clicking on a feature, check the '... popup on click' option in the 'Options' menu. However, if you plan to use Ensembl regularly it is well worth getting used to the default behaviour of the pop-up menus.
The search box at the top of the page allows you to search for any identifier present in Ensembl. For detailed instructions see the Ensembl 'TextView' page.