Uppsala universitet

RetroTector

The program package RetroTector (formerly RetroSpector) is designed to identify and characterize entire or fragmented endogenous retroviruses (ERVs) in genomic material, in a fashion robust to mutations and with considerable flexibility.


    It relies on a database of motifs and their properties, and alignments of known retroviral proteins. It is at present oriented primarily towards searching for ERVs in the human genome, but can be adapted to other species etc.

    It also utilizes a considerable number of adjustable parameters. Adjusting them may, among other things, affect the balance between speed, sensitivity and selectivity as required. The optimal settings for many of these have not yet been determined.

    The architecture is flexible and allows plug-in of new motifs, new procedures etc. It may be operated to go through an entire genome automatically.

    The program is written in Java and quite portable. It is in use under Windows, MacOS X and Linux.

    A full version is available to those seriously interested (see below). Small jobs (<10 Mbases) can be run at

The RetroTector online URL


Algorithms

For RetroTector three types of algorithms have been developed:

    1. “Fragment threading”

whereby characteristic motifs are combined into chains, satisfying distance criteria.

   The fragment threading algorithm depends on a collection of Motifs, conserved features of various kinds. At present, the bulk of the Motifs are consensus amino acid sequences, but there are a number of others, including motifs based on neural networks, weight matrices etc. For a "Motif hit", perfect fulfilment of the criteria is not needed. However, the hit is assigned a score depending on the degree of fulfilment.

    Also vital is information about acceptable distances between Motif hits. The acceptable ranges are usually set rather wide, to account for the possibility that unknown ERVs may have hitherto unknown distances.

    The fragment threading algorithm constructs candidate ERVs as chains of Motf hits fulfilling the distance criteria and assigns each a score, depending on the hit scores, the number of hits and some other criteria such as reading frame consistency. "Broken" chains, violating a small number of the distance criteria are also acceptable.

    In practice, the number of possible chains is so large that an exhaustive search is not feasible. Procedures to make a semi-exhaustive search without serious loss have been devised, one of them being a two-stage process whereby Motif hits are first threaded into "subgene hits", which are then threaded into chains.

    2a. A fast dynamic programming

Needleman-Wunsch type algorithm for checking similarity between two DNA base sequences.

    2b.  A dynamic programming

Needleman-Wunsch type algorithm for fitting an amino acid sequence to a DNA base sequence, taking into account known related peptides and other factors suggesting the preferred reading frame.


Procedure

RetroTector may be used to analyze short sequences, but should then not be expected to perform optimally. It is designed to search in large DNA sequences such as chromosomes and genomes. The typical procedure is:

    1. The SweepDNA module cuts the sequence into handier "chunks", and removes ALUs and L1 fragments using algorithm 2a.

    2. The LTRID module identifies possible LTR pairs (reasonably well, though rather uncritically) and single LTRs (not very satisfactorily at present). In principle, the Polyadenylation signal or equivalent is identified, a number of LTR markers are evaluated in its vicinity, and a pair companion is sought using algorithm 2a.

    3. The RetroVID module searches for hits by, at present, about 275 Motifs, and makes chains out of them and the LTRs using fragment threading.

    4. "Puteins", i e attempted reconstructions of gag, pol, pro and env proteins are made by the ORFID module, using algorithm 2b. Information about actual ORFs is also provided.

    5. Possible other exons may be suggested by the XonID module.

    A number of modules for graphic display of the results are available.


RTShell

This is a separate program written in Visual FoxPro. It is useful in organizing and presenting the RetroTector results, collecting them in a database, relating them to RepBase etc. As it is effectively limited to Windows, some of its functions are being transferred to the platform-indepedent RetroTector.


Contacts

For further information, you may turn to
Jonas.Blomberg@medsci.uu.se, the leader of the project (to whom requests should be addressed), or
Goran.Sperber@neuro.uu.se who wrote RetroTector.



Changed: 2008-06-26 (jb)
Responsible unit: Dept. of medical science, Unit for clinical virology


©-2002. UPPSALA UNIVERSITET, Box 256, 751 05 Uppsala | Webmaster