help  This is the Absynte help page

abstract  Abstract
Absynte (Archaeal and Bacterial Synteny Explorer) is a web-based service designed to display local syntenies in completely sequenced prokaryotic chromosomes. The genomic contexts are determined with a multiple center star clustering topology on the basis of a user-provided protein sequence and all (or a set of) chromosomes from the publicly available archaeal and bacterial genomes. The results consist in a dynamic web page where a consistent color coding permits a rapid visual evaluation of the relative positioning of homologous genes within the synteny. Each gene composing the synteny can be further queried interactively using either local or remote databases. Absynte results can be exported in .CSV or high resolution .PDF formats for printing, archival, further editing or publication purposes. Performance, real-time computation, user-friendliness and daily database updates constitute the principal advantages of Absynte over similar web services.
The web service is available at http://archaea.u-psud.fr/absynte.

Reference: Despalins A, Marsit S & Oberto J - Bioinformatics (2011); doi: 10.1093/bioinformatics/BTR473  
work  Absynte workflow
Upon submission of a protein sequence, Absynte computes the local genomic context (or synteny) of the corresponding homologous gene originating from all (or a user-selected list of) fully sequenced sequenced archaeal or bacterial genomes. Absynte displays local genomic maps drawn to scale and with a consistent color code and to allow immediate comparative visual analysis of the gene order conservation in the selected organisms. Absynte queries are computationally intensive tasks; several solutions have been developed to increase performance. The Absynte workflow is executed locally on the server and consists of seven major steps:
Step 1. The protein sequence is matched against itself using BLASTP and the resulting bit score is used as the reference score (100%).
Step 2. The query protein is matched against the selected chromosomes translated in the six frames using the TBLASTN algorithm.
Step 3. The resulting scores are normalized according to the reference score determined above. Only the scores above a user-selected threshold are retained (default and minimal value of 10%). The user will also determine if only one score per chromosome or all scores are retained. The chromosomes are then ranked by decreasing scores.
Step 4. For each positive scoring chromosome, Absynte pulls out a DNA sequence segment of 15000 bp centered on the TBLASTN hit and translates all the open reading frames according to GenBank annotations.
Step 5. The proteins from the highest ranking chromosome are compared to each other in order to detect potential homologs using the Smith-Waterman-Gotoh (SWG) algorithm. This procedure enables a multiple center star gene clustering topology.
Step 6. The protein sequences extracted from the highest ranking chromosome are then matched against all the proteins from the other chromosomes using the SWG algorithm. A consistent color code is assigned to matching proteins across genomes.
Step 7. Synteny maps are then drawn<br /> to scale and the corresponding open reading frames are color coded as described above.
work  "Multiple center star" gene clustering topology
The multiple center star gene clustering topology allows the detection of potential homologs even in the highest ranking chromosome as shown in the following figure.
topology
Fig.1. Absynte example using B. subtilis 168 Gcp as query protein. In most bacteria, gcp (or ygjD) has a paralog called ydiC (or yeaZ). In many Bacillaceae, the two paralogs are in close proximity and Absynte is able to detect these paralogs as indicated by the identical color.
chrom  Normal search mode vs. expert search mode
Absynte provides two independent search modes: in the normal search mode, Absynte will allow the user-selection of up to 50 individual chromosomes. In the expert mode, the user does not need to select individual chromosomes: the system will select them instead. Warning: the expert search mode might take several minutes.
chrom  Available chromosomes (normal search mode)
The archaeal and bacterial databanks available to Absynte originate from the NCBI repository. The genome udpate is fully automated and occurs daily at 7:00 GMT Zulu. The new genomes are dowloaded using the FTP protocol and formatted. The obsolete or renamed genomes are discarded. The chromosomes are listed independently by organism name; when a genomes is composed of multiple chromosomes, the suffices _C1, _C2, etc... are appended, by decreasing chromosome size. Chromosomes can be chosen in normal serach mode either as single or multiple selection using the appropriate key/mouse click combination, specific for each operating system.
chrom  Selected chromosomes (normal search mode)
A maximum number of 50 archaeal and bacterial individual chromosomes can be selected for each search. This value has been shown empirically to exceed the needs of typical synteny queries. This limitation ensures also preservation of computing ressources. Upon query submission, all the chromosomes in the leftmost listbox will be investigated.
matrix  Insert query sequence
Absynte will accept a protein sequence in any readable format such as FASTA or raw sequence. Other formats are accepted as well (blank spaces and numerical digits will be automatically stripped off the submitted sequence).
matrix  Search parameters & minimal search threshold
To ensure an intuitive experience for the user, parameter tweaking has been reduced to a strict minimum. The user can determine if one score only per chromosome or all scores are to be retained. The minimal threshold normalized Blast bit score can be set anywhere between 10% and 100%. To obtain this score, the sequence of the query protein is matched against itself using BLASTP and the resulting bit score is used as the reference score of 100% for the subsequent alignments.
matrix  Performance
Absynte was designed using a bottom-up approach and built with an assembly of individual components. Each component was developped and tested separately using performance assessment. A particular interest was devoted to the routines where most of the computing time was spent. It came as no surprise that the optimization of the routines involved in protein alignment would produce the biggest benefits. In particular, two solutions were instrumental in boosting Absynte's overall performance, as follows:
1. BLAST troughtput improvement. The query protein matching against the translated genomes is carried out by running a highly optimized executable inside Absynte (see Absynte Workflow Step 2). The query sequence and the complete genomes are fed to the executable and its output is parsed and further processed by Absynte. We have compared the respective performance of the BLAST and BLAT executables. The two executables are equally fast but BLAT failed in the detection of distant homologs found by BLAST. There is clearly no match for BLAST when performance is required to align a protein or a gene sequence to a complete genome. BLAST throughput was further maximized by limiting as much as possible the notoriously slow disk file read/write: query proteins sequences are fed directly from live memory (RAM) to the BLAST executable using an undocumented shell command.
2. Use and parallelization of the Smith-Waterman-Gotoh (SWG) algorithm. Multiple protein-protein alignments are required in Step 6 of the Absynte workflow (see above). For this particular purpose, the SWG algorithm was preferred over the most commonly used BLASTP for several reasons. i) it is a more sensitive global alignment than BLASTP, ii) it can be easily be run in parallel on multiple processor cores and iii) both the query and the subject protein sequences can be fed to SWG directly from memory without the need of disk file read/write.
search  Search Genomes
The genome search option permits to identify organim names by keyword or substring query. If an organim corresponding to the keyword is found, external links are provided to the NCBI taxonomy browser server in order to obtain information on its taxonomic lineage.
absynte  Absynte Results
If the search is successful, Absynte will display a list of genomic contexts and the corresponding organisms/chromosome names, sorted by decreasing TBLASTN scores. The gene represented in bold at the center of the synteny corrisponds to the query protein sequence. In addition, further information can be obtained for every gene displayed in the context maps:
1. Gene identification (local). This information, obtained from the local database can be visualized by hovering the mouse cursor on any specific gene. A tooltip will pop up and indicate the full gene name, the gene product, the protein coding capacities and the specific sequence identifier number (GI).
2. Link to the NCBI database (remote). For each coding gene shown in the context maps, additional information can be retrieved by mouse clicking. A new page will open at the NCBI showing the complete protein entry constituted by the annotations and the amino acid sequence.
pdf  Export Absynte Acrobat .PDF Report
This feature allows the user to save Absynte results as a local Acrobat .PDF file, compatible with Adobe Acrobat Reader on all operating systems for printing, storage or further use. The Acrobat report contains all the results generated by Absynte. This feature allows device-independent, high-resolution printing in A4 and Letter formats. This report is generated "on the fly" from system memory: no user data is ever saved on the server, at any moment.
csv  Export Absynte Excel .CSV Report
This feature allows the user to save Absynte results as a local Excel .CSV file, compatible with spreadsheet programs on all operating systems for further processing. The Excel report contains most of the results generated by Absynte with the exception of the synteny maps. This report is generated "on the fly" from system memory: no user data is ever saved on the server, at any moment.
database  Genomic database updates
All the Absynte genomic databases are stored on the server. Their update is fully automated and occurs daily at 07:00 GMT/Zulu. The databases are shared with the BAGET and FITBAR web services. If your favorite genome is not found in the database, please let us know. We would be pleased to include additional genomes (user-provided in GenBank format)
links  Useful links


















































Archaea
Help file last updated 2011 May 1st.