
The Mesotoxin gene is organized into three exons and two introns with the second intron location conserved across the family. Here, we report a gene isolated from the venom gland of scorpion Mesobuthus martensii which encodes a novel sodium channel toxin-like peptide of 64 amino acids, named Mesotoxin.

However, their evolutionary relationship is not yet established. Based on different pharmacological profiles and binding properties, scorpion sodium channel toxins are divided into alpha- and beta-groups. These molecules comprise an evolutionarily related peptide family with three shared features including conserved three-dimensional structure and gene organization, and similar function.

Toxins affecting sodium channels widely exist in the venoms of scorpions throughout the world. Since our results may be useful for comparative genomics and genome functional analyses, we organized a dedicated web interface to make them accessible to the scientific community.

This approach was used to analyze the organization of Arabidopsis nuclear protein coding genes, and besides classifying computationally defined paralogs into networks and single copy genes into different classes, it unraveled further intriguing aspects concerning the genome annotation and the gene relationships in this reference plant species. We describe here a complete computational strategy to detect both duplicated and single copy genes in a genome, discussing all the methodological issues that may strongly affect the results, their quality and their reliability. This is also due to the lack of a reference bioinformatics pipeline that could exhaustively identify paralogs and singleton genes. This is still controversial, even in the widely studied Arabidopsis genome. The identification of paralogs and single copy genes within a highly duplicated genome is a prerequisite to understand its organization and evolution and to improve its exploitation in comparative genomics. These events, together with probable chromosome reductions, dramatically increased the genome complexity, limiting its role as a reference.

However, the Arabidopsis genome is characterized by an inherently complex organization, since it has undergone ancient whole genome duplications, followed by gene reduction, diploidization events and extended rearrangements, which relocated and split up the retained portions. Its genome was the first among plants to be sequenced, becoming the reference in plant genomics. Arabidopsis thaliana became the model organism for plant studies because of its small diploid genome, rapid lifecycle and short adult size.
