We compared the automatically selected OGs for the phylogenetic assessment with several lists of genes manually compiled. These comparisons indicated that, depending on the genome coverage and annotation of the drafts employed, our analyses broadly agree in the selection of OGs with those utilized previously for phylogenetic inference. Furthermore, the functional distribution of the automatically selected genes exhibits the expected behaviour at different taxonomical levels. Selections on broader taxonomical levels exhibit a larger representation of genes implicated in central-metabolism,
while the proportion of clade-specific genes augments in narrower taxonomical levels. The analysis of the distribution of COG categories shows that central metabolism and ribosomal proteins are favoured when comparing distant genomes, as they are in phylogenetic studies based on one or few loci. Genes in these categories are better suited than genes in selleck other COG categories or unclassified genes because of two characteristics that are important for phylogenetic assessment. Firstly, genes implicated in central-metabolism and ribosomal genes are usually of single-copy. Genes with in-paralogs are normally avoided in phylogenetic inferences given the difficulty in identifying
corresponding genes in sets of paralogy [67], despite some efforts to include them in phylogenetic analyses (e.g., [68]). Secondly, these genes are often present even in genomes from loosely related organisms. Although phylogenetic reconstructions SC79 concentration based on gene content have proven successful (e.g., [69]), it is hard to achieve high resolution below species and it is not possible with incomplete draft genomes. Additional genes CA4P suitable for phylogenetic analyses were detected through automated identification of orthologs, allowing a higher resolution
among closely related taxa. These genes are usually not included in MLSA, although they can add important information about relationships within the group. For closely related bacteria (such as the X. oryzae pv. oryzae strains), 17-DMAG (Alvespimycin) HCl the importance of such additional information resides on the low variability among genomes. Therefore, the option to select orthologs without a priori knowledge of the genes that will be included, allows for flexibility in terms of data availability, as well as the obtention of optimized phylogenetic resolution at any taxonomic level under study. A previous study [42] suggested a reductive evolution in the genome of X. albilineans, revealed by the small genome (3.77 Mbp) and the high putative pseudogenization. We present evidence supporting the hypothesis that the reductive genome evolution occurs along the genus, and is not restricted to the species X. albilineans. In our analyses, the species X. albilineans effectively revealed large genomic reductions, but even larger reductions were presented by the species X.