All sequences smaller than 60 bases were eliminated based on the

All sequences smaller than 60 bases were eliminated based on the assumption that small reads might represent sequencing artifacts [21]. The trimmed and size-selected reads were then assembled using the publicly available program CAP3 [42], which can utilize quality scores to aid read assembly. The selleckchem overlap settings used for this assembly were 40 bp and 80% similarity, with all other parameters set to their default values. Sequence annotation The assembled sequences were compared against the NCBI non-redundant (Nr) protein database and Swiss-Prot database using BlastX with an E-value of 1e-4. Gene names were assigned to each assembled sequence based on the best BLAST hit (highest score). To increase computational speed, such search was limited to the first 10 significant hits for each query.

To annotate the assembled sequences with GO terms describing biological processes, molecular functions and cellular components, the Swiss-Prot BLAST results were imported into Blast2GO [43]�C[45], a software package that retrieves GO terms, allowing gene functions to be determined and compared. These GO terms are assigned to query sequences, producing a broad overview of groups of genes cataloged in the transcriptome for each of three ontology vocabularies, biological processes, molecular functions and cellular components. The obtained annotation was enriched and refined using ANNEX [46], Validate Annotations and GO Slim [47], [48] integrated in the Blast2GO software. The data presented herein represent a GO analysis at level 2, illustrating general functional categories.

KEGG pathways were assigned to the assembled sequences using the online KEGG Automatic Annotation Server (KAAS), http://www.genome.jp/kegg/kaas/. The bi-directional best hit (BBH) method was used to obtain KEGG Orthology (KO) assignment [49]. The output of KEGG analysis includes KO assignments and KEGG pathways that are populated with the KO assignments. SSR and SNP discovery SciRoko program v3.3 [50] was used to identify and localize microsatellite motifs. We searched for all types of SSRs from dinucleotides to hexanucleotides using default settings. Potential SNPs were detected using QualitySNP [51]. SNP identification was accomplished using a separate procedure from the main annotation pipeline. All the clean reads were first assembled using the CAP3 program, for which the overlap settings were 100 bp and a 95% similarity.

SNP identification was limited to clusters containing at least four reads. Supporting Information Table S1 Sequences with significant BLAST matches against Nr and Swiss-Prot database. (XLS) Click here for additional data file.(9.9M, xls) Table S2 KEGG biochemical mappings for P. yessoensis. (DOC) Click Brefeldin_A here for additional data file.(49K, doc) Table S3 Candidate genes involved in growth, reproduction, stimulus response and immune defense.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>