Methodology for phylogenetic analysis

Phylogenetic reconstruction

We have built 2283 phylogenetic trees by phylogenetic reconstruction of peroxisomal human, yeast, plant and kinetoplastidian proteins and their homologs in 36 eukaryotic genomes analysed. For the visualization of phylogenetic trees embedded Archaeopteryx applet was used with taxonomy- dependent branch coloration. Phylogenetic trees are available in Newick format and associated with taxonomic data via phyloXML. Query protein is highlighted by typing "QUERY" in the Search box.

Homologous sequences were aligned using MUSCLE 3.6 with default parameters. Gap rich positions in the alignment were removed using trimAl v1.2, using a gap threshold of 25% and a conservation threshold of 50%. Phylogenetic trees were then reconstructed using Maximum Likelihood as implemented in PhyML aLRT version, using the option “Minimum of SH-like and Chi2-based” to obtain approximate Likelihood Ratio Test (aLRT) support values for the different partitions. In all cases, JTT was used as an evolutionary model, assuming a discrete gamma-distribution model with four rate categories and invariant sites, where the gamma shape parameter and the proportion of invariant sites were estimated from the data.

When peroxisomal proteins without yeast and human homologues were used, seed proteins were from plant and kinetoplastidian. Groups of homologous sequences were aligned using MAFFT. Insertions and sequence characters that could not be aligned with confidence and incomplete sequences were removed. Additional phylogenetic analyses were performed using the bayesian method implemented in MrBayes with a mixed model of amino acid substitution and a gamma correction (four discrete categories plus a proportion of invariant sites) to take into account among-site rate variations. MrBayes was run with eight chains for 1 million generations and trees were sampled every 100 generations. To construct the consensus tree, the first 1000 trees were discarded.