poptimizer is an application for optimizing the structure and parameters of stochastic P system models using evolutionary algorithms. poptimizer takes a library of modules that represent basic biological processes of interest and combines them in many different ways to discover a possible assembly that mimics the behavior of the target data. During the search process, each model is evaluated by simulating its behavior with mcss. poptimizer and mcss are being used to develop Systems and Synthetic Biology computational models of bacterial colonies and plant systems.
For instructions on how to compile and install poptimizer, see the README file included with the poptimizer distribution.
After installed, poptimizer is run by typing the following command:
$ poptimizer PARAMETER_FILE
where PARAMETER_FILE is a file containing the input parameters required by poptimizer. For example, to run the promoter model optimization provided in the directory examples/ of the poptimizer distribution, change to the corresponding directory and type:
$ poptimizer all_para_promoter_inputpara.xml
The output of the optimization procedure can be inspected in several files. The log information with the generation number, number of function evaluations, and fitness of the best solution is saved to file evolveprocess_Run0.txt. The best P system obtained at the end of the optimization is saved to bestPsystem_Run0.txt and the corresponding time series to bestsimulation_Run0_initfile0.txt.
Currently, poptimizer can process two types of input data from cell systems biological data:
The structure of the parameter file required for executing poptimizer is described in file poptimizer-parameters-template.xml under directory src/poptimizer/. Different examples for the parameter file can be seen under directory examples/.
The models built by poptimizer have flexible structure and parameters. A particular model is composed by a set of elementary modules (previously specified in a library) that act as the ‘building blocks’. The user can define his own module library based on specific knowledge or simply on elementar biological motifs described in Systems Biology literature.
While certain modules can have fixed rules and kinetic constants (fixed module library), others can be instantiated with different objects (proteins, genes, etc) and parameter values (non-fixed library). Many kinetic constants referring to well-known reactions can be taken from the literature and introduced in the library, where others need to be evolved by the parameter optimization methods available in poptimizer.
The optimization of the model structure concerns with the choice of which modules should compose the model. The number of modules and their corresponding instantiation (according to a choice of different objects) is also explored to minimize the error between the output data generated by the model and the target data. A genetic algorithm that selects, recombines, and mutates different sets of modules is used to optimize the model structure.
The optimization of the model parameters concerns with learning the appropriate kinetic constants corresponding to each one of the rules specified in the modules. When the kinetic constants are not known from literature, the module library specifies the parameter ranges (and a choice of linear/logarithmic scale) for each kinetic constant. The parameter optimization methods currently available include genetic algorithms (GA), differential evolution (DE), opposition differential evolution (ODE), and the covariance matrix adaptation evolution strategy (CMA-ES).
poptimizer can use two different fitness functions to quantify the quality of candidate models. These are:
More detailed information about the methodology can be found in the paper entitled Evolving Cell Models for Systems and Synthetic Biology, to appear in the Systems and Synthetic Biology journal.
This section briefly describes three different running examples for poptimizer. The first two examples are taken from the reference paper cited above and the third refers to a pulse generator with different initial conditions.
This case study investigates regulatory networks consisting of three genes that are able to produce a pulse in the expression of a specific gene. The corresponding files can be found in examples/threegene/. To run this example, change to the corresponding directory, and type:
$ poptimizer threegene_inputpara.xml
The non-fixed module library used is specified in file threegene_module_library.xml, the target data in target_data_threegene.txt, and the initial values for each of the genes in initial_values_threegene.txt.
This case study investigates a gene regulatory network consisting of five genes that is able to behave as a bandwidth detector. The corresponding files can be found in examples/promoter/. To run this example, change to the corresponding directory, and type:
$ poptimizer all_para_promoter_inputpara.xml
The non-fixed module library used is specified in file all_para_module_library_promoter.xml, the target data in*target_data_promoter.txt*, and the initial values for each of the genes in initial_values_promoter.txt.
The last example deals with a network of at most five genes to simulate a pulse generator for one the genes under different initial conditions. The corresponding files can be found in examples/fourinitial/. To run this example, change to the corresponding directory, and type:
$ poptimizer four_initial_inputpara.xml
A fixed module library specified in file library2.xml is now used together with the non-fixed library library1-lin.xml. The target data is now specified in four different files (target1.txt, target2.txt, target3.txt, target4.txt), as well as the initial values (initials1.txt, initials2.txt, initials3.txt, initials4.txt).
The poptimizer distribution, including all source code, model examples, and documentation, are the copyright of of the Infobiotics Team (Hongqing Cao, Claudio Lima, Natalio Krasnogor, Francisco Romero-Campero, Jamie Twycross, and Jonathan Blakes) and is released under the GNU GPL version 3 license.
poptimizer was written by Hongqing Cao, with contributions from Claudio Lima, Natalio Krasnogor, Jamie Twycross, Francisco Romero-Campero, and Jonathan Blakes. It is being used on Systems Biology research projects in the Centre for Plant Integrative Biology and the School of Computer Science, University of Nottingham, U.K. This work is funded by grants from the BBSRC grant BB/D0196131.
For further information or any questions please contact cvf AT cs.nott.ac.uk.
copyright 2009 Infobiotics Team, released under GNU GPL version 3.