Learning a parameterized skill is essential for autonomous robots operating in an unpredictable environment. Previous techniques learned a policy for each example task individually and constructed a regression model to map between task and policy parameter
spaces. However, these techniques have less success when applied to whole-body dynamic skills, such as jumping or walking, which involve the challenges of handling discrete contacts and balancing an under-actuated system under gravity. This paper
introduces an evolutionary optimization algorithm for learning parameterized skills to achieve whole-body dynamic tasks. Our algorithm simultaneously learns policies for a range of tasks instead of learning each policy individually. The problem
can be formulated as a nonconvex optimization whose solution is a closed segment of curve instead of a point in the policy parameter space. We develop a new optimization algorithm which maintains a parameterized probability distribution for the
entire range of tasks and iteratively updates the distribution using selected elite samples. Our algorithm is able to better exploit each sample, greatly reducing the number of samples required to optimize a parameterized skill for all the tasks
in the range of interest.