New methods to evaluate the impact of single point protein
mutation on human health.|
1. Summary of the project objectives
In this report we summarize the research activity performed by
Dr. Emidio Capriotti during the outgoing phase of the Marie-Curie
IOF at the Department of Bioengineering, Stanford University under
the supervision of Dr. Russ B. Altman.
The main aims of our proposal are the following:
i. Study and characterization of the rate of evolution of Single
Nucleotide Polymorphisms and their effect in human disease.
ii. Study and characterization of the structural determinants of
iii. Development of new general machine learning methods for disease
iv. Development of disease-specific predictors.
v. Development of a World Wide Web server for predicting the likelihood
of a SNP variant to be associated with human disease.
These 5 aims correspond to 6 different tasks that have to be
accomplished in 36 months. In the proposal's timeline, we planned
to perform about 4 over the 6 tasks during the outgoing phase (24 months).
According to this, we mainly achieved the first 3 objectives and part
of the 4th and 5th. The remaining parts of specific aims 4 and 5 will
be performed during the returning phase at the University of Balearic
2. Description of the work performed since the beginning of the
During the first months of the project Dr. Emidio Capriotti selected a
set of annotated missense Single Nucleotide Variants (mSNVs) from the
database SwissVar. The dataset used in this work has been downloaded at the
end of October 2009. The selection of the subset of mSNVs for which the
three-dimensional structure of the proteins is known, EC implemented programs
able to automatically compare the sequences of the mutated proteins with
the sequences of the protein collected in the Protein Data Bank (PDB).
In a second phase EC performed a evolutionary analysis calculating the
selective pressure acting at codon level using alignments between the human
DNA sequences and their homolog in mammalian species. In the next
steps Dr. Capriotti built a machine-learning base approaches to predict
the impact of mSNVs evaluating the discriminative power of different
features. In these algorithms we included features from sequence analysis
such as evolutionary and functional information and protein structure
information. In last period a disease-specific method have been
developed to predict the cancer causing mSNVs.
3. Description of the achieved results
With the research activity performed during the outgoing phase, EC reached
the largest part of the objectives described in our proposal. In particular,
we defined a set of discriminative features derived from protein sequence
profile and protein structure.
This allowed to
develop a new machine learning based method for the prediction of deleterious
variant taking in input information from protein sequence profile, protein
function and protein structure. The improvement of the prediction accuracy
resulting from the use of structure information has been quantitatively
estimated comparing the structure-based method with similar sequence-based
4. Expected final results and their potential impact
After the returning phase we expect to have developed a user-friendly web
server interface for the prediction of the effect of mSNVs. Currently,
EC is implementing these web tools including both protein sequence and
structure information. We believe that the use of structural information
in the prediction of deleterious variants will be important for the
understanding of the disease mechanism. In addition the developed method
will have an impact in personal genomics allowing to make new hypothesis
about the insurgence of genetic diseases. As natural consequence of this work we
are planning to study the relationship between genetic variants and drug
response. The application of newly developed tools in clinical settings
will be important for the establishment of personalized medicine.
Project objectives for the period
During the first two years of the outgoing phase the first three aims
of the project have been accomplished. These objectives correspond
to the first four tasks. In details after collecting a set of
manually curated human variants, we analyzed them considering different
evolutive information obtained aligning the protein sequence with a set
of related proteins in other species. In the second period, we selected
a subset of mutations for which structural information are available.
Analyzing these mutations we identifying important feature to improve
the prediction of deleterious variants. To accomplish the 3rd aim of
the project, we developed different general machine learning-based
approaches to discriminate between disease-related and neutral variants.
After a testing procedure we have shown that protein structure features
increase the quality of the prediction and provide information about the
mechanism of the disease. During the last months EC started to collect a
manually curated dataset of mutations related to cancer and characterize
them from the functional point of view. In the next period, EC is planning
to develop a disease-specific method for the prediction of cancer causing
variants using sequence information. For more details about the
performed activity during the outgoing phase please refer to the next
Work progress and achievements during the period
1. Progress towards objectives and details for each task
In this section we summarized the objectives achieved for each one
of the five aims described in our proposal during the outgoing phase
at the Stanford University.
1.1 Study and characterization of the rate of evolution of Single Nucleotide
Polymorphisms and their effect in human disease.
To achieve the first objective of our proposal Dr Capriotti did two different
type of analysis. The evolution of the protein in the mutated position was
studied considering both alignments of similar protein sequences or DNA
alignment between homolog genes. Using different features, these data confirm
the idea the wild-type residues are more conserved in disease-related sites
than neutral and the mutated residues appear more frequently in the positions of
the multiple sequence alignment corresponding to neutral mutation with respect
to those disease-related.
1.2 Study and characterization of the structural determinants of human
After the characterization of mSNVs in terms conservation and evolution
across species, we analyzed structural features of mutated residues. In
this direction the first goal has been the development of an automatic
method to map the mSNVm for protein sequence to structure. The protein
structures and the mutated sites in the selected dataset have been analyzed
to find new discriminative features. First we calculated the distributions
of the relative solvent accessible area (RASA) for disease-related and neutral
mSNVs and the occurrences in each secondary structural class. The results show
a significant difference between the distributions of RASA for disease-related
and neutral variants. More in details the disease-related mutations are more
likely to happen in the core of the proteins and neutral ones on the
1.3 Development of new general machine learning methods for disease
EC developed a sequence-based method. This algorithm is a Support Vector
Machine trained on data extracted from protein sequence, profile, the output
of PANTHER and a functional score derived using Gene Ontology terms of the
protein under mutations and all their parents. The sequence-based SVM takes
in input a 51 elements vector and a second SVM base method has been developed
including structural information. Thus the sequence-base algorithm has been
used as baseline to quantify the improvement of the prediction resulting from
the use of structural information.
1.4 Development of disease-specific predictors
For this particular task EC is focusing on the detection of mutations involved
in the insurgence of different types of cancers. Thus a set of manually
curated driver variants associated to cancer has been selected.
2. Researcher training activities/transfer of knowledge
In the period of the outgoing phase at Stanford University, EC was appointed
as postdoc in the Department of Bioengineering. He had the opportunity to
attend different courses. In addition he cooperated for specific projects with
the researcher at the PharmGKB consortium that is interested to analyze the
effect of genetic variants on drug response.
3. Highlight significant results
During the first phase, EC achieved many significant results related to the
main objectives of the Mut2Dis project. First of all, EC analyzed large dataset
of annotated mutations evaluating evolutionary and structural information.
More practical results is the development of a new machine learning base
approach based on protein structure information to predict the effect of
missense single nucleotide variants.
4. Statement on the use of resources
For the development of this project during the last two years the University of
Balearic Island had total expenses for 156.475,14 € (see table in the attached file).
In the Form C attached to this report we included only the amount provided by the
European Community (Total 152,627.84 €).
During the outgoing period EC dedicated part of the time to disseminate
the results of this project in international conferences, workshops
and in invited seminars in institutions both in US and Europe.To
summarize the dissemination activity performed in the last two years, EC
published one paper about the results obtained in this study
and Altman, BMC Bioinformatics, 2011) and one review
about the personalized challenges in personalized medicine
et al. Bioinformatics, 2011). He also
submitted 10 posters to meetings and conferences, 6 of which have been
selected for oral presentation. Finally EC was invited to give 12
seminars where he presented the results of the Mut2Dis research project.
EC is also maintaining web page were details of there project are made
available. In the returning phase, EC is planning to attend important
meetings and conferences to advertise the web server for the prediction
of deleterious variants that will be developed during the next year. In
addition other papers and review are in preparation and we believe that
will be published during the second phase.
1. Project planning and status - from management point of
During the outgoing phase all the aims and tasks of the project have
been fulfilled according to the timeline described in the research
proposal. From the management point of view, the available funds
have been used to cover medical insurance expenses for EC during his
stay in US and the expanses to apply for the US VISA. Part of the funds
has been used to compensate the personal of the UIB involved in the
management of this grant. During the returning phase it expected to have
not any expenses for medical insurance.
2. Problems which have occurred and how they were solved or envisaged
3. Changes to the legal status of any of the beneficiaries
4. Impact of possible deviations from the planned milestones and
5. Development of the project website
EC as beneficiary of the fellowship is maintaining and updating a
dedicated web site where details and information about the project
are reported (see http://snps.uib.es/mut2dis).
6. Gender issues; Ethical issues
7. Justification of subcontracting (if applicable)
There are not subcontracting expenses in this period.
8. Justification of real costs (management costs)
The management costs include:
i. Management of the contract between beneficiary and the University of
Balearic Islands (3,004.20 Euro)
ii. The travel insurance fees for the beneficiary for two years
iii. Costs of management of the US VISA for the beneficiary and fees
9. Indirect costs
Overheads granted are 10% of the total direct costs (excluding
management costs). The actual overheads at UIB used in FP7 are
calculated using a simplified method, which includes all the indirect
costs of the institution (communication costs, maintenance and depreciation
of buildings and infrastructures, courier services, security services,
electric power and water expenses, research support personnel an so on)
and represent, for the year 2010, a rate of 81.01% of the personnel costs.
This rate has already been audited. Only maximum reimbursable indirect
costs have been included in Form C, because in this Grant Agreement
indirect costs are calculated by a flat rate.