New methods to evaluate the impact of single point protein
mutation on human health.|
1. Summary of the project objectives
In this report we summarize the research activity performed by Dr.
Emidio Capriotti during the returning phase of the Marie-Curie IOF
at the Department of Mathematics and Computer Science, University
of Balearic Islands under the supervision of Dr. Jairo Rocha.
The main aims of our proposal are the following:
i. Study and characterization of the rate of evolution of Single
Nucleotide Polymorphisms and their effect in human disease.
ii. Study and characterization of the structural determinants of
iii. Development of new general machine learning methods for disease
iv. Development of disease-specific predictors.
v. Development of a World Wide Web server for predicting the likelihood
of a SNP variant to be associated with human disease.
These 5 aims correspond to 6 different tasks that have to be accomplished
in 36 months. In the proposal's timeline, during the returning phase (12 months),
we planned to perform the two final 2 tasks. According to this, we mainly accomplished
the 4th and 5th. In conclusion during the whole period of the project (36 months) we
achieved all the five objectives described in our proposal.
2. Description of the work performed since the beginning of the
During the last 12 months of the project Dr. Emidio Capriotti developed a
new methods for the prediction of disease-specific mutations focusing on cancer.
In addition, he implemented two different web servers for the predictions of
In details EC selected a manually curated set cancer driver missense Single Nucleotide
Variants (mSNVs). This dataset previously used to train another method (Carter at al,
Cancer Research 2009) and analyzed it performing sequence analysis of the protein under
mutation. For each protein the sequence profile has been calculated using similar protein retrieved using the BLAST algorithm. Using all sequence information previously calculated
EC developed a machine learning approach to discriminate between cancer causing and neutral
variants. For this particular task only sequence information has been used because the
number of cancer mutations for which protein three-dimensional structure was available
were not enough abundant to train a machine learning method. Finally, EC implemented two
web servers: the first one more general for the prediction of disease-related mSNVs and the
second one more specific for the detection of cancer causing mSNVs.
3. Description of the achieved results
The research activity performed during the returning phase reached all the aims
described in our proposal. In particular, it has been demonstrated that for diseases
for which a good number of annotated mutations are available it is possible to build
disease-specific predictors. In particular we tested this hypothesis in the case of
cancer-causing mSNVs showing that the disease-specific methods reaches performs better
than the general method. In addition we implemented web available version of the method
that can be used by the scientific community to evaluate possible deleterious mutations
4. Expected final results and their potential impact
At the end of the returning phase we have developed a user-friendly web server interface
for the prediction of the effect of mSNVs. The implemented web tools include a general
method for the detection of disease-related variants that uses both protein sequence and
structure information and a cancer specific algorithm that takes in to account only sequence information. In conclusion we demonstrated that structural information is important to improvement
the prediction of deleterious variants. When structural information is not available but a
good set of mutations have been annotated, the function information are important to improve
the performance of the predictors on a specific class of diseases. We believe that in the
near future, when more mSNVs data will be available, the development of disease-specific
methods will be key strategy for the development of more accurate algorithms and for the
understanding of the disease mechanism.
Project objectives for the period
During the last year in the returning phase the last two aims of the project
have been accomplished. These objectives correspond to the last two tasks.
More in details a set of manually curated driver cancer variants have been selected
and analyzed considering the evolutive information derived from a set of related sequences
retrieved by BLAST algorithm. In addition, the analysis of the functional information
using a subset of reduced Gene Ontology terms (GO slim) has been used to characterize
particular functions that are more frequent in proteins related to cancer. With this
work we successfully accomplished the 4th aim of our proposal developing a Support Vector
Machine based method able to discriminate between cancer driving mSNVs and neutral
polymorphisms. In the last period of the grant EC accomplished the 5th objective of the
Mut2Dis project developing different web servers for the prediction of the impact of
For more details about the performed activity during the returning phase please refer
to attached file in the next section.
Work progress and achievements during the period
1. Progress towards objectives and details for each task
In this section we summarized the objectives achieved for each
one of the last two aims described in our proposal during the outgoing phase
at the University of Balearic Islands.
1.1 Development of disease-specific predictors.
For the accomplishment of the 5th task, EC started in the last part of the outgoing
phase collecting cancer related mSNVs selecting only mutations with disease names
associated to the MESH term ?neoplasm?. During the returning phase the previous set
was compared with a manually curated dataset of cancer driver mutations to select a
set of cancer-causing mutations and remove possible passenger cancer mSNVs not directly
cause of the pathological state.
Using these data, EC analyzed compared the sequence profile in the mutated position for
the set of cancer-causing mutations and an equal set of mSNVs in SwissVar that are not
associated to any diseases that have been used as negative set. In the next step the frequency
particular class of protein function in the subset of cancer-causing mutations has been compared
with similar set of randomly selected disease-related mSNVs not associated to cancer. Finally most
discriminative features has been selected and used to train and test a binary classifier
able to discriminate between cancer-causing and non cancer-causing mSNVs.
1.2 World Wide Web server for the disease-related mutation prediction.
During the last period of the returning phase, to accomplish the 6th task,
EC implemented different web servers to make available to the scientific
community the methods developed in this project. In detail, EC implemented
an updated version of the SNPs&GO algorithm that predicts the effect of mSNVs
using only sequence information. According to the findings of this research activity
a new version of the SNPs&GO algorithm that takes in to account protein structure
information (SNPs&GO3d) has been made available on the web. SNPs&GO server and its
implementation based on protein three-dimensional structure is reachable at
The promising results obtained in the analysis of cancer driver mutations have been used
to implement a web server for predicting the cancer causing mSNVs (Dr. Cancer).
The Dr Cancer web server is available at
2. Researcher training activities/transfer of knowledge
In the period of the returning phase at University of Balearic Islands, EC was contracted
researcher in the Department of Mathematics and Computer Science. EC attended the Bologna
Winter School 2012, a 5-day course dedicate to the study of the proteins and their variants
from the structural and functional point of view. He also had the opportunity to attend the
course of Optimization held by Dr Jairo Rocha. There has been also the opportunity of
collaboration with other members Computational Biology and Bioinformatics Research Group
to perform a statistical analysis for the detection of high discriminative sequence and
structure based feature included in our algorithms.
3. Highlight significant results
During the second phase, EC achieved many significant results related to the main aims
of the Mut2Dis project. First of all, EC analyzed large dataset of cancer-causing mSNVs
evaluating evolutionary and functional information to discriminate them from neutral
polymorphisms and other disease-related mutations. The results have shown that residue
conservation in the mutated site from the protein sequence profile is one the best discriminative
features. This finding has been also verified also comparing subset of cancer-causing mSNVs
and polymorphisms. We have also shown that cancer-specific GO scores are more accurate
that general GO-term ones in the identification of cancer-related protein, improving the
detection of cancer-causing mSNVs. Finally, the new version SNPs&GO algorithm resulting
from this research project has been scored between the best in its category either in testes
performed by other groups (Thusberg et al. Human Mutation 2011) and in the blind set of
mutations on CHK2 released by the Critical Assessment for Genome Interpretation (CAGI)
organizers during the last two editions.
The great interest of the international scientific community on our methods is shown
from the geographic (http://snps.uib.es/)
and the numeric
representations of the access to the http://snps.uib.es/
web server during the last few years.
4. Statement on the use of resources
For the development of this project during the returning phase the University of
Balearic Island had total expenses for 64,976.67 ? (see table in the attached file).
During the returning period EC dedicated part of the time to disseminate the
results of this project in international conferences, workshops and in invited
seminars in institutions both in US and Europe. To summarize the dissemination activity
performed during the last year, EC published one paper about the results obtained
analyzing cancer-causing missense Single Nucleotide Variants (mSNVs)
and Altman, Genomics, 2011) and another paper
about the prediction the deleterious effect of mSNVs detected in a family quartet
PLOS Genetics, 2012) and
two reviews about the future perspective in personal genomics
et al., Briefings in Bioinformatics, 2012)
and the use of protein structure information for the detection of
mSNVs affecting drug response
et al., Journal of Royal Society Interface, 2012).
In collaboration with other colleagues, EC also submitted 3 posters to meetings and
conferences, 2 of which have been orally presented by collaborators. Finally, EC was
invited to give 4 seminars where he presented the results of the Mut2Dis research
project. EC is also maintaining web page were details of there project are made
available. It is expected that other papers and reviews related to this research
project currently in preparation will be published during the next few months.
1. Project planning and status - from management point of
During the returning phase all the aims and tasks of the project have been
fulfilled according to the timeline described in the research proposal. Dr Jairo
Rocha managed the scientific part of the project supervising the research activity
of Emidio Capriotti and Xavier Garcia supervised the economical part of the project
checking and keeping track of the budget for the realization of the project.
2. Problems which have occurred and how they were solved or envisaged
3. Changes to the legal status of any of the beneficiaries
4. Impact of possible deviations from the planned milestones and
5. Development of the project website
EC as beneficiary of the fellowship is maintaining and updating a
dedicated web site where details and information about the project
are reported (see http://snps.uib.es/mut2dis).
6. Gender issues; Ethical issues
7. Justification of subcontracting (if applicable)
There are not subcontracting expenses in this period.
8. Justification of real costs (management costs)
The management expenses consist in the management of the contract between
beneficiary and the University of Balearic Islands (2,204.70 ?).
9. Indirect costs
Overheads granted are 10% of the total direct costs.
The actual overheads at UIB used in FP7 are
calculated using a simplified method, which includes all the indirect
costs of the institution (communication costs, maintenance and depreciation
of buildings and infrastructures, courier services, security services,
electric power and water expenses, research support personnel an so on)
and represent, for the year 2011, a rate of 81.09% of the personnel costs.
This rate has already been audited.