Predictive modelling of facial features from DNA

Jans, G.C.M.

Abstract

Background: In recent years, attention to the genetic architecture of normal-range variation in facial morphology has risen and through GWAS genetic loci associated with facial morphology have been identified. However these give no insight in how the face is shaped by genetics. To investigate the relationship between the genotype and the phenotype, predictive modelling can be used. Predictive modelling is a term used to describe genetic prediction models: tools that aim to predict a phenotype from the genotype.
Objectives: The objective of this project was to investigate the possibility of predictive modelling of facial features from DNA. This method could be used to visualize the variation in facial features caused by the underlying genetics
Methods: In this project the LDAK genetic prediction model was used, this is a software package for phenotype prediction, which uses multilinear regression. Two different datasets were used for this study: Generation R and the Rotterdam Study. Generation R data existed of subjects at the age of nine years old and from the Rotterdam study subjects ≥ 45 years old were included. The datasets were processed individually. From both datasets, phenotype and genotype data was used. The phenotype data consisted of 3D facial meshes that were reduced to 200 endophenotypes with an auto-encoder prior to this study, and genotype data consisted of SNPs acquired using genotyping arrays. The prediction model was trained on 90% of the data, the other 10% was used for testing, where the facial morphology was predicted based on the SNPs. To evaluate the prediction a similarity measure was computed between the predicted faces and the ground truth faces. The similarity measure was computed between each predicted face and all ground truth faces in the test set, thereafter they were ranked in ascending order based on the computed similarity. Next the rank of the true ground truth was determined. Based on the ranking, an accuracy plot was constructed for both datasets and the accuracy ratios (AR) were computed.
Results: For the Generation R dataset the AR found was 0.06 for the Generation R dataset and 0.02 for the Rotterdam Study dataset. The results indicate there is some predictive power, however the AR’s are only slightly above the lower bound for presence of predictive performance. Furthermore, there was a difference in the AR for the Generation R and Rotterdam Study dataset, which could be the result of increased environmental component in facial morphology which reduced the genetic predictability. However, currently the predictive power is minimal. This could be caused by several factors, such as the number of subjects and prediction model that is restricted to only linear relationships.
Conclusion: The objective of this project was to explore the possibility of predictive modelling of facial features from DNA. It was found that there was some predictive power, however this was very limited. Research on predictive modelling of facial features is still in early stages and further research is required to improve the predictive power.

Predictive modelling of facial features from DNA

Abstract

Files