Reconstruction of a matrix of genotypic correlations between variants within a gene for joint analysis of imputed and sequenced data

Cover Page

Cite item

Full Text

Open Access Open Access
Restricted Access Access granted
Restricted Access Subscription Access

Abstract

When combining imputed and sequenced data in a single gene-based association analysis, the problem of reconstructing genetic correlation matrices arises. It is related to the fact that for a gene, we know the correlations between genotypes of all imputed variants and the correlations between genotypes of all sequenced variants, but we do not know the correlations between genotypes of variants, one of which is imputed and the other is sequenced. To recover these correlations, we propose an efficient method based on maximising the determinant of the matrix. This method has a number of useful properties and has an analytical solution for our task. Approbation of the proposed method was performed by comparing reconstructed and real correlation matrices constructed on individual genotypes from the UK biobank. Comparison of the results of gene-based association analysis performed by the SKAT, BT and PCA methods on reconstructed and real matrices, using modelled summary statistics and calculated summary statistics on real phenotypes, showed high quality of reconstruction and robustness of the method to different gene structures.

About the authors

G. R. Svishcheva

Federal Research Centre, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences; Vavilov Institute of General Genetics, Russian Academy of Sciences

Author for correspondence.
Email: gulsvi@mail.ru
Russian Federation, 630090, Novosibirsk; 119991, Moscow

A. V. Kirichenko

Federal Research Centre, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences

Email: gulsvi@mail.ru
Russian Federation, 630090, Novosibirsk

N. M. Belonogova

Federal Research Centre, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences

Email: gulsvi@mail.ru
Russian Federation, 630090, Novosibirsk

E. E. Elgaeva

Federal Research Centre, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences; Novosibirsk State University

Email: gulsvi@mail.ru
Russian Federation, 630090, Novosibirsk; 630090, Novosibirsk

A. Ya. Tsepilov

Federal Research Centre, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences

Email: gulsvi@mail.ru
Russian Federation, 630090, Novosibirsk

I. V. Zorkoltseva

Federal Research Centre, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences

Email: gulsvi@mail.ru
Russian Federation, 630090, Novosibirsk

T. I. Axenovich

Federal Research Centre, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences

Email: gulsvi@mail.ru
Russian Federation, 630090, Novosibirsk

References

  1. Eichler E.E., Flint J., Gibson G. at al. Missing heritability and strategies for finding the underlying causes of complex disease // Nat. Rev. Genet. 2010. V. 11. № 6. P. 446–450. https://doi.org/10.1038/nrg2809
  2. Li B., Leal S.M. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data // The Am. J. Hum. Genet. 2008. V. 83. № 3. P. 311−321. https://doi.org/10.1016/j.ajhg.2008.06.024
  3. Cirulli E.T. The increasing importance of gene-based analyses // PloS Genetics. 2016. V. 12. № 4. https://doi.org/10.1371/journal.pgen.1005852
  4. Kang G., Jiang B., Cui Y. Gene-based genomewide association analysis: A comparison study // Curr. Genomics. 2013. V. 14. № 4. P. 250–255. https://doi.org/10.2174/13892029113149990001
  5. Li Y., Willer C., Sanna S., Abecasis G. Genotype imputation // Ann. Rev. Genomics and Hum. Genet. 2009. V. 10. P. 387–406. https://doi.org/10.1146/annurev.genom.9.081307.164242
  6. Uffelmann E., Huang Q.Q., Munung N.S. et al. Genome-wide association studies // Nat. Rev. Methods Primers. 2021. V. 1. № 59. P. 1–21. https://doi.org/10.1038/s43586-021-00056-9
  7. Guo Y., Long J., He J. et al. Exome sequencing generates high quality data in non-target regions // BMC Genomics. 2012. V. 13. № 1. P. 1–10. https://doi.org/10.1186/1471-2164-13-194
  8. Clark M.J., Chen R., Lam H.Y. et al. Performance comparison of exome DNA sequencing technologies // Nat. Biotechnol. 2011. V. 29. № 10. P. 908–914. https://doi.org/10.1038/nbt.1975
  9. Stanley J.C., Wang M.D. Restrictions on the possible values of r12, given r13 and r23 // Educational and Psychol. Measurement. 1969. V. 29. № 3. P. 579–581.
  10. Glass G.V., Collins J.R. Geometric proof of the restriction on the possible values of rxy when rxz and ryz are fixed // Educational and Psychol. Measurement. 1970. V. 30. № 1. P. 37–39.
  11. Budden M., Hadavas P., Hoffman L., Pretz C. Generating valid 4 × 4 correlation matrices // Applied Mathemat. E-Notes. 2007. V. 7. P. 53–59.
  12. Glunt W., Hayden T., Johnson C.R., Tarazaga P. Positive definite completions and determinant maximization // Linear Algebra and its Applications. 1999. V. 288. P. 1–10. https://doi.org/10.1016/S0024-3795(98)10211-2
  13. Vandenberghe L., Boyd S., Wu S.-P. Determinant maximization with linear matrix inequality constraints // SIAM J. Matrix Analysis and Applications. 1998. V. 19. № 2. P. 499–533. https://doi.org/10.1137/S0895479896303430
  14. Georgescu D.I., Higham N.J., Peters G.W. Explicit solutions to correlation matrix completion problems, with an application to risk management and insurance // Royal Soc. Open Sci. 2018. V. 5. № 3. P. 172348.
  15. Grone R., Johnson C.R., Sá E.M., Wolkowicz H. Positive definite completions of partial Hermitian matrices // Linear Algebra and its Applications. 1984. V. 58. P. 109–124.
  16. Popescu O., Rose C., Popescu D.C. Maximizing the determinant for a special class of block-partitioned matrices // Mathem. Problems in Engineering. 2004. V. 2004. P. 49–61. https://doi.org/10.1155/S1024123X04307027
  17. Li B., Liu D.J., Leal S.M. Identifying rare variants associated with complex traits via sequencing // Curr. Protocols in Hum. Genet. 2013. V. 78. № 1. P. 1–26. https://doi.org/10.1002/0471142905.hg0126s78
  18. Wu M.C., Lee S., Cai T. et al. Rare-variant association testing for sequencing data with the sequence kernel association test // The Am. J. Hum. Genet. 2011. V. 89. № 1. P. 82–93. https://doi.org/10.1016/j.ajhg.2011.05.029
  19. Jiang L., Zheng Z., Fang H., Yang J. A generalized linear mixed model association tool for biobank-scale data // Nat. Genet. 2021. V. 53. № 11. P. 1616–1621. https://doi.org/10.1038/s41588-021-00954-4
  20. Svishcheva G.R. A generalized model for combining dependent SNP-level summary statistics and its extensions to statistics of other levels // Scientific Reports. 2019. V. 9. № 1. P. 1–8. https://doi.org/10.1038/s41598-019-41827-5
  21. Svishcheva G.R., Belonogova N.M., Zorkoltseva I.V. et al. Gene-based association tests using GWAS summary statistics // Bioinformatics. 2019. V. 35. № 19. P. 3701–3708. https://doi.org/10.1093/bioinformatics/btz172
  22. Belonogova N.M., Svishcheva G.R., Kirichenko A.V. et al. sumSTAAR: A flexible framework for gene-based association studies using GWAS summary statistics // PloS Comput. Biology. 2022. T. 18. № 6. https://doi.org/10.1371/journal.pcbi.1010172
  23. Тихонов А.Н. О решении некорректно поставленных задач и методе регуляризации // ДАН. 1963. Т. 151. № 3. C. 501–504.

Supplementary files

Supplementary Files
Action
1. JATS XML

Copyright (c) 2024 Russian Academy of Sciences