Genotype Imputation Reference Panel Intelligent Customization
ImpRefIC is an intelligent software tool designed for genotype imputation in genomics research, utilizing a customized reference panel to achieve higher imputation accuracy. It skillfully processes Variant Call Format (VCF) files, converting genotypes into a numerical format conducive to advanced machine learning algorithms. Harnessing the power of Logistic Regression, ImpRefIC adeptly manages imbalanced data through a RandomOverSampler upsampling technique. The tool saves results, including the predicted optimal reference population and their respective probabilities, in a user-specified directory. It transparently reports its operational duration to keep users informed. However, note that limited consistent SNPs or chromosomal diversity may potentially influence prediction precision.
In its current version, ImpRefIC is primarily dedicated to porcine research. Nevertheless, we're committed to embracing the diversity of genomics research and are diligently working towards extending ImpRefIC's functionality to accommodate other species. Stay tuned for these exciting updates to broaden your research horizons!
git lfs clone --recursive https://github.com/klzhang2022/ImpRefIC.git
☀Requirements
python 3 (https://www.python.org)
python modules and packages
git
git-lfs
cd ImpRefIC_path
python3 ImpRefIC.py ./example/test.vcf.gz ./ ./example
# probability matrix of target samples and 64 breeds/lines
American_Yorkshire | Canadian_Yorkshire | Danish_Yorkshire | Dutch_Yorkshire | French_Yorkshire | Unknown_Yorkshire_lines | Landrace | Duroc | Berkshire | Goettingen_Minipig | Hampshire | Iberian | Mangalica | Pietrain | Angler_Sattleschwein | British_Saddleback | Bunte_Bentheimer | Calabrese | Casertana | Chato_Murciano | Cinta_Senese | Gloucester_Old_Spot | Large_Black | Leicoma | Linderodsvin | Middle_White | Nero_Siciliano | Tamworth | European_Wild_boar | Yucatan_minipig | Creole | American_Wild_boar | Bamei | Baoshan | Enshi_black | Erhualian | Hetao | Jinhua | Korean_black_pig | Laiwu | Meishan | Min | Neijiang | Rongchang | Tibetan | Tongcheng | Hubei_White | Daweizi | Jiangquhai | Leping_Spotted | Penzhou | songliao_black_pig | Taihu | Wannan_Spotted | Wujin | Ya_nan | Diannanxiaoer | Luchuan | Wuzhishan | Bamaxiang | MiniLEWE | Xiang | Asia_Wild_boar | Hybrid |
0.0000 | 0.0000 | 0.9999 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
0.0000 | 0.0000 | 1.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
0.0000 | 0.0000 | 0.9999 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
0.0000 | 0.0000 | 0.9999 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
0.0000 | 0.0000 | 0.9999 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
0.9988 | 0.0002 | 0.0002 | 0.0001 | 0.0001 | 0.0001 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0002 |
0.9962 | 0.0003 | 0.0002 | 0.0001 | 0.0002 | 0.0007 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0018 |
0.0253 | 0.0036 | 0.0564 | 0.0032 | 0.0043 | 0.0644 | 0.1547 | 0.0002 | 0.0017 | 0.0010 | 0.0018 | 0.0010 | 0.0027 | 0.0022 | 0.0051 | 0.0016 | 0.0026 | 0.0012 | 0.0015 | 0.0026 | 0.0009 | 0.0009 | 0.0010 | 0.0018 | 0.0029 | 0.0027 | 0.0077 | 0.0009 | 0.0036 | 0.0010 | 0.0020 | 0.0010 | 0.0006 | 0.0003 | 0.0013 | 0.0000 | 0.0012 | 0.0001 | 0.0013 | 0.0007 | 0.0002 | 0.0037 | 0.0001 | 0.0000 | 0.0009 | 0.0002 | 0.0259 | 0.0002 | 0.0003 | 0.0003 | 0.0022 | 0.0015 | 0.0013 | 0.0003 | 0.0005 | 0.0001 | 0.0002 | 0.0001 | 0.0003 | 0.0001 | 0.0006 | 0.0002 | 0.0006 | 0.5914 |
0.0134 | 0.0171 | 0.1671 | 0.0033 | 0.0057 | 0.0639 | 0.2500 | 0.0002 | 0.0019 | 0.0017 | 0.0012 | 0.0006 | 0.0016 | 0.0029 | 0.0054 | 0.0015 | 0.0012 | 0.0012 | 0.0016 | 0.0022 | 0.0013 | 0.0009 | 0.0013 | 0.0027 | 0.0023 | 0.0035 | 0.0030 | 0.0014 | 0.0031 | 0.0019 | 0.0022 | 0.0013 | 0.0012 | 0.0005 | 0.0042 | 0.0001 | 0.0018 | 0.0001 | 0.0011 | 0.0008 | 0.0005 | 0.0026 | 0.0001 | 0.0001 | 0.0013 | 0.0005 | 0.0641 | 0.0004 | 0.0004 | 0.0004 | 0.0038 | 0.0017 | 0.0022 | 0.0006 | 0.0005 | 0.0002 | 0.0004 | 0.0001 | 0.0009 | 0.0005 | 0.0012 | 0.0003 | 0.0003 | 0.3389 |
0.0209 | 0.0093 | 0.0347 | 0.0013 | 0.0028 | 0.0273 | 0.0732 | 0.0002 | 0.0021 | 0.0013 | 0.0012 | 0.0005 | 0.0017 | 0.0015 | 0.0026 | 0.0011 | 0.0023 | 0.0008 | 0.0011 | 0.0018 | 0.0010 | 0.0009 | 0.0006 | 0.0014 | 0.0019 | 0.0020 | 0.0031 | 0.0007 | 0.0014 | 0.0009 | 0.0016 | 0.0006 | 0.0003 | 0.0002 | 0.0017 | 0.0000 | 0.0011 | 0.0001 | 0.0007 | 0.0007 | 0.0002 | 0.0035 | 0.0001 | 0.0001 | 0.0007 | 0.0002 | 0.0247 | 0.0003 | 0.0003 | 0.0003 | 0.0021 | 0.0009 | 0.0018 | 0.0002 | 0.0004 | 0.0001 | 0.0002 | 0.0000 | 0.0002 | 0.0003 | 0.0006 | 0.0002 | 0.0004 | 0.7536 |
# target sample prediction results: sample ID; breed/line most similar to each target sample sequence
Sample ID | Breed/Line |
sample_1 | Danish_Yorkshire |
sample_2 | Danish_Yorkshire |
sample_3 | Danish_Yorkshire |
sample_4 | Danish_Yorkshire |
sample_5 | Danish_Yorkshire |
sample_6 | American_Yorkshire |
sample_7 | American_Yorkshire |
sample_8 | Hybrid |
sample_9 | Hybrid |
sample_10 | Hybrid |
# customized reference panel for target samples
Customized reference panel |
Danish_Yorkshire |
American_Yorkshire |
Hybrid |
# the trained classification model