This section give a brief introduction of AGIDB. From this section, users can learn what information AGIDB provides and how to cite AGIDB.
AGIDB is an all-in-one imputation platform covering a total of 35 animals. Our platform provides a treasure trove of information, featuring in-depth reference panels, dynamic haplotype visualization, and cutting-edge imputation techniques encompassing four tailored scenarios to meet your specific requirements. Capitalize on our ENCODE-annotated SNP database and delve into our robust toolset, including ImpRefIC, principal component analysis, selective sweep analysis, and versatile data format conversions, all meticulously crafted to enhance your research workflow. Embark on your journey through our intuitive interface and unveil groundbreaking insights to propel your research forward. For swine genomic research, we invite you to explore our specialized Pig Imputation Platform (PGIDB) for a more focused experience. Begin your voyage and enrich your research today!
Kaili Zhang, Jiete Liang, Yuhua Fu, Jinyu Chu, Liangliang Fu, Yongfei Wang, Wangjiao Li, You Zhou, Jinhua Li, Xiaoxiao Yin, Haiyan Wang, Xiaolei Liu, Chunyan Mou, Chonglong Wang, Heng Wang, Xinxing Dong, Dawei Yan, Mei Yu, Shuhong Zhao, Xinyun Li, Yunlong Ma, AGIDB: a versatile database for genotype imputation and variant decoding across species, Nucleic Acids Research, Volume 52, Issue D1, 5 January 2024, Pages D835–D849, https://doi.org/10.1093/nar/gkad913
This part provides the operation guide for each functional module of AGIDB. Important steps are marked with red boxes and serial numbers.
Quick Start
This module provides basic information on 35 animals, including introduction, phylogenetic tree, and sample information table, enabling users to gain insight into samples and design optimal reference panel subsets for genotype imputation.
Click on the picture of an animal, the introduction, phylogenetic tree and sample information will be presented below the animal picture.
The sample information table contains information such as Sample Id, Breed, Group, Sex, Tissue, Locality, Bases, Mapping Rate, Depth, Coverage, Project, etc.
This module provides a free online imputation service, including four imputation scenarios:
1.Imputation from Chip to High-coverage sequencing.
2.Imputation from Low-coverage to High-coverage sequencing.
3.Imputation from Chip to Chip, to realize map conversion between chips.
4.Imputation of the specified map.
This module provides the imputation function from chip data to sequencing data using Beagle 5.4 software. In this module, it is supported to upload the target data set in VCF or VCF compressed format, and the imputed or phased file will automatically pop up to the local browser page.
Imputation steps: submit the target file to be imputed(Size must < 50 MB); enter the genomic region of interest; select the default reference panel or a subset of the reference panel; specify phasing and imputation requirements; click the "Submit" button. Users can also click the "example" button to use the example file to experience our imputation function.
The imputation result file will be directly downloaded to the local.
The reference panel defaults to "AGIDB", that is, all samples of this species provided by the database. Users can also select subsets of the reference panel by breed or group as desired.
This module uses GLIMPSE2 software to provide the imputation function from low-depth sequencing to high-depth sequencing. This module supports uploading two forms of target files (VCF/BCF or BAM/CRAM), and the resulting file will automatically pop up to the local browser page.
Note: Users can click "example.vcf.gz" to download the example VCF file provided for each species.
The imputation result file will be directly downloaded to the local.
In this module, BAM/CRAM files of a single sample or multiple samples are supported to be uploaded at the same time. When uploading BAM files of multiple samples, a BAM/CRAM list file must be submitted at the same time. For the format of the list file, see "List example". One file per line. A second column (space separated) can be used to specify the sample name, otherwise the name of the file is used.
The imputation guide is shown in the figure below:
The imputation result file will be directly downloaded to the local.
This module provides the conversion function of different versions of chips. By selecting the chip version of interest, a VCF file of the specified map can be obtained. We also provide a Venn diagram between versions for user reference.
This module provides an imputation function for a specified map. By submitting a Map file, the user can obtain the VCF file corresponding to the map.
This module provides breed-specific variant search. Users can easily query breed-specific variant based on breeds and genomic regions. Search results are displayed in a comprehensive table.
This module provides conserved SNPs search. Users can easily query conserved SNPs based on breeds. Search results are displayed in a comprehensive table.
This module provides deleterious variant search. Users can easily query deleterious variant based on breeds, software and score. Search results are displayed in a comprehensive table.
In this module, we provide the SNP information search function of 35 species and selected genomic regions. For each SNP, the allele frequency and annotation are provided, and the ENCODE peak annotation is integrated. Simply select your species and genomic region of interest and detailed SNP data is presented. Enhance your understanding of genetic variant with this powerful and user-friendly module. We also provide genome region files for each species to facilitate user input of reasonable genome regions.
The genomic region files available for download are as follows:
Each column of the table has an ascending or descending function to allow users to filter for more critical SNPs.
The content of any field in the form can be searched through the search box on the right.
The figure below the table shows the Peak and gene information of this region more intuitively. The abscissa represents the SNP position, and the ordinate represents the fold change. When the mouse hovers over the Peak, the "Chromosome", "Peak Start" and "Peak End" information corresponding to the Peak will be displayed. Change the "position" by sliding the abscissa.
In this module, we link the genome browser, which enables users to visualize and browse entire genomes with annotated data, including genomes, genes, and variants. It is convenient for users to view numerous features, analyze and study the relationship between various genomic elements.
Select the chromosome of interest in the left track to display the corresponding variant information.
This module provides haplotype block search and LD heatmap visualization. Users can easily query haplotype blocks based on species and genomic regions. Search results are displayed in a comprehensive table.
Click on a BLOCK ID in the second column to display its frequency data. Click the "Close" button to close the popup.
Click the "LD plot" button to obtain the LD heatmap of the selected species and the specified genomic region.
This module provides rare variant search. Users can easily query rare variant based on species, rare variant, AF, and genomic regions. Search results are displayed in a comprehensive table.
This module provides the principal component analysis function. The user selects the specified species and submits a VCF compressed file. After clicking the submit button, the PCA analysis will be executed. The user will get the PCA results between the submitted data and the reference panel of the species in the database, so as to facilitate the user's subsequent analysis. The PCA plot supports downloading.
The PCA plot is available for download, and the black dots represent samples uploaded by users.
This module provides the mutual conversion function of seven genome data formats. We also provide example files for each data format for download, so that users can understand the format more intuitively.
Download example files
Key steps in genomic data format conversion:
The converted file is automatically downloaded to the local.
This module provides selective sweep analysis. Users can easily query Tajima's D、Pi and FST based on species, gene id, gene name and genomic regions.Online analysis is also available. Search results are displayed in a comprehensive table.
This module introduces ImpRefIC, an intelligent software designed for genotype imputation in genomics research, for customizing subsets of reference panels to achieve higher imputation accuracy. According to the reference panel subset customized by ImpRefIC, users can select the predicted breed or population as the reference population in imputation scenario 1.
In the "Download" module, we provide species sample information, genome data, ENCODE peak information and chip version files for download, and mark the size of each file. By selecting the species name on the left side of the module, the downloadable files for that species will be displayed on the right side of the module.
In the "Submit" module, support user submit data link to our database.
LDBlockShow -InVCF ref.vcf.gz -Region [chrom]:[start]-[end] -OutPng -SeleVar 2 -OutPut test_LD_plot
java -jar beagle.22Jul22.46e.jar ref=ref.vcf.gz gt=target.vcf.gz chrom=[chrom]:[start]-[end] ne=100 out=target_imp
The target file is in VCF/BCF format.
GLIMPSE2_phase --input-gl target.vcf.gz --reference ref.vcf.gz --input-region [chrom]:[start]-[end] --output-region [chrom]:[start]-[end] --ne 100 --impute-reference-only-variants --output target_imp.bcf
The target file is in BAM/CRAM format.
GLIMPSE2_phase --bam-list target_BAM_list.txt --reference ref.vcf.gz --input-region [chrom]:[start]-[end] --output-region [chrom]:[start]-[end] --ne 100 --output target_imp.bcf --threads 20
Pig Genotype Imputation Database
Quick Start
A1: You can quickly learn about the core functions of AGIDB through the Home page, and learn about each functional module of AGIDB in more detail through the Tutorial on the Help page.
A2: You can contact us by email in "Contact". You are welcome to contact us anytime.
A3: AGIDB will be committed to collecting more species data and increasing the genetic diversity of each species; enriching the imputation scenarios and applications in the database, and breaking through some technical limitations of the imputation platform to achieve more practical and efficient imputation services.
A4: Kaili Zhang, Jiete Liang, Yuhua Fu, Jinyu Chu, Liangliang Fu, Yongfei Wang, Wangjiao Li, You Zhou, Jinhua Li, Xiaoxiao Yin, Haiyan Wang, Xiaolei Liu, Chunyan Mou, Chonglong Wang, Heng Wang, Xinxing Dong, Dawei Yan, Mei Yu, Shuhong Zhao, Xinyun Li, Yunlong Ma, AGIDB: a versatile database for genotype imputation and variant decoding across species, Nucleic Acids Research, Volume 52, Issue D1, 5 January 2024, Pages D835–D849, https://doi.org/10.1093/nar/gkad913
Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University Wuhan, Hubei, 430070, PR China
The mailing lists are in no particular order, sorted by first and last name.
Yunlong Ma (Yunlong.Ma@mail.hzau.edu.cn)
Kaili Zhang (kelly1153793935@163.com)