1. Introduction

This section give a brief introduction of AGIDB. From this section, users can learn what information AGIDB provides and how to cite AGIDB.

1.1 About AGIDB

AGIDB is an all-in-one imputation platform covering a total of 35 animals. Our platform provides a treasure trove of information, featuring in-depth reference panels, dynamic haplotype visualization, and cutting-edge imputation techniques encompassing four tailored scenarios to meet your specific requirements. Capitalize on our ENCODE-annotated SNP database and delve into our robust toolset, including ImpRefIC, principal component analysis, selective sweep analysis, and versatile data format conversions, all meticulously crafted to enhance your research workflow. Embark on your journey through our intuitive interface and unveil groundbreaking insights to propel your research forward. For swine genomic research, we invite you to explore our specialized Pig Imputation Platform (PGIDB) for a more focused experience. Begin your voyage and enrich your research today!

1.2 How to cite AGIDB

Kaili Zhang, Jiete Liang, Yuhua Fu, Jinyu Chu, Liangliang Fu, Yongfei Wang, Wangjiao Li, You Zhou, Jinhua Li, Xiaoxiao Yin, Haiyan Wang, Xiaolei Liu, Chunyan Mou, Chonglong Wang, Heng Wang, Xinxing Dong, Dawei Yan, Mei Yu, Shuhong Zhao, Xinyun Li, Yunlong Ma, AGIDB: a versatile database for genotype imputation and variant decoding across species, Nucleic Acids Research, Volume 52, Issue D1, 5 January 2024, Pages D835–D849, https://doi.org/10.1093/nar/gkad913

2. Tutorial

This part provides the operation guide for each functional module of AGIDB. Important steps are marked with red boxes and serial numbers.

Quick Start

2.1 Reference Panel

This module provides basic information on 35 animals, including introduction, phylogenetic tree, and sample information table, enabling users to gain insight into samples and design optimal reference panel subsets for genotype imputation.

Step 1: Select animals of interest

Click on the picture of an animal, the introduction, phylogenetic tree and sample information will be presented below the animal picture.

Introduction
Phylogenetic tree
Sample information

The sample information table contains information such as Sample Id, Breed, Group, Sex, Tissue, Locality, Bases, Mapping Rate, Depth, Coverage, Project, etc.

2.2 Genotype Imputation

This module provides a free online imputation service, including four imputation scenarios:

1.Imputation from Chip to High-coverage sequencing.

2.Imputation from Low-coverage to High-coverage sequencing.

3.Imputation from Chip to Chip, to realize map conversion between chips.

4.Imputation of the specified map.

Step 1: Select species of interest
Step 2: Four imputation scenarios

2.2.1 Chip-to-sequence

This module provides the imputation function from chip data to sequencing data using Beagle 5.4 software. In this module, it is supported to upload the target data set in VCF or VCF compressed format, and the imputed or phased file will automatically pop up to the local browser page.

Step 1: Select species of interest
Step 2: Imputation of specified species

Imputation steps: submit the target file to be imputed(Size must < 50 MB); enter the genomic region of interest; select the default reference panel or a subset of the reference panel; specify phasing and imputation requirements; click the "Submit" button. Users can also click the "example" button to use the example file to experience our imputation function.

The imputation result file will be directly downloaded to the local.

The reference panel defaults to "AGIDB", that is, all samples of this species provided by the database. Users can also select subsets of the reference panel by breed or group as desired.

2.2.2 Low-coverage to High-coverage sequence

This module uses GLIMPSE2 software to provide the imputation function from low-depth sequencing to high-depth sequencing. This module supports uploading two forms of target files (VCF/BCF or BAM/CRAM), and the resulting file will automatically pop up to the local browser page.

Scenario 1: The target file is in VCF/BCF format.

Note: Users can click "example.vcf.gz" to download the example VCF file provided for each species.

The imputation result file will be directly downloaded to the local.

Scenario 2: The target file is in BAM/CRAM format.

In this module, BAM/CRAM files of a single sample or multiple samples are supported to be uploaded at the same time. When uploading BAM files of multiple samples, a BAM/CRAM list file must be submitted at the same time. For the format of the list file, see "List example". One file per line. A second column (space separated) can be used to specify the sample name, otherwise the name of the file is used.

The imputation guide is shown in the figure below:

The imputation result file will be directly downloaded to the local.

2.2.3 Chip to Chip

This module provides the conversion function of different versions of chips. By selecting the chip version of interest, a VCF file of the specified map can be obtained. We also provide a Venn diagram between versions for user reference.

Step 1: Upload a VCF file to be converted or click the "example" button to use an example file
Step 2: Choose a chip version
Step 3: Enter a genomic region of interest
Step 4: Enter the effective population size
Step 5: Click the "Submit" button to perform chip version conversion
Step 6: Download the converted file

2.2.4 To Map

This module provides an imputation function for a specified map. By submitting a Map file, the user can obtain the VCF file corresponding to the map.

Step 1: Upload a Map file or click the "example" button to use an example file
Step 2: Upload a VCF file or click the "example" button to use an example file
Step 3: Enter a genomic region of interest
Step 4: Enter the effective population size
Step 5: Click the "Submit" button to get the imputation result of the specified map
Step 6: Download

2.3 Breed-specific Variant

This module provides breed-specific variant search. Users can easily query breed-specific variant based on breeds and genomic regions. Search results are displayed in a comprehensive table.

Step 1: Breed-specific Variant Search
Step 2: Breed-specific Variant
Step 3: Allele Frequency Distribution

2.4 Conserved SNPs

This module provides conserved SNPs search. Users can easily query conserved SNPs based on breeds. Search results are displayed in a comprehensive table.

Step 1: Conserved SNPs Search
Step 2: Conserved SNPs

2.5 Deleterious Variant

This module provides deleterious variant search. Users can easily query deleterious variant based on breeds, software and score. Search results are displayed in a comprehensive table.

Step 1: Deleterious Variant Search
Step 2: Deleterious Variant
Step 3: Deleterious Variant Score

2.6 ENCODE-annotated SNPs

In this module, we provide the SNP information search function of 35 species and selected genomic regions. For each SNP, the allele frequency and annotation are provided, and the ENCODE peak annotation is integrated. Simply select your species and genomic region of interest and detailed SNP data is presented. Enhance your understanding of genetic variant with this powerful and user-friendly module. We also provide genome region files for each species to facilitate user input of reasonable genome regions.

The genomic region files available for download are as follows:

Step 1: Select species of interest
Step 2: Enter a specific genomic region
Step 3: Click the "Search" button

Each column of the table has an ascending or descending function to allow users to filter for more critical SNPs.

The content of any field in the form can be searched through the search box on the right.

The figure below the table shows the Peak and gene information of this region more intuitively. The abscissa represents the SNP position, and the ordinate represents the fold change. When the mouse hovers over the Peak, the "Chromosome", "Peak Start" and "Peak End" information corresponding to the Peak will be displayed. Change the "position" by sliding the abscissa.

2.7 Genome Browser

In this module, we link the genome browser, which enables users to visualize and browse entire genomes with annotated data, including genomes, genes, and variants. It is convenient for users to view numerous features, analyze and study the relationship between various genomic elements.

Step 1: Select species of interest
Step 2: Display the genome-wide visualization results of this species

Select the chromosome of interest in the left track to display the corresponding variant information.

2.8 Haplotype Block

This module provides haplotype block search and LD heatmap visualization. Users can easily query haplotype blocks based on species and genomic regions. Search results are displayed in a comprehensive table.

Step 1: Haplotype Block Search
Step 2: Haplotype and frequency

Click on a BLOCK ID in the second column to display its frequency data. Click the "Close" button to close the popup.

Step 3: LD Heatmap

Click the "LD plot" button to obtain the LD heatmap of the selected species and the specified genomic region.

2.9 Rare Variant

This module provides rare variant search. Users can easily query rare variant based on species, rare variant, AF, and genomic regions. Search results are displayed in a comprehensive table.

Step 1: Rare Variant Search
Step 2: Rare Variant

2.10 PCA

This module provides the principal component analysis function. The user selects the specified species and submits a VCF compressed file. After clicking the submit button, the PCA analysis will be executed. The user will get the PCA results between the submitted data and the reference panel of the species in the database, so as to facilitate the user's subsequent analysis. The PCA plot supports downloading.

Step 1: Select a species for PCA analysis
Step 2: Submit a target VCF compressed file or click the "example" button to use an example file
Step 3: Choose the principal component
Step 4: Click the "Submit" button to perform PCA analysis
Step 5: PCA results

The PCA plot is available for download, and the black dots represent samples uploaded by users.

2.11 Format Conversion

This module provides the mutual conversion function of seven genome data formats. We also provide example files for each data format for download, so that users can understand the format more intuitively.

Download example files

Key steps in genomic data format conversion:

The converted file is automatically downloaded to the local.

2.12 Selective Sweep Analysis

This module provides selective sweep analysis. Users can easily query Tajima's D、Pi and FST based on species, gene id, gene name and genomic regions.Online analysis is also available. Search results are displayed in a comprehensive table.

Step 1: Select Visualization Modules And Online Analysis of Tajima's D、Pi And FST
Step 2: Tajima's D、Pi and FST Search
Tajima's D
Pi
FST
Step 3: Tajima's D、Pi and FST Result
Tajima's D
Pi
FST
Step 4: FST Heatmap
Step 5: Tajima's D、Pi and FST Online Analysis

2.13 ImpRefIC

This module introduces ImpRefIC, an intelligent software designed for genotype imputation in genomics research, for customizing subsets of reference panels to achieve higher imputation accuracy. According to the reference panel subset customized by ImpRefIC, users can select the predicted breed or population as the reference population in imputation scenario 1.

2.14 Download

In the "Download" module, we provide species sample information, genome data, ENCODE peak information and chip version files for download, and mark the size of each file. By selecting the species name on the left side of the module, the downloadable files for that species will be displayed on the right side of the module.

2.15 Submit

In the "Submit" module, support user submit data link to our database.

2.16 Pipeline

LD plot

LDBlockShow -InVCF ref.vcf.gz -Region [chrom]:[start]-[end] -OutPng -SeleVar 2 -OutPut test_LD_plot

Imputation tools, version and usage
Beagle 5.4

java -jar beagle.22Jul22.46e.jar ref=ref.vcf.gz gt=target.vcf.gz chrom=[chrom]:[start]-[end] ne=100 out=target_imp

GLIMPSE2

The target file is in VCF/BCF format.

GLIMPSE2_phase --input-gl target.vcf.gz --reference ref.vcf.gz --input-region [chrom]:[start]-[end] --output-region [chrom]:[start]-[end] --ne 100 --impute-reference-only-variants --output target_imp.bcf

The target file is in BAM/CRAM format.

GLIMPSE2_phase --bam-list target_BAM_list.txt --reference ref.vcf.gz --input-region [chrom]:[start]-[end] --output-region [chrom]:[start]-[end] --ne 100 --output target_imp.bcf --threads 20

2.17 About PGIDB

Pig Genotype Imputation Database

Quick Start

3. FAQs

Q1. How to use AGIDB?

A1: You can quickly learn about the core functions of AGIDB through the Home page, and learn about each functional module of AGIDB in more detail through the Tutorial on the Help page.

Q2. How should I contact you if I found a bug or have a suggestion about the database?

A2: You can contact us by email in "Contact". You are welcome to contact us anytime.

Q3. What is the future plan of AGIDB?

A3: AGIDB will be committed to collecting more species data and increasing the genetic diversity of each species; enriching the imputation scenarios and applications in the database, and breaking through some technical limitations of the imputation platform to achieve more practical and efficient imputation services.

Q4. How to cite AGIDB?

A4: Kaili Zhang, Jiete Liang, Yuhua Fu, Jinyu Chu, Liangliang Fu, Yongfei Wang, Wangjiao Li, You Zhou, Jinhua Li, Xiaoxiao Yin, Haiyan Wang, Xiaolei Liu, Chunyan Mou, Chonglong Wang, Heng Wang, Xinxing Dong, Dawei Yan, Mei Yu, Shuhong Zhao, Xinyun Li, Yunlong Ma, AGIDB: a versatile database for genotype imputation and variant decoding across species, Nucleic Acids Research, Volume 52, Issue D1, 5 January 2024, Pages D835–D849, https://doi.org/10.1093/nar/gkad913

4. Updates

  • 2024/1/5
  • AGIDB was officially online.
  • 2023/10/27
  • AGIDB manuscript was online. Links to the article
  • 2023/8/6
  • Updated "Imputation" and "Search" modules.
  • 2023/7/30
  • Added selective sweep analysis module.
  • 2023/6/4
  • AGIDB and PGIDB were currently in the trial and testing.
  • 2023/5/4
  • AGIDB and PGIDB were released to the Internet environment.
  • 2023/3/9
  • Update Beagle to the latest version, and update the low-depth sequencing imputation software GLIMPSE1 to GLIMPSE2.
  • 2023/1/18
  • The SNP retrieval function is updated, and the SNP annotation is enriched by using the peak information of ENCODE.
  • 2022/12/2
  • The genome browser is bound to realize the visualization of genome data.
  • 2022/10/17
  • Complete the PCA analysis module and the genome data format conversion module.
  • 2022/9/23
  • Development of ImpRefIC, a software that can customize an optimal reference set for genotype imputation to improve imputation accuracy and speed up imputation.
  • 2022/7/1
  • Customized SNP retrieval and imputation module for pig, designed pig imputation platform (PGIDB).
  • 2022/5/20
  • The software used for low-depth sequencing imputation was determined to be GLIMPSE.
  • 2022/4/10
  • Update genotype imputation scenarios to meet individual imputation needs.
  • 2022/3/25
  • Updated WGS data for 35 species.
  • 2022/2/15
  • Complete the imputation module.
  • 2021/12/28
  • Determine the imputation software and imputation evaluation scheme, and design the imputation module.
  • 2021/12/16
  • Integrate the haplotype analysis results into AGIDB in the form of haplotype modules to realize the retrieval and visualization of haplotype blocks.
  • 2021/11/8
  • Complete data cleaning and phasing, and determine the haplotype analysis scheme.
  • 2021/10/20
  • Whole-genome sequencing (WGS) data for 35 species were collated and cleaned.
  • 2021/6/7
  • Complete the framework design of Animal Genotype Imputation Database (AGIDB).

5. Contact

Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University Wuhan, Hubei, 430070, PR China

Email:

The mailing lists are in no particular order, sorted by first and last name.

Yunlong Ma (Yunlong.Ma@mail.hzau.edu.cn)

Kaili Zhang (kelly1153793935@163.com)