Outline

33
Identification and evaluation of causative genetic variants corresponding to a certain phenotype Xidan Li

description

Identification and evaluation of causative genetic variants corresponding to a certain phenotype Xidan Li. Outline. SIT- identify and evaluate the causative genetic variants within a QTL/GWAS defined region . - PowerPoint PPT Presentation

Transcript of Outline

Page 1: Outline

Identification and evaluation of causative genetic variants corresponding to a certain phenotype

Xidan Li

Page 2: Outline

Outline

• SIT - identify and evaluate the causative genetic variants within a QTL/GWAS defined region.

• PASE - evaluate the effect of amino acid substitution to the hosting protein function

• DIPT - to identify causative genes underlying an expression phenotype

• Parallelizing computing

Page 3: Outline

Genetic variances identification

Page 4: Outline

Possible solutions?

Page 5: Outline
Page 6: Outline

Working process of SITVCF file

SNPs analysis in non-coding regions SNPs analysis in coding regions

Splicing sites

CpG island

UTR region

Non-synonymous SNPs

PASE

Candidate genes with candidate SNPs

List of ranking Non-synonymous SNPs

Ensembl

Page 7: Outline

Sample results

Page 8: Outline

Non-synonymous SNPs are ranked

Page 9: Outline

The life is easy!

Page 10: Outline

Amino acid substitutions effects prediction

Page 11: Outline

Effect of amino acid substitutions

Page 12: Outline
Page 13: Outline

Selected seven physico-chemical properties of Amino acids

Seven Physiochemical properties of Amino acid

Transfer free energy from octanol to water

Normalized van der Waals volume

Isoelectric point

Polarity

Normalized frequency of alpha-helix

Free energy of solution in water

Normalized frequency of turn

Page 14: Outline

Formula for conservation calculation

1-.95N

Probability of 20 different AAs in a position for N random equal frequent sequences.

nobserved /Ntotal

(1-.95N)*(nobserved /Ntotal)Blast search clustalw

Page 15: Outline

Protein kinase AMP-activated gamma 3 (PRKAG3) gene

• (R200Q) in AMPK3 in purebred Hampshire pigs – RN• (V199I) in AMPK3 Co-participate in the effective

process with R200Q • RN that causes excess glycogen content in pig skeletal

muscle

• Milan D, et. al. (2000). A mutation in PRKAG3 associated with excess glycogen content in pig skeletal muscle. Science 288 (5469): 1248–51.

• Ciobanu,D, et. al. (2001). Evidence for New Alleles in the Protein Kinase Adenosine Monophosphate-Activated 3-Subunit Gene Associated With Low Glycogen Content in Pig Skeletal Muscle and Improved Meat Quality. Genetics, 159, 1151-1162.

Page 16: Outline

Genes ID Coordinate REF ALT Conservations score (MSAC)

PASE score

PASEC (combined)

score

PRKAG_3 200 R Q 0.93 0.54 0.50

PRKAG_3 199 V I 0.85 0.14 0.12

(R200Q) Cause major increase in the muscle glycogen content(V199I) Contribute with smaller effect

Ciobanu,D, et. al. (2001). Evidence for New Alleles in the Protein Kinase Adenosine Monophosphate-Activated 3-Subunit Gene Associated With Low Glycogen Content in Pig Skeletal Muscle and Improved Meat Quality. Genetics, 159, 1151-1162.

Page 17: Outline

Testing with SIFT and POLYPHEN

Conservation scores

(MSAC)

PASE scores(Physico-chemical

properties changings)

PASEC score(combined)

SIFTTolerated (1987) 0.47 0.39 0.18

Deleterious (1351) 0.60 0.51 0.30

PolyPhen

Benign (1637) 0.44 0.37 0.16

Possibly damaging (539)

0.56 0.43 0.24

Probably damaging (1162)

0.63 0.53 0.33

Page 18: Outline

Features• Other tool

SIFT, PolyPhen

MAINLY rely on calculating sequence conservation scores (finding homologous sequences).

• PASE

not only uses the physico-chemical property changing score, but also combine with sequence conservation score

Potentially being able to analyze the evolutionary-distant protein sequence

Page 19: Outline
Page 20: Outline

From expression phenotype to association genotype

Page 21: Outline
Page 22: Outline

Sample result of DIPT

Page 23: Outline
Page 24: Outline

www.computationalgenetics.se/DIPT/

Page 25: Outline

Parallelizing computing

Page 26: Outline

Principle of parallelizing computing

Page 27: Outline

Multiple threads – efficient work

Page 28: Outline

Single thread - tough job!

Page 29: Outline

• Usually in the loop

• Data must be independent

Page 30: Outline

GPU vs. CPU

Page 31: Outline
Page 32: Outline

Cuda Vs. C#include <cuda.h>#include <stdio.h>

// Prototypes__global__ void helloWorld(char*);

// Host functionint main(int argc, char** argv){ int i;

// desired output char str[] = "Hello World!";

// mangle contents of output ; the null character is left intact for simplicity for(i = 0; i < 12; i++) str[i] -= i;

// allocate memory on the device char *d_str; size_t size = sizeof(str); cudaMalloc((void**)&d_str, size);

// copy the string to the device cudaMemcpy(d_str, str, size, cudaMemcpyHostToDevice);

// set the grid and block sizes dim3 dimGrid(2); // one block per word dim3 dimBlock(6); // one thread per character // invoke the kernel helloWorld<<< dimGrid, dimBlock >>>(d_str);

// retrieve the results from the device cudaMemcpy(str, d_str, size, cudaMemcpyDeviceToHost);

// free up the allocated memory on the device cudaFree(d_str); // everyone's favorite part printf("%s\n", str); return 0;}

// Device kernel__global__ void helloWorld(char* str){ // determine where in the thread grid we are int idx = blockIdx.x * blockDim.x + threadIdx.x;

// unmangle output str[idx] += idx;}

#include <stdio.h>

int main(void){ printf("Hello World\n"); return 0;}

Page 33: Outline

Thank You!