nit_lab

8/7/2019 nit_lab

1/13

Experiment No: 1

AIM: To browse Human genome data, OMIM, SNP databases to understand genetic andmetabolic disorders

REQUIREMENT: Computer system with (legal software) equipped with InternetConnection preferably fast Broadband.

WEB RESOURCES USED:

http://www.ncbi.nlm.nih.gov/

THEORY AND PRINCIPLE:

Genomes:A genome is all of a living thing's genetic material. It is the entire set of hereditaryinstructions for building, running, and maintaining an organism, and passing life on to thenext generation. In short, it is the complete set of chromosomes with all the genes (for diploid

organisms, often it is given as a haploid genome) for that species.

Genomic Resources: The organisms genomic resources are stored as Genome databasesand these are a collection of complete and incomplete large-scale sequencing, assembly,annotation and mapping projects for cellular organisms. The genome database provides viewsfor a variety of genomes, complete chromosomes, sequence maps and integrated genetic andphysical maps, organelles, plasmids as well as genome assemblies.

Human Genome: It is the complete information of all the 22 pairs of autosomes and pairofsex chromosomes namely X and Y. It includes the information about the location and thesequence of genes along the length of each chromosome and the distance between twoadjacent genes as well as the entire sequence of nucleotides for each gene (with its allelicforms) in the entire chromosome complement of 46 chromosomes for both the alleles.

OMIM Databases: OMIM is a comprehensive, authoritative, and timely compendium ofhuman genes and genetic phenotypes. The full-text, referenced overviews in OMIM containinformation on all known Mendelian disorders and over 12,000 genes. OMIM focuses on therelationship between phenotype and genotype. It is updated daily, and the entries containcopious links to other genetics resources.

SNP Databases: Single Nucleotide Polymorphism is one form of genomic variation in

population that may occur anywhere in the genome. SNP are the point mutations, i.e., singlebase alterations present in alleles. The international SNP Map working group is a consortium,where data on minor and major alleles is stored SNP is used as one of the genetic markers.

Genetic Disorders: These are the disorders of gene structure and lead to its malfunctioningthereby affecting the phenotype concerned with that gene (phenotypic expression of thatgene). The study of genetic disorders is done by studying inheritance pattern and genestructure and function.

Metabolic Disorders: The metabolic disorders are the disorders of metabolism and may bedue some infection and other reasons. By using drug we can treat these disorders and cure

patient.

8/7/2019 nit_lab

2/13

PROCEDURE:

1. Start the computer and establish Internet connection.2. Use any search engine like Yahoo / Google or otherwise directly open NCBI web page,3. Double click on genome-specific resources, and browse for specific data.

8/7/2019 nit_lab

3/13

Experiment No: 2

AIM: To predict gene and promoter sequence.

REQUIREMENT: Computer system with (legal software) equipped with Internet

Connection preferably fast Broadband.

WEB RESOURCES USED:

http://genes.mit.edu/GENSCAN.htmlhttp://www-bimas.cit.nih.gov/molbio/proscan/


How are encoded proteins recognized in uncharacterized eukaryotic, genomic DNA?Translating from all translational start codons to all nonsense chain terminating, stop

codons in every frame produces a list of ORFs (Open Reading Frames), but which of them, ifany, actually code for proteins? And this only works in organisms without exons and introns,or in processed mRNAs. Three general solutions to the gene findingProblem can be imagined: All genes have certain regulatory signals positioned in or about them, All genes by definition contain specific code patterns, and Many genes have already been sequenced and recognized in other organisms so we

can infer function and location by homology if our new sequence is similar enough toan existing sequence.

All of these principles can be used to help locate the position of genes in DNA and are oftenknown as searching by signal, searching by content, and homology inferencerespectively. Homology inference can be especially helpful, but what happens in caseswithout any similar proteins in the databases, and even if homologues can be found,discovering exon-intron borders and UTRs (5 and 3 Untranslated Regions) can be verydifficult. If you have cDNA available, then you can align it to the genomic sequence toascertain where the genes lay, but even this can be quite difficult, and cDNA libraries are notalways available. No one method is absolutely reliable, but one seldom has the luxury ofknowing the complete amino acid sequence to the protein of interest and simply translatingall of the DNA until the correct pieces fall out. This is the only method that would be 100%positive. Since we are usually forced to discover just where these pieces are, especially with

genomic DNA, computerized analysis becomes essential.GENE PREDICTION

Gene finding typically refers to the area of computational biology that is concerned withalgorithmically identifying stretches of sequence, usually genomic DNA, that are biologicallyfunctional. This especially includes protein-coding genes, but may also include otherfunctional elements such as RNA genes and regulatory regions. Gene finding is one of thefirst and most important steps in understanding the genome of a species once it has beensequenced. In its earliest days, "gene finding" was based on painstaking experimentation onliving cells and organisms. Statistical analysis of the rates of homologous recombination of

several different genes could determine their order on a certain chromosome, and informationfrom many such experiments could be combined to create a genetic map specifying the rough

8/7/2019 nit_lab

4/13

location of known genes relative to each other. Today, with comprehensive genome sequenceand powerful computational resources at the disposal of the research community, genefinding has been redefined as a largely computational problem. Determining that a sequenceis functional should be distinguished from determining the function of the gene or its product.The latter still demands in vivo experimentation through gene knockout and other assays,

although frontiers of bioinformatics research are making it increasingly possible to predict thefunction of a gene based on its sequence alone.

PROMOTER PREDICTION

In genetics, a promoter is a region of DNA that facilitates the transcription of a particulargene. Promoters are typically located near the genes they regulate, on the same strand andupstream (towards the 5' region of the sense strand). In order for the transcription to takeplace, the enzyme that synthesizes RNA, known as RNA polymerase, must attach to theDNA near a gene. Promoters contain specific DNA sequences and response elements whichprovide a secure initial binding site for RNA polymerase and for proteins called transcription

factors that recruit RNA polymerase. These transcription factors have specific activator orrepressor sequences of corresponding nucleotides that attach to specific promoters andregulate gene expressions.

PROCEDURE:

1. Start the computer and establish Internet connection.2. Use any search engine like Yahoo / Goggle or otherwise directly use the given website touse these tools3. Input your sequence data in proper format and note down the results obtained.

Experiment No: 3

AIM: Multiple Sequence alignment and phylogenetic analysis

REQUIREMENT: Computer system with (legal software) equipped with InternetConnection preferably fast Broadband.

WEB RESOURCES USED:

http://blast.ncbi.nlm.nih.gov/Blast.cgihttp://www.ebi.ac.uk/Tools/msa/clustalw2/


Given the nucleotide or amino acid sequence of a biological molecule, what can we knowabout that molecule? We can find biologically relevant information in sequences by searchingfor particular patterns that may reflect some function of the molecule. These can be

catalogued motifs and domains, secondary structure predictions, physical attributes such ashydrophobicity, or even the content of DNA itself as in some of the gene finding techniques.

8/7/2019 nit_lab

5/13

But, what about comparisons with other sequences? Can we learn about one molecule bycomparing it to another? Yes, naturally we can; inference through homology is a fundamentalprinciple to all the biological sciences. We can learn a tremendous amount by comparing oursequence against others.

IDENTIFICATION OF HOMOLOGOUS SEQUENCES

In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm forcomparing primary biological sequence information, such as the amino-acid sequences ofdifferent proteins or the nucleotides of DNA sequences. A BLAST search enables aresearcher to compare a query sequence with a library or database of sequences, and identifylibrary sequences that resemble the query sequence above a certain threshold. Different typesof BLASTs are available according to the query sequences. For example, following thediscovery of a previously unknown gene in the mouse, a scientist will typically perform aBLAST search of the human genome to see if humans carry a similar gene; BLAST will

identify sequences in the human genome that resemble the mouse gene based on similarity ofsequence. The BLAST program was designed by Eugene Myers, Stephen Altschul, WarrenGish, David J. Lipman and Webb Miller at the NIH and was published in J. Mol. Biol. in1990.

The math can be generalized thus: for any two sequences of length m and n, local, bestalignments are identified as HSPs. HSPs are stretches of sequence pairs that cannot be furtherimproved by extension or trimming, as described above. For ungapped alignments, thenumber of expected HSPs with a score of at least S is given by the formula:E = Kmnes.This is called anE-value for the score S. In a database search n is the size of the database inresidues, soN=mn is the search space size. Kand are be supplied by statistical theory, and,can be calculated by comparison to precomputed, simulated distributions. These twoparameters define the statistical significance of anE-value.

8/7/2019 nit_lab

6/13

PHYLOGENETIC ANALYSIS

Every living organism contains DNA, RNA, and proteins. Closely related organismsgenerally have a high degree of agreement in the molecular structure of these substances,

while the molecules of organisms distantly related usually show a pattern of dissimilarity.Conserved sequences, such as mitochondrial DNA, are expected to accumulate mutationsover time, and assuming a constant rate of mutation provides a molecular clock for datingdivergence. Molecular phylogeny uses such data to build a "relationship tree" that shows theprobable evolution of various organisms. Not until recent decades, however, has it beenpossible to isolate and identify these molecular structures. The most common approach is thecomparison of homologous sequences for genes using sequence alignment techniques toidentify similarity. Another application of molecular phylogeny is in DNA barcoding, wherethe species of an individual organism is identified using small sections of mitochondrialDNA. Another application of the techniques that make this possible can be seen in the verylimited field of human genetics, such as the ever more popular use of genetic testing to

determine a child's paternity, as well as the emergence of a new branch of criminal forensicsfocused on evidence known as genetic fingerprinting.

ClustalW (Thompson, Higgins & Gibson, 1994) is one of the standard programsimplementing one variant of the progressive method in wide use today for multiple sequencealignment. The W denotes a specific version that has been developed from the originalClustal program.

The basic steps of the algorithm implemented in ClustalW are:

1. Compute the pairwise alignments for all against all sequences. The similarities arestored in a matrix (sequences versus sequences).

2. Convert the sequence similarity matrix values to distance measures, reflectingevolutionary distance between each pair of sequences.

3. Construct a tree (the so-called guide tree) for the order in which pairs of sequencesare to be aligned and combined with previous alignments. This is done using aneighbour-joining clustering algorithm. In the case of ClustalW, a method by Saitou& Nei is used.

4. Progressively align the sequences/alignments together into each branch point of theguide tree, starting with the least distant pairs of sequences.

8/7/2019 nit_lab

7/13

PROCEDURE:

1. Start the computer and establish Internet connection.2. Use any search engine like Yahoo / Goggle or otherwise directly open the home page byusing the given website.3. Retrieve the sequence information of a protein of interest4. Go for BLAST analysis to find out the good homologous sequence of the protein.5. Collect the homologous sequence and go for multiple sequence alignment by using

ClustalW server and find out the evolutionary status of the protein from phylogenetic tree.

Experiment N0:4

AIM: To obtain a three dimensional model of given protein sequences by Homologymodeling method.

REQUIREMENTS:Computer system with (legal software) equipped with Internet Connection preferably fast

Broadband.

WEB RESOURCES:

www.expasy.chhttp://blast.ncbi.nlm.nih.gov/Blast.cgihttp://swissmodel.expasy.org/workspace/index.php?func=modelling_simple1http://nihserver.mbi.ucla.edu/SAVES/

PRINCIPLE:

8/7/2019 nit_lab

8/13

Homology modeling, also known as comparative modeling of protein refers to constructingan atomic-resolution model of the "target" protein from its amino acid sequence and anexperimental three-dimensional structure of a related homologous protein (the "template").Homology modelling relies on the identification of one or more known protein structureslikely to resemble the structure of the query sequence, and on the production of an alignment

that maps residues in the query sequence to residues in the template sequence. It has beenshown that protein structures are more conserved than protein sequences amongsthomologues, but sequences falling below a 20% sequence identity can have very differentstructure. Evolutionarily related proteins have similar sequences and naturally occurringhomologous proteins have similar protein structure. It has been shown that three-dimensionalprotein structure is evolutionarily more conserved than expected due to sequenceconservation. The sequence alignment and template structure are then used to produce astructural model of the target. Because protein structures are more conserved than DNAsequences, detectable levels of sequence similarity usually imply significant structuralsimilarity.

The quality of the homology model is dependent on the quality of the sequence alignmentand template structure. The approach can be complicated by the presence of alignment gaps(commonly called indels) that indicate a structural region present in the target but not in thetemplate, and by structure gaps in the template that arise from poor resolution in theexperimental procedure (usually X-ray crystallography) used to solve the structure. Modelquality declines with decreasing sequence identity; a typical model has ~1-2 root meansquare deviation between the matched C atoms at 70% sequence identity but only 2-4 agreement at 25% sequence identity. However, the errors are significantly higher in the loopregions, where the amino acid sequences of the target and template proteins may becompletely different.

The method comprises of the following steps:

1. Template recognition and initial alignment.

2. Select the template sequences of known structure.

3. Align the template and target sequence.

4. Build the model.

5. Model optimization.

6. Model validation.

SWISS-MODEL is a fully automated protein structure homology-modeling server, accessiblevia the ExPASy web server, or from the program DeepView (Swiss Pdb-Viewer). Thepurpose of this server is to make Protein Modelling accessible to all biochemists andmolecular biologists Worldwide. Homology modelling combines sequence analysis andmolecular modelling to predict three dimensional structures. You will choose a remotehomologue of your Project protein that has not yet had its structure solved, and use theSwissModel WWW resource to model the molecule. The theoretical structure will then bevisualized with the SwissPDBViewer and RasMol to gain insight into the way in which its

structure relates to its function. Color coding different physical attributes such as residuecharge, hydrophobicity, and secondary structure elements; different representations, such as

8/7/2019 nit_lab

9/13

alpha-carbon traces, cartoon graphics, and space-filling models; and super-positioning ofthe model with an actual structure all assist in the interpretation.

PROCEDURE:

1. Start the computer and establish Internet connection.2. Retrieve the sequence information from EXPASY/NCBI

3. Open BLAST and select Protein BLAST option to find out the template structure of yourprotein of interest.

4. Along with the suitable template structure and the sequences go for Swiss model server(automated mode) for homology modeling.

5. After getting the model visualize in Rasmol and then go for quality assessment of themodel by PROCHECK and Verify3D tool.

6. List out the result

Experiment N0: 5

AIM: To retrieve of drug molecule information from data base and calculating drug likeproperty of the molecule.

REQUIREMENTS: Computer system with (legal software) equipped with InternetConnection preferably fast Broadband.

WEB RESOURCES:

http://pubchem.ncbi.nlm.nih.gov/http://www.molinspiration.com/cgi-bin/properties

PRINCIPLE:

Pubchem

PubChem, released in 2004, provides information on the biological activities of smallmolecules. It is a component of NIH's Molecular Libraries Roadmap Initiative. PubChem isorganized as three linked databases within the NCBI's Entrez information retrieval system.These are PubChem Substance, PubChem Compound, and PubChem BioAssay. PubChemalso provides a fast chemical structure similarity search tool.

Molinspiration server

8/7/2019 nit_lab

10/13

The server basically calculates the following property of drug molecules that are drawn in inits interface.

LogP (octanol/water partition coefficient)

LogP is calculated by the methodology developed by Molinspiration as a sum of fragment-based contributions and correction factors. Method is very robust and is able to processpractically all organic, and most organometallic molecules.

Molecular Polar Surface Area

It is calculated based on the methodology as a sum of fragment contributions. O- and N-centered polar fragments are considered. PSA has been shown to be a very good descriptorcharacterizing drug absorption, including intestinal absorption, bioavailability, Caco-2permeability and blood-brain barrier penetration.

Molecular Volume

This is a method for calculation of molecule volume developed at Molinspiration is based ongroup contributions. These have been obtained by fitting sum of fragment contributions to"real" 3D volume for a training set of about twelve thousand, mostly drug-like molecules. 3Dmolecular geometries for a training set were fully optimized by the semi empirical AM1method.

Number of Rotatable Bonds - nrotb

This simple topological parameter is a measure of molecular flexibility. It has been shown tobe a very good descriptor of oral bioavailability of drugs. Rotatable bond is defined as anysingle non-ring bond, bounded to nonterminal heavy (i.e., non-hydrogen) atom. Amide C-Nbonds are not considered because of their high rotational energy barrier.

PROCEDURE:

"Rule of 5" Properties of a drug molecule is set of simple molecular

descriptors used by Lipinski in formulating his "Rule of 5". The rule states,

those most drug-like molecules have

logP

8/7/2019 nit_lab

11/13

1. Start the computer and establish Internet connection.2. Open Pubchem data base from NCBI resource.3. Retrieve the structural feature of drug molecule of interest by using specific key words.4. Open Molinspiration server and draw the structure of the drug molecule by using the

functional groups in molecular editor.6. By using the calculate properties button, various property is computed.

Experiment N0: 6

AIM: To analyze ligand-receptor binding affinity by molecular docking analysis.

REQUIREMENTS: Computer system with (legal software) equipped with InternetConnection preferably fast Broadband.

WEB RESOURCES:

http://hex.loria.fr/

PRINCIPLE:

In the field of molecular modeling, docking is a method which predicts the preferredorientation of one molecule to a second when bound to each other to form a stable complex.

Knowledge of the preferred orientation in turn may be used to predict the strength ofassociation or binding affinity between two molecules using for example functions. Theassociations between biologically relevant molecules such as proteins, nucleic acids,carbohydrates, and lipids play a central role in signal transduction. Furthermore, the relativeorientation of the two interacting partners may affect the type of signal produced (e.g.,agonism vs antagonism). Therefore docking is useful for predicting both the strength and typeof signal produced. Docking is frequently used to predict the binding orientation of smallmolecule drug candidates to their protein targets in order to in turn predict the affinity andactivity of the small molecule. Hence docking plays an important role in the rational designof drugs. Given the biological and pharmaceutical significance of molecular docking,considerable efforts have been directed towards improving the methods used to predict

docking.

Hex is an interactive molecular graphics program for calculating and displaying feasibledocking modes of pairs of protein and DNA molecules.Hex can also calculate protein-liganddocking, assuming the ligand is rigid, and it can superpose pairs of molecules using onlyknowledge of their 3D shapes.Hex has been available for about 12 years now, it is still theonly docking and superposition program to use spherical polar Fourier(SPF) correlations toaccelerate the calculations, and its still one of the few docking programs which has built-ingraphics to view the results. Also, as far as I know, it is the first protein docking program tobe able to use modern graphics processor units (GPUs) to accelerate the calculations.

8/7/2019 nit_lab

12/13

PROCEDURE:

1. Start the computer and establish Internet connection.2. Suitable ligand and receptor molecule is to be selected in PDB format.3. By using Hex tool the ligand and receptor molecule is to be loaded.4. Docking is started by choosing the appropriate button and the default parameter is to beset.5. Usually it takes few minutes for completion of docking depending upon the systemconfiguration6. After finishing of the docking it provides the binding energy to be noted down and boththe ligand and receptor complex is to be saved as PDB format and the ligand binding site canbe observed by visualization tool.

8/7/2019 nit_lab

13/13

nit_lab

Documents

Transcript of nit_lab