BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on...
-
Upload
brenda-ball -
Category
Documents
-
view
223 -
download
0
Transcript of BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on...
![Page 1: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/1.jpg)
BioJava Core API
![Page 2: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/2.jpg)
Java for Bioinformatics?
Cross platform means develop on one platform deploy on any.
Widely accepted industry standard. Lots of support libraries for modern
technologies (XML, WebServices, JDBC).
Scales well from small to industrial strength enterprise sized programs.
![Page 3: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/3.jpg)
Java for Bioinformatics?
Object Oriented. Rapid development due to
Very strict types Simple clear syntax Exception handling and recovery Cross platform Extensive class library Code reuse
![Page 4: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/4.jpg)
What is BioJava?
A collection of Java objects that represent and manipulate biological data
Not a program, rather a programming library
Open source (LGPL) open for all development, even commercial. Not ‘sticky’ or ‘viral’.
![Page 5: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/5.jpg)
What is BioJava?
Collection of objects to assist bioinformatics research
Started at EBI/Sanger in 1998 by Matthew Pocock and Thomas Down
25+ developers have contributed (5 core)
![Page 6: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/6.jpg)
What is BioJava?
BioJava has acquired 1100+ classes, 130,000+ lines of code.
Uses CVS version control, JUnit testing and ANT builds.
It now has a fairly stable API. 76 packages!
![Page 7: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/7.jpg)
Where is BioJava
Home Page www.biojava.org
BioJava in Anger http://www.biojava.org/docs/bj_in_anger/
Mailing Lists [email protected] [email protected]
Nightly Builds http://www.derkholm.net/autobuild/
![Page 8: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/8.jpg)
Obtaining BioJava
Download http://www.biojava.org/download/ Get binaries, source and docs
biojava-live (requires cvs) cvs -d
:pserver:[email protected]:/home/repository/biojava login Password is ‘cvs’ cvs -d
:pserver:[email protected]:/home/repository/biojava checkout biojava-live
cvs update -Pd
![Page 9: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/9.jpg)
Compiling biojava-live
Requires the ANT build tool http://jakarta.apache.org/ant/
The ANT tool will use build.xml to Arrange source code Compile source Make jar file Make Java docs Build demos Build and Run tests Change to biojava-live; type ant
Unit testing requires JUnit http://junit.sourceforge.net/
![Page 10: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/10.jpg)
Setting up BioJava
Put the following JAR files on your class path:
biojava.jar bytecode-0.92.jar commons-cli.jar commons-collections-2.1.jar commons-dbcp-1.1.jar commons-pool-1.1.jar
![Page 11: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/11.jpg)
Object Orient Patterns and BioJava Design
![Page 12: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/12.jpg)
BioJava Design
Uses some reasonably “advanced” concepts Design by Interface Protected or Private constructors Factory classes and Methods Flyweight/ Singleton objects
![Page 13: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/13.jpg)
Interfaces Hide Implementation
In BioJava there are several implementations of the Distribution interface.
Any can be legally returned by a method that returns a Distribution (the returning method may even return different ones depending on the situation).
Any can be legally used as an argument to a method that requires a Distribution.
All are guaranteed to contain a minimal set of common methods.
![Page 14: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/14.jpg)
Flyweight and Singleton Objects
A Singleton is a class with only one instance and only one access point.
A Singleton will need a Private constructor and may be static (e.g. AlphabetManager).
A Flyweight object uses sharing to support large numbers of fine-grained object efficiently.
For example in BioJava there is only ever one instance of the DNA Symbol “A”. A sequence of A’s is really just a list of pointers to that one object.
![Page 15: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/15.jpg)
Factory and Static methods
Sometimes it is useful to prevent a user from directly constructing an object via a constructor. If the construction is complex. If the choice of the optimal implementation is
best left to the API developer. If important resources are best protected from
end users e.g. Singletons/ Flyweights. Rather than instantiating the object via its
constructor a static method or Factory object is used
![Page 16: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/16.jpg)
Examples
Static method: FiniteAlphabet dna = DNATools.getDNA();
Static field: DistributionFactory df = DistributionFactory.DEFAULT;
Factory method: Distribution d = df.createDistribution(dna);
![Page 17: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/17.jpg)
Two Levels of BioJava
Macro type programming Tools classes (SeqIOTools,
DistributionTools etc). Static methods for common tasks.
Full programming Lots of customizations and ‘plug and
play’ possible. More exposure to the sharp edges of the
API. Less documentation.
![Page 18: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/18.jpg)
Alphabets, Symbols and Sequences
![Page 19: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/19.jpg)
Symbols
In BioJava the DNA residue “A” is an object.
In Bioperl “A” would be a String. The “A” object is part of the sequence
not the sequence. “A” from DNA is not equal to “A” from
RNA or “A” from Protein.
![Page 20: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/20.jpg)
Why not Strings?
DNA A != RNA A != Protein A For Strings “A”.equals(“A”); DNA Alphabet also contains
K,Y,W,S,R,M,B,D,G,V,N
![Page 21: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/21.jpg)
Why not Strings?
Object Y contains C and T, The String “Y” doesn’t contain anything
Translation HashMaps with Strings are flawed. Biojava GGN translates to GLY String GGN maps to null
A fully redundant String to String HashMap translation table requires 4096 keys!
![Page 22: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/22.jpg)
Symbols are Canonical
DNATools.a() == DNATools.a(); There is only one instance of ‘a’
DNATools.a().equals(DNATools.a()); ProteinTools.a() != DNATools.a(); Even on Remote JVM’s!
During serialization Alphabet indexing is transient and ‘reconnected’ via readResolve() methods.
![Page 23: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/23.jpg)
Alphabets
A set of Symbols Alphabets can be infinite
DoubleAlphabet, IntegerAlphabet Some Alphabets have a Finite number
of Symbols DNA, RNA etc
Alphabet and FiniteAlphabet interfaces
![Page 24: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/24.jpg)
org.biojava.bio.Alphabet
boolean contains(Symbol s) Returns whether or not this Alphabet contains the symbol.
List getAlphabets() Return an ordered List of the alphabets which make up a compound alphabet.
Symbol getAmbiguity(java.util.Set syms) Get a symbol that represents the set of symbols in syms.
Symbol getGapSymbol() Get the 'gap' ambiguity symbol that is most appropriate for this alphabet
String getName() Get the name of the alphabet.
Symbol getSymbol(java.util.List rl) Get a symbol from the Alphabet which corresponds to the specified ordered list of symbols.
SymbolTokenization getTokenization(java.lang.String name) Get a SymbolTokenization by name.
void validate(Symbol s) Throws a precanned IllegalSymbolException if the symbol is not contained within this Alphabet.
![Page 25: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/25.jpg)
org.biojava.bio.FiniteAlphabet
In addition to the previous methods
void addSymbol(Symbol s) Adds a symbol to this Alphabet
Iterator iterator() Retrieve an Iterator over the Symbols in this Alphabet.
void removeSymbol(Symbol s) Remove a symbol from this alphabet.
int size() The number of symbols in the alphabet.
![Page 26: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/26.jpg)
The Default Alphabets
DNA (a,c,g,t) RNA (a,c,g,u) PROTEIN (all amino acids including ‘Sel’) PROTEIN-TERM (all PROTEIN plus “*”) STRUCTURE (PDB structure symbols) Alphabet of all integers (Infinite Alphabet)
Can generate SubIntegerAlphabets Alphabet of all doubles (Infinite Alphabet)
![Page 27: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/27.jpg)
Getting the common Alphabets
import org.biojava.bio.symbol.*; import java.util.*; import org.biojava.bio.seq.*; public class AlphabetExample { public static void main(String[] args) { Alphabet dna, rna, prot; //get the DNA alphabet by name dna = AlphabetManager.alphabetForName("DNA"); //get the RNA alphabet by name rna = AlphabetManager.alphabetForName("RNA"); //get the Protein alphabet by name prot = AlphabetManager.alphabetForName("PROTEIN"); //get the protein alphabet that includes the * termination Symbol prot = AlphabetManager.alphabetForName("PROTEIN-TERM"); //get those same Alphabets from the Tools classes dna = DNATools.getDNA(); rna = RNATools.getRNA(); prot = ProteinTools.getAlphabet(); //or the one with the * symbol prot = ProteinTools.getTAlphabet(); } }
![Page 28: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/28.jpg)
SymbolLists are made of Symbols
org.biojava.bio.symbol.SymbolList A sequence of Symbols from the same
Alphabet. Uses biological coordinates from 1 to
length cf String from 0 to length-1
![Page 29: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/29.jpg)
Doesn’t this waste memory?
A SymbolList is not really a List of Symbol Objects.
Rather a List of Object references. Still a bit heavier than a char[] but not
serious.
A CG
T
AACGTGGGTTCCAACT
![Page 30: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/30.jpg)
The Bigger Picture
A CG
T
AACGTGGGTTCCAACT
AlphabetManager
“DNA”
“Protein”
![Page 31: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/31.jpg)
The SymbolList interface
void edit(Edit edit) Apply an edit to the SymbolList as specified by the edit object.
Alphabet getAlphabet() The alphabet that this SymbolList is over.
Iterator iterator() An Iterator over all Symbols in this SymbolList.
int length() The number of symbols in this SymbolList.
String seqString() Stringify this symbol list.
SymbolList subList(int start, int end) Return a new SymbolList for the symbols start to end inclusive.
String subStr(int start, int end) Return a region of this symbol list as a String.
Symbol symbolAt(int index) Return the symbol at index, counting from 1.
List toList() Returns a List of symbols.
![Page 32: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/32.jpg)
String to SymbolList
import org.biojava.bio.seq.*import org.biojava.bio.symbol.*;
public class StringToSymbolList { public static void main(String[] args) {
try { //create a DNA SymbolList from a String SymbolList dna = DNATools.createDNA("atcggtcggctta"); //create a RNA SymbolList from a String SymbolList rna = RNATools.createRNA("auugccuacauaggc"); //create a Protein SymbolList from a String SymbolList aa = ProteinTools.createProtein("AGFAVENDSA");}catch (IllegalSymbolException ex) { //this will happen if you use a character in one of your strings that is //not an accepted IUB Character for that Symbol. ex.printStackTrace();}
}}
![Page 33: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/33.jpg)
SymbolList to String
import org.biojava.bio.symbol.*;
public class SymbolListToString {
public static void main(String[] args) {SymbolList sl = null;
//code here to instantiate sl
//convert sl into a String String s = sl.seqString(); }}
![Page 34: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/34.jpg)
The Sequence Interface
A Sequence is a SymbolList with more information.
In addition to Annotatable and SymbolList:String getName()
The name of this sequence.
String getURN() A Uniform Resource Identifier (URI) which identifies the sequence represented by this object.
Also implements FeatureHolder which allows addition of Feature Objects.
![Page 35: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/35.jpg)
Quickly generate a Sequence
import org.biojava.bio.seq.*; import org.biojava.bio.symbol.*; public class StringToSequence { public static void main(String[] args) { try { //create a DNA sequence with the name dna_1 Sequence dna = DNATools.createDNASequence("atgctg", "dna_1"); //create an RNA sequence with the name rna_1 Sequence rna = RNATools.createRNASequence("augcug", "rna_1"); //create a Protein sequence with the name prot_1 Sequence prot = ProteinTools.createProteinSequence("AFHS", "prot_1"); } catch (IllegalSymbolException ex) { //an exception is thrown if you use a non IUB symbol ex.printStackTrace(); } } }
![Page 36: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/36.jpg)
More Complex Symbols and Alphabets
![Page 37: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/37.jpg)
Ambiguity Symbols
Ambiguous or Fuzzy data is a fact of life, especially with sequencing.
DNA traces can contain symbols such as n, r, w, v, h, k, y, n etc.
In BioJava DNA symbols a, c, g, t are AtomicSymbols.
Ambiguous symbols like y are BasisSymbols.
![Page 38: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/38.jpg)
BasisSymbols
A BasisSymbol may be represented as a list of one or more Symbols.
BasisSymbol extends Symbol. Ambiguity Symbols are always
BasisSymbols getSymbols() The list of symbols that
this symbol is composed from.
![Page 39: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/39.jpg)
AtomicSymbols
AtomicSymbols are not ambiguous. They cannot be further divided into
Symbols that are valid members of the parent Alphabet.
In the case of compound Alphabets they can be divided into valid Symbols from component Alphabets.
![Page 40: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/40.jpg)
AtomicSymbols
The AtomicSymbol interface extends BasisSymbol but adds no new methods only behaviour contracts.
AtomicSymbol instances guarantee that getMatches() returns an Alphabet containing just that Symbol and each element of the List returned by getSymbols() is also atomic.
![Page 41: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/41.jpg)
Atomic and Basis
A T
AATW
W
AlphabetManager“DNA”
AtomicSymbols
BasisSymbol
![Page 42: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/42.jpg)
Translating Ambiguity
BioJava handles translation of ambiguity very smoothly.
DNA ‘n’ = [a,c,g,t] Transcribes to RNA ‘n’ [a,c,g,u] ggn translates to Gly agn translates to [Ser, Arg] Most protein ambiguities have no
‘token’ and are printed as ‘X’
![Page 43: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/43.jpg)
CrossProduct Alphabets
A CrossProductAlphabet is a combination of two or more Alphabets.
Any type of CrossProductAlphabet is possible
Dimers (DNA x DNA) Codon (DNA x DNA x DNA) Conditional ((DNA x DNA) x DNA) Mixed ((DNA x DNA x DNA) x PROTEIN)
![Page 44: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/44.jpg)
Finite and Compound Alphas
A CG
T
[AAC][GTG]GGTTCCAACT
DNA AtomicSymbols
ACA GTG(DNA x DNA x DNA) AtomicSymbols
GNG (DNA x DNA x DNA) BasisSymbol
![Page 45: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/45.jpg)
What are they good for?
Codon Symbols (DNA x DNA x DNA). Many analysis Classes such as Count
and Distribution use Symbol as an argument. A hexamer can be an AtomicSymbol.
Phred is DNA x Integer 1st and Higher order Markov Models
use CrossProductAlphabets.
![Page 46: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/46.jpg)
How do I make a CrossProductAlphabet?
import java.util.*; import org.biojava.bio.seq.*; import org.biojava.bio.symbol.*; public class CrossProduct { public static void main(String[] args) { //make a CrossProductAlphabet from a List List l = Collections.nCopies(3, DNATools.getDNA()); Alphabet codon = AlphabetManager.getCrossProductAlphabet(l); //get the same Alphabet by name Alphabet codon2 = AlphabetManager.generateCrossProductAlphaFromName(
"(DNA x DNA x DNA)“ );
//show that the two Alphabets are canonical System.out.println(codon == codon2); } }
![Page 47: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/47.jpg)
Making Triplet Views on a SymbolList
import org.biojava.bio.seq.*; import org.biojava.bio.symbol.*; public class CodonView { public static void main(String[] args) { try { //make a DNA SymbolList SymbolList dna = DNATools.createDNA("atgcccgcgtaa"); System.out.println("Length of dna " + dna.length()); //get a Codon View (window size of three) SymbolList codons = SymbolListViews.windowedSymbolList(dna, 3); System.out.println("Length of codons " + codons.length()); //get a Triplet View SymbolList triplets = SymbolListViews.orderNSymbolList(dna, 3); System.out.println("Length of triplets "+ triplets.length()); } catch (Exception ex) { ex.printStackTrace(); } } }
![Page 48: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/48.jpg)
Getting a Symbol for a Codon
import java.util.*; import org.biojava.bio.seq.*; import org.biojava.bio.symbol.*; public class MakeATG { public static void main(String[] args) { //make a CrossProductAlphabet from a List List l = Collections.nCopies(3, DNATools.getDNA()); Alphabet codon = AlphabetManager.getCrossProductAlphabet(l); //get the codon made of atg List syms = new ArrayList(3); syms.add(DNATools.a()); syms.add(DNATools.t()); syms.add(DNATools.g()); Symbol atg = null; try { atg = codon.getSymbol(syms); } catch (IllegalSymbolException ex) { //used Symbol from Alphabet that is not a component of codon ex.printStackTrace(); } System.out.println("Name of atg: "+ atg.getName()); } }
![Page 49: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/49.jpg)
Breaking a Codon into its Parts
import java.util.*; import org.biojava.bio.seq.*; import org.biojava.bio.symbol.*; public class BreakingComponents { public static void main(String[] args) { //make the 'codon' alphabet List l = Collections.nCopies(3, DNATools.getDNA()); Alphabet alpha = AlphabetManager.getCrossProductAlphabet(l); //get the first symbol in the alphabet Iterator iter = ((FiniteAlphabet)alpha).iterator(); AtomicSymbol codon = (AtomicSymbol)iter.next(); System.out.print(codon.getName()+" is made of: "); //break it into a list its components List symbols = codon.getSymbols(); for(int i = 0; i < symbols.size(); i++){ if(i != 0) System.out.print(", "); Symbol sym = (Symbol)symbols.get(i); System.out.print(sym.getName()); } } }
![Page 50: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/50.jpg)
Basic Sequence Operations
![Page 51: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/51.jpg)
Getting a section of a SymbolList
symbolAt(int i) Returns a Symbol
subList(int min, int max) Returns a SymbolList
subString(int min, int max) Returns the subsection tokenized to a
String
![Page 52: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/52.jpg)
Transcription
In BioJava DNA sequences and RNA sequences are from different Alphabets. To convert between them:
//make a DNA SymbolListSymbolList dna = DNATools.createDNA("atgccgaatcgtaa");
//convert it to RNASymbolList rna = DNATools.toRNA(dna);
//just to prove it workedSystem.out.println(rna.seqString()); //augccgaaucguaa
//biological transcription (ie copy and reverse strand)rna = DNATools.transcribeToRNA(dna); //5’ atgccgaatcgtaa 3’System.out.println(rna.seqString()); //5’ uuacgauucggcau 3’
![Page 53: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/53.jpg)
Reverse Complement
import org.biojava.bio.symbol.*; import org.biojava.bio.seq.*; public class ReverseCompiment { public static void main(String[] args) throws Exception{ SymbolList forward = DNATools.createDNA("atcgctagcgatcg"); //two step SymbolList reverse = SymbolListViews.reverse(forward); SymbolList revc1 = DNATools.complement(reverse); //one step SymbolList revc2 = DNATools.reverseComplement(forward); //test for equivalence System.out.println(revc1.equals(revc2)); } }
![Page 54: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/54.jpg)
Translation
RNATools contains the “Universal” RNA to Protein TranslationTable.
Standard procedure is transcribe DNA to RNA and then translate.
![Page 55: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/55.jpg)
Translation Example
import org.biojava.bio.symbol.*; import org.biojava.bio.seq.*; public class Translate { public static void main(String[] args) { try { //create a DNA SymbolList SymbolList symL = DNATools.createDNA("atggccattgaatga"); //transcribe to RNA symL = RNATools.toRNA(symL); //translate to protein symL = RNATools.translate(symL); //prove that it worked System.out.println(symL.seqString()); } catch (Exception ex) { ex.printStackTrace() }
} }
![Page 56: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/56.jpg)
Sequence I/O
![Page 57: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/57.jpg)
Don’t ever write another Parser
If you can avoid it! BioJava supports
Genbank, GenPept, RefSeq, EMBL, SwissProt, PDB, Fasta, ABI, LocusLink, Unigene (requires Java 1.4)
GAME, AGAVE Blast, Fasta, HMMER (models and results), BlastXML,
MEME, Phred OBDA, BioIndex, BioSQL, DAS, GFF, XFF Ensembl (with biojava-ensembl package)
StAX/ Tag value RMI and Serialization
![Page 58: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/58.jpg)
Simple I/O
Most of BioJava’s simpler I/O operations are conveniently wrapped up behind static methods from the SeqIOTools class.
SeqIOTools can read and write: Fasta (protein or DNA) EMBL GenBank (flat file and XML) SwissProt GenPept MSF (protein or DNA) Fasta Alignments
![Page 59: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/59.jpg)
SeqIOTools Reader Methods
SequenceIterator i = SeqIOTools.readGenbank(br); SequenceIterator i = SeqIOTools.readGenpept(br); SequenceIterator i = SeqIOTools.readSwissprot(br); SequenceIterator i = SeqIOTools.readEmbl(br); etc… SequenceIterator i = (SequenceIterator)
SeqIOTools.fileToBiojava("fasta", "dna“, br);
Alignment a = (Alignment) SeqIOTools.fileToBiojava(“MSF", “rna“, br);
![Page 60: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/60.jpg)
Features, Locations, Annotations
![Page 61: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/61.jpg)
Features and Annotations
Sequence data often comes with added information about the various properties of the sequence (Genbank, SwissProt etc).
BioJava divides this information into global properties (Annotations) and Localized properties (Features).
![Page 62: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/62.jpg)
Annotatable
Annotatable is an “mix-in” interface that indicates the implementing object contains a Annotation object.
It defines one method. Annotation getAnnotation();
![Page 63: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/63.jpg)
Annotations
org.biojava.bio.Annotation Annotations are used for Global properties. Species, Accession Number, xrefs, date,
publication. Key – value maps. Key and Value are objects but almost always are
Strings. Annotation.EMPTY_ANNOTATION
static convenience class good place holder, avoids null pointer exceptions immutable
![Page 64: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/64.jpg)
Annotation API
Map asMap() Return a map that contains the same key/values as this Annotation.
boolean containsProperty(java.lang.Object key) Returns whether there the property is defined.
Object getProperty(java.lang.Object key) Retrieve the value of a property by key.
Set keys() Get a set of key objects.
void removeProperty(java.lang.Object key) Delete a property
void setProperty(java.lang.Object key, java.lang.Object value) Set the value of a property.
![Page 65: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/65.jpg)
FeatureHolder
FeatureHolder is another “mix-in” interface which allows the implementing object to hold Features.
Sequence implements FeatureHolder. Features are created by
FeatureHolders. FeatureHolders can be filtered.
![Page 66: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/66.jpg)
FeatureHolder methods
boolean containsFeature(Feature f) Check if the feature is present in this holder.
int countFeatures() Count how many features are contained.
Feature createFeature(Feature.Template ft) Create a new Feature, and add it to this FeatureHolder.
Iterator features() Iterate over the features in no well defined order.
FeatureHolder filter(FeatureFilter filter) Query this set of features using a supplied FeatureFilter.
FeatureHolder filter(FeatureFilter fc, boolean recurse) Return a new FeatureHolder that contains all of the children of this one that passed the filter fc.
FeatureFilter getSchema() Return a schema-filter for this FeatureHolder.
void removeFeature(Feature f) Remove a feature from this FeatureHolder.
![Page 67: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/67.jpg)
Features are Annotatable
Features implement Annotatable Can hold an annotation Global annotations of a Feature
/note: /db_xref: etc
![Page 68: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/68.jpg)
Features may be nested
Features implement FeatureHolder! Therefore Features may hold nested
Features c.f. The AWT Menu is a MenuItem e.g. A gene has exons and introns Filtering can be recursive A Feature cannot hold itself (directly or
indirectly)
![Page 69: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/69.jpg)
Location API
Locations are objects that specify a minimum and maximum bound on a region of sequence.
Contains some useful methods, particularly getMin() and getMax().
Many methods have been deprecated and are now delegated to LocationTools.
LocationTools is the best place to get new instances of a Location.
PointLocation, RangeLocation, CircularLocation, CompoundLocation.
![Page 70: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/70.jpg)
LocationTools
static boolean areEqual(Location locA, Location locB) Return whether two locations are equal.
static boolean contains(Location locA, Location locB) Return true iff all indices in locB are also contained by locA.
static Location flip(Location loc, int len) Flips a location relative to a length.
static Location intersection(Location locA, Location locB) Return the intersection of two locations.
static CircularLocation makeCircularLocation(int min, int max, int seqLength) A simple method to generate a RangeLocation wrapped in a CircularLocation
static Location makeLocation(int min, int max) Return a contiguous Location from min to max.
static boolean overlaps(Location locA, Location locB) Determines whether the locations overlap or not.
static Location subtract(Location x, Location y) Subtract one location from another.
static Location union(java.util.Collection locs) The n-way union of a Collection of locations.static
Location union(Location locA, Location locB) Return the union of two locations.
![Page 71: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/71.jpg)
Location Example
import org.biojava.bio.symbol.*; import org.biojava.bio.seq.*; public class SpecifyRange { public static void main(String[] args) { try { //make a RangeLocation specifying the residues 3-8 Location loc = LocationTools.makeLocation(3,8); //print the location System.out.println("Location: "+loc.toString()); //make a SymbolList SymbolList sl = RNATools.createRNA("gcagcuaggcggaaggagc"); System.out.println("SymbolList: "+sl.seqString()); //get the SymbolList specified by the Location SymbolList sym = loc.symbols(sl); System.out.println("Symbols specified by Location: "+sym.seqString()); } catch (IllegalSymbolException ex) { //illegal symbol used to make sl ex.printStackTrace(); } } }
![Page 72: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/72.jpg)
Filtering Features
FeatureHolders have a filter method that accepts a FeatureFilter as an argument.
Features that are accepted by the FeatureFilter are returned as a new FeatureHolder.
Filtering may be done recursively so that nested Features are subjected to the same FeatureFilter .
![Page 73: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/73.jpg)
FeatureFilters
FeatureFilter is an interface that specifies one method. boolean accept(Feature f)
There are 26 implementations of FeatureFilter in BioJava available as inner classes of the FeatureFilter interface.
Most commonly used are ByType, BySource, StrandFilter, OverlapsLocation, ContainedByLocation.
Also boolean logic filters: And, Or, Not
![Page 74: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/74.jpg)
Analysis and Distributions
![Page 75: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/75.jpg)
Distributions and Counts
The Distribution and Count interfaces are from the org.biojava.bio.dist package.
Counts are maps from AtomicSymbols to counts.
Distributions are maps from Symbols to frequencies.
![Page 76: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/76.jpg)
Distributions
Distributions are central to analysis Map Symbols to Frequencies Can be trained or weights can be set Used heavily in dp (dynamic programming)
package. HMM transitions and emmissions
Many implementations, frequently used are: SimpleDistribution OrderNDistribution UniformDistribution
![Page 77: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/77.jpg)
Distribution API
Alphabet getAlphabet() The alphabet from which this spectrum emits symbols. Distribution getNullModel() Retrieve the null model Distribution that this Distribution recognizes. double getWeight(Symbol s) Return the probability that Symbol s is emited by this spectrum. void registerWithTrainer(DistributionTrainerContext dtc) Register this distribution with a training context. Symbol sampleSymbol() Sample a symbol from this state's probability distribution. void setNullModel(Distribution nullDist) Set the null model Distribution that this Distribution recognizes. void setWeight(Symbol s, double w) Set the probability or odds that Symbol s is emited by this state.
![Page 78: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/78.jpg)
DistributionFactory
Generally a Distribution is created using a DistributionFactory.
The DistributionFactory interface contains a static inner class called DEFAULT that implements DistributionFactory
DistributionFactory df = DistributionFactory.DEFAULT; Distribution d = df.createDistribution(dna.getAlphabet());
![Page 79: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/79.jpg)
Distribution Training
Distributions can be trained on observed sequences using a DistributionTrainerContext.
One or more Distributions can be registered with the DTC. //register the Distributions with the trainer
dtc.registerDistribution(dnaDist);
![Page 80: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/80.jpg)
DistributionTrainerContext
A DistributionTrainer is assigned to each registered Distribution by the DTC.
If unusual training behaivour is required you can register your own DistributionTrainer at the same time.
The dtc can also add pseudocounts if needed.
Ambiguities are automagically handled. Counts are split according to the null model.
![Page 81: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/81.jpg)
Training Example
//make a DNA SymbolList SymbolList dna = DNATools.createDNA("atcgctagcgtyagcntatsggca"); //get a DistributionTrainerContext DistributionTrainerContext dtc = new SimpleDistributionTrainerContext(); //make the Distribution Distribution dnaDist = DistributionFactory.DEFAULT.createDistribution(dna.getAlphabet()); //register the Distribution with the trainer dtc.registerDistribution(dnaDist); for(int j = 1; j <= dna.length(); j++){ dtc.addCount(dnaDist, dna.symbolAt(j), 1.0); } //train the Distribution dtc.train();
![Page 82: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/82.jpg)
setWeight() Example
FiniteAlphabet a = DNATools.getDNA();Distribution d =
DistributionFactory.DEFAULT.createDistribution(a);//set the weight of each symbold.setWeight(DNATools.a(),0.3);d.setWeight(DNATools.c(),0.2);d.setWeight(DNATools.g(),0.2); d.setWeight(DNATools.t(),0.3);
![Page 83: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/83.jpg)
DistributionTools
DistributionTools holds static methods for creating and manipulating Distributions.
Tasks include: Equal emission spectra? Shannon Entropy, information, KL Distance. Generate biased sequences. Make a Distribution[] from an Alignment (each Distribution
represents one position in an Alignment. Average two or more Distributions. Randomize a Distribution. Make a Distribution from a Count.
![Page 84: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/84.jpg)
Serialization of Distributions
Distributions are Serializable Write to and Read from Binary RMI
XMLDistributionWriter Write any Distribution to a stream in XML format.
XMLDistributionReader SAXParser Read any Distribution from a XML stream
![Page 85: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/85.jpg)
XML Output
<?xml version="1.0" ?>
<Distribution type="Distribution">
<alphabet name="DNA" />
<weight sym="adenine" prob="0.32178516910737204" />
<weight sym="cytosine" prob="0.04596199299395364" />
<weight sym="guanine" prob="0.1405504188012911" />
<weight sym="thymine" prob="0.4917024190973832" />
</Distribution>
![Page 86: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/86.jpg)
What Else??
Dynamic Programming (HMMs) Bibliography Alignments Blast and Fasta parsing
![Page 87: BioJava Core API. Java for Bioinformatics? Cross platform means develop on one platform deploy on any. Widely accepted industry standard. Lots of support.](https://reader035.fdocuments.in/reader035/viewer/2022062304/56649e0f5503460f94af9d32/html5/thumbnails/87.jpg)
What Else??
BioSQL support GUI components Chromatograms Molecular Biology (pI, mass, restriction
enzymes) Molecular Structure