1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And...
-
Upload
christopher-caldwell -
Category
Documents
-
view
215 -
download
0
Transcript of 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And...
1
Introduction to Matlab & Data Analysis
Tutorials 8 and 9: Cell Arrays
Advanced Text Processing And File Handling
Please change directory to directory E:\Matlab (cd E:\Matlab;)
From the course website
(http://www.carine.co.il/htmls/page_1176.aspx?c0=13889&bsp=14333&bssearch=4,0,5,3,41,0
)
Download:
t89.zip and unzip itWeizmann 2010 ©
2
Outline
2
Cell arrays: Creating and indexing Useful functions for strings lists
Structures Advanced string manipulation
Regular expressions File handling
Reading files Writing to files High-level file handling functions
Final example – P53
3
Cell Arrays – Lecture Reminders Cell arrays –
Used for keeping different types of data in the same array
For example: A{1}= 2; A{2}= 4:2:44; A{3}= ‘hello’;
Extremely useful for handling lists of strings
Notice the curly brackets
2 4:2:44 hello
Cell Cell Cell Cell Array
4
Creating Cell Arrays – Lecture Reminder
A(1) = {3}; A{2} = 3; A{3} = ‘radio blabla’; A{4} = 2:2:66;B(1:3) = {3, [1, 2], ’abc’};
C = {‘george clooney’ ; … ‘richard gere’ }; %Initializing an empty cell array:
D=cell(4,2);
>>A‘ans = [ 3][ 3]
' radio blabla'[ 1x33 double]
C = ' george clooney'
' richard gere'
D = ][ ][ ][ ][ ][ ][ ][ ][
5
Indexing Cell Arrays Define a cell array:>> A(1) = {3};>> A{2} = 3;>> A{3} = ‘radio blabla’;
>> A{4} = 2:2:66; (or load A.mat;)
What is the difference?A(1)
A{1}
>>x=A(1) >>class(x)
>>x=A{1}>> class(x)
>>x=A(3)>> class(x)
>>x=A{3} >>class(x)
x = [3]cellx = 3doublex = 'radio blabla'cellx = radio blablachar
3 [1,2,7] ‘Str’
Cell Cell Cell Cell ArrayTry:
6
Manipulating Cell arraysJust like numerical arrays…Examples:x([1,3,5]) = {'aaa','bbb','ccc'}x = repmat(x,2,3)x(:,4)x(1:2,3:5)
% Notice:% Using curly brackets returns couple of cells
[a, b]=x{1:2}
Numerical array default value is zero, in cell array it is []
7
Cell Arrays Are Very Useful For Keeping Lists of Strings
Cell arrays of strings can be treated similarly to numerical arrays.
Many functions can work both numerical & cell arrays Many functions which work on strings can handle cell
arraysload fruit.mat;%fruit={‘mango’,’banana’,’melon’,’apple’,’kiwi’,’orange’};%fruit_prices=[30 15 10 5 35 8]; Find what is the price of melon?ind = find(strcmp(fruit,’melon’));fruit_prices(ind) Sort the fruits from cheapest to most expensive[sorted_p,y]=sort(fruit_prices);fruit(y)
ans = 10
{‘apple‘,’orange‘,’melon‘,’banana‘,’mango‘,’kiwi‘}
8
Manipulating Cell Arrays That Hold Lists Of Strings
unique
intersect
setdiff
union
9
Manipulating Cell Arrays That Hold Lists Of Strings - Example
%fruit={‘mango’,’banana’,’melon’,’apple’,’qiwi’,’orange’};
%fruit_sales={‘mango’,’banana’,’melon’,…
’mango’,’mango’,’qiwi’,’banana’,’mango’};
Which fruits were not sold today?setdiff(fruit,unique(fruit_sales))
{'apple‘,'orange‘}
For efficiency
10
ismember Function Is Useful For Mapping One List To Another
Finds if an element exists in a list>> b = {‘z’,’y’,’x’,’w’};>> a = ismember(‘x’,b)a = 1
If it does – ismember can tell you where it is>>[a,map]= ismember(‘x’,b)a=1, map=3
ismember is good for mapping one list to another – when order is important! >>[a,map]= ismember({‘x’,’y’,‘c’},b);a=[1 1 0], map=[3 2 0]
11
Comparing Two Lists of Strings:ismember, find and intersect
Which function to use? I want to find the order of
elements of one list in another list?
ismember I want to find which elements
of a list are also in another list?
intersect I want to find all the
occurrences of an element in a list?
find
When the element appears in the list more than once, ismember will return only the last position
12
Using ismember - Example
>> a = ismember(‘banana’, fruit_sales)a=1>> a = ismember(‘orange’, fruit_sales)a=0>> a = ismember(fruit, fruit_sales);a = [1 1 1 0 1 0]% Reminder: fruit_prices = 30 15 10 5 35 8
Example: calculate the amount of money made by each fruit sale
>> [a,b]= ismember(fruit_sales, fruit);a = [1, 1, 1, 1, 1, 1, 1, 1]b = [1, 2, 3, 1, 1, 5, 2, 1]
>> sales_money = fruit_kilos .* fruit_prices(b)sales_money = [90, 30, 10, 60, 240, 17.5, 45, 150]
13
Structures
14
Lecture Reminder - Structures Creation
>> dogs.name = 'rufus';>> dogs.breed = 'Bulldog';>> dogs.age = 1.5; % in years>> dogs.special_food = 'none';>> dogsdogs =
name: 'rufus' breed: 'Bulldog ' age: 1.5000 special_food: 'none‘
14
15
Lecture Reminder - Structures creation
Adding more dogs…>> dogs(2).name = 'king-kong';>> dogs(2).breed = ‘Chihuahua';>> dogs(2).age = 5; >> dogs(2).special_food = 'filet mignon';
>> dogs(3).name = 'wong';>> dogs(3).breed = 'pekingese';>> dogs(3).age = 20; >> dogs(3).special_food = 'sushi';
>> dogs =
1x3 struct array with fields: name breed age special_food
15
16
Structures – Short Example
Define a “fruits” structure array that has the fields: name price color
and contains two fruits of your choice
Get: Cell array of the names Array of the prices The first fruit
>> fruits(1).name = 'Lemon';>> fruits(1).color = 'Yellow';>> fruits(1).price = 20; >> fruits(2).name = 'Apple';>> fruits(2).color = 'Green';>> fruits(2).price = 10;
>> {fruits.name}'Lemon' 'Apple'>> [fruits.price]20 10>> a = fruits(1)a = name: 'Lemon' color: 'Yellow' price: 20
17
Structure Advertisement
Although this tutorial focuses on cells:
Using Structures to aggregate variables that belong to the same entity makes the program easier to design, more readable and easier to debug.
18
Advanced Text processing (String Manipulation)
1. Review of useful functions:1. findstr, strfind, strtok, strtrim2. sprintf
2. Regular expressions
19
Review of Useful Functions For String Manipulation
So far we learned simple string manipulations: str2num, num2str strcmp, strncmp, strcmpi, strncmpi
More advance string manipulation functions (used in text processing): findstr, strfind strtok strtrim sprintf (related functions: fprintf, sscanf)
20
Finding One String Inside Another - findstr and strfind
findstr(str1,str2) – Searches the longer of the two input
strings for any occurrences of the shorter string (input order does not matter!):
>> k = findstr('beauty is in the eyes of the beholder','be')
k=[1, 30]
strfind(str1,str2) The order matters: finding str2 inside
str1 str1 can be a cell array of strings!!!
23
Consider the line ‘this is an example’ How we write a program that breaks it to a
cell array of single words?rem=‘this is an example’;
words=cell(0);
while 1
[tok,rem] = strtok(rem);
if isempty(tok)
break;
end
words{end+1}=tok;
end
Example –Parsing a Line Using strtok
words'
ans =
'this' 'is' 'an' 'example'
25
load fruit.mat;for i=1:length(fruit) s = sprintf('Fruit number %d: %s', i, fruit{i}); disp(s);end
sprintf – Write Formatted Data Into Strings
Fruit number 1: mangoFruit number 2: bananaFruit number 3: melonFruit number 4: appleFruit number 5: qiwiFruit number 6: orange
Number String
sprintf(format,…) – write formatted data into strings
Good for creating massages for disp Related functions: fprintf, sscanf
format special characters: %s – a string %d – an integer %f – a float (short double)
26
sprintf - Example Consider the cell arraynames = {'Danny', 'Noa', 'Moti'}; Write a script that prints:Number:1, Name:Danny.Number:2, Name:Noa.Number:3, Name:Moti. Answer:for i=1:length(names) s = sprintf('Number:%d, Name:%s.',…
i, names{i}); disp(s);end
See also: sscanf & fprintf
27
More Useful String Manipulation Functions
strtrim(str) – removes all leading and trailing white-space>> strtrim(' do not blink ')'do not blink‘
strtok(str,delim) - breaks a string into “tokens”>> [tok,rem]=strtok('this is an example', ' ')
tok =‘this’ rem = ‘ is an example’ strfind (str1,str2) - searches str2 in str1. str1 can be a cell array of strings! >> k = strfind('beauty is in the eyes of the
beholder','be') k=[1, 30] findstr(str1,str2) – Searches the longer of the two input
strings for any occurrences of the shorter string More useful functions at:
Help -> Matlab -> Functions by category -> Strings functions
28
Regular expressions
29
Regular Expression - Definition
Wikipedia – Regular expressions provide a concise and flexible means for identifying strings of text
of interest, such as particular characters, words, or patterns of characters.
ind = regexp(long_str,'\w+ain')
Regular expressions
We need to learn the regular expressions “language” syntax
30
Regular Expressions Syntax
Defining a pattern: [] is like OR
Any character out of a,b,c or d: [abcd] Anything other than a,b,c or d : [^abcd]
Character range: (all characters a to z) [a-z] Special Charecters used in defining a pattern:
Any character: . Whitespace: \s Newline: \n Tab: \t Any alphanumeric character: \w [a-zA-Z_0-9] Any digit: \d [0-9]
31
Pattern definition - Expression Quantifiers: One or more: exp+ (Example: ‘[\w]+’) Zero or more: exp* Between n and m times: exp{n,m}Examples
Read more about “regular expressions” in the MATLAB help!(search “regular expressions” )
Function: loc = regexp(str, pattern)
Regular Expressions Syntax
‘\w\s+\w’ – Two alphanumeric expressions with one or more spaces in the middle
‘[SRM]amy’ –
Ramy, Samy or Mamy
32
Using Regular Expressions to Search For Pattern occurrences In a Long String
Example:
prof_higgins = 'The rain in Spain stays mainly in the plain.';
We would like to find all the words that rhyme with ‘ain’
1. Defining the pattern: new word (preceded with space) One or more alphanumeric characters ‘ain’ pattern= ‘\w+ain[\s\.]’ OR pattern= ‘[a-zA-Z]+ain [\s\.]’
33
>> prof_higgins = … 'The rain in Spain falls mainly on the plain.';
Find occurrences indices: >> loc = regexp(prof_higgins,'\w+ain')loc = [5 13 25 39]
Get pattern occurrences:>> words = regexp(prof_higgins,'\w+ain','match')words = {'rain','Spain','main','plain'}
Using Regular Expressions to Search For Pattern occurrences In a Long String
34
Replace all pattern occurrences:
>> eliza_doolittle=regexprep(prof_higgins,’ain’,’yne’)
elisa_doolittle = ‘The ryne in Spyne falls mynely on the plyne.’
Split a line to the words (Good for parsing lines of input file): >> words = regexp(prof_higgins, '\s', 'split');words ={'The‘, 'rain‘,'in‘,'Spain‘,'falls‘,'mainly‘,'on‘,'the‘,
'plain.‘}
Using Regular Expressions to Replace Pattern Occurrences In a Long String
35
Using Regular Expression to Parse a line (see strtok for another option)
no_rhymes = regexp(prof_higgins, 'ain\w*\s', 'split')no_rhymes =
{'The r' 'in Sp' 'falls m' 'on the plain.‘}
Error: The last word does not have space after it
Fixing it:
no_rhymes = regexp(prof_higgins, '\w+ain[\s\.]', 'split')no_rhymes =
{'The ' 'in ' 'falls mainly on the ' '' }
36
Running Example – Finding Bomb Threats
You are a CIA agent,who is in charge of identifying potential bombing threats of cities, by going over emails of terrorists .
37
Using Regular Expression to Identify Significant Lines
Assume an email is stored as a cell array of strings (each line in a cell), called “email”
Using Regular expression: Identify lines that contain the expression “bomb” in it. When you find such a line, print: “Help!!!” load email.mat;for i=1:length(email)
line=email{i};if( )
disp(‘HELP!!!’);end
end
~isempty(regexp(line,’bomb’))
38
Using Regular Expression to Identify Significant Lines
Notice there is a “bug” in the code: load email.mat;for i=1:length(email)
line=email{i};if(~isempty(regexp(line,’bomb’)) )
disp([‘HELP!!!:’ line]);end
end
HELP!!!:thinking of bombing rehovotHELP!!!:thinking of bombing sderotHELP!!!:thinking of going to the bombamella festival next week
How do we fix the bug?Hint | is or: ‘smil[e|ed|ing]’
39
Using Regular Expression to Identify Significant Lines
Here is a fix for the bug:
load email.mat;for i=1:length(email)
line=email{i};if(~isempty(regexp(line,’[Bb]omb[ed|ing|s]*\s’)))
disp([‘HELP!!!:’ line]);end
end
HELP!!!:thinking of bombing rehovotHELP!!!:thinking of bombing sderot
| is or
40
Regular Expression Tokens Are Used to Retrieve Specific Part of the Pattern Occurrences
tokens = regexp( …'bla bla [email protected] bli bli [email protected] ya', …
'(\w+)@(\w+)\.ac\.il', 'tokens')
Token 1 Token 2
tokens =
{ {‘ami’, ‘weizmann’} {‘tami’ ‘tau’} }
ocuurence1
tokens{1}{1} = ‘ami’
Token1
Token2
ocuurence2Token
1Token
2
41
Using Tokens to Retrieve Specific Parts of the Pattern Occurrences
Now that you identified the suspicious email, take out the threatened city Hint: Use
regexp(line, <some expression>, ‘tokens’).
for i=1:length(email)line=email{i};if(~isempty(regexp(line,’[Bb]omb[ed|ing|s|\s]*\s’))) city = regexp(line,…
'[Bb]omb[ed|ing|s|\s]*\s(\w+)',…
'tokens');disp([‘HELP!!! Bomb threat on ‘ city{1}{1}]);
endend
HELP!!! Bomb threat on:rehovotHELP!!! Bomb threat on:sderot
42
Using Tokens to Retrieve Specific Parts of the Pattern Occurrences
Here is a loop-less version: load email.mat;cities = regexp(email, '[Bb]omb[ed|ing|s]*\s(\w+).*', 'tokens')
is_threat = ~cellfun('isempty',cities);cities = cities(is_threat);cities = [cities{:}];cities = [cities{:}];warnings = strcat('HELP!!! Bomb threat on: ', cities)disp(strvcat(warnings))
HELP!!! Bomb threat on:rehovotHELP!!! Bomb threat on:sderot
regexp can handle cell array
43
Handling Files
44
Lecture Reminder –Opening and Closing Files
Opening a file for reading:fid=fopen(‘filename’,’r’); Opening a file for writing:fid=fopen(‘filename’,’w’); fid is a scalar MATLAB integer, called a
file identifier. You use the fid as the first argument to
other file input/output routines
Always close your file!!! fclose(fid);
Permissions: ‘a’ – append‘r+’- read and writeMore in the HELP…
45
Lecture Reminder –Reading a File Line by Line
Reading line by line:line = fgetl(fid); How can we read the entire file?fid = fopen('names.txt');
while feof(fid)==0tline = fgetl(fid);
if ~ischar(tline) break; endtline = strtrim(tline);%<do whatever you want>
end
fclose(fid);
Open
Close
feof – did file reached the end
fgetl – file get linebreak if not char
46
Lecture Reminder – Writing to a File
Open the file for writing permission Writing, line by line, using:
fprintf(fid,format,…); % similar to sprintf!!! Format – is a string with special characters:
%s – a string, %d – an integer, %f – a float (short double) Close the file Example:
fid = fopen(‘tmp.txt', 'w');for i=1:length(lines) fprintf(fid,’this is a line: %s\n’,lines{i});Endfclose(fid);
47
fid = fopen('names.txt', 'r');l_cnt = 0;
while feof(fid)==0 line = fgetl(fid); if ~ischar(line) break; end l_cnt = l_cnt +1; disp(['Line number ' num2str(l_cnt) ':' line]); end
fclose(fid);
File handling - Example
Open the file names.txt for read
Display it with line numbers:Line number 1: <line1>Line number 2: <line2> …
Close the file
48
File Handling - Example Congratulations!
You were just promoted to a senior spy. You have a directory full of emails text
files. Now you need to read all emails files,
identify the bomb threat, and write them into a summary threat_report.txt file.
49
File Handling - ExampleSolution strategy:1. Open output the threats file 2. Go over all the emails in a given
directory:1. Open an input email file2. Read it, line by line 3. identify threats
When a threat is identified – Print the line
4. Close the input email file3. Close output threats file
50
File Handling – Example:Programs Design
searchEmailsDirForThreats – Open report output file Open a directory and get all the files
names For each file run
searchEmailForThreats – Open email input file Search line by line for threat If threat is found –
Write the threat to the output file
1. Email file name2. Report output fid
51
File Handling – Example:Main Function Design
function threats_found = searchEmailsDirForThreats(in_emails_dir, out_report_fname)
%<getting all files names> % <opening report output file>
% <going over the files>
% <closing report output file>
52
File Handling – Example:Main Function Design
function threats_found = searchEmailsDirForThreats(in_emails_dir, out_report_fname
%<getting all files names>if (~isdir(in_emails_dir)) error([in_emails_dir ' is not a directory']);end % getting file namesfs = dir(in_emails_dir);file_names = {fs.name};
Directory management:
dir, pwd, cd, copyfile, delete, movefile, mkdir, rmdir, …
53
File Handling – Example:Main Function Design
function threats_found = searchEmailsDirForThreats(in_emails_dir, out_report_fname
%<getting all files names> % <opening report output file>out_report_fid = fopen(out_report_fname, 'w');if out_report_fid < 0 error(['File ' ,out_report_fname ,' could not open']);end threats_found = 0;
54
File Handling – Example:Main Function Design
function threats_found = searchEmailsDirForThreats(in_emails_dir, out_report_fname % <going over the files>for i=1:length(file_names) email_fname = file_names{i};
if (~isdir(email_fname)) threats_found = threats_found + ..
searchEmailForThreats(out_report_fid, … [in_emails_dir '/' email_fname]); end end% <closing report output file>fclose(out_report_fid);
55
File Handling – Example:Looking for Threats In an Email
function threats_found = searchEmailForThreats(out_report_fid,email_fname)
% <opening email input file>%<going over the file line by line>while feof(in_fid) == 0 % <read line> if % <is found threat>
%<get the threatened city> % <adding to the report> endend%<closing input file>
56
File Handling – Example:Opening File For Read
function threats_found = searchEmailForThreats(out_report_fid,email_fname)
% <opening email input file>in_fid = fopen(email_fname, 'rt');if in_fid < 0 error(['File ' , email_fname ,' was not found.']);end threats_found = 0; l_cnt = 0;
57
File Handling – Example:Reading a File Line by Line
function threats_found = searchEmailForThreats(out_report_fid,email_fname)
% <opening email input file>%<going over the file line by line>while feof(in_fid) == 0 % <read line > line = fgetl(in_fid); if ~ischar(line) break; end l_cnt = l_cnt+1; line = strtrim(line); if % <is found threat>
… endend
58
File Handling – Example:Using Regular expression to find and retrieve pattern occurences
while feof(in_fid) == 0 % <read line> % <is found threat> if (~isempty(regexp(line,'.*bomb.*'))) city = regexp(line, '.*bomb\w*\s([\w-]+).*', 'tokens'); % <adding to the report> fprintf(out_report_fid,'File: %s, Line number:%d, Threat on %s - %s\n', ... email_fname , l_cnt, city{1}{1},line); threats_found = threats_found + 1; end
end
59
File Handling – Example:Looking For Threats in an Email
function threats_found = searchEmailForThreats(out_report_fid,email_fname)
% <opening email input file>%<going over the file line by line>while feof(in_fid) == 0 % <read line> if % <is found threat>
%<get the threatened city> % <adding to the report> endend%<closing input file>fclose(in_fid);
60
High-Level File Handling Functions
61
Matlab Has a Collection of High Level Write / Read Functions
Matlab has a collection of high level read and write functions
These functions can save the need to write read/ write the file line by line.
Examples: dlmread, dlmwrite textread, textscan xlsread importdata
62
High-level File Reading Function Example- textread
Reading an entire text file in one line: lines=textread(filename,format,parameters) Example: When reading a file containing a single word in every
line: names=textread(‘names.txt’,’%s’);
If there are more words in a line – each word will be read separately
Example 1:
email=textread(‘email.txt’,’%s’); What happens?
email = {'thinking' 'of'' bombing' 'rehovot''thinking‘…}
63
High-level File Reading Function Example- textread
Example 2: Reading a text file, line by line Try:
email = textread('email.txt', '%s', 'delimiter','\n‘);
What happens?
email = {'thinking of bombing rehovot''thinking of bombing sderot''thinking of going to the bombamella festival next week’}
64
MATLAB functions for High-level file reading
Reading an entire Excel file in one line:
[nums,t]=xlsread(filename,options…) Will create a numerical array nums and a
cell array t. Try:
[n,t]=xlsread('rt_example3.xls') What happens?
Textual cells are set to NaNs in n Numerical cells are set to ‘’ (empty strings) in t
Note: can read each sheet (read the HELP)
65
MATLAB functions for High-level file reading
Reading an entire Excel/tab delimited text file /other preformatted files:
A=importdata(filename,options…) Will create a structure A, which contains:
A.data - numerical array A.textdata - a cell array.
Try: A=importdata('rt_example3.xls') What happens?
66
Summery – File Handeling
Matlab has diverse and powerful functions for text processing
Before you start coding using low levels I/O function – Check if one of the high level functions solves it.
67
Final example:Looking for p53 TFBS
(Transcription Factor Binding Sites)in human promoters
68
Looking for p53 TFBS in human promoters
A TF can recognize a variable site Some positions are fixed Some are optional, e.g. A/T are
acceptable, but not G/C. Consensus sequence: the pattern
representing all possible recognized sites.
69
Looking for p53 TFBS in human promoters
Let’s define a consensus for p53 half-site:1. Pos #1: G/A/T2. Pos #2: G/A3. Pos #3: A/G/C4. Pos #4: C5. Pos #5: A/T6. Pos #6: A/T7. Pos #7: G8. Pos #8: N9. Pos #9: T/C/G10. Pos #10: T/C
Variable space0-13
Half-site Half-site
70
Looking for p53 TFBS in human promoters
How do we even start???1. Read the promoter file into a cell array.2. Go through the promoters:
Look for the p53 consensus (need to define it – regular expression) When we find it store the data on the hit
3. Open a result file4. Go through all the hits you found
Print them into the results file
71
Looking for p53 TFBS in human promoters
1. Reading the promoter file:
The file name: masked_promoters.some.txtThe file format: FASTA>gene1 header lineSequence…Sequence…> gene2 header lineSequence…Sequence…
>GENE=ENSG00000001036 Transcript=1 LLid=2519 orgDBsym=FUCA2 other details… CCATGTTCTAAACGACTTCATAGATTTATTTCTTTCAGTCAT…
72
Looking for p53 TFBS in human promoters
1. Reading the promoter file:promoters={};ensID={};symb={};
fid=fopen('masked_promoters.all.txt');while feof(fid)==0 tline = fgetl(fid); >process the data> endfclose(fid);
73
Looking for p53 TFBS in human promoters
1. Reading the promoter file:while 1 >from previous slide…> if(tline(1)=='>') %it is a header tmp=regexp(tline,…
'.*GENE=(\w+)\s.*orgDBsym=(\w+)',… 'tokens');
ensID{end+1}=tmp{1}{1}; symb{end+1}=tmp{1}{2}; else %it is a promoter promoters{end+1}=tline; endend
74
Looking for p53 TFBS in human promoters
2. Go through the promoters:
hit_seq={};hit_gene=[];hit_pos=[];p53_consensus = ...'[GAT][GA][AGC]C[AT][AT]G.[TCG][TC].{0,13}[GAT][GA]
[AGC]C[AT][AT]G.[TCG][TC]';
for i=1:length(promoters) [m s e] = regexp(promoters{i}, p53_consensus, 'match', …
'start', 'end');%let’s ignore that DNA is double stranded…
if(~isempty(m)) hit_seq(end+1:end+length(m))=m; hit_gene(end+1:end+length(m))=repmat(i,1,length(m)); hit_pos(end+1:end+length(m))=s; endend
75
Looking for p53 TFBS in human promoters
3&4. Open a result file, print all the hits
fid=fopen('p53_TFBS.txt','w');%printing a header linefprintf(fid,'gene ID\tgene name\tsite\tpos\n');for i=1:length(hit_gene) fprintf(fid,'%s\t%s\t%s\t%d\n',
ensID{hit_gene(i)},... symb{hit_gene(i)},...
hit_seq{i},... hit_pos(i));
endfclose(fid);