Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++...
Transcript of Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++...
![Page 1: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/1.jpg)
Data Wrangling Lab
Sept 26-29, 2016 (c) 2016 iCDO@UALR 1
David /WEI DAI
CDO-1 Certificate Program:Foundations for Chief Data Officers
![Page 2: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/2.jpg)
Agenda
• Basic Python Program
• MongoDB Lab
• Clean Data Lab
Sept 26-29, 2016 (c) 2016 iCDO@UALR 2
![Page 3: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/3.jpg)
A Tutorial on the Python Programming Language
Sept 26-29, 2016 (c) 2016 iCDO@UALR 3
![Page 4: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/4.jpg)
Why do we choose Python?
• C or C++
• Java
• Perl
• Scheme
• Fortran
• Python
• Matlab
Modern, interpreted, object-oriented, full featured high level programming language
Portable(Unix/Linux,MacOS X,Windows) Open source, intellectual property rights held
by the Python Software Foundation Python versions: 2.x and 3.x
3.x is not backwards compatible with 2.x This course uses 3.x version
Fast program development Simple syntax Easy to write well readable code Large standard library Lots of third party libraries
Numpy, Scipy, Biopython MatplotlibSept 26-29, 2016 (c) 2016 iCDO@UALR 4
![Page 5: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/5.jpg)
Python Program Platform
• Open a browser and access the website:
• https://teslae.host.ualr.edu:8888
• Password: python
Sept 26-29, 2016 (c) 2016 iCDO@UALR 5
![Page 6: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/6.jpg)
Hello World
•At the prompt type “ hello world!”
Sept 26-29, 2016 (c) 2016 iCDO@UALR 6
![Page 7: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/7.jpg)
The print and string Statement
>>> print('hello')hello>>> print('hello', David')hello David
• Elements separated by commas print with a space between them
• Strings are immutable
• “+” is overloaded to do concatenation >>> x = 'hello'
>>> x = x + ' America'>>> print(x)'hello America'
Sept 26-29, 2016 (c) 2016 iCDO@UALR 7
![Page 8: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/8.jpg)
Substrings and Methods
>>> s = '012345'>>> print(s[3])'3'>>> print(s[1:4])'123'>>> print(s[2:])'2345'>>> print(s[:4])'0123'>>> print(s[-2])'4'
• len(String) – returns the number of characters in the String
• str(Object) – returns a String representation of the Object
>>> print(len(s))6>>> print(str(10.3))'10.3'
Sept 26-29, 2016 (c) 2016 iCDO@UALR 8
![Page 9: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/9.jpg)
Sept 26-29, 2016 (c) 2016 iCDO@UALR 9
• Relational operators== equal
!=, <> not equal
> greater than
>= greater than or
equal
< less than
<= less than or equal
• Logical operatorsand and
or or
notnot
![Page 10: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/10.jpg)
Variables
• Are not declared, just assigned
• The variable is created the first time you assign it a value
• Assignment is = and comparison is ==
Sept 26-29, 2016 (c) 2016 iCDO@UALR 10
![Page 11: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/11.jpg)
Lists
• Ordered collection of data
• Data can be of different types
• Lists are mutable
• Issues with shared references and mutability
• Same subset operations as Strings
>>> x = [1,'hello', (3 + 2j)]>>> print(x)[1, 'hello', (3+2j)]>>> print(x[2])(3+2j)>>> print(x[0:2])[1, 'hello']
Sept 26-29, 2016 (c) 2016 iCDO@UALR 11
![Page 12: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/12.jpg)
Lists: Modifying Content
• x[i] = a reassigns the ith element to the value a
• Since x and y point to the same list object, both are changed
• The method appendalso modifies the list
>>> x = [1,2,3]>>> y = x>>> x[1] = 15>>>print( x)[1, 15, 3]>>> print(y)[1, 15, 3]>>> x.append(12)>>> print(y)[1, 15, 3, 12]
Sept 26-29, 2016 (c) 2016 iCDO@UALR 12
![Page 13: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/13.jpg)
Lists: Modifying Contents
• The method append modifies the list and returns None
• List addition (+) returns a new list
>>> x = [1,2,3]>>> y = x>>> z = x.append(12)>>> print(z == None)True>>> print(y)[1, 2, 3, 12]>>> x = x + [9,10]>>> print(x)[1, 2, 3, 12, 9, 10]>>> print(y)[1, 2, 3, 12]>>>
Sept 26-29, 2016 (c) 2016 iCDO@UALR 13
![Page 14: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/14.jpg)
If ELSE Statements
if expression:statement(s)
else:statement(s)
Sept 26-29, 2016 (c) 2016 iCDO@UALR 14
![Page 15: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/15.jpg)
For Loops
• Similar to perl for loops, iterating through a list of values
16123
for x in [1,6,12,3] :print(x)forloop1.py
0123
for x in range(4) :print(x)forloop2.py
range(N) generates a list of numbers [0,1, …, n-1]Sept 26-29, 2016 (c) 2016 iCDO@UALR 15
![Page 16: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/16.jpg)
Functions are first class objects
• Can be assigned to a variable
• Can be passed as a parameter
• Can be returned from a function
• Functions are treated like any other variable in Python, the def statement simply assigns a function to a variable
Sept 26-29, 2016 (c) 2016 iCDO@UALR 16
![Page 17: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/17.jpg)
Function Basics
def min(x,y) :if x > y :
return xelse :
return y
>>> mix(2,5)5
functionbasics.py
Sept 26-29, 2016 (c) 2016 iCDO@UALR 17
![Page 18: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/18.jpg)
Python for graph
• Matplotlib is a python 2D plotting library which produces high quality figures
• Read demos is ready at plot_demo.ipy file.
Sept 26-29, 2016 (c) 2016 iCDO@UALR 18
![Page 19: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/19.jpg)
MongoDB LAB
Sept 26-29, 2016 (c) 2016 iCDO@UALR 19
![Page 20: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/20.jpg)
http://teslae.host.ualr.edu:8081
username: mongotest
Password: mongotest
MongoDB Express User Interface
Sept 26-29, 2016 (c) 2016 iCDO@UALR 20
![Page 21: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/21.jpg)
MongoDB Express
• MongoDB Express is Web-based MongoDB admin interface
• You can create, review, export, delete data through the platform
Sept 26-29, 2016 (c) 2016 iCDO@UALR 21
![Page 22: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/22.jpg)
MongoDB Express Lab
• Export cities.json
• Add a new city name which you like to MongoDB
• Query or find the new city name
• Delete the new city name
Sept 26-29, 2016 (c) 2016 iCDO@UALR 22
![Page 23: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/23.jpg)
Clean Data Lab
Sept 26-29, 2016 (c) 2016 iCDO@UALR 23
![Page 24: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/24.jpg)
Courses Data in MongoDB
Sept 26-29, 2016 (c) 2016 iCDO@UALR 24
![Page 25: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/25.jpg)
Connect to MongoDB
Sept 26-29, 2016 (c) 2016 iCDO@UALR 25
![Page 26: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/26.jpg)
CRUD Operation for MongoDB
Sept 26-29, 2016 (c) 2016 iCDO@UALR 26
![Page 27: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/27.jpg)
Basic Python-MongoDB Lab
• Write codes to add a new course • {"courseid": "71XX", <--Change XX
• "subject": "information science",
• "title": "data quality algorithm", <--Change course name
• "hours": 3 <--Change hours
• }
• Write codes to search your courses• query = {"title": "data quality algorithm" } <--Change title name
• projection = {"hours": 3 <--Change hours
Sept 26-29, 2016 (c) 2016 iCDO@UALR 27
![Page 28: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/28.jpg)
Basic Python-MongoDB lab (cont.)
• A challenge project• Write codes to add your name at teachers’ list
Sept 26-29, 2016 (c) 2016 iCDO@UALR 28
![Page 29: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/29.jpg)
Clean Data lab (cont.)
• Teachers, Courses, and Students are MDM data so that the data is accurate and trust.
• student_course_report and
• teacher_course_report contain incorrect data, but teacherid, studentid ,and courseid are correct.
Teachersinfo teacher_course_report
Sept 26-29, 2016 (c) 2016 iCDO@UALR 29
![Page 30: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/30.jpg)
Clean Data lab (cont.)
teacher_course_report
Sept 26-29, 2016 (c) 2016 iCDO@UALR 30
![Page 31: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/31.jpg)
Clean Data lab (cont.)
• Write codes to clean student_course_report
• Tips:
coursesinfo
studentsinfo
student_course_report
Sept 26-29, 2016 (c) 2016 iCDO@UALR 31
![Page 32: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/32.jpg)
Clean Data lab (cont.)
• A challenge project• Write codes to clean t_s_c_report.
coursesinfo
studentsinfo
TeachersinfoSept 26-29, 2016 (c) 2016 iCDO@UALR 32
![Page 33: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/33.jpg)
THANK YOU
Sept 26-29, 2016 (c) 2016 iCDO@UALR 33
![Page 34: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,](https://reader035.fdocuments.in/reader035/viewer/2022081404/5f0676847e708231d4181e7b/html5/thumbnails/34.jpg)
Reference
• http://www.scipy-lectures.org/packages/statistics/index.html
• https://github.com/mongo-express/mongo-express
• https://api.mongodb.com/python/current/
• https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&sqi=2&ved=0ahUKEwjI-uufkabPAhVOgx4KHdWsAXwQFggiMAE&url=http%3A%2F%2Fwww.fh.huji.ac.il%2F~goldmosh%2FPythonTutorialFeb152012.ppt&usg=AFQjCNH5nWz_PAanbl7JCdE6PN7SFUVxyw&sig2=SGxL0rIqfL8gbxQD7mfURA
• https://docs.mongodb.com/manual/
• http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/5944/pdf
• O'higgins, Niall. MongoDB and Python: Patterns and processes for the popular document-oriented database. " O'Reilly Media, Inc.", 2011.
Sept 26-29, 2016 (c) 2016 iCDO@UALR 34