OpenRefine Class Tutorial
-
Upload
ashwin-dinoriya -
Category
Documents
-
view
121 -
download
0
Transcript of OpenRefine Class Tutorial
![Page 1: OpenRefine Class Tutorial](https://reader036.fdocuments.in/reader036/viewer/2022062821/588890561a28ab3e658b68fd/html5/thumbnails/1.jpg)
Advances in Data ScienceFall 2016
TUTORIAL
![Page 2: OpenRefine Class Tutorial](https://reader036.fdocuments.in/reader036/viewer/2022062821/588890561a28ab3e658b68fd/html5/thumbnails/2.jpg)
INTRODUCTION
FEATURES
INSTALLATION
DEMO
COMPARISON
![Page 3: OpenRefine Class Tutorial](https://reader036.fdocuments.in/reader036/viewer/2022062821/588890561a28ab3e658b68fd/html5/thumbnails/3.jpg)
WHAT IS …
??
• Formerly known as Google Refine
OpenRefine is a power tool for working with messy data, primarily for
• detecting and fixing inconsistencies • transforming data from one structure or format to
another • extending it with web services and external data• connecting names within your data to name
registries (databases)
Use OpenRefine when you need something ...
• more powerful than a spreadsheet• more interactive and visual than scripting• more provisional / exploratory / experimental / . playful than a database
![Page 4: OpenRefine Class Tutorial](https://reader036.fdocuments.in/reader036/viewer/2022062821/588890561a28ab3e658b68fd/html5/thumbnails/4.jpg)
• Import data in various formats (Ex: TSV, CSV,Excel (.xls, xlsx),XML,RDF as XML,JSON)
• Explore datasets in a matter of seconds
• Apply basic and advanced cell transformations
• Deal with cells that contain multiple values
• Create instantaneous links between datasets
• Filter and partition your data easily with regular expressions
• Use named-entity extraction on full-text fields to automatically identify topics
• Perform advanced data operations with the General Refine Expression Language
IMPORTANT FEATURES:
![Page 5: OpenRefine Class Tutorial](https://reader036.fdocuments.in/reader036/viewer/2022062821/588890561a28ab3e658b68fd/html5/thumbnails/5.jpg)
The LendingClub data contains complete loan data for all loans issued through the time period stated, including the current loan status (Current, Late, Fully Paid, etc.) and latest payment information
LENDING CLUB LOAN STATS DATA
Our aim is to perform exploratory analysis on given financial data
![Page 6: OpenRefine Class Tutorial](https://reader036.fdocuments.in/reader036/viewer/2022062821/588890561a28ab3e658b68fd/html5/thumbnails/6.jpg)
• Getting the data
• Looking at the data
• Cleansing
• Transforming
• Creating visualizations
STEPS
![Page 7: OpenRefine Class Tutorial](https://reader036.fdocuments.in/reader036/viewer/2022062821/588890561a28ab3e658b68fd/html5/thumbnails/7.jpg)
1 – Getting started with OpenRefine
2 – Analyzing and Fixing Data
3 – Advanced Data Operations
4 – Linking Datasets
5 – Regular Expressions and GREL
TUTORIAL
![Page 8: OpenRefine Class Tutorial](https://reader036.fdocuments.in/reader036/viewer/2022062821/588890561a28ab3e658b68fd/html5/thumbnails/8.jpg)
• Requirements• Java JRE installed
• Download• OpenRefine is a desktop application. Here’s the link: Google OpenRefine• Unlike most other desktop applications, it runs as a small web server on
your own computer • You point your web browser at that web server in order to use Refine. So,
think of Refine as a personal and private web application
HOW TO INSTALL
![Page 9: OpenRefine Class Tutorial](https://reader036.fdocuments.in/reader036/viewer/2022062821/588890561a28ab3e658b68fd/html5/thumbnails/9.jpg)
• Install: • Once you have downloaded the .zip file, uncompress it into a folder wherever you want (such as in
C:\Google-Refine).
• Run: • Run the .exe file in that folder. You should see the Command window in which OpenRefine runs. By
default, the Command window has a black background and text in monospace font in it.
• Shut down: • When you need to shut down OpenRefine, switch to that Command window, and press Ctrl-C. Wait
until there's a message that says the shutdown is complete. That window might close automatically, or you can close it yourself. If you get asked, "Terminate all batch processes? Y/N", just press Y.
INSTALLATION: WINDOWS
![Page 10: OpenRefine Class Tutorial](https://reader036.fdocuments.in/reader036/viewer/2022062821/588890561a28ab3e658b68fd/html5/thumbnails/10.jpg)
• Install: • Once you have downloaded the .dmg file, open it, and drag the OpenRefine icon into
the Applications folder icon (just like you would normally install Mac applications).
• Run: • To launch OpenRefine, go to the Applications folder and double click the OpenRefine
app. You'll see the OpenRefine app appear in your dock.
• Shut down: • You can switch to the OpenRefine app (clicking on its icon in the dock) and invoke its
Quit command.
• If you use Yosemite you will need to install Java for OS X 2014-001 first.
INSTALLATION: MAC
![Page 11: OpenRefine Class Tutorial](https://reader036.fdocuments.in/reader036/viewer/2022062821/588890561a28ab3e658b68fd/html5/thumbnails/11.jpg)
• Install / Run: Once you have downloaded the tar.gz file, open a shell and type
• tar xzf google-refine.tar.gz
• cd google-refine
• ./refine
• This will start OpenRefine and open your browser to its starting page.
• Shut down: Press Ctrl-C in the shell.
INSTALLATION: LINUX
![Page 12: OpenRefine Class Tutorial](https://reader036.fdocuments.in/reader036/viewer/2022062821/588890561a28ab3e658b68fd/html5/thumbnails/12.jpg)
RUN OPENREFINE
• To increase memory: refine.bat /m 4096m
![Page 13: OpenRefine Class Tutorial](https://reader036.fdocuments.in/reader036/viewer/2022062821/588890561a28ab3e658b68fd/html5/thumbnails/13.jpg)
IMPORT DATA
![Page 14: OpenRefine Class Tutorial](https://reader036.fdocuments.in/reader036/viewer/2022062821/588890561a28ab3e658b68fd/html5/thumbnails/14.jpg)
EXPLORING DATA
![Page 15: OpenRefine Class Tutorial](https://reader036.fdocuments.in/reader036/viewer/2022062821/588890561a28ab3e658b68fd/html5/thumbnails/15.jpg)
MANIPULATING COLUMNS
![Page 16: OpenRefine Class Tutorial](https://reader036.fdocuments.in/reader036/viewer/2022062821/588890561a28ab3e658b68fd/html5/thumbnails/16.jpg)
USING THE PROJECT HISTORY
![Page 17: OpenRefine Class Tutorial](https://reader036.fdocuments.in/reader036/viewer/2022062821/588890561a28ab3e658b68fd/html5/thumbnails/17.jpg)
EXPORTING A PROJECT
![Page 18: OpenRefine Class Tutorial](https://reader036.fdocuments.in/reader036/viewer/2022062821/588890561a28ab3e658b68fd/html5/thumbnails/18.jpg)
ANALYZING AND FIXING DATA
![Page 19: OpenRefine Class Tutorial](https://reader036.fdocuments.in/reader036/viewer/2022062821/588890561a28ab3e658b68fd/html5/thumbnails/19.jpg)
WORKING ON THE DATA• sorting data
• faceting data
• detecting duplicates
• applying a text filter
• using simple cell transformations
• removing matching rows
• splitting data across columns
• adding derived columns
![Page 20: OpenRefine Class Tutorial](https://reader036.fdocuments.in/reader036/viewer/2022062821/588890561a28ab3e658b68fd/html5/thumbnails/20.jpg)
SPECIAL FEATURE• Regular Expressions and GREL
• Can use Python, Clojure
![Page 21: OpenRefine Class Tutorial](https://reader036.fdocuments.in/reader036/viewer/2022062821/588890561a28ab3e658b68fd/html5/thumbnails/21.jpg)
ADDING A RECONCILIATION SERVICE ANDRECONCILING WITH LINKED DATA
![Page 22: OpenRefine Class Tutorial](https://reader036.fdocuments.in/reader036/viewer/2022062821/588890561a28ab3e658b68fd/html5/thumbnails/22.jpg)
ADVANCED DATA OPERATIONS• handling multi-valued cells
• alternating between rows and records mode
• clustering similar cells
• transforming cell values
• adding derived columns
• transposing rows and columns
• installing extensions
![Page 23: OpenRefine Class Tutorial](https://reader036.fdocuments.in/reader036/viewer/2022062821/588890561a28ab3e658b68fd/html5/thumbnails/23.jpg)
• Documentation: • https://github.com/OpenRefine/OpenRefine/wiki
• Youtube Tutorial:• https://www.youtube.com/playlist?list=PL737054C67FCC0741
REFERENCES: