Analisis Perbandingan Algoritma Kompresi Data FLC, VLC dan Algoritma Huffman
learn data science by building ACADEMY TRACKS - Algoritma · learn data science by building A fun,...
Transcript of learn data science by building ACADEMY TRACKS - Algoritma · learn data science by building A fun,...
ACADEMYTRACKS
l earn data sc ience by bu i ld ing
https://algorit.ma/data-visualization-specialization/
l earn data sc ience by bu i ld ing
A fun, hands-on, and project-based specialization that help students gain full proficiency in data visualization systems and tools. Create compelling narratives by combining charting elements with custom aesthetics under the guidance of our instructors.
The learn-by-building module in all the workshops follow our project-based learning philosophy to this specialization. The course’s capstone require the students to build a real-world application under stringent criteria modeled after real business scenarios.
DATA VISUALIZATION SPECIALIZATION
Programming for Data Science is a course that cover the important programming paradigms and tools used by data analysts and data scientists today. You will be guided through a series of coding exercises designed to maximize your familiarity with data science programming in RStudio, an integrated development environment for the statistical computing language R.
Upon completion of this workshop, you will be familiar with the programming language, popular tools, libraries (data science packages) and tool kits required to excel in your data analysis and statistical computing projects.
Programming for Data Science3-day workshop
Module 1: Data Science in R Module 2: Data Manipulation
Data Science in RR Programming BasicsWhy Learn R?R Studio InterfaceData Structures in R
Working with DataReading & Extracting DataUnderstanding StatisticsExploratory Data Analysis
Data Manipulation IIVector Types and ClassesList and ObjectsMatrix and Data Frames
Practical Data CleansingThe Data Transformation ProcessReproducible Data Science ProjectsReading and Writing from Your IDE
Data ManipulationWorking with Your Global EnvironmentGetting Familiar with Your WorkspaceContinuous and Categorical Data
R in PracticeProgramming Exercise: e-Commerce Retail DatasetsIn-depth Review of Data Frame SubsettingSampling and RandomizationCross-TabulationsAggregations
WEEK 1
WEEK 1
Academy Modules
Learning-by-Building Module (2 Points)
Working with RR Scripts and FunctionsR MarkdownWhy Care About Reproducibility
Graded Quiz
Retail Sales Pre-Diagnostic Cleanup ScriptBuild a programming script that reads data into your workspace, perform various data cleansing tasks, and save the result in the appropriate formats for data science work.
Reproducible Data ScienceCreate an R Markdown file that combines data transformation code with explanatory text. Add formatting styles and hierarchical structure using Markdown.
WEEK 1
Pave the statistical foundation for more advanced machine learning theories later on in the specialization by picking up the key ideas in statistical thinking. Learn to interpret correlations, construct confidence intervals and other statistical principles that form the basis of many common machine learning models.
The 2-day course is optional for participations of the Data Science and Machine Learning Specialization and intended for learners without prior experience in statistics.
PracticalStatistics2-day workshop
Module 1: Descriptive Statistics
5-Number SummaryMean, Median and ModeMeasures of Central TendencyQuantiles in R
Central Tendency & VariabilityProbability Distribution FunctionVisualizing Central TendencyVariation, Variance and Covariance
Module 2: Inferential Statistics
ProbabilitiesProbability Mass FunctionProbability Density FunctionExpected Values
IntervalsConfidence IntervalsPrediction Intervalsp-Values
Standard Score and z-ScoreStandard Normal CurveCentral Limit Theoremz-Score Calculation & Student's T-test
Inferential Statistics in PracticeHypothesis TestingDeriving Scientific Truths from DataCase Study
WEEK 1
Academy Modules
Learning-by-Building Module (Not Graded)
Tips & Techniques: R for Statisticians
Statistical Treatment of Retail Dataset
Is there a difference in profitability between standard shipment and same-day shipment?
Using what you’ve learned, formulate a question and derive a statistical hypothesis test to answer the question. You have to demonstrate that you’re able to make decisions using data in a scientific manner. Examples of questions can be:
Density PlotsInterpreting Box Plots (Box-and-Whisker)Better Summary Statistics with skimr()Pais Matrix
Supposed there is no difference in profitability between the different product segment, what is the probability that we obtain the current observation due to pure chance alone?
WEEK 2
Module 1: Plotting Essentials
Base Plotting IPlots and LinesBuilt-in Plot TypesLegends and AnnotationsOther Built-in Plotting Functionalities
Base Plotting IIHistograms and CurvesCleveland's Dot PlotAxis, Titles, Subtitles and Panel Styles The Notorious Pie Chart
Module 2: Richer Visualization Techniques
Enhancing ggplot2 IIFlipping Coordinates and Axis RotationMulti-dimensional FacetingText Layers and Label LayersExpected Values
Enhancing ggplot2 IIIEnriching: Scatterplots and Bubble Plots Enriching: JitterplotsEnriching: Boxplots and Violin PlotsLayer Transparency
Working with ggplot2Grammar of Graphics SystemMapping AestheticsWorking with GeometriesBackground Image
Enhancing ggplot2Axis, Titles and ScalesAdding Themes to Your PlotsCustom Aesthetics and StylesWorking with Legends
Enhancing ggplot2 IVEnriching: Column Plots Enriching: Texts and LabelsEnriching: Horizontal and Vertical LinesFills and Colors
Other Visualization ToolsetDiscrete, Continuous, and Gradient ColorsFacet with Wraps and GridsVisualizing Spatial DataWorking with Leaflet and Maps
A fun, hands-on, and project-based workshop that helps student gain full proficiency in data visualization systems and tools. Create compelling narratives by combining charting elements with custom aesthetics under the guidance of our instructors.
The 9-hour course follows our learn-by-building approach, in that students are tasked to reproduce a series of plots applying what they’ve learned. While it covers the three main plotting systems in R, its particular focus is on ggplot2 and the additional libraries centered around it that bring interactivity and enhanced aesthetic options to the art of creating rich, powerful visualizations.
Data Visualizationin R4-day workshop
WEEK 2
Academy Modules
Learning-by-Building Module (2 Points)
Project: Mining Trending Videos on YouTube
Creating a Publication-Grade Plot
Creating an Interactive Map
Applying what you’ve learned, create an economics- or social-related plot that is polished with the appropriate annotations, aesthetics, and some simple commentary. You may use the same "YouTube Trending Videos" dataset or any other dataset for this practice.
Applying what you’ve learned, create a web page with an interactive map embedded on it. Use a custom icon for the map markers to represent business locations, and show details about each location pin (“markers”) upon user’s interaction with it.
Hands-on Data VisualizationIdentifying Temporal Patterns in Trending VideosCombining Aesthetics and Geometries
WEEK 3
Module 1: Interactive Visualization
Working with PlotlyRefresher on dplyrggplotly FunctionVisualization as a HTML WidgetRange Slider and Other Interactivity
Publication & Layout OptionsMultiple Plots ArrangementMore Export FunctionsSubplotsTips and Techniques for Layouts
Module 2: Web Dashboard Development
Flex DashboardCreating Flex Dashboard from RStudioLayoutsHands-on Practice: Text, Plots, TablesDemonstration and Practical Advice
Interactive DocumentInputs and OutputsThe renderPlot() FunctionEmbedded ApplicationDemonstration and Practical Advice
Shiny Web AppShiny DashboardTabs and PaginationUI, Server and Shiny FunctionsCustom Styles, Structure
Building on the foundation from previous classes, we will create a series of interactive plots and gadgets that renders multiple visualization elements based on user’s input. This is the final workshop leading up to the data visualization capstone project.
The 9-hour course follows our learn-by-building approach, in that students are tasked to reproduce a series of plots applying what they’ve learned. It covers an exhaustive list of techniques that add interactivity to an R document and set the stage for the data science capstone project.
Interactive Plotting &Web Dashboard4-day workshop
WEEK 3
WEEK 4
Academy Modules
Learning-by-Building Module (4 Points)
Tips on Web Dashboard Deployment Working with Live DataApp Deployment SolutionsTips for Live Dashboard Performance
Applying what you’ve learned, create a paginated web dashboard with a rich set of UI elements coupled with the appropriate server logic. The web dashboard can be of any theme, using any dataset, but must feature an input panel that accepts end user inputs and render the output accordingly.
Building an Interactive Dashboard
Data VisualizationCapstone Project
After having learned and explored appropriate techniques on visualizing data, students are required to deploy an interactive web dashboard application using a shiny server which contains any plotting objects such as ggplot and/or leaflet that display useful insights. In addition, students are given the freedom to use their own dataset or past datasets from previous classes.
Marks of the project is out of 30 points, the rubrics for assessment and grading will be discussed in the class.
A careful combination of statistical theory, hands-on coding and programming exercises to help students understand and implement some of the most widely-used and fundamental, machine learning algorithms. By building regressors and classifier algorithms from scratch, the students will go beyond applying machine learning models to actually developing their own models — and learn the right approach to fine-tuning the model performance as well as evaluating model fit against unseen data. Upon completion of the workshop, the students will be well versed in an array of important, versatile machine learning algorithms and equipped with the right knowledge to apply them to future datasets in their daily job.
MACHINE LEARNING SPECIALIZATION
l e a rn da ta s c i e n ce by bu i l d i n g
WEEK 5
This course strives for a fine balance between business applications and mathematical rigor in its treatment to regression models, one of the most essential statistical techniques in the field of machine learning. Its aim is to equip you with the knowledge to investigate relationships between variables of a data effectively and rigorously.
We strongly recommend that you complete Practical Statistics prior to taking this course. Upon completion of this workshop, you will acquire a rigorous statistical understanding of machine learning models, allowing you to extrapolate the same ideas into other, more advanced machine learning models.
RegressionModels4-day workshop
Module 1: Regression Models I Module 2: Regression Models II
OLS RegressionUnderstanding Least SquaresOutliers ISimple Linear Regression
Linear Models in RUnderstanding CoefficientsPlotting RegressionModel Construction
Interpreting Linear ModelsEstimates and Standard Errorst-Value and p-ValueAdjusted R-Squared
Multiple RegressionMulticollinearity and VIFModel Assumptions Bias-Variance Trade-offOutliers II: Leverage and Influence
Interpreting Linear ModelsResiduals ManuallyCoefficients ManuallyR-Squared Manually
Dive Deeper: Regression ModelsModel Selection and SpecificationStep-wise RegressionAll-possible RegressionsResidual PlotsModel DiagnosticsLimitations of Regression Models
WEEK 5
Academy Modules
Learning-by-Building Module (3 Points)
Graded Quiz
Recommendation on Lowering Crime RatesWrite a regression analysis report applying what you’ve learned in the workshop. Using the dataset provided by you, write your findings on the different socioeconomic variables most highly correlated to crime rates.
Explain your recommendations where appropriate.
WEEK 6
Learn to solve binary and multi-class classification models using machine learning algorithms that is easily understood and readily interpretable. You will learn to write a classification algorithm from scratch, and appreciate the mathematical foundations underpinning logistic regressions and nearest neighbors algorithms.
We strongly recommend that you complete the regression models workshop prior to taking this course. Upon completion of this workshop, you will acquire the depth to develop, apply, and evaluate two highly versatile algorithms widely used today.
Classification in Machine Learning I4-day workshop
Module 1: Logistic Regression Module 2: Nearest Neighbours Algorithm
Relating Probabilities to OddsUnderstanding OddsUnderstanding Log of OddsPlotting Odds and Log of Odds
Logistic Regression from First PrinciplesSigmoidal Logistic FunctionKey Assumptions of Sigmoid FunctionExtra Proof: Intuition Behind The Sigmoid Function
Closer Look at ClassificationProbabilities vs Class ResponseConfusion MatrixSensitivity, Specificity & Precision
k-NN in ActionCharacteristics of k-NNPositives and NegativesDiagnosing Breast Cancer with k-NN
Logistic Regression in ActionBinary Logistic RegressionInterpreting CoefficientsInterpretation Against Continuous & Discrete Variables
Practical Tips and Case StudyFlight Delay Prediction ExamplesCustomer Churn and Attrition ExamplesRisk Modeling on Loans from Quarter 4, 2017
Performance Evaluation and Model SelectionAIC (Akaike Information Criteria)Null Deviance and Residual DevianceHauck Donner Effect
Building Blocks of k-NNDistance Function (Euclidean, Minkowsky)The k ParameterStandardization vs Min-Max Normalization
k-NN from First PrinciplesClassifying Customer Segments with k-NNWriting Your Own k-NN ClassifierPredicting Using Your Own k-NN Classifier
WEEK 6
Academy Modules
Learning-by-Building Module (3 Points)
Graded Quiz
Logistic Regression on Credit RiskApplying what you’ve learned, present a simple R Markdown document in which you demonstrate the use of logistic regression on the lbb_loans.csv dataset. Explain your findings wherever necessary and show the necessary data preparation steps. To help you through the exercise, consider the following questions throughout the document:
Customer Segment PredictionApplying what you’ve learned, present a simple R Markdown document in which you demonstrate the use of knn on the wholesale.csv dataset. Compare the k-NN to the logistic regression model and answer the following questions throughout the document:
How do we correctly interpret the negative coefficients obtained from your logistic regression?How do we know which of the variables are more statistically significant as predictors?What are some strategies to improve your model?
What is your accuracy? Was the logistic regression better than kNN in terms of accuracy? (recall the lesson on obtaining an unbiased estimate of the model’s accuracy)Was the logistic regression better than our kNN model at explaining which of the variables are good predictors of a customer’s industry?List down 1 disadvantage and 1 strength of each of the approach (kNN and logistic regression)
WEEK 7
Learn to apply the law of probabilities, boosting, bootstrap aggregation, k-fold cross validation, ensembling methods, and a variety of other techniques as we build some of the most widely used machine learning algorithms today. Learn to add performance to your models using mathematically sound principles you’ll learn in this course.
We strongly recommend that you complete the Machine Learning: Classification 1 workshop prior to taking this course. Some concepts presented throughout the lecture may be less-than-ideal for practitioners who have not completed the pre-requisite courses.
Classification in Machine Learning II4-day workshop
Module 1: Naive Bayes Module 2: Tree-Based Methods and Ensembles
Law of ProbabilityDependent and Independent EventsBayes TheoremFormula for Posterior Probability
Naive Bayes ClassifierCharacteristics of a Naive Bayes ClassifierThe "Naive" AssumptionsCustomer Churn Example
Decision TreesAdvantages and Model CharacteristicsInformation Gain and Splitting CriterionPruning and Tree Size
Decision Trees in ActionPredicting Diabetes from Diagnostics MeasurementAUC Curve Key Considerations and Practical AdvicePractical and Performance
ConsiderationsThe Case for SmoothingLaplace (Add-One)Thinking about Training vs Prediction Speed
Naive Bayes in ActionSpam ClassificationPredicting on Text (Corpus)Predicting Political Party Affiliation
Machine Learning TheoriesLogistic Regression, Naive Bayes and Decision Trees Have More in Common Than You ThinkIndustrial ApplicationsThinking About Decision Boundaries
High-Performance Machine LearningBias-Variance Tradeoff Revisited k-Fold Cross ValidationPredicting Exercise Form with Fitness Tracker Data
WEEK 7
Academy Modules
Learning-by-Building Module (3 Points)
Graded Quiz
Identifying Risky Bank LoansUse any of the 3 classification algorithms you’ve learned in this lesson to predict the risk status of a bank loan. The variable default in the dataset indicates whether the applicant did default on the loan issued by the bank.
Use an R Markdown document to lay out your process, and explain the methodology in 1 or 2 brief paragraph. The student should be awarded the full (3) points when:
The preprocessing steps are done, and the student show an understanding of holding out a test/cross - validation set for an estimate of the model’s performance on unseen dataThe model’s performance is sufficiently explained (accuracy may not be the most helpful metric here! Recall about what you’ve learned regarding specificity and sensitivity)The student demonstrated extra effort in evaluating his/her model, and proposes ways to improve the accuracy obtained from the initial model
WEEK 8
Learn PCA (Principal Component Analysis), Clustering, and other algorithms to work with unsupervised machine learning tasks where the target variable is not known or defined. Applying what you’ll learn from this workshop, you will be tasked to develop an anomaly detection or an e-commerce product recommendation model that can be related to real-life business scenarios.
We strongly recommend that you complete the pre-requisite courses prior to taking this course. Some concepts presented throughout the lecture may be less-than-ideal for practitioners who are new to the field of machine learning.
UnsupervisedMachine Learning4-day workshop
Module 1: Dimensionality Reduction Module 2: k-Means Clustering
BackgroundUnderstanding Unsupervised LearningThe "Dimensionality" ProblemIndustrial Use of PCA
Principle Component AnalysisRethinking About CovariancesThe Case for PCAEigenvalues and Eigenvectors
Understanding ClusteringCentroid-based Clustering AlgorithmsThe k-Means ProcedureMathematical Details
k-Means Clustering in ActionCluster-based Product RecommendationScaling and Implementation DetailsVisualizing Clusters
PCA from First PrinciplesJust Enough Matrix AlgebraMathematical ProofVisualization and Visual Proof
PCA in ActionDubious Property Sales in NYCPCA on US Arrests DataBiplot and The Variables Factor Map
Evaluating k-MeansBetween Sum-of-squaresWithin Sum-of-squaresCombining k-Means with PCA
PCA in Action IIEigenfacesPCA on Credit Loan DataDeconstruction and Reconstructing Faces with PCAPrincipal Components by Hand
WEEK 8
Academy Modules
Learning-by-Building Module (3 Points)
Graded Quiz
Diving into Wholesale TransactionsUsing any of the two unsupervised learning algorithms you’ve learned, produce a simple R markdown document where you demonstrate an exercise of either clustering or dimensionality reduction on the wholesale.csv data provided to you
Digging Deep into NYC Property SalesUsing any of the two unsupervised learning algorithms you’ve learned, produce a simple R markdown document where you demonstrate an exercise of either clustering or dimensionality reduction on the nyc data provided to you
Explain your choice of parameters (how you choose k for k-means clustering, or how you choose to retain n number of dimensions for PCA) from the original data. What are some business utilities for the unsupervised model you’ve developed? The R Markdown document should be no longer than 4 paragraph, and contain one or two visualizations.
WEEK 9
Decomposition of time series allows us to learn about the underlying seasonality, trend and random fluctuations in a systematic fashion. In this workshop, we learn the methods to account for seasonality and trend, work with autocorrelation models and create industry-scale forecasts using modern tools and frameworks.
We strongly recommend that you complete the pre-requisite workshops prior to taking this course. Some concepts presented throughout the lecture may be less-than-ideal for practitioners who have not completed the pre-requisite courses.
Time Series &Forecasting4-day workshop
Module 1: Time Series I Module 2: Forecasting
Working with Time SeriesApplication of Time SeriesDefinition of a ts ObjectFunctions to Work with Time Series
Time Series in ActionIndonesia's Gas Emissions, 1970-2012Frequency, Start and EndTime Series Plots
Forecasting ISimple Moving AverageSimple Moving Average from First PrinciplesLog-Transformation
Forecasting IIForecasting Using One-sided SMAForecasting Using Exponential SmoothingHolt's Exponential Smoothing
Classical DecompositionTrend, Seasonality and ResidualsUnderstanding LagsAdditive vs Multiplicative
Classical Decomposition in ActionMonthly Airline Passenger, 1949-1960The decompose FunctionUnderstanding Smoothing
Forecasting IIIThe beta and gamma CoefficientsMathematical DetailsHolt-Winters Exponential Smoothing
Advanced Time SeriesACF and PACFARMA and ARIMA ModelsStationarity and Differencing
Advanced Time Series IIAugmented Dickey-Fuller (ADF) TestSeasonal ARIMATips to Work With xtsFacebook's ProphetQuantmod for Quantitative Traders
Techniques to Work with Time SeriesAdjusting for SeasonalityDetrendingDecomposing Non-Seasonal Time Series
WEEK 9
Academy Modules
Learning-by-Building Module (3 Points)
Graded Quiz
Forecasting the Crime Rate in ChicagoDownload the dataset from Chicago Crime Portal, and use a sample of these data to build a forecasting project where you inspect the seasonality and trend of crime inChicago. Submit your project in the form of a RMD format, and address the following questions:
The student should be awarded the full (3) points if they address at least 2 of the above questions.
Is crime generally rising in Chicago in the past decade (last 10 years)?Is there a seasonal component to the crime rate?Which time series method seems to capture the variation in your time series better? Explain your choice of algorithm and its key assumptions
WEEK 10
Develop artificial neural networks that can recognize face, handwriting patterns and are at the core of some of the most cutting-edge cognitive models in the AI landscape. We will learn to create a backpropagation neural network from scratch, and use our neural network for classification tasks. This class is the final course in the Machine Learning Specialization.
We strongly recommend that you complete the pre-requisite workshops prior to taking this course. Some concepts presented throughout the lecture may be less-than-ideal for practitioners who have not completed the pre-requisite courses.
Neural Network &Deep Learning4-day workshop
Module 1: Neural Network Module 2: Deep Learning
Artificial Neural NetworksThe Biological Brain InspirationCost FunctionThe Building Blocks of Neural Networks
Neural Network ArchitectureLayers, Nodes and SignalsNetwork TopologyDirection of Signal
Neural Networks from First PrinciplesSum of Squared ErrorsCross-Entropy ErrorThe Gradient Descent Algorithm
Neural Networks from ScratchGradient Descent by HandNeural Network by HandLearning Rate and Implementation Details
Neural Network Architecture IIHidden LayersComputing with Neural NetworkMathematical Details
Multi-Layer Perceptrons (MLP)Backpropagation of ErrorFeed-forward vs RecurrentMathematical Details
Neural Networks in ActionPutting It All TogetherParameterization and Practical AdviceDeep Learning for Classification and Regression
Deep Learning in ActionTheorizing with Effect of DepthActivation FunctionsVisualizing Logarithmic Loss
Deep Learning in Action IIPredicting Bank Telemarketing CampaignVisualizing Tricks for Deep Neural NetworksParameterization and Practical Advice
MXNet in ActionThinking About ParallelismMNIST Handwritten Digit RecognitionPredictions with MXNet and Practical Advice
WEEK 10
Academy Modules
Learning-by-Building Module (3 Points)
Graded Quiz
Image Classification Using Neural NetworkBuild a neural network capable of classifying images into one of many classes and explain the choice of your architecture. Test your neural network using unseen images – can your algorithm correctly classify 80% of the images?
WEEK 11
Machine LearningCapstone Project
After having learned various machine learning methods and its application, students are required to choose one project that challenges them to construct an optimal model from the dataset given. The selection of methods include Forecasting, Regression and Classification.
Marks of the project is out of 24 points, the rubrics for assessment and grading will be discussed in the class.
l earn data s c ien c e b y b u i ld in g
Menara Kadin lvl .4 (Parking Building)Jl . H. R . Rasuna Said Blok X-5 No.Kav 3, RT.1/RW.2, Kuningan Timur, Kecamatan Setia Budi, DKI Jakarta 12950
+62 877-7835-3007
teamalgoritma
Team Algoritma
teamalgoritma
teamalgoritma
BLOCK71 JakartaAriobimo Sentral , Jl . H. R . Rasuna SaidKav. X-2 No. 5, RT. 9 / RW. 4, Kuningan Timur, Kecamatan Setiabudi, DKI Jakarta 12950