0845 450 6120

M20773 Analyzing Big Data with Microsoft R

The main purpose of the course is to give students the ability to use Microsoft R Server to create and run an analysis on a large dataset, and show how to utilize it in Big Data environments, such as a Hadoop or Spark cluster, or a SQL Server database.

Audience profile

  •   The primary audience for this course is people who wish to analyze large datasets within a big data environment.
  •   The secondary audience are developers who need to integrate R analyses into their solutions.
Reset

Learning Objectives

After completing this course, students will be able to:

  •   Explain how Microsoft R Server and Microsoft R Client work
  •   Use R Client with R Server to explore big data held in different data stores
  •   Visualize data by using graphs and plots
  •   Transform and clean big data sets
  •   Implement options for splitting analysis jobs into parallel tasks
  •   Build and evaluate regression models generated from big data
  •   Create, score, and deploy partitioning models generated from big data
  •   Use R in the SQL Server and Hadoop environments

Pre-Requisites

In addition to their professional experience, students who attend this course should have:

  •   Programming experience using R, and familiarity with common R packages
  •   Knowledge of common statistical methods and data analysis best practices.
  •   Basic knowledge of the Microsoft Windows operating system and its core functionality.
  •   Working knowledge of relational databases.

It is recommended that delegates review this self-pace content to gain an introduction to the R language

https://www.edx.org/course/introduction-r-data-science-microsoft-dat204x-5

Course Content

Module 1: Microsoft R Server and R Client Explain how Microsoft R Server and Microsoft R Client work.

Lessons

  •   What is Microsoft R server
  •   Using Microsoft R client
  •   The ScaleR functions

Lab : Exploring Microsoft R Server and Microsoft R Client

  •   Using R client in VSTR and RStudio
  •   Exploring ScaleR functions
  •   Connecting to a remote server

Module 2: Exploring Big Data At the end of this module the student will be able to use R Client with R Server to explore big data held in different data stores.

Lessons

  •   Understanding ScaleR data sources
  •   Reading data into an XDF object
  •   Summarizing data in an XDF object

Lab : Exploring Big Data

  •   Reading a local CSV file into an XDF file
  •   Transforming data on input
  •   Reading data from SQL Server into an XDF file
  •   Generating summaries over the XDF data

Module 3: Visualizing Big Data Explain how to visualize data by using graphs and plots.

Lessons

  •   Visualizing In-memory data
  •   Visualizing big data

Lab : Visualizing data

  •   Using ggplot to create a faceted plot with overlays
  •   Using rxlinePlot and rxHistogram

Module 4: Processing Big Data Explain how to transform and clean big data sets.

Lessons

  •   Transforming Big Data
  •   Managing datasets

Lab : Processing big data

  •   Transforming big data
  •   Sorting and merging big data
  •   Connecting to a remote server

Module 5: Parallelizing Analysis Operations Explain how to implement options for splitting analysis jobs into parallel tasks.

Lessons

  •   Using the RxLocalParallel compute context with rxExec
  •   Using the revoPemaR package

Lab : Using rxExec and RevoPemaR to parallelize operations

  •   Using rxExec to maximize resource use
  •   Creating and using a PEMA class

Module 6: Creating and Evaluating Regression Models Explain how to build and evaluate regression models generated from big data

Lessons

  •   Clustering Big Data
  •   Generating regression models and making predictions

Lab : Creating a linear regression model

  •   Creating a cluster
  •   Creating a regression model
  •   Generate data for making predictions
  •   Use the models to make predictions and compare the results

Module 7: Creating and Evaluating Partitioning Models Explain how to create and score partitioning models generated from big data.

Lessons

  •   Creating partitioning models based on decision trees.
  •   Test partitioning models by making and comparing predictions

Lab : Creating and evaluating partitioning models

  •   Splitting the dataset
  •   Building models
  •   Running predictions and testing the results
  •   Comparing results

Module 8: Processing Big Data in SQL Server and Hadoop Explain how to transform and clean big data sets.

Lessons

  •   Using R in SQL Server
  •   Using Hadoop Map/Reduce
  •   Using Hadoop Spark

Lab : Processing big data in SQL Server and Hadoop

  •   Creating a model and predicting outcomes in SQL Server
  •   Performing an analysis and plotting the results using Hadoop Map/Reduce
  •   Integrating a sparklyr script into a ScaleR workflow
One Month
Two Months
Three Months
More than Three Months
PRINCE2 Foundation & Practitioner
MSP Foundation & Practitioner
APMP Certificate
ITIL Foundation
Scrum in One Day
Certified ScrumMaster
ISTQB Software Test Foundation
Microsoft Project
BCS Business Analysis Practice
Other - Please Specify Below

Virtual Classroom

Virtual classrooms provide all the benefits of attending a classroom course without the need to arrange travel and accomodation. Please note that virtual courses are attended in real-time, commencing on a specified date.

Virtual Course Dates

Our Customers Include