# Statistics for Big Data & Analytics

CSense Statistics for Big Data & Analytics is designed for experts venturing into Data Science – Big Data and Data Analytics, helping them to have a strong foundation with statistics.

Data scientists say that every second we are creating close to 1 megabyte of data per capita, and the volume of data is growing than ever before.
In line with the availability of data, studies on collecting the data, processing them, analysing and interpreting them for decision making are also expanding. Just a few decades back, statisticians were seen as eccentric personalities in our society. But today, the statistics forms the foundation of analysing the available data and interpreting or predicting the future of the population.

Today, we have different categories of Data Analysis based on the field of use and the techniques deployed viz., Data Analytics, Big Data, Data Science etc. So, it becomes more and more compelling to learn statistics in the purview of Analytics / Big data.

CSense Statistical Data Analysis for Analytics covers the necessary basics of Statistics. We can say the statistical concepts discussed here are the Grammar or the Syntax of Data Language.

## What is Data Science?

Data Science is the field encamping everything about data – structured and unstructured – deals with data collection, cleansing, preparation, analysis, and inference. The data science includes the skills of understanding business needs, statistics, problem-solving, decision making, programming etc.

## What is Analysis?

We define data analysis as a part of the collection of structured data – preferably of samples – arranging and visualising data, fitting a statistical model to the data and inferring or predicting the behaviour of the population or the future. Data Analysis helps in solving business problems using limited structured data and statistics. Data analysis could be done using simple spreadsheets like MS – Excel or using analysis software like Minitab, JMP, etc.

## What is Big Data?

There is no universally accepted definition for Big Data yet. But we can understand that

Big Data is data whose volume, variety, and velocity require new infrastructure, new techniques, and analytics to handle, to manage and extract value from it.

## What is Analytics?

When we talk about analytics – it covers the entire spectrum of data collection, data mining – of huge volumes or running data – either in structured or in unstructured form, arranging, cleaning, analysing for inferences, feeding the inference to the machine and enabling faster and on-line prediction (machine learning).

Analytics calls for the skills programming – predominantly with R or SAS. Data handling skills like NoSQL, Hadoop, etc., are also used.

# Training Contents – Statistics for Big Data & Analytics

## Overview of Big Data

• How does data become Big Data?
• Characteristics of big Data
• Applications of Big Data in various industries

## Data Analytics

• Overview of Data Analytics
• 5 Steps of Analytics
• Identify the theme
• Analyse data
• Statistical Data Modelling
• Predictive interpretations
• Conclusion

## Statistical Basics

• Types of Data
• Understanding data
• Data Reduction – Descriptive Statistics
• Central Tendency
• Dispersion
• Inference from Data

## Managing Data

• Data Cleansing
• Finding Outliers
• Missing Data

## Exploratory Data Analysis

• Data Visualisation
• Box Plot and Outliers
• Histogram
• Statistical Data Modelling
• Probability Distribution
• Normal Model
• Other Models
• Trends & Patterns in Data
• Trend Analysis
• Proposing and validating Hypothesis
• t – Test
• F – Test
• ANOVA
• Chi Square
• Establishing Relationships among data
• Regression Analysis
• Regression and Correlation
• Determination Coefficient
• Simple Linear Regression
• Curvilinear Regression
• Multiple Correlations

## Where to go from here?

• Exercises
• Finding more – data sources
• Overview of open-source applications for Big Data Analytics