Data Sciences: June 2020

Monday, June 22, 2020

Data wrangling: Definitions, tools and techniques.

Today, I will be discussing about data wrangling or Data munging

What is the data wrangling?

It is the process of cleaning,restructuring, and enriching the data.

There are different views of data wrangling

In the first View, The process begins with Discovering the data Using data mapping and It is structured using tabulations.

Further the researcher cleans the data from outliers and missing values. Added to this. If there is a data insufficiency for analysis,Then data is enriched by adding the data. Additionally, the data is validated using the normality test. After this, the data is published in databases.

Coming to the second view, The process begins with pre-processing the data,Standardizing the data,cleaning the data,consolidating the data,matching the data, and filtering the data.

One can do Data search using Google database search. There are many tools have emerged for data wrangling.

Google data preparations.
Microsoft Excel power query.
Data Wranglers
CSV kit.
Openrefine
R data cleansing Etc.

In a nutshell, what are the data wrangling techniques a data scientist has to know

Missing value analysis
Outlier analysis
Transformations
Visual binning
Multi- collinearity
Principal component analysis
Dummy variable analysis
Singular Vector decomposition
Linear discriminant analysis
Multidimensional scaling
T-SNE and
Independent component analysis.

Thursday, June 11, 2020

Descriptive statistics tools

Descriptive statistics tools include

1.Mean
2.Median
3.Mode
4.Skewness
5.Kurtosis
6.Normal plots
7.Turf analysis
8.Cross tabulation
9.Test of linearity
10.Tests of normality
11.standard deviation
12.Variance
13.Maximum value
14.Minimum value

Features of SPSS : Taking business decision effectively

Features of SPSS:

1.Data visualization features help organization to understand data behavior.
2.Regression, Neural network, and time series analysis aid researchers to forecast the demand for the business.
3 correlation analysis finds relationship between two variables like cost and sales.
4.Parametric and non parametric tests makes conclusion on hypothesis of the business research.
5.Market segmentation done using cluster analysis.
6.Conjoint analysis applied for the new product development.
7.Multi dimension scaling applied for positioning.
8.Statistical quality control charts in SPSS widely implemented for total quality management.

Wednesday, June 10, 2020

Multivariate analysis definition, and objectives

Multivariate Analysis

Definition

Multivariate analysis are set of statistical techniques used by a researcher to test the hypothesis set on multiple variables of sampling unit or sampling units of his experiment or research design.

Examples:
1. Sampling unit : Learners
variables: Grades in mathematics, statistics, big data analytics, database management.
2. Sampling unit; Patient
Variables: heart rate, Body mass index, weight, height

Learning Objectives:

After studying this chapter learner will be able to :

1. know the purposes, assumptions , and limitations of multiple techniques.

2. Identify appropriate techniques for data analysis using multivariate techniques.

3. Interpret the output of software to gain meaningful insights.

Objectives of Multivariate analysis:

1. To understand the relationship between several dependent variables and several independent variables.

2. It identify the data structures of multiple variables.

3. It helps in classifying and categorizing the data.

4. Multivariate techniques helps in data reduction.

Multivariate data analysis questions for examinations

Discuss the importance of factor analysis in data reduction.
What is the difference between varimax and equimax in factor analysis.
Explain rotated component matrix in factor analysis
What do you mean by Eigenvalue?
Define communalities.
Explain principle component analysis.
Discuss the use of maximum likelihood function in the factor analysis.
Explain multivariate normal distribution.
Discuss tests of covariance matrices.
Explain the importance of discriminant analysis.
Elaborate the application of canonical correlation.
Explain multiple regression with an example.
Discuss the cluster analysis application in segmentation.
Distinguish between hierarchical clustering and tw ostep clustering.
Explain K- mean clustering.
Write a note on MANOVA.
What is Ginearal linear model and how it is different from Genaralized linear model.
How wilki's lambda used in multivariate analysis?
What do you mean by bootstrapping?
explain the latent structure discovery.
List any five tools of data mining.
Distinguish OLS and PLS regression.
write a note on SIMCA

Monday, June 8, 2020

Python for MBA's: Chisqaure test

Chi sqaure test using Google Colab

Independent test

Problem 1:

Perform the chi-square test for the following values

14,16,12,15,17

coding

import numpy as np

from scipy.stats import chisquare

chisquare([14,16,12,15,17])

output

Power_divergenceResult(statistic=1.0, 
pvalue=0.9097959895689501)

Saturday, June 6, 2020

R for MBA's- Calculating quartiles/quantiles using R

Coding

> x<-c(97,92,104,189,156,156,125)

> quantile(x)

Output

0% 25% 50% 75% 100%
92.0 100.5 125.0 156.0 189.0

R for MBA's: Calculating Range of numbers using R

Coding:

> x<-(97,92,104,189,156,156,125)

> range(x)

Output

[1] 92 189

R for MBA's- Calculating Standard Deviation and Variance using R

Coding and output

> x<-c(97,92,104,189,156,156,125)

> sd(x)

Standard deviation: [1] 36.64112

> var(x)

variance: [1] 1342.571

Friday, June 5, 2020

Text analytics or Text mining ( Sentiment Analysis and Topic detection)( without coding) using Microsoft Azure.

Thursday, June 4, 2020

Facial Analysis without coding

R for MBA's- Chi sqaure test(Independent)

R for MBA's - Chi sqaure test (independent)

coding

x<-(1567,1233,1456,1678,1456,1111,1895)

> chisq.test(x)

Output

Chi-squared test for given probabilities

data: x

X-squared = 280.87, df = 6, p-value < 2.2e-16

Contents of the blog.

Monday, June 22, 2020

Thursday, June 11, 2020

Descriptive statistics tools include

Features of SPSS:

Wednesday, June 10, 2020

Multivariate Analysis

Definition

Learning Objectives:

Objectives of Multivariate analysis:

Monday, June 8, 2020

Chi sqaure test using Google Colab

Independent test

Problem 1:

coding

output

Saturday, June 6, 2020

Coding

Output

Coding:

Output

Friday, June 5, 2020

Thursday, June 4, 2020

Facial Analysis without coding

R for MBA's - Chi sqaure test (independent)

coding

Output

Wednesday, June 3, 2020

Followers

Contact Form