Data Sciences
Data Science tutorials, Data Science eBooks, Data Science data sets,Data Science codes, Data Science programming languages, and Data Science reports.
Contents of the blog.
- Artificial Intelligence (2)
- big data analytics (2)
- Case studies (2)
- chisquare (3)
- Correlation (2)
- Data Preparation (5)
- Data Reduction (1)
- Data Reports (1)
- Data Visualization (5)
- Data Wrangling (5)
- Descriptive Statistics (15)
- face analysis (1)
- Hypothesis Testing (4)
- Image analysis. (2)
- Inferential Statistics (6)
- Learn SPSS (10)
- Linear regression (1)
- Machine learning (5)
- Maths for MBA's (2)
- Mean (4)
- Measures of Central tendency (5)
- Measures of Dispersion (3)
- Median. (2)
- multivariate analysis (2)
- Non parametric tests (3)
- One sample t test (1)
- paired sample t test (1)
- Parametric tests. (2)
- Primary data (2)
- Python for MBA's (9)
- quartiles (1)
- R for MBA's (11)
- Range (1)
- Research Methodology (6)
- Secondary data (2)
- Sentiment analysis (1)
- Spreadsheets using sheets and excel. (3)
- Standard Deviation (1)
- Support vector Machines. (1)
- T-test (2)
- Text analytics (1)
- validity tests (1)
- Variance (1)
Tuesday, June 29, 2021
Sunday, May 9, 2021
Introduction to Data Sciences
https://docs.google.com/presentation/d/18rOWO8lfiRj8yaF7_DOrZxUcKh4LmytlAv6W1-1wnmk/edit?usp=drivesdk
Wednesday, January 20, 2021
Python for MBA's- Multiple bar charts using Python
import numpy as np
import matplotlib.pyplot as plt
Agricultureproduction = [[30, 25, 50, 20],
[40, 23, 51, 17],
[35, 22, 45, 19]]
X = np.arange(4)
fig = plt.figure()
ax = fig.add_axes([0,0,1,1])
ax.bar(X + 0.00, Agricultureproduction[0], color = 'b', width = 0.25)
ax.bar(X + 0.25, Agricultureproduction[1], color = 'g', width = 0.25)
ax.bar(X + 0.50, Agricultureproduction[2], color = 'r', width = 0.25)
Python for MBA's- Barplot using Python
import numpy as np
import matplotlib.pyplot as plt
fig=plt.figure()
ax=fig.add_axes([0,0,1,1])
GDP=[8.2,8.4,7.2,6.5,8.5,9,9.2]
Year=['2010','2011','2012','2013','2014','2015','2016']
ax.bar(Year,GDP)
plt.show()
calculating linear regression using R
> # calculating the impact of digital marketing campaign cost on sales > campaign<-c(24,26,28,29) > sales<-c(134,145,167,172) > realtion<-lm(campaign~sales) > print(relation) Call: lm(formula = x ~ y) Coefficients: (Intercept) y 33.8312 -0.1052 > print(summary(relation)) Call: lm(formula = x ~ y) Residuals: 1 2 3 4 5 -9.568 23.642 3.747 -6.148 -11.674 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 33.8312 16.3441 2.070 0.130 y -0.1052 0.6855 -0.154 0.888 Residual standard error: 16.72 on 3 degrees of freedom Multiple R-squared: 0.007795, Adjusted R-squared: -0.3229 F-statistic: 0.02357 on 1 and 3 DF, p-value: 0.8877
Friday, July 3, 2020
Support Vector machines
- The support vectors are data points close to the hyperplanes. These data points influence the position and orientation of the hyperplanes.
- Removing support vectors alter the shape of the hyperplanes.
Monday, June 22, 2020
Data wrangling: Definitions, tools and techniques.
- Google data preparations.
- Microsoft Excel power query.
- Data Wranglers
- CSV kit.
- Openrefine
- R data cleansing Etc.
- Missing value analysis
- Outlier analysis
- Transformations
- Visual binning
- Multi- collinearity
- Principal component analysis
- Dummy variable analysis
- Singular Vector decomposition
- Linear discriminant analysis
- Multidimensional scaling
- T-SNE and
- Independent component analysis.
Thursday, June 11, 2020
Descriptive statistics tools
Descriptive statistics tools include
1.Mean2.Median
3.Mode
4.Skewness
5.Kurtosis
6.Normal plots
7.Turf analysis
8.Cross tabulation
9.Test of linearity
10.Tests of normality
11.standard deviation
12.Variance
13.Maximum value
14.Minimum value
Features of SPSS : Taking business decision effectively
Features of SPSS:
1.Data visualization features help organization to understand data behavior.2.Regression, Neural network, and time series analysis aid researchers to forecast the demand for the business.
3 correlation analysis finds relationship between two variables like cost and sales.
4.Parametric and non parametric tests makes conclusion on hypothesis of the business research.
5.Market segmentation done using cluster analysis.
6.Conjoint analysis applied for the new product development.
7.Multi dimension scaling applied for positioning.
8.Statistical quality control charts in SPSS widely implemented for total quality management.
Wednesday, June 10, 2020
Multivariate analysis definition, and objectives
Multivariate Analysis
Definition
Multivariate analysis are set of statistical techniques used by a researcher to test the hypothesis set on multiple variables of sampling unit or sampling units of his experiment or research design.Examples:
1. Sampling unit : Learners
variables: Grades in mathematics, statistics, big data analytics, database management.
2. Sampling unit; Patient
Variables: heart rate, Body mass index, weight, height
Learning Objectives:
Objectives of Multivariate analysis:
Multivariate data analysis questions for examinations
- Discuss the importance of factor analysis in data reduction.
- What is the difference between varimax and equimax in factor analysis.
- Explain rotated component matrix in factor analysis
- What do you mean by Eigenvalue?
- Define communalities.
- Explain principle component analysis.
- Discuss the use of maximum likelihood function in the factor analysis.
- Explain multivariate normal distribution.
- Discuss tests of covariance matrices.
- Explain the importance of discriminant analysis.
- Elaborate the application of canonical correlation.
- Explain multiple regression with an example.
- Discuss the cluster analysis application in segmentation.
- Distinguish between hierarchical clustering and tw ostep clustering.
- Explain K- mean clustering.
- Write a note on MANOVA.
- What is Ginearal linear model and how it is different from Genaralized linear model.
- How wilki's lambda used in multivariate analysis?
- What do you mean by bootstrapping?
- explain the latent structure discovery.
- List any five tools of data mining.
- Distinguish OLS and PLS regression.
- write a note on SIMCA
Monday, June 8, 2020
Python for MBA's: Chisqaure test
Chi sqaure test using Google Colab
Independent test
Problem 1:
coding
output
Power_divergenceResult(statistic=1.0,
pvalue=0.9097959895689501)
Saturday, June 6, 2020
R for MBA's- Calculating quartiles/quantiles using R
Coding
> x<-c(97,92,104,189,156,156,125)
> quantile(x)
Output
0% 25% 50% 75% 100%
92.0 100.5 125.0 156.0 189.0
R for MBA's: Calculating Range of numbers using R
Coding:
> x<-(97,92,104,189,156,156,125)> range(x)
Output
[1] 92 189
R for MBA's- Calculating Standard Deviation and Variance using R
Coding and output
> x<-c(97,92,104,189,156,156,125)
> sd(x)
Standard deviation: [1] 36.64112
> var(x)
variance: [1] 1342.571
Friday, June 5, 2020
Thursday, June 4, 2020
R for MBA's- Chi sqaure test(Independent)
R for MBA's - Chi sqaure test (independent)
coding
x<-(1567,1233,1456,1678,1456,1111,1895)
> chisq.test(x)
Output
Chi-squared test for given probabilities
data: x
X-squared = 280.87, df = 6, p-value < 2.2e-16
Wednesday, June 3, 2020
Friday, May 29, 2020
Wednesday, May 27, 2020
R for MBA's- Hypothesis testing - T test
R for MBA's- Hypothesis testing - 't' test
one sample
Problem 1:
Coding:
Output
data: x
t = 12.581, df = 8, p-value = 1.495e-06
alternative hypothesis: true mean is not equal to 0
52.17808 75.59970
sample estimates:
mean of x
63.88889