Monday, June 22, 2020

Data wrangling: Definitions, tools and techniques.

Today, I will be discussing about  data wrangling or Data munging 
What is the data wrangling? 
It is the process of cleaning,restructuring, and enriching the data. 
There are different views of data wrangling 
In the first View, The process begins with Discovering the data Using data mapping and It is structured using tabulations. 
Further the researcher cleans the data from outliers and missing values. Added to this. If there is a data insufficiency for analysis,Then data is enriched by adding the data. Additionally, the data is validated using the normality test. After this, the data is published in databases. 
Coming to the second view, The process begins with pre-processing the data,Standardizing the data,cleaning the data,consolidating the data,matching the data, and filtering the data. 
One can do Data search using Google database search. There are many tools have emerged for data wrangling. 
  1. Google data preparations. 
  2. Microsoft Excel power query. 
  3. Data Wranglers 
  4. CSV kit. 
  5. Openrefine 
  6. R data cleansing Etc. 
In a nutshell, what are the data wrangling techniques a data scientist has to know 
  • Missing value analysis 
  • Outlier analysis 
  • Transformations 
  • Visual binning 
  • Multi- collinearity
  • Principal component analysis 
  • Dummy variable analysis
  • Singular Vector decomposition 
  • Linear discriminant analysis
  • Multidimensional scaling 
  • T-SNE and
  • Independent component analysis.

No comments:

Post a Comment