http://easy-ai.oss-accelerate.aliyuncs.com/2021-03-07-edabanner.png

Exploratory data analysis is the process of obtaining the original data and using technical means to help oneself better understand the data, extract "good features", and establish a preliminary model.

This article will introduce how to classify data and how to visualize different types of data.

What is exploratory data analysis?

When it comes to basketball, everyone knows that height and wingspan are the key characteristics of athletes.

What about handball?I believe most people can't tell.

When you encounter a field you are not familiar with, you need to quickly have a certain understanding of the unfamiliar field.

There are 2 ways to help us understand unfamiliar areas:

  1. Consult industry insiders.Senior industry insiders will pass on some of their experience.
  2. Go and study data in unfamiliar areas.We can take the physical data and performance data of handball players for analysis to see what are the characteristics of the best handball players.Without any industry experience, some discoveries can be made through data insights.

http://easy-ai.oss-accelerate.aliyuncs.com/2021-03-07-ask-eda.png

The second way above is:Exploratory Data Analysis | Exploratory Data Analysis | EDA

Exploratory data analysis is a data analysis method and concept that uses various technical means (most of which are data visualization) to explore the internal structure and laws of data.

The purpose of exploratory data analysis is to gain as much insight as possible into the data set, discover the internal structure of the data, extract important features, detect outliers, test basic hypotheses, and establish preliminary models.

The 3-step approach to exploratory data analysis

http://easy-ai.oss-accelerate.aliyuncs.com/2021-03-08-3steps.png

The process of exploratory data analysis is roughly divided into 3 steps:

  1. Data Classification
  2. data visualization
  3. Insight data

The first step: data classification

When we get the data, the first step is to classify the data, and then use different methods to process different types of data.

The data can be classified in the following ways from coarse to fine:

http://easy-ai.oss-accelerate.aliyuncs.com/2021-03-07-xifen.png

Structured data vs unstructured data

Structured data: Data that can be organized in tables is considered structured data.

For example: data in Excel, data in MySQL...

Unstructured data: All are organized in non-tabular format.

For example: text, picture, video...

 

Quantitative data vs qualitative data

Quantitative data: Numerical type, which measures the quantity of something.

For example: 1985

Qualitative data: category, describing the nature of something.

For example: post-80s

 

4 levels of data

Norminal level: It is the first level of data, and its structure is the weakest.Just need to sort by name.

For example: blood type (A, B, AB, O), name, color

Ordinal level: Sequencing level adds natural sorting on the basis of categorization level, so that we can compare different data.

For example: the star rating of the restaurant, the evaluation level of the company

Interval level: The fixed distance level must be of numeric type, and these values ​​can be used not only for sorting, but also for addition and subtraction.

For example: Fahrenheit, Celsius (the temperature has a negative number, multiplication and division are not allowed)

Ratio level (ratio level): On the basis of the fixed distance level, the absolute zero point is added, which can not only perform addition and subtraction operations, but also multiplication and division operations.

For example: money, weight

 

Step XNUMX: Data visualization

In order to have a better insight into the data, we can visualize the data to better observe the characteristics of the data.

There are several commonly used data visualizations:

http://easy-ai.oss-accelerate.aliyuncs.com/2021-03-07-keshihua.png

Easyai public number

The four data levels above need to correspond to different visualization methods. Below is a table that can help you choose a better visualization solution.

The following are some basic visualization schemes. In practical applications, there will be more complex, combination charts can be used.

Data level attribute Descriptive statistics chart
Classify Discrete, disordered Frequency ratio, mode Bar chart, pie chart
Sequencing Ordered categories, comparison Frequency, mode, median, percentile Bar chart, pie chart
Fixed distance Number difference is meaningful Frequency, mode, median, mean, standard deviation Bar chart, pie chart, box plot
Fixed ratio continuous Mean, standard deviation Bar chart, curve chart, pie chart, box plot

Step XNUMX: Insight into the data

Data visualization can help us gain better insights into the data. We can more efficiently discover which data is more important, the possible relationships between different data, and which data will affect each other...

The reason why it is called exploratory data analysis is that there are no fixed routines, so there is nothing to talk about in this step.

Final Thoughts

Exploratory data analysis is a data analysis method and concept that uses various technical means (most of which are data visualization) to explore the internal structure and laws of data.

The process of exploratory data analysis is roughly divided into 3 steps:

  1. Data Classification
  2. data visualization
  3. Insight data
Easyai public number