This article is transferred from the public number fresh jujube class (xzclasscom), the original address:

In these years, big data, as a fashionable concept, has a high frequency and a high degree of attention.

For many people, when he first heard the word "big data", it would naturally be understood literally - thinking that big data is a large amount of data, big data technology is a large amount of data storage technology.

But that is not the case.

Big data is more complicated than imagined. It is not just a data storage technology, but a series of extraction, integration, management, analysis and interpretation technologies related to massive data. It is a huge framework system.

Furthermore, big data is a new way of thinking and business model.

Big Data Era

Today's article, let us spend five minutes to find out more about what is big data.

  Definition of big data  

First, still have to revisitDefinition of big data.

There are many definitions of big data in the industry, there are broad definitions, and there are narrow definitions.

The broad definition, a little philosophical taste - big data, refers to the mapping and refinement of the physical world to the digital world. By discovering the characteristics of the data, it makes decision-making behaviors that improve efficiency.

The narrow definition is given by technical engineers - big data is passedObtain,storage,analysis,FromHigh capacityIn the dataMining valueA new technical architecture.

In comparison, I still like the technical definition, haha.

Everyone noticed that the keywords I have thickened in the original sentence!

What to do? - Get data, store data, analyze data

Who do you do? - large capacity data

What is the purpose? ——Excavation value

It's not surprising to get data, store data, and analyze data. We use computers every day and do this every day.

For example, at the beginning of each month, the attendance administrator willObtainThe attendance information of each employee is entered into an Excel spreadsheet, and thenexistIn the computer,statisticsanalysisHow many people are late, absent, and then deduct TA salary.

However, the same behavior, placed on big data, will not work. In other words, the traditional data level of traditional personal computers and traditional conventional software is called “big data”.

  Big data, how big is it?  

Our traditional personal computer, processing data, is GB/TB level. For example, our hard drive is now usually 1TB/2TB/4TB.

The relationship between TB, GB, MB, and KB should be familiar to everyone:

1 KB = 1024 B (KB – kilobyte) 

1 MB = 1024 KB (MB - megabyte) 

1 GB = 1024 MB (GB - gigabyte) 

1 TB = 1024 GB (TB - terabyte) 

What is the level of big data? PB/EB level.

Most people have never heard of it. In fact, it is to continue to 1024 times:

1 PB = 1024 TB (PB - petabyte) 

1 EB = 1024 PB (EB - exabyte) 

Just looking at these letters, it seems not very intuitive. Let me give you an example.

1TBOnly one hard disk can be stored. The capacity is about 20 10,000 photos or 20 million MP3 music, or the 671 "Dream of Red Mansions" novel.

Normal hard disk


2 cabinets

1EB, requires approximately 2000 cabinet storage devices. If you discharge these cabinets, it can be as long as 1.2 kilometers. If you put it in the computer room, you need the 21 standard basketball court to be able to put it down.

21 basketball courts

The Internet giants such as Ali, Baidu and Tencent are said to be close to the EB level.

Ali Data Center Interior

EB is not the biggest. The current amount of data for all humans is ZB.

1 ZB = 1024 EB (ZB - zettabyte) 

In 2011, the total amount of data created and copied worldwide is 1.8ZB.

And by 2020, the data stored in global electronic devices will reach35ZB. If you build a computer room to store this data, the area of ​​the computer room will be larger than the 42 Bird's Nest Stadium.

The amount of data is not only large, but growth is also fast – an annual increase of 50%. In other words, it will double every two years.

The current big data applications have not yet reached the ZB level, mainly concentrated at the PB/EB level. Big data level positioning

1 KB = 1024 B (KB – kilobyte) 

1 MB = 1024 KB (MB - megabyte) 

1 GB = 1024 MB (GB - gigabyte) 

1 TB = 1024 GB (TB - terabyte) 

1 PB = 1024 TB (PB - petabyte) 

1 EB = 1024 PB (EB - exabyte) 

1 ZB = 1024 EB (ZB - zettabyte) 

  Source of data  

Why is the data growing so fast?

Having said that, it is necessary to review several important stages in the generation of human social data.

In general, there are three important stages.

The first stage is the stage after the computer is invented. Especially after the database was invented, the complexity of data management was greatly reduced. Data from all walks of life began to be recorded and thus recorded in the database. The data at this time is dominated by structured data (what will be explained as "structured data"). The way data is generated is alsopassiveof.

The world's first general purpose computer - ENIAC

The second phase was accompanied by the Internet 2.0 era. The most important sign of Internet 2.0 is user-generated content. With the popularity of the Internet and mobile communication devices, people are beginning to use social networks such as blogs, facebook, and youtube.initiativeGenerated a lot of data.

The third stage is the sensory system stage. With the development of the Internet of Things, various sensing layer nodes beginautomaticGenerate a lot of data, such as sensors and cameras around the world.

After the "Passive-active-automaticThe development of these three stages eventually led to the rapid expansion of the total amount of human data.

  Big data 4Vs  

The characteristics of big data in the industry are summarized as 4 V. The huge amount of data mentioned above is Volume. In addition to the Volume, the remaining three are Variety, Velocity, and Value.

Let us introduce one by one.

  • Variety (diversified)

The form of data is diverse, including numbers (prices, transaction data, weight, number of people, etc.), text (mail, web pages, etc.), images, audio, video, location information (latitude and longitude, altitude, etc.), etc. Is the data.

Data is divided intoStructured dataUnstructured data.

As can be seen from the name, structured data refers to data that can be expressed in a predefined data model or stored in a relational database.

Structured data

For example, the age of a class owner, the price of all the goods in a supermarket, these are structured data.

Web articles, email content, images, audio, video, etc. are all unstructured data.

In the field of the Internet,The proportion of unstructured data has exceeded 80% of the entire amount of data.

Big data is in line with this feature:The data forms are diversified and the proportion of unstructured data is high.

  • Velocity (timeliness)

Another characteristic of big data is that it is time-sensitive. From the generation of data to consumption, the time window is very small. The rate of change of data, as well as the processing, is getting faster and faster. For example, the rate of change, from the previous change in day, to the current change in seconds or even milliseconds.

We still use numbers to speak:

What happened in the data world just in the past minute?

Email: 2.04 billion seals are issued

Google: 200 10,000 search requests were submitted

Youtube: 2880 minutes of video uploaded

Facebook: 69.5 million status is updated

Twitter: 98000 push is sent

12306: 1840 tickets are sold


how about it? Is it changing quickly?

  • Value

The last feature is the value density.

The amount of data in big data is large, but what comes with it is that the value density is very low, and the real value in the data is only a small part.

For example, by monitoring video to look for the appearance of criminals, perhaps a few terabytes of video files are really valuable, only a few seconds.

2014 Boston bombings in the United States, the scene was transferred10TBThe monitoring data (including the communication records of the mobile base station, the surveillance video of nearby stores, gas stations, newsstands, and the video materials provided by the volunteers) finally found a photo of the suspect.

  The value of big data  

Just talking about value density, it also talks about the core essence of big data, that isvalue.

The main purpose of human beings to propose big data and research big data is to find out the value in big data.

Big data, what is the value?

As early as 1980, the famous futurist Alvin Toffler made it clear in his book The Third Wave:Data is wealth"and, calling big data"The third wave of cadenza. "

  • The first wave: the agricultural stage, about 1 started 10,000 years ago
  • The second wave: the industrial stage, the beginning of the 17 century
  • The third wave: the informatization stage, 20 century 50 era began

After entering the 21 century, with the development of the second and third phases mentioned above, the rise of the mobile Internet, the storage capacity and the ability of cloud computing have leapt, and the emergence of big data has attracted more and more attention.

The World Economic Forum of 2012 stated: "Data has become a new class of economic assets, just like money and goldThis will undoubtedly push the value of big data to an unprecedented level.

Today, big data applications are beginning to enter our lives, affecting our food, clothing, and housing.

Didi's big data is ripe, I believe everyone has heard of it.

The reason why big data will develop so rapidly is because more and more industries and enterprises are beginning to realize the value of big data and start to try to participate in the value of big data.

In summary, the value of big data comes mainly from two aspects:

1 helps companies understand users

Through the correlation analysis, Big Data connects customers with products and services in tandem, and locates users' preferences to provide more accurate and more oriented products and services and improve sales performance.

A typical example is e-commerce.

An e-commerce platform like Ali Taobao has accumulated a large number of users to purchase data. In the early days, these data were cumbersome and burdensome, and storing them required a lot of hardware costs. However, these data are now Ali's most valuable asset.

Through these data, user behavior can be analyzed, and the consumption characteristics, brand preferences, and geographical distribution of the target customer group can be accurately positioned, thereby guiding the operation management, brand positioning, and promotion marketing of the merchant.

Big data can have a direct impact on performance. Its efficiency and accuracy far exceeds traditional user research.

In addition to e-commerce, including energy, film and television, securities, finance, agriculture, industry, transportation, public utilities, etc., is the use of big data.

Big data can even help the president

2 helps companies understand themselves

In addition to helping to understand users, big data can help you understand yourself.

Enterprise production and operation requires a large amount of resources, and big data can analyze and lock down the specific conditions of resources, such as reserves distribution and demand trends. The visualization of these resources can help enterprise managers to more intuitively understand the operational status of the enterprise, identify problems faster, adjust operational strategies in a timely manner, and reduce operational risks.

All in all, "knowing ourselves and knowing each other, winning every battle."Big data, that is, for decision-making.

  Big data and cloud computing  

Having said that, we have to answer a doubt that many people have in their hearts -What is the relationship between big data and cloud computing?

It can be explained that data itself is an asset, while cloud computing provides a suitable tool for mining asset value.

Technically, big data is dependent on cloud computing. The massive data storage technology, massive data management technology and distributed computing model in cloud computing are the foundation of big data technology.

Cloud computing is like an excavator, big data is a mine. Without cloud computing, the value of big data won't work.

On the contrary, the processing needs of big data have also stimulated the development and landing of cloud computing related technologies.

That is to say, if there is no big data in this mine, the cloud computing excavator, many powerful functions can not develop.

Apply an old saying - cloud computing and big data, the two complement each other.

  Big Data and the Internet of Things (5G)  

second question,What is the relationship between big data and the Internet of Things?

I think everyone should be able to understand this question very quickly. It was mentioned earlier.

The Internet of Things is the "Internet of things and things connected to each other." The perception layer of the Internet of Things has produced a huge amount of data, which will greatly promote the development of big data.

Similarly, big data applications have also played a role in the value of the Internet of Things, which has stimulated the use of the Internet of Things. More and more companies are finding that they can gain value through the Internet of Things big data, and they will be willing to invest in building the Internet of Things.

In fact, this problem can be further extended to"The relationship between big data and 5G".

The upcoming 5G, by increasing the connection speed, enhances the perception of "human networking" and promotes human initiative to create data.

On the other hand, it is more for the "Internet of Things". Including low latency, massive terminal connections, etc., are all requirements of the Internet of Things scene.

5G stimulates the development of the Internet of Things, while the Internet of Things stimulates the development of big data. The power of all communications infrastructure is paving the way for the rise of big data.

  Big data industry chain  

Next, talk about the industry chain of big data.

The industry chain of big data is closely related to the processing of big data. In simple terms, it is production data, aggregate data, analytical data, and consumption data.

Each link has a corresponding role player. As shown below:

From the current situation, foreign manufacturers occupy a large share in the big data industry, especially in the upstream sector, which are basically foreign companies. Compared with domestic IT companies, there is a big gap.

Big data related key areas and enterprises (technical)

  Big data challenge  

Saying so much big data is not a big deal, it doesn't mean big data is perfect.

Big data is also facing many challenges.

In addition to the difficulty of data management technology, the biggest challenge of big data isSafe.

Data is assets and privacy. No one wants their privacy to be exposed, so people are paying more and more attention to their privacy protection. The government is also constantly strengthening the protection of citizens' privacy rights and has introduced many laws.

In the 2018 year, the European Union introduced the most stringent GDPR (General Data Protection Act) in history, raising network data protection to an unprecedented height.

In this case, companies need to carefully consider whether they obtain user data, and whether they are ethical and legal. Once it breaks the law, it will pay a very heavy price.

In addition, even if a company legally acquires data, it is worried about whether it will be maliciously attacked and stolen. The risks here cannot be ignored.

In addition to security, big data also faces problems such as energy consumption.

In other words, if you can't protect and use the big data in your hand, then it is a hot potato.