Complete Guide to Big Data and HDFS

The world is getting a huge amount of data every day from various sources like social media, authentication and user-generated data. The tradition means are enough compatible to process it. The traditional database management system is also inefficient to manage it effectively.

SQL and RDMS are unable to cater the problem so we need a new kind of architecture and processing device that can able to manage the huge amount of data. Big data is going to be the subject of future and the amount of data will increase by many folds in near future.

Before moving to handle the big data we should know what big data is first!

Big data is the collection of huge amount of data which are not able to process with the traditional way of computing technique. The amalgam of many tools and techniques and business model are needed to store and process it.

Now the question came that which kind of data is big data?

  • Data store by the black box of the airplanes, helicopter, and fighter planes etc comes under it. It contains various kind of information like image, sound, video capturing, conversation, statistics of the various machines etc.
  • Data on social media platform which includes pictures, video, chats, and user-generated data etc.
  • The data store during the trading in a stock exchange, companies share price, their trading pattern and shares valuation-devolution etc.
  • Energy generation centers, power distribution centers, grids are generating bulk amount of data related to the transmission, distribution and many other aspects of gridding.
  • Data generated by web browsers are enormous, so it also came under the category of big data.

 Kind of Big Data

 It could we be divided into three basic categories first is structured data, second is semi-structured data and the third one is unstructured data.

  1. The data which related each other and establish any kind of relation is called relational or structured data.
  2. Data which can relate a fragment of other data is called semi-structured data. Data generated in XML form come under it.
  3. Data which do not relate to a different kind of data is called unstructured data. Word documents, picture, statistic or log file, PDF come under unstructured data type.

If the huge amount of data is difficult to process then why is it so important?

The huge amount of data becomes very productive and path-breaking for any company after processing. These data contain the census, complain, opinion and reaction which can be a decisive factor in the growth of any industry.

Social media comments, a reaction is creating a trend, so it plays an important role in the business development plan and product promotion strategy of any company.

Problems associated with big data

Storing of data, processing data, transferring data, presenting, fetching the desired piece of data are the common problem associated with big data.

1. HDFS (Hadoop distributed file system)

1It is the storage structure for big data. It fragments the data into a smaller unit and store at “Slave node”. A root node on more reliable hardware stores all information about the slave nodes. Each slave node has multiple copies of it and if once failed the responsibility of processing sifted to another. In this way, each set of information is processed individually and sends its report to the client directly.

2. MapReduce

 It is the processor of HDFS structure that is solely responsible for the processing of data at each slave node and aggregation of the fragmented data at reducer and giving output in a user understandable form.


 It is responsible for the scheduling and maintaining the productivity of the slave node. It is the one which allots the time slice to every single node to process. It seems that every node it operating simultaneously but due to time slice that allotted to every single node each node works one by one but with tremendous speed .so yarn is called the mind of HDFS.

4. PIG

PIG works as the support system of Map reduces. Actually, the PIG is extracting, developing and processing data inside the MapReduce. It can process all kind of data.

Billy Mark is a Microsoft Office expert and has been working in the technical industry since 2002. As a technical expert, Billy has written technical blogs, manuals, white papers, and reviews for many websites such as

Leave a Reply

Your email address will not be published. Required fields are marked *

DISCLAIMER: is an independent support provider on On-Demand Remote Technical Services For Microsoft Office products. Use of Microsoft Name, logo, trademarks & Product Images is only for reference and in no way intended to suggest that Technology has any business association with Microsoft Office. Microsoft Office trademarks, Names, logo and Images are the property of their respective owners; disclaims any ownership in such conditions.