Replica Synchronization in Distributed File System

ABSTRACT – The Map Reduce framework provides a scalable model for large scale data intensive computing and fault tolerance. In this paper, we propose an algorithm to improve the I/O performance of the distributed file systems. The technique is used to reduce the communication bandwidth and increase the performance in the distributed file system. These challenges are addressed in the proposed algorithm by using adaptive replica synchronization. The adaptive replica synchronization among storage server consists of chunk list which holds the information about the relevant chunk. The proposed algorithm contributing to I/O data rate to write intensive workload. This experiments show the results to prove that the proposed algorithm show the good I/O performance with less synchronization applications.
Index terms – Big data, distributed file system, Map Reduce, Adaptive replica synchronization

INTRODUCTION

The distributed environment which is used to improve the performance and system scalability in the file system known as distributed file system [1]. It consists of many I/O devices chunks of data file across the nodes. The client sends the request to the metadata server who manages all the whole system which gets the permission to access the file. The client will access the storage server which is corresponding to it, which handles the data management, to perform the real operation from the MDS
The distributed file system of MDS which manages all the information about the chunk replicas and replica synchronization is triggered when any one of the replica has been updated [2]. When the data are updated in the file system the newly written data are stored in the disk which becomes the bottleneck. To solve this problem we are using the adaptive replica synchronization in the MDS

Get Help With Your Essay
If you need assistance with writing your essay, our professional essay writing service is here to help!
Essay Writing Service

MapReduce is which is the programming primitive , programmer can map the input set and obtaining the output and those output set send to the reducer to get the map output. In the MapReduce function it is written as the single node and it is synchronized by MapReduce framework [3]. In distributing programming models which perform the work of data splitting, synchronization and fault tolerance. MapReduce framework is the programming model which is associated with implementation for processing large data sets with distributed and parallel algorithm on a cluster of nodes.
Hadoop MapReduce is a framework for developing applications which can process large amounts of data up to even multiple terabytes of data-sets in parallel on large clusters which includes thousands of commodity nodes in a highly fault tolerant and reliable manner. The input and the output of the MapReduce job are stored in Hadoop Distributed File System (HDFS).

RELATED WORKS

GPFS [4] which allocates the space for the multiple copies of data on the different storage server which supports the chunk replication and it writes the updates to all the location. GPFS keeps track of the file which been updated to the chunk replica to the primary storage server. Ceph[5] has replica synchronization similar ,the newly written data should be send to all the replicas which are stored in different storage server which is before responding to the client. Hadoop File System [6] the large data are spitted into different chunk and it is replicated and stored on storage servers, the copes of the any stripe are stored in the storage server and maintained by the MDS, so the replica synchronization are handled by the MDS, the process will be done when new data written on the replicas. In GFS [7], there are various chunk servers were the MDS manages the location and data layout. For the purpose of the reliability in the file system the chunk are replicated on multiple chunk servers; replica synchronization can be done in MDS. The Lustre file system [8], which is known for parallel file system, which has replication mechanism
For better performance Mosa Store [9] which is a dynamic replication for the data reliability. By the application when one new data block is created, the block at one of the SSs is stored in the MosaStore client, and the MDS replicate the new block to the other SSs to avoid the bottleneck when the new data block is created. Replica synchronization is done in the MDS of MosaStore.
The Gfarm file system [10] the replication mechanism is used for data replication for the reliability and availability. In the distributed and parallel file system, the MDS controls the data replication and send the data to the storage servers; this makes pressure to the MDS. Data replication which has the benefits to support for better data access was the data is required and provide data consistency. In the parallel file system [11], this improves the I/O throughput, data duration and availability by data replication. The proposed mechanism, according to the cost of analysis the data pattern are analysed a data replication is done, but replication synchronization is done in the MDS.
In the PARTE file system, the metadata file parts can be replicated to the storage servers to improve the availability of metadata for high service [12]. In detail we can say that in the PARTE file system, the metadata file parts can be distributed and replicated to the corresponding metadata into chunks on the storage servers, the file system in the client which keeps the some request of the metadata which have been sent to the server. If the active MDS crashed for any reason, then these client backup request are used to do the work bu the standby MDS to restore the metadata which are lost during the crash.
iii.PROPOSED SYSTEM OVERVIEW
The adaptive replica synchronization mechanism is used to improve the I/O throughput, communication bandwidth and performance in the distributed file system. The MDS manages the information in the distributed file system which is split the large data into chunks replicas.
The main aim of using the mechanism adaptive replica synchronization because the storage server cannot withstand the large amount of the concurrent read request to the specific replica, adaptive replica is triggered to the up to chunk data to the other related SSs in the hadoop distributed file system [13][5].The adaptive replica synchronization will be preformed to satisfy heavy concurrent reads when the access frequency to the target replica is greater than the predefined threshold. The adaptive replica synchronization mechanism among SSs intends to enhance the I/O subsystems performance.

Fig 1: Architecture of replica synchronization mechanism
A. Big data Preparation and Distributed data Storage
Configure the storage server in distributed storage environment. Hadoop distributed file system consists of big data, Meta Data Servers (MDS), number of replica, Storage Server (SS). Configure the file system based on the above mentioned things with proper communication. Prepare the social network big data. It consists of respected user id, name, status, updates of the user. After the data set preparation, it should be stored in a distributed storage server.

B. Data update in distributed storage
The user communicates with distributed storage server to access the big data. After that, user accesses the big data using storage server (SS). Based on user query, update the big data in distributed storage database. By updating the data we can store that in the storage server.
C. Chunk list replication to storage servers
The chunk list consists of all the information about the replicas which belongs to the same chunk file and stored in the SSs. The primary storage server which has the chunk replica that is newly updated to conduct the adaptive replica synchronization , when there is a large amount of the read request which concurrently passes in a short while with minimum overhead to satisfy this that mechanism is used.
D. Adaptive replica synchronization
The replica synchronization will not perform synchronization when one of the replicas is modified at the same time. The proposed mechanism Adaptive replica synchronization which improve the I/O subsystem performance by reducing the write latency and the effectiveness of replica synchronization is improved because in the near future the target chunk might be written again, we
can say that the other replicas are necessary to update until the adaptive replica synchronization has been triggered by primary storage server.
In the distributed file system the adaptive replica synchronization is used to increase the performance and reduce the communication bandwidth during the large amount of concurrent read request. The main work of the adaptive synchronization is as follows: The first step is chunk is saved in the storage servers is initiated .In second step the write request is send one of the replicas after that the version and count are updated. Those SS update corresponding flag in the chunk list and reply an ACK to the SS. On the next step read/write request send to other overdue replicas .On other hand it should handle all the requests to the target chunk and the every count is incremented according to the read operation and frequency is computed. In addition, the remaining replica synchronization for updated chunks, which are not the hot spot objects after data modification, will be conducted while the SSs are not as busy as in working hours. As a result, a better I/O bandwidth can be obtained with minimum synchronization overhead. The proposed algorithm is shown in algorithm.
ALGORITHM: Adaptive replica synchronization
Precondition and Initialization:
1) MDS handles replica management without synchronization, such as creating a new replica;
2) Initialize [Replica Location] [Dirty], [cnt], and [ver] in Chunk List when the relevant chunk replicas have been created.
Iteration:
1: while Storage server is active do
2: if An access request to the chunk then
3: / Other Replica has been updated /
4: if [Dirty] == 1 then
5: Return the latest Replica Status;
6: break;
7: end if
8: if Write request received then
9: [ver] ← I/O request ID;
10: Broadcast Update Chunk List Request;
11: Conduct write operation;
12: if Receiving ACK to Update Request then
13: Initialize read count
14: [cnt] ← 1;
15: else
16: /Revoke content updates /
17: Undo the write operation;
18: Recover its own Chunk List;
19: end if
20: break;
21: end if
22: if Read request received then
23: Conduct read operation;
24: if [cnt] > 0 then
25: [cnt] ← [cnt] + 1;
26: Compute [Freq]
27: if [Freq] >= Configured Threshold then
28: Issue adaptive replica synchronization;
29: end if
30: end if
31: end if
32: else
33: if Update Chunk List Request received then
34: Update chunk List and ACK
35: [Dirty] ← 1; break;
36: end if
37: if Synchronization Request received then
38: Conduct replica synchronization;
39: end if
40: end if
iv.PERFORMANCE RESULTS
The replica in the target chunk has been modified by the primary SSs will retransmits the updated to the other relevant replicas, and the write latency is which is required time for the each write ,by proposing new mechanism adaptive replica synchronization the write latency is measured by writing the data size.

Fig:2 Write latency
By the adaptive replica synchronization we can get the throughput of the read and write bandwidth in the file system. We will perform both I/O data rate and the time processing operation of the metadata.

Fig.3.I/ O data throughput
V. CONCLUSION
In this paper we have presented an efficient algorithm to process the large amount of the concurrent request in the distributed file system to increase the performance and reduce the I/O communication bandwidth. Our approach that is adaptive replica synchronization is applicable in distributed file system that achieves the performance enhancement and improves the I/O data bandwidth with less synchronization overhead. Furthermore the main contribution is to improve the feasibility, efficiency and applicability compared to other synchronization algorithm. In future, we can extend the analysis by enhancing the robustness of the chunk list
REFRENCES
[1] Benchmarking Mapreduce implementations under different application scenarios Elif Dede Zacharia Fadika Madhusudhan,Lavanya ramakrishnan Grid and Cloud Computing Research Laboratory,Department of Computer Science, State University of New York (SUNY) at Binghamton and Lawrence Berkeley National Laboratory
[2] N. Nieuwejaar and D. Kotz, “The galley parallel file system,” Parallel Comput., vol. 23, no. 4/5, pp. 447–476, Jun. 1997.
[3] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The Hadoop distributed file system,” in Proc. 26th IEEE Symp. MSST, 2010, pp. 1–10,
[4] M. P. I. Forum, “Mpi: A message-passing interface standard,” 1994.
[5] F. Schmuck and R. Haskin, “GPFS: A shared-disk file system for large computing clusters,” in Proc. Conf. FAST, 2002, pp. 231–244, USENIX Association.
[6] S. Weil, S. Brandt, E. Miller, D. Long, and C. Maltzahn, “Ceph: A scalable,high-performance distributed file system,” in Proc. 7th Symp. OSDI, 2006, pp. 307–320, USENIX Association.
[7] W. Tantisiriroj, S. Patil, G. Gibson, S. Son, and S. J. Lang, “On the duality of data-intensive file system design: Reconciling HDFS and PVFS,” in Proc. SC, 2011, p. 67.
[8] S. Ghemawat, H. Gobioff, and S. Leung, “The Google file system,” in Proc. 19th ACM SOSP, 2003, pp. 29–43.
[9] The Lustre file system. [Online]. Available: http://www.lustre.org
[10] E. Vairavanathan, S. AlKiswany, L. Costa, Z. Zhang, D. S. Katz, M. Wilde, and M. Ripeanu, “A workflow-aware storage system: An opportunity study,” in Proc. Int. Symp. CCGrid, Ottawa, ON, Canada, 2012, pp. 326–334.
[11]GfarmFileSystem.[Online].Available:http://datafarm.apgrid.org/
[12] A. Gharaibeh and M. Ripeanu, “Exploring data reliability tradeoffs in replicated storage systems,” in Proc. HPDC, 2009, pp. 217–226.
[13] J. Liao and Y. Ishikawa, “Partial replication of metadata to achieve high metadata availability in parallel file systems,” in Proc. 41st ICPP, 2012, pp. 168–1.

What Will You Get?

We provide professional writing services to help you score straight A’s by submitting custom written assignments that mirror your guidelines.

Premium Quality

Get result-oriented writing and never worry about grades anymore. We follow the highest quality standards to make sure that you get perfect assignments.

Experienced Writers

Our writers have experience in dealing with papers of every educational level. You can surely rely on the expertise of our qualified professionals.

On-Time Delivery

Your deadline is our threshold for success and we take it very seriously. We make sure you receive your papers before your predefined time.

24/7 Customer Support

Someone from our customer support team is always here to respond to your questions. So, hit us up if you have got any ambiguity or concern.

Complete Confidentiality

Sit back and relax while we help you out with writing your papers. We have an ultimate policy for keeping your personal and order-related details a secret.

Authentic Sources

We assure you that your document will be thoroughly checked for plagiarism and grammatical errors as we use highly authentic and licit sources.

Moneyback Guarantee

Still reluctant about placing an order? Our 100% Moneyback Guarantee backs you up on rare occasions where you aren’t satisfied with the writing.

Order Tracking

You don’t have to wait for an update for hours; you can track the progress of your order any time you want. We share the status after each step.

Order Now Talk to Us

Areas of Expertise

Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.

Essay

Thesis

Presentation

Dissertation

Term Paper

Research Paper

Book Review

Assignment

Report

Case Study

Letter

Article

Coursework

Speech

Q & A

Critical Thinking

Areas of Expertise

Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.

Essay

Thesis

Presentation

Dissertation

Term Paper

Research Paper

Book Review

Assignment

Report

Case Study

Letter

Article

Coursework

Speech

Q & A

Critical Thinking

Trusted Partner of 9650+ Students for Writing

From brainstorming your paper's outline to perfecting its grammar, we perform every step carefully to make your paper worthy of A grade.

Preferred Writer

Hire your preferred writer anytime. Simply specify if you want your preferred expert to write your paper and we’ll make that happen.

Grammar Check Report

Get an elaborate and authentic grammar check report with your work to have the grammar goodness sealed in your document.

One Page Summary

You can purchase this feature if you want our writers to sum up your paper in the form of a concise and well-articulated summary.

Plagiarism Report

You don’t have to worry about plagiarism anymore. Get a plagiarism report to certify the uniqueness of your work.

Free Features $66FREE

Most Qualified Writer $10FREE
Plagiarism Scan Report $10FREE
Unlimited Revisions $08FREE
Paper Formatting $05FREE
Cover Page $05FREE
Referencing & Bibliography $10FREE
Dedicated User Area $08FREE
24/7 Order Tracking $05FREE
Periodic Email Alerts $05FREE

Our Services

Join us for the best experience while seeking writing assistance in your college life. A good grade is all you need to boost up your academic excellence and we are all about it.

On-time Delivery
24/7 Order Tracking
Access to Authentic Sources

Academic Writing

We create perfect papers according to the guidelines.

Professional Editing

We seamlessly edit out errors from your papers.

Thorough Proofreading

We thoroughly read your final draft to identify errors.

Thorough Proofreading

We thoroughly read your final draft to identify errors.

Delegate Your Challenging Writing Tasks to Experienced Professionals

Work with ultimate peace of mind because we ensure that your academic work is our responsibility and your grades are a top concern for us!

Sign Up Talk to Us

Check Out Our Sample Work

Dedication. Quality. Commitment. Punctuality

[display_samples]

It May Not Be Much, but It’s Honest Work!

Here is what we have achieved so far. These numbers are evidence that we go the extra mile to make your college journey successful.

0+

Happy Clients

0+

Words Written This Week

0+

Ongoing Orders

0%

Customer Satisfaction Rate

Process as Fine as Brewed Coffee

We have the most intuitive and minimalistic process so that you can easily place an order. Just follow a few steps to unlock success.

Call Us +1 (877) 657-8180 Discuss Order Details Now

See How We Helped 9000+ Students Achieve Success

We Analyze Your Problem and Offer Customized Writing

We understand your guidelines first before delivering any writing service. You can discuss your writing needs and we will have them evaluated by our dedicated team.

Clear elicitation of your requirements.
Customized writing as per your needs.

We Mirror Your Guidelines to Deliver Quality Services

We write your papers in a standardized way. We complete your work in such a way that it turns out to be a perfect description of your guidelines.

Proactive analysis of your writing.
Active communication to understand requirements.

We Handle Your Writing Tasks to Ensure Excellent Grades

We promise you excellent grades and academic excellence that you always longed for. Our writers stay in touch with you via email.

Thorough research and analysis for every order.
Deliverance of reliable writing service to improve your grades.

Place an Order Start Chat Now