If the I/O load of the selected node is heavy, the NameNode will choose another DataNode with lighter load. If HDFS Load Balance is enabled during file writing, the NameNode selects a DataNode (in the order of local node, local rack, and remote rack). Based on I/O loads of different nodes, the load balance of HDFS ensures that when read and write operations are performed on the HDFS client, the node with low I/O load is selected to perform such operations to balance I/O load and fully utilize the overall throughput of the cluster. The current read and write policies of HDFS are mainly for local optimization without considering the actual load of nodes or disks. Data in /user/shl is stored in A, E, and F.Įnhanced Open Source Feature: HDFS Load Balance.Data in /user is stored in C, D, and F.Data in /Spark is stored in A, B, D, E, and F.Data in /HBase is stored in A, B, and D.You can set the policy in case of block placement failure, for example, select a node from all nodes randomly.For example, store two replicas of the data block to the node labeled with L1, and store other replicas of the data block to the nodes labeled with L2. You can store the replicas of data blocks to the nodes with different labels accordingly.Then proper nodes are selected from the specified range. If the label-based data block placement policy is used for selecting DataNodes to store the specified files, the DataNode range is specified based on the label expression. You can configure a label expression to an HDFS directory or file and assign one or more labels to a DataNode so that file data blocks can be stored on specified DataNodes. You need to configure the nodes for storing HDFS file data blocks based on data features. This accelerates the HDFS startup.Įnhanced Open Source Feature: Label-based Block Placement Policies (HDFS Nodelabel) When the NameNodes start, sections are loaded in parallel mode. HDFS NameNode divides each type of metadata by segments and stores the data in multiple sections when generating the FsImage files.
![hbase storage policy disk archive hbase storage policy disk archive](http://3.bp.blogspot.com/-Dy4uvAe5oTQ/U073VnY8gWI/AAAAAAAAClA/wR2sGI0_oJY/s1600/HBase+write+process.jpg)
If a large number of files and folders are stored on the HDFS, loading of the two sections is time-consuming, prolonging the HDFS startup time.
#Hbase storage policy disk archive serial
These section blocks are loaded in serial mode during startup. Each type of metadata information (such as file metadata information and folder metadata information) is stored in a section block, respectively. In the open source HDFS, FsImage stores all types of metadata information.
![hbase storage policy disk archive hbase storage policy disk archive](https://image.slidesharecdn.com/nhnhcj2013w-130121080346-phpapp01/95/storage-infrastructure-using-hbase-behind-line-messages-26-638.jpg)
Therefore, this version optimizes the process of loading metadata file FsImage. If the number of files stored on the HDFS reaches the million or billion level, the two processes are time-consuming and will lead to a long startup time of the NameNode. When the data block information reported by DataNodes reaches the preset percentage, NameNodes exits safe mode to complete the startup process. Then, DataNodes will report the data block information after the DataNodes startup. In HDFS, when NameNodes start, the metadata file FsImage needs to be loaded. Enhanced Open Source Feature: HDFS Startup Acceleration