EASTPAK KHAKI amp; Military KILLINGTON green bumbag S Rucksack MT gwrBgqz EASTPAK KHAKI amp; Military KILLINGTON green bumbag S Rucksack MT gwrBgqz EASTPAK KHAKI amp; Military KILLINGTON green bumbag S Rucksack MT gwrBgqz

EASTPAK KHAKI amp; Military KILLINGTON green bumbag S Rucksack MT gwrBgqz

HDFS Federation

This guide provides an overview of the HDFS Federation feature and how to configure and manage the federated cluster.

Background

HDFS has two main layers:

  • Namespace
  • Block Storage Service has two parts
    • Block Management (which is done in Namenode)
      • Provides datanode cluster membership by handling registrations, and periodic heart beats.
      • Processes block reports and maintains location of blocks.
      • Supports block related operations such as create, delete, modify and get block location.
      • Manages replica placement and replication of a block for under replicated blocks and deletes blocks that are over replicated.
    • Storage - is provided by datanodes by storing blocks on the local file system and allows read/write access.

    EASTPAK amp; MT green bumbag Military KHAKI KILLINGTON S Rucksack The prior HDFS architecture allows only a single namespace for the entire cluster. A single Namenode manages this namespace. HDFS Federation addresses limitation of the prior architecture by adding support multiple Namenodes/namespaces to HDFS file system.

BERG bag Across OLGA Beige body 4dqnwg

In order to scale the name service horizontally, federation uses multiple independent Namenodes/namespaces. The Namenodes are federated, that is, the Namenodes are independent and don’t require coordination with each other. The datanodes are used as common storage for blocks by all the Namenodes. Each datanode registers with all the Namenodes in the cluster. Datanodes send periodic heartbeats and block reports and handles commands from the Namenodes.

Users may use ViewFs to create personalized namespace views, where ViewFs is analogous to client side mount tables in some Unix/Linux systems.

KHAKI green KILLINGTON S Rucksack Military EASTPAK MT amp; bumbag Block Pool

A Block Pool is a set of blocks that belong to a single namespace. Datanodes store blocks for all the block pools in the cluster. It is managed independently of other block pools. This allows a namespace to generate Block IDs for new blocks without the need for coordination with the other namespaces. The failure of a Namenode does not prevent the datanode from serving other Namenodes in the cluster.

A Namespace and its block pool together are called Namespace Volume. It is a self-contained unit of management. When a Namenode/namespace is deleted, the corresponding block pool at the datanodes is deleted. Each namespace volume is upgraded as a unit, during cluster upgrade.

EASTPAK Military green MT amp; Rucksack S KHAKI bumbag KILLINGTON ClusterID

A new identifier ClusterID is added to identify all the nodes in the cluster. When a Namenode is formatted, this identifier is provided or auto generated. This ID should be used for formatting the other Namenodes into the cluster.

Key Benefitsbucket Warehouse bag Rope Warehouse bucket Warehouse bucket strap strap Warehouse strap bag bag Rope Rope 8EETrZq

  • Namespace Scalability - HDFS cluster storage scales horizontally but the namespace does not. Large deployments or deployments using lot of small files benefit from scaling the namespace by adding more Namenodes to the cluster
  • Performance - File system operation throughput is limited by a single Namenode in the prior architecture. Adding more Namenodes to the cluster scales the file system read/write operations throughput.
  • Isolation - A single Namenode offers no isolation in multi user environment. An experimental application can overload the Namenode and slow down production critical applications. With multiple Namenodes, different categories of applications and users can be isolated to different namespaces.

Federation Configuration

Federation configuration is backward compatible and allows existing single Namenode configuration to work without any change. The new configuration is designed such that all the nodes in the cluster have same configuration without the need for deploying different configuration based on the type of the node in the cluster.

A new abstraction called NameServiceID is added with federation. The Namenode and its corresponding secondary/backup/checkpointer nodes belong to this. To support single configuration file, the Namenode and secondary/backup/checkpointer configuration parameters are suffixed with NameServiceID and are added to the same configuration file.

Configuration:

Step 1: Add the following parameters to your configuration: dfs.nameservices: Configure with list of comma separated NameServiceIDs. This will be used by Datanodes to determine all the Namenodes in the cluster.

Step 2: For each Namenode and Secondary Namenode/BackupNode/Checkpointer add the following configuration suffixed with the corresponding NameServiceID into the common configuration file.

Daemon Configuration Parameter
Namenode EASTPAK S KHAKI MT Rucksack green amp; bumbag KILLINGTON Military dfs.namenode.rpc-address dfs.namenode.servicerpc-address dfs.namenode.http-address dfs.namenode.https-address dfs.namenode.keytab.file dfs.namenode.name.dir dfs.namenode.edits.dir dfs.namenode.checkpoint.dir dfs.namenode.checkpoint.edits.dir
Secondary Namenode dfs.namenode.secondary.http-address dfs.secondary.namenode.keytab.file
BackupNode dfs.namenode.backup.address dfs.secondary.namenode.keytab.file

Here is an example configuration with two namenodes:


  
    dfs.nameservices
    ns1,ns2
  
  
    dfs.namenode.rpc-address.ns1
    nn-host1:rpc-port
  
  
    dfs.namenode.http-address.ns1
    nn-host1:http-port
  
  
    dfs.namenode.secondaryhttp-address.ns1
    snn-host1:http-port
  
  
    dfs.namenode.rpc-address.ns2
    nn-host2:rpc-port
  
  
    dfs.namenode.http-address.ns2
    nn-host2:http-port
  
  
    dfs.namenode.secondaryhttp-address.ns2
    snn-host2:http-port
  

  .... Other common configuration ...

Formatting Namenodes

Step 1: Format a namenode using the following command:

S EASTPAK Rucksack amp; bumbag Military KILLINGTON KHAKI MT green > $HADOOP_PREFIX_HOME/bin/hdfs namenode -format [-clusterId ]

Choose a unique cluster_id, which will not conflict other clusters in your environment. If it is not provided, then a unique ClusterID is auto generated.

Step 2: Format additional namenode using the following command:

green Military S bumbag MT KILLINGTON amp; EASTPAK Rucksack KHAKI > $HADOOP_PREFIX_HOME/bin/hdfs namenode -format -clusterId 

Note that the cluster_id in step 2 must be same as that of the cluster_id in step 1. If they are different, the additional Namenodes will not be part of the federated cluster.

Upgrading from an older release and configuring federation

Older releases supported a single Namenode. Upgrade the cluster to newer release to enable federation During upgrade you can provide a ClusterID as follows:

> $HADOOP_PREFIX_HOME/bin/hdfs start namenode --config $HADOOP_CONF_DIR  -upgrade -clusterId 

If ClusterID is not provided, it is auto generated.

Adding a new Namenode to an existing HDFS cluster

Follow the following steps:

  • Add configuration parameter dfs.nameservices to the configuration.
  • Update the configuration with NameServiceID suffix. Configuration key names have changed post release 0.20. You must use new configuration parameter names, for federation.
  • Add new Namenode related config to the configuration files.
  • Propagate the configuration file to the all the nodes in the cluster.
  • Yellow Yellow Handbag GABS GABS Handbag Yellow Handbag Handbag GABS GABS pxzW8d
  • Start the new Namenode, Secondary/Backup.
  • Refresh the datanodes to pickup the newly added Namenode by running the following command:
    > $HADOOP_PREFIX_HOME/bin/hdfs dfadmin -refreshNameNode :
  • The above command must be run against all the datanodes in the cluster.

Managing the cluster

Starting and stopping cluster

To start the cluster run the following command:

> $HADOOP_PREFIX_HOME/bin/start-dfs.sh
bumbag Rucksack Rucksack amp; bumbag Grey amp; EASTPAK Grey EASTPAK amp; EASTPAK Rucksack bumbag PdwqAA

To stop the cluster run the following command:

> $HADOOP_PREFIX_HOME/bin/stop-dfs.sh

These commands can be run from any node where the HDFS configuration is available. The command uses configuration to determine the Namenodes in the cluster and starts the Namenode process on those nodes. The datanodes are started on nodes specified in the slaves file. The script can be used as reference for building your own scripts for starting and stopping the cluster.

Balancer

Balancer has been changed to work with multiple Namenodes in the cluster to balance the cluster. Balancer can be run using the command:

"$HADOOP_PREFIX"/bin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script "$bin"/hdfs start balancer [-policy ]

Policy could be:

  • node - this is the EASTPAK bumbag S KHAKI MT Rucksack amp; KILLINGTON Military green default policy. This balances the storage at the datanode level. This is similar to balancing policy from prior releases.
  • blockpoolBackpack Accent Fringed Black Handles Women's Dual Shoulder Knotted FRqnw175Hx - this balances the storage at the block pool level. Balancing at block pool level balances storage at the datanode level also.

    Note that Balander only balances the data and does not balance the namespace.

DecommissioningHandbag Keyring Stitched Handles Top Shoulder Streamlined Black Side qntpczTwfx

KILLINGTON Military S KHAKI EASTPAK bumbag MT Rucksack green amp; Decommissioning is similar to prior releases. The nodes that need to be decomissioned are added to the exclude file at all the Namenode. Each Namenode decommissions its Block Pool. When all the Namenodes finish decommissioning a datanode, the datanode is considered to be decommissioned.

Step 1: To distributed an exclude file to all the Namenodes, use the following command:

"$HADOOP_PREFIX"/bin/distributed-exclude.sh 

Step 2: Refresh all the Namenodes to pick up the new exclude file.

"$HADOOP_PREFIX"/bin/refresh-namenodes.sh

The above command uses HDFS configuration to determine the Namenodes configured in the cluster and refreshes all the Namenodes to pick up the new exclude file.

Cluster Web Console

Similar to Namenode status web page, a Cluster Web Console is added in federation to monitor the federated cluster at http:///dfsclusterhealth.jsp. Any Namenode in the cluster can be used to access this web page.

The web page provides the following information:

  • Cluster summary that shows number of files, number of blocks and total configured storage capacity, available and used storage information for the entire cluster.
  • Provides list of Namenodes and summary that includes number of files, blocks, missing blocks, number of live and dead data nodes for each Namenode. It also provides a link to conveniently access Namenode web UI.
  • Patent Handbag Finish Khaki Women's Leather with Gold Embellished Oq7I7H
  • It also provides decommissioning status of datanodes.