- Typically the cluster contain 3 types of nodes.
- Gateway nodes or client nodes or edge nodes
- Master nodes
- Worker nodes
- Developers like us will typically have access to Gateway nodes or Client nodes.
- We can connect to Gateway nodes or Client nodes using SSH.
- Once login, we can interact with HDFS either by using
hadoop fs
orhdfs dfs
. Both of them are aliases to each other. hadoop
have other subcommands thanfs
and is typically used to interact with HDFS or Map Reduce as developers.hdfs
have other subcommands thandfs
. It is typically used to not only manage files in HDFS but also administrative tasks related HDFS components such as Namenode, Secondary Namenode, Datanode etc.- As deveopers, our scope will be limited to use
hdfs dfs
orhadoop fs
to interact with HDFS. - Both have sub commands and each of the sub command take additional control arguments. Let us understand the structure by taking the example of
hdfs dfs -ls -l -S -r /public
.hdfs
is the main command to manage all the components of HDFS.dfs
is the sub command to manage files in HDFS.-ls
is the file system command to list files in HDFS.-l -S -r
are control arguments for-ls
to control the run time behavior of the command./public
is the argument for the-ls
command. It is path in HDFS. You will understad as you get into the details.
%%sh
hadoop
hadoop fs -usage
hdfs dfs -usage