Now let us walk through different options we have with hdfs ls
command to list the files.
We can get usage by running hdfs dfs -usage ls
.
hdfs dfs -usage ls
Usage: hadoop fs [generic options] -ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [-e] [<path> ...]
hdfs dfs -help ls
-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [-e] [<path> ...] : List the contents that match the specified file pattern. If path is not specified, the contents of /user/<currentUser> will be listed. For a directory a list of its direct children is returned (unless -d option is specified). Directory entries are of the form: permissions - userId groupId sizeOfDirectory(in bytes) modificationDate(yyyy-MM-dd HH:mm) directoryName and file entries are of the form: permissions numberOfReplicas userId groupId sizeOfFile(in bytes) modificationDate(yyyy-MM-dd HH:mm) fileName -C Display the paths of files and directories only. -d Directories are listed as plain files. -h Formats the sizes of files in a human-readable fashion rather than a number of bytes. -q Print ? instead of non-printable characters. -R Recursively list the contents of directories. -t Sort files by modification time (most recent first). -S Sort files by size. -r Reverse the order of the sort. -u Use time of last access instead of modification for display and sorting. -e Display the erasure coding policy of files and directories.
- Let us list all the files in /public/nyse_all/nyse_data folder. It is one of the public data sets that are available under /public. By default files and folders are sorted in ascending order by name.
hdfs dfs -ls -r /public/nyse_all/nyse_data
Found 21 items
-rw-r--r-- 1 itversity supergroup 519586 2022-05-29 17:16 /public/nyse_all/nyse_data/NYSE_2017.txt.gz
-rw-r--r-- 1 itversity supergroup 11796756 2022-05-29 17:16 /public/nyse_all/nyse_data/NYSE_2016.txt.gz
-rw-r--r-- 1 itversity supergroup 11327417 2022-05-29 17:16 /public/nyse_all/nyse_data/NYSE_2015.txt.gz
-rw-r--r-- 1 itversity supergroup 10552757 2022-05-29 17:16 /public/nyse_all/nyse_data/NYSE_2014.txt.gz
-rw-r--r-- 1 itversity supergroup 9588984 2022-05-29 17:16 /public/nyse_all/nyse_data/NYSE_2013.txt.gz
-rw-r--r-- 1 itversity supergroup 8538688 2022-05-29 17:16 /public/nyse_all/nyse_data/NYSE_2012.txt.gz
-rw-r--r-- 1 itversity supergroup 7980961 2022-05-29 17:16 /public/nyse_all/nyse_data/NYSE_2011.txt.gz
-rw-r--r-- 1 itversity supergroup 7551218 2022-05-29 17:16 /public/nyse_all/nyse_data/NYSE_2010.txt.gz
-rw-r--r-- 1 itversity supergroup 7186235 2022-05-29 17:16 /public/nyse_all/nyse_data/NYSE_2009.txt.gz
-rw-r--r-- 1 itversity supergroup 7179621 2022-05-29 17:16 /public/nyse_all/nyse_data/NYSE_2008.txt.gz
-rw-r--r-- 1 itversity supergroup 6903056 2022-05-29 17:16 /public/nyse_all/nyse_data/NYSE_2007.txt.gz
-rw-r--r-- 1 itversity supergroup 6480175 2022-05-29 17:16 /public/nyse_all/nyse_data/NYSE_2006.txt.gz
-rw-r--r-- 1 itversity supergroup 6207833 2022-05-29 17:16 /public/nyse_all/nyse_data/NYSE_2005.txt.gz
-rw-r--r-- 1 itversity supergroup 5689069 2022-05-29 17:16 /public/nyse_all/nyse_data/NYSE_2004.txt.gz
-rw-r--r-- 1 itversity supergroup 5271305 2022-05-29 17:16 /public/nyse_all/nyse_data/NYSE_2003.txt.gz
-rw-r--r-- 1 itversity supergroup 5021940 2022-05-29 17:16 /public/nyse_all/nyse_data/NYSE_2002.txt.gz
-rw-r--r-- 1 itversity supergroup 4722623 2022-05-29 17:16 /public/nyse_all/nyse_data/NYSE_2001.txt.gz
-rw-r--r-- 1 itversity supergroup 4439306 2022-05-29 17:16 /public/nyse_all/nyse_data/NYSE_2000.txt.gz
-rw-r--r-- 1 itversity supergroup 4297025 2022-05-29 17:16 /public/nyse_all/nyse_data/NYSE_1999.txt.gz
-rw-r--r-- 1 itversity supergroup 4142942 2022-05-29 17:16 /public/nyse_all/nyse_data/NYSE_1998.txt.gz
-rw-r--r-- 1 itversity supergroup 3842443 2022-05-29 17:16 /public/nyse_all/nyse_data/NYSE_1997.txt.gz
* We can sort the files and directories by time using `-t` option. By default you will see latest files at top. We can reverse it by using `-t -r`.
hdfs dfs -ls -t /public/nyse_all/nyse_data
* We can sort the files and directories by size using `-S`. By default, the files will be sorted in descending order by size. We can reverse the sorting order using `-S -r`.
hdfs dfs -ls -S /public/nyse_all/nyse_data