Cleaning up /tmp under HDFS

The script to wipe out /tmp under HDFS (originally posted here). Could be run in a crontab to periodically delete files older than XXX days.


usage="Usage: [days]"

if [ ! "$1" ]
  echo $usage
  exit 1

now=$(date +%s)

hadoop fs -ls /tmp/hive/hive/ | grep "^d" | while read f;
  dir_date=`echo $f | awk '{print $6}'`
  difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) ))

if [ $difference -gt $1 ]; 
  hadoop fs -ls `echo $f | awk '{ print $8 }'`;
### hadoop fs -rm -r `echo $f | awk '{ print $8 }'`;


By default the script will be executed in a “dry” mode, listing the files that are older than XXX days. Once you’re comfortable with the output, comment the line containing ‘fs -ls’ and uncomment the one with ‘fs -rm’.

If you get Java memory errors while executing the script, make sure to pass HADOOP_CLIENT_OPTS variable prior to calling the script:

export HADOOP_CLIENT_OPTS="-XX:-UseGCOverheadLimit -Xmx4096m"

Tags: , , , , ,

Leave a Reply