Cleaning up /tmp under HDFS

The script to wipe out /tmp under HDFS (originally posted here). Could be run in a crontab to periodically delete files older than XXX days.

  1. #!/bin/bash
  2.  
  3. usage="Usage: cleanup_tmp.sh [days]"
  4.  
  5. if [ ! "$1" ]
  6.  then
  7.   echo $usage
  8.   exit 1
  9. fi
  10.  
  11. now=$(date +%s)
  12.  
  13. hadoop fs –ls /tmp/hive/hive/ | grep "^d" | while read f;
  14.  do
  15.   dir_date=`echo $f | awk '{print $6}'`
  16.   difference=$(( ( $now – $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) ))
  17.  
  18. if [ $difference -gt $1 ];
  19.  then
  20.   hadoop fs –ls `echo $f | awk '{ print $8 }'`;
  21. ### hadoop fs -rm -r `echo $f | awk '{ print $8 }'`;
  22. fi
  23.  
  24. done

By default the script will be executed in a “dry” mode, listing the files that are older than XXX days. Once you’re comfortable with the output, comment the line containing ‘fs -ls’ and uncomment the one with ‘fs -rm’.

If you get Java memory errors while executing the script, make sure to pass HADOOP_CLIENT_OPTS variable prior to calling the script:

  1. export HADOOP_CLIENT_OPTS="-XX:-UseGCOverheadLimit -Xmx4096m"

Tags: , , , , ,

Leave a Reply