The script to wipe out /tmp under HDFS (originally posted here). Could be run in a crontab to periodically delete files older than XXX days.
-
#!/bin/bash
-
-
usage="Usage: cleanup_tmp.sh [days]"
-
-
if [ ! "$1" ]
-
then
-
echo $usage
-
exit 1
-
fi
-
-
now=$(date +%s)
-
-
hadoop fs –ls /tmp/hive/hive/ | grep "^d" | while read f;
-
do
-
dir_date=`echo $f | awk '{print $6}'`
-
difference=$(( ( $now – $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) ))
-
-
if [ $difference -gt $1 ];
-
then
-
hadoop fs –ls `echo $f | awk '{ print $8 }'`;
-
### hadoop fs -rm -r `echo $f | awk '{ print $8 }'`;
-
fi
-
-
done
By default the script will be executed in a “dry” mode, listing the files that are older than XXX days. Once you’re comfortable with the output, comment the line containing ‘fs -ls’ and uncomment the one with ‘fs -rm’.
If you get Java memory errors while executing the script, make sure to pass HADOOP_CLIENT_OPTS variable prior to calling the script:
-
export HADOOP_CLIENT_OPTS="-XX:-UseGCOverheadLimit -Xmx4096m"