Notice: _c is deprecated since version 2.9.0! Use _x() instead. in /home/content/96/7229596/html/wp-includes/functions.php on line 4329
Archive for October, 2018

Cleaning up /tmp under HDFS

Monday, October 8th, 2018

The script to wipe out /tmp under HDFS (originally posted here). Could be run in a crontab to periodically delete files older than XXX days.

#!/bin/bash

usage="Usage: cleanup_tmp.sh [days]"

if [ ! "$1" ]
 then
  echo $usage
  exit 1
fi

now=$(date +%s)

hadoop fs -ls /tmp/hive/hive/ | grep "^d" | while read f;
 do
  dir_date=`echo $f | awk '{print $6}'`
  difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) ))

if [ $difference -gt $1 ]; 
 then
  hadoop fs -ls `echo $f | awk '{ print $8 }'`;
### hadoop fs -rm -r `echo $f | awk '{ print $8 }'`;
fi

done

By default the script will be executed in a “dry” mode, listing the files that are older than XXX days. Once you’re comfortable with the output, comment the line containing ‘fs -ls’ and uncomment the one with ‘fs -rm’.

If you get Java memory errors while executing the script, make sure to pass HADOOP_CLIENT_OPTS variable prior to calling the script:

export HADOOP_CLIENT_OPTS="-XX:-UseGCOverheadLimit -Xmx4096m"