Cleaning up /tmp under HDFS

October 8th, 2018

The script to wipe out /tmp under HDFS (originally posted here). Could be run in a crontab to periodically delete files older than XXX days.

  1. #!/bin/bash
  2.  
  3. usage="Usage: cleanup_tmp.sh [days]"
  4.  
  5. if [ ! "$1" ]
  6.  then
  7.   echo $usage
  8.   exit 1
  9. fi
  10.  
  11. now=$(date +%s)
  12.  
  13. hadoop fs –ls /tmp/hive/hive/ | grep "^d" | while read f;
  14.  do
  15.   dir_date=`echo $f | awk '{print $6}'`
  16.   difference=$(( ( $now – $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) ))
  17.  
  18. if [ $difference -gt $1 ];
  19.  then
  20.   hadoop fs –ls `echo $f | awk '{ print $8 }'`;
  21. ### hadoop fs -rm -r `echo $f | awk '{ print $8 }'`;
  22. fi
  23.  
  24. done

By default the script will be executed in a “dry” mode, listing the files that are older than XXX days. Once you’re comfortable with the output, comment the line containing ‘fs -ls’ and uncomment the one with ‘fs -rm’.

If you get Java memory errors while executing the script, make sure to pass HADOOP_CLIENT_OPTS variable prior to calling the script:

  1. export HADOOP_CLIENT_OPTS="-XX:-UseGCOverheadLimit -Xmx4096m"

Moving /var/log to a different drive under CentOS 7

July 14th, 2018

A quick how-to with the set of instructions to move /var/log partition to a different drive. Done on CentOS 7.5.1804 with all partitions managed by LVM, while /var/log is moved to a USB key.

The tricky part with /var/log is that there is always something being written to it, and although simple archive/restore might work, you risk to lose changes from the moment you create archive till the moment you restore it. Depending on how big /var/log is it could be minutes/hours of data. The procedure below assumes an outage since it will be performed offline, however it assures that no data will be lost.

Read the rest of this entry »

Brocade ICX/VDX firmware update cheatsheet

July 11th, 2018

Kind of a cheatsheet for updating firmware on Brocade’s ICX (nowadays Ruckus Networks) and VDX (nowadays Extreme Networks) switches.

-=ICX=-

– Make sure to check the release notes to ensure that your model is supported. For example, with ICX6xxx switches (which are EOL though) 08.0.30 branch is the highest you can go. 08.0.60 or 08.0.80 don’t support ICX6xxx.

  1. copy scp flash 10.10.11.146 /home/brcdsup/fastiron/08030/ICX64S08030s.bin primary

– If you immediately get ‘Connecting to remote host… Connection Closed’ error, then check whether your SSH server config includes legacy options (to be added into sshd_config file):

  1. KexAlgorithms diffie-hellman-group1-sha1,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha1
  2. Ciphers 3des-cbc,blowfish-cbc,aes128-cbc,aes128-ctr,aes256-ctr

-=VDX=-

– Version upgrades such as 6.0 to 7.0, or 7.0 to 7.1 are considered as major upgrades hence destructive. Be prepared for ~30 minutes of outage + ~10 minutes for fabric recovery (with data traffic being forwarded).

  1. firmware download logical-chassis scp directory /home/brcdsup/nos/nos7.1.0b host 10.10.11.146 user brcdsup password xxx rbridge-id all coldboot

make buildworld & HP ProLiant BL460c G7

November 11th, 2017

Upgrade from 11.1-RELEASE to 11.1-RELEASE-p3

Brand: HP ProLiant BL460c G7

I wasn’t able to make it boot from the iLO virtual CD/USB interface no matter what I tried, hence locally attached USB CD-ROM was the only way to install the OS.

Secondly, no network adapters will be recognized after the installation is completed. Therefore, make sure you include src from the system components window, so you can rebuild the kernel with the following stanza added (GENERIC doesn’t include it):

  1. device oce

Processor: 1 x Intel Xeon L5640 2.27 GHz (6 cores)
Memory: 16GB
HDD: 2 x 300GB (10k RPM, 6Gbps SAS 2.5-inch) in RAID1

Softupdates: ON
SMP: ON

  1. CPU: Intel(R) Xeon(R) CPU L5640 @ 2.27GHz (2266.79-MHz K8-class CPU)
  2. real memory = 17179869184 (16384 MB)
  3. avail memory = 16598376448 (15829 MB)
  4. da0 at ciss0 bus 0 scbus2 target 0 lun 0
  5. da0: <HP RAID 1(1+0) OK> Fixed Direct Access SPC-3 SCSI device
  6. da0: 135.168MB/s transfers
  7. da0: Command Queueing enabled
  8. da0: 286070MB (585871964 512 byte sectors)

make -j4 buildworld: 1h 6m 23s
make -j4 buildkernel: 5m 26s
make installkernel: 19s
make installworld: 3m 12s

Managing ports for multiple FreeBSD servers

July 31st, 2017

This is a follow up post on how to manage ports for multiple FreeBSD servers. If you’re looking for how to update the operating system itself, have a look at my almost three years old post: Managing multiple FreeBSD servers.

Alright, so what we’re trying to solve is this: multiple VMs running the same (or different) release of FreeBSD, and you’re looking for a way to centralize delivery of packages to your FreeBSD VMs.

Read the rest of this entry »

make buildworld & IBM x3550 m3

June 24th, 2017

Upgrade from 11.0-RELEASE to 11.1-BETA3

Brand: IBM x3550 m3

In order to successfully boot the server make sure to enable legacy support as per this thread.

Processor: 2 x Intel Xeon E5620 2.40GHz (4 cores each)
Memory: 8GB
HDD: 2 x 146GB (10k RPM, 6Gbps SAS 2.5-inch) in RAID1

Softupdates: ON
SMP: ON

  1. CPU: Intel(R) Xeon(R) CPU E5620  @ 2.40GHz (2400.13-MHz K8-class CPU)
  2. real memory  = 8589934592 (8192 MB)
  3. avail memory = 8244543488 (7862 MB)
  4. mfi0: <LSI MegaSAS Gen2> port 0x1000-0x10ff mem 0x97940000-0x97943fff,0x97900000-0x9793ffff irq 16 at device 0.0 on pci11
  5. mfi0: Using MSI
  6. mfi0: Megaraid SAS driver Ver 4.23
  7. mfi0: FW MaxCmds = 1008, limiting to 128
  8. mfid0: 139236MB (285155328 sectors) RAID volume (no label) is optimal

make -j4 buildworld: 1h 36m 28s
make -j4 buildkernel: 5m 58s
make installkernel: 13s
make installworld: 3m 32s

DRBD with OCFS2 and fstab

May 28th, 2017

Two-nodes active/active DRBD cluster implemented on Debian Jessie with OCFS2 on top of it, so the file system can be mounted and accessed on both nodes at the same time. Sounds like a easy-peasy task considering the amount of articles on the web (mainly copy/paste of the same content though).

So, you finish with the setup, everything is synced and shiny, you edit fstab, perform the final reboot, and… oopsie daisy, nothing is mounted. You start digging into _netdev direction, or suspecting that perhaps an order in which drbd and ocfs2 are started is to blame, or putting mount stanza into rc.local — none of this helps. You might even come up with an excuse that you will not reboot those servers often, however, the fact that you need to manually perform some post-reboot actions doesn’t sound promising at all. Particularly if it’s an unexpected reboot over a weekend. Particularly if it happened some years after the installation hence you need to find (and most importantly, keep in mind about) those notes. Particularly if you already quit this job, and there is another poor fella taking care of the servers. And finally, to make things even more complicated, you might have services that actually depend on the availability of the mounted drive after the reboot (Apache or Samba for example).

Obviously, this needs to be fixed once and for all, and I have good news for you. :) If you were vigilant enough during troubleshooting you’d notice that a) if you try to mount the drive through /etc/rc.local there will be a warning thrown at boot time (something about missing device), and b) when you mount drbd drive manually it’s not mounted instantly — there is several seconds delay before the disk is successfully attached. That brought me to the suspicion that perhaps drbd is actually not ready at the time mount in /etc/rc.local is executed, and by deliberately introducing some delay things can be improved. And voila — it really did seem to do the trick!

Here is my /etc/fstab entry:

  1. /dev/drbd0   /var/www   ocfs2   noauto,noatime   0   0

And here is my /etc/rc.local introducing 30 seconds delay prior to mount, to give enough time for DRBD to cool down:

  1. sleep 30
  2. mount /dev/drbd0
  3. exit 0

Now, I’m not sure whether this is by design, since DRBD nodes do have to communicate with each other (initial election and/or sync), and that contributes to the delay in creating /dev/drbd0, OR, my environment is generally slow (everything is virtualized on not-so-super-fast SATA drives), but it works.

MySQL cluster using ndbcluster engine under FreeBSD 11

January 22nd, 2017

!SPOILER ALERT!

Not to discourage you, but make sure you read and understand MySQL cluster limitations thoroughly prior to start building the cluster. You don’t want to spend time on building the whole thing just to discover at the very end that you hit some hard coded limitation that can’t be resolved. It’s very easy to be trapped into Catch-22 here: some third-party vendor might say “why would we want to adjust our software to overcome MySQL limitations?”, and I’m sure MySQL dev team has had valid reasons to introduce those. So you end up in the middle, and you’re basically stuck.

For example, the vanilla typo3 distribution won’t work with ndbcluster engine out of the box. You hit the Row size limitation almost immediately, and unless you’re willing to spend time to analyze and optimize the structure of the typo3 database you’re blocked. You might be lucky, and it could be just a small change from varchar(2000) to varchar(1000), but you might be not. In addition to that, you’ll most certainly need a separate instance of MySQL with InnoDB or MyISAM, so you can import the DB, dump it, and start feeding it to the ndbcluster engine in batches. All these contribute to the time spent, and during the course of the installation you start considering alternatives, like changing the Operating System, and/or trying Galera for instance, or even switching to PostgreSQL altogether, but we’re not looking for easy paths, are we? :)

Read the rest of this entry »

NSD and OpenDNSSEC under FreeBSD 10 [Part 5: SafeNet HSM]

May 18th, 2016

This is the fifth part in the series of articles explaining how to run NSD and OpenDNSSEC under FreeBSD 10.

This time we’re going to integrate proper hardware HSM support in our setup — a pair of SafeNet Network HSMs (aka Luna SA).

Here is how our updated installation diagram looks like:

2016051701

Before we jump into technical details there are a couple of assumptions:

— I assume that HSMs are already configured and partitioned. HSM installation is outside of scope of this guide since it’s a lengthy and pretty time consuming process which has nothing to do with OpenDNSSEC. It also involves a big chunk of work to be done on the access federation field (different teams accessing different partitions with different PEDs or passwords). SafeNet HSM’s documentation is quite solid though, so make sure this part is completed. In our setup, both HSMs run the latest software 6.2.0-15 and there is one partition created on both units called TEST. TEST partition is activated and we’re going to create High Availability group, add both HSMs to the HA group and allow NS-SIGN to access it;

— As you might have noticed, I decided to leave ZSKs to be handled by SoftHSM. One of the things that you’ll have to keep an eye on with network HSMs is the HDD space. The way it works with SafeNet is that you have an appliance with some fixed amount of disk space (let’s say 2MB). Then you create partitions and allocate space out of total amount for each partition (by default it’s equal distribution). So let’s assume we created five partitions 417274 bytes each. Normally, storing a pair of public/private key consumes very little, but with OpenDNSSEC we’re talking about a number of domains each storing a pair of public/private keys for both KSK and ZSK. It’s very important to understand how far you can go, so you’re not surprised after several years when you discover that you run out of space.

Let’s do some basic math: one domain, with both ZSK (1024) and KSK (2048) stored on HSM, will consume 2768 bytes, so with 417274 bytes partition you should be able to handle ~150 domains. However, during ZSK or KSK rollover, another pair will be temporarily created, and although ZSK/KSK rollover shouldn’t happen at the same time and OpenDNSSEC will purge expired keys after the rollover is completed, you’ll have to consider extra 2768 bytes per domain (for a period of time defined in <Purge> stanza in kasp.xml), which leaves you 75 domains. As you can see this isn’t much. That’s why I decided to keep SoftHSM for ZSKs to save some HSM space (which is not cheap to say the least!).

One of the disadvantages of keeping both storage engines is that you’ll have one more dependency to worry about should you consider to upgrade (for example to SoftHSM2), hence the choice is yours. Another option would be to store private keys in HSM and leave public keys aside (<SkipPublicKey/> option under conf.xml), but I’ve read that it’s very much dependent on the HSM provider and could lead to unexpected results. And one more option would be to use <ShareKeys/> under kasp.xml — that way you can share the same key for multiple domains.

Read the rest of this entry »

Viewing package ChangeLog with rpm

April 4th, 2016

Here is how to view the ChangeLog of installed package using rpm under CentOS:

  1. rpm -q —-changelog libuuid-2.23.2-26.el7_2.2.x86_64 | more
  2.  
  3. * Wed Mar 16 2016 Karel Zak <kzak@redhat.com> 2.23.2-26.el7_2.2
  4. – fix #1317953 – lslogins crash when executed with buggy username

Same applies to the kernel. By adding -p switch you can actually check the rpm file itself without installing it:

  1. rpm -qp —-changelog kernel-plus-3.10.0-327.13.1.el7.centos.plus.x86_64.rpm | more
  2.  
  3. * Thu Mar 31 2016 Akemi Yagi <toracat@centos.org> [3.10.0-327.13.1.el7.centos.plus]
  4. – Apply debranding changes
  5. – Roll in i686 mods
  6. – Modify config file for x86_64 with extra features turned on including