Adrift in a Data Lake – an Oracle Developer’s first steps in Hadoop

We live in interesting times. As I write, the political life of a great nation is currently in thrall to a wealthy right-wing populist with a rather distinctive hairstyle.
But enough about Boris Johnson.

For someone used to the way things are done in Oracle, Hadoop can be something of a culture shock.
My initial introduction to Hadoop and it’s “vibrant ecosystem” was some internally written documentation.
As with most technical documentation, it was written with the assumption of a certain level of prior knowledge on the part of the reader. For this particular reader, it proved to be an erroneous assumption.

After a half an hour of struggling through this text , I was left wondering what you would use a Khafka Banana Oozie for.

Maybe it’s for killing zombie processes ? Let’s face it, with all that Java running on your system there are bound to be a few knocking around.
I’m a little hazy on my Zombie Lore, so I’m not entirely clear as to why a rapid application of fresh fruit would pose an existential threat to your average zombie. Maybe they’re potassium intolerant ?

There are a bewildering array of tools associated with Hadoop, many of which appear to provide almost identical functionality.
For example, a relational database traditionally requires only one SQL Engine…and I’ll be taking a look at two of them.

Moving from Oracle to Hadoop can feel rather like trading your car in for a box of Lego.
If the box does contain any instructions they seem to have been translated from Java into English…by someone who doesn’t speak either.
Note to reader : please substitute your own language as appropriate.

Fortunately, there are Hadoop distributions available which bundle the core tools required to get up an running. We’ll gloss over the fact that these distributions – Cloudera and Hortonworks – don’t contain the exact same tools.

In my search to find a coherent explanation of how to use Hadoop, I’ve found that the VM provided by Cloudera, together with the introductory tutorial, is a valuable aid to familiarising myself with the basics.

To start with then, I’ll cover getting hold of the Cloudera VM and setting it up in VirtualBox.
Then I’ll go through some of the tools available and what they do.
I’ll do this from the perspective of an Oracle developer (mainly because I don’t have much option) and will point out the driftwood of familiarity that you might be able to cling to in order to stay afloat in your Data Lake.

What I’ll cover is :

  • The core components of Hadoop
  • HDFS commands
  • Transferring data between a relational database and hadoop using SQOOP
  • Querying structured data using Hive and Impala
  • Uploading data using Kite

Ready to Dive in ?

Continue reading

Advertisements

Live to Win – Motorhead Covers and Pythonic Irrigation

The recent passing of Lemmy has caused me to reflect on on the career of one of the bands who made my growing up (and grown-up) years that much…well…louder.

Yes, I know that serious Python documentation should employ a sprinkling of Monty Python references but, let’s face it, what follows is more of a quick trawl through some basic Python constructs that I’ve found quite useful recently.
If I put them all here, at least I’ll know where to look when I need them again.

In any case, Michael Pailin made a guest appearance on the album Rock ‘n’ Roll so that’s probably enough of a link to safisfy the Monty Python criteria.

I find Python a really good language to code in…especially when the alternative is writing a Windows Batch Script. However, there is a “but”.
Python 3 is not backward compatible with Python 2. This can make life rather interesting on occasion.

It is possible to write code that is compatible with both versions of the language and there’s a useful article here on that topic.

The code I’ve written here has been tested on both Python 2 (2.7.6) and Python 3 (3.4.3).

One of the great things about Python is that there are a number of modules supplied as standard, which greatly simplify some common programming tasks.
What I’m going to run through here is :

  • Getting information about the environment
  • Handling runtime arguments with the argparse module
  • Reading config files with configparser
  • Writing information to log files with the logging module

Continue reading

Resolving Hardware Issues with a Kernel Upgrade in Linux Mint

One evening recently, whilst climbing the wooden hills with netbook in hand, I encountered a cat who had decided that halfway up the stairs was a perfect place to catch forty winks.
One startled moggy later, I had become the owner of what I can only describe as…an ex-netbook.

Now, finally, I’ve managed to get a replacement (netbook, not cat).

As usual when I get a new machine, the first thing I did was to replace Windows with Linux Mint…with the immediate result being that the wireless card stopped working.

The solution ? Don’t (kernel) panic, kernel upgrade !

Support for most of the hardware out there is included in the Linux Kernel. The kernel is enhanced and released every few months. However, distributions, such as Mint, tend to stick on one kernel version for a while in order to provide a stable base on which to develop.
This means that, if Linux is not playing nicely with your Wireless card/web-cam/any other aspect of your machine’s hardware, a kernel upgrade may resolve your problem.
Obviously it’s always good to do a bit of checking to see if this might be the case.
It’s also good to have a way of putting things back as they were should the change we’re making not have the desired effect.

What I’m going to cover here is the specific issue I encountered with my new Netbook and the steps I took to figure out what kernel version might fix the problem.
I’ll then detail the kernel upgrade itself.

Continue reading

Getting one of your Five-a-Day – connecting Remotely to a Raspberry Pi from Linux Mint

It’s Christmas. To mark the occasion, my son bought me a top-of-the-range computer…

pi_board

Christmas has come early ! Er, hang, on…

Yes, a Raspberry Pi 2 b-spec, complete with 900 MHz Quad-core ARM processor and 1 GB RAM.

Getting it up and running was a bit more of a challenge than I had anticipated.
The Pi uses HDMI for Video output and my ageing monitor is not equipped for HDMI…

tv

The best program on TV – NOOBS doing it’s thing.

In the end, I had to “borrow” the TV.
This arrangement was, of necessity, extremely temporary. The TV had to be back in it’s usual place ready for The Strictly-TOWIE-Dancing-Get-Me-Out-Of-Here Christmas Special, on pain of pain.
Therefore, my first Pi project was to connect to it remotely from another machine, namely, my Linux Mint Laptop.
This will enable me to run the Pi headless (i.e. without a monitor/keyboard/mouse attached to it).

I’m going to cover two different methods of connecting to the Pi.
The first is using ssh to connect to the command line.
The second is to connect remotely to the Raspbian desktop itself.

Just to avoid any confusion, I will be referring to the Raspberry Pi as “the Pi” and the machine I’m connecting from as “Mint”.
Continue reading

Stat Wars – using the Force (DBMS_STATS) for Incremental Stats Gathering

We haven’t been queuing for days to see that film.
Darth Debbie was quite insistent that she really did have better things to do than
queue up outside a cinema behind someone dressed as a Wookie.
Even the potential opportunity of uttering the line “will someone get this walking carpet out of my way ?!”, has not moved her on this point.
All things Star Wars are not to be mentioned in our house at the moment. So, any resemblance to a certain Space Opera in what follows is purely coincidental.

Anyway, a Long Time Ago in a Database far, far away….

It wasn’t easy being a DBA on the Jedi Archives Database.
Strong with the Force they may have been, but the users weren’t particularly patient.
On top of that, there was still some muttering going on about that unfortunate data loss incident with Obi Wan Kenobi and the missing planetary system.

All in all then, when complaints began coming in about the nightly batch overrunning, it was with a sense of dread that the Geeki began to investigate….

Continue reading