Root cause maps: aren’t they sexy???

Posted: October 9th, 2009 | Author: Joaquín Bañez | Filed under: Uncategorized | Tags: , , , | 4 Comments »
Cause Map - ain't hot???

Cause Map - ain't it hot???

The guys from ThinkReliability are offering an Excel template that can be used to track and document any problem investigation, including the tools needed to draw a root cause and process map -yes, in Excel. As they say:

Most people think of Excel as only an application for creating spreadsheets, but it’s an excellent tool for capturing each element of a complete root cause analysis. By changing the way details are documented, a facilitator can improve the entire investigation process. The drawing tools are simple, flexible and Excel is probably already on your computer.

I must add to that: you don’t even need Excel on your computer, you can use Google Docs -it’s free and a solid tool for work teams.

It’s not specifically addressed to deal with IT problems, but it’s not really that important; after all, IT problems are problems anyway. The main advantage to me of this template is it can be used to draw a cause map easily in a few clicks; on a tab you can find a lot of basic shapes (boxes and connectors), you copy them to the tab labelled “Cause map” and you’re done. Detailed instructions can be found in the template. Worth having a look at it.

You can download The Cause Mapping Template in Microsoft Excel here for free (no money and no registration required), and the list of advantages using MS Excel brings according to ThinkReliability.

PS: have a look at their blogs, plenty of useful examples on how to perform an RCA and its deliverables.


Problem Management: your customers deserve a (reasonable) explanation

Posted: September 22nd, 2009 | Author: Joaquín Bañez | Filed under: Uncategorized | Tags: , , | 1 Comment »

I read on  The Opposite of Luck blog a post about a forensic report explaining the cause of an incident that caused an outage at Fisher’s Plaza Technology Complex in Seattle and took down many websites hosted there.

Like them, I applause how transparently Fisher’s Plaza have dealt with this incident, specially regarding their tenants, as they really deserve an explanation.

You can read the full report here.

Your users also deserve an explanation when an IT service goes down; you don’t have to give them a 12 page report for every problem you manage, but you can’t simply shrug your shoulders and tell them “shit happens”.


The fear of the blank problem record

Posted: June 17th, 2009 | Author: admin | Filed under: Uncategorized | Tags: , , , | 2 Comments »

Root cause analysis (RCA) is an intuitive activity, we are naturally inclined to do it in our daily life; we know the techniques and methods to make deep and sharp analyses of our daily events, from enumerating causal factors to the generation of recommendations and the implementation of changes to eliminate the underlying cause.

Why, then, it’s so hard to take that ability to a work environment?

Read the rest of this entry »


RCA: misunderstood or even a complete stranger

Posted: June 12th, 2009 | Author: Joaquin Baez | Filed under: Uncategorized | Tags: , , , , | 1 Comment »

The Spanish Deming Cycle
The Spanish Deming Cycle

I’ve been working on IT for about 8 years now, always as a member of support teams (except for a couple of years, when I, just me, was the “support team”). Through the last 2 years, beyond the mega-hype of ITIL as a magic recipe to turn IT shops into perfect machines (as if that was enough) one of the most used expressions for selling and buying IT services has been “continual improvement”.

On January 2008 I joined a support team of 5 people, where I played the systems administrator role; the customer was a private holding who bought to my company a managed service pack, including the adoption of a few ITIL v2 processes: incident, problem and change management.

I had never known, through my years of experience in the IT field,  nothing like problem management, whose goal is finding the  underlying root cause of major or recurring incidents and then raise a request for a change (of the infrastructure, the operations procedures or documentation or whatever) to permanently eliminate that cause and so prevent the recurrence of such incidents.

Read the rest of this entry »