Monday, April 20, 2009

Troubleshooting with Assumptions

If you're ever trying to solve a problem, be it a technical problem or a logistical problem or a problem at home or at work or wherever, there is one very important rule you must always remember: don't assume anything is "normal" or "OK" or working as expected. In my experience, the part or situation or knowledge level or step-already-taken that you believe is OK, is nearly always either the root cause itself or directly related to the root cause.

Case in point from this morning: one of our helpdesk techs got a call that one of our security applications wasn't working. The user couldn't make it work at all and it was preventing him from accessing his system. So the tech called me to ask me to check a few things. I had to get a few tools going to check this out and so I told him I'd call him back. When I was ready to go and needed more information, I called him to get it so I could check the user's situation. He didn't answer but sent an e-mail a few minutes later saying that the user wasn't plugged into the network, and that this was the root cause. The user plugged in and everything was fine. He apologized for assuming the user knew how to make sure he was on the network before he did anything.

My intent here is not to criticize the tech or the user, but to demonstrate how you can never assume something has been done or something is the way it is supposed to be, or that someone else knows something they're "supposed" to know. This is logical, really; if something is not working the way it's supposed to, it could be anything that's causing it and it's foolish to assume anything is working, until you actually check it and make sure that it's working. The tech assumed the user was competent enough to know he needed to be on the network for anything to work, so he didn't check for that. If he had asked that question first, the issue would have been resolved in about 30 seconds. When the tech called me, I assumed that he had done all the basic checks and steps to ensure that is was really an issue for me to look at; I took him at his word that I needed to look at some things. In the tech world, knowledge of the fundamentals of any technology, protocol, or process are critical to building, maintaining, and fixing that technology, protocol, or process. Sometimes, you can't even assume that an otherwise-competent technical coworker knows the fundamentals of something you're dealing with. Usually, they're sharp enough to know their limits and tell you they don't know, but sometimes they won't tell you that (for various reasons). The raw breadth and complexity of today's technology solutions and configurations only exacerbate the issue when something's not right, because there are so many things that could go wrong and you can easily waste a lot of time looking in the wrong places and at the wrong things.

Assumptions in troubleshooting do two very bad things:

1. They waste time that could be used for legitimate work. The user could have just checked his network connection, but he might not have known to do that. It's the primary function of a Tier-1 helpdesk/desktop support technician to verify the basic, fundamental information and screen out the simple fixes so that the truly difficult ones may be escalated to someone more knowledgeable or experienced. The user is the one who makes the money around here, so he's the one who needs to be functional. The tech and I are liabilities on the bank's balance sheet; we cost money, we don't make money. Plus, the tech is a lot less expensive than I am. So when the user isn't working, he's not making money. Then when he calls the helpdesk, more money is being spent to fix him because not only is he not making money, but the tech is costing the bank money with his time devoted to that issue. Then, when I get involved, the bank is losing even more money, because they're paying a user who isn't making money, a Tier-1 tech at a fixed price who still oversees the ticket, AND my time for helping track down the issue. That's a lot of money for something that could've have been resolved in about 30-60 seconds, if no assumptions had been made. True, the tech and I are SUPPOSED to be paid to do this work, and we get paid anyway, whether the user's system is busted or not. However, both the tech and I could be working more serious issues, or making sure that more serious issues don't occur and cost the bank more money (i.e., PM or "preventative maintenance", like changing your car's oil). So, rather than spending the bank's money in the most useful, constructive way that provides better ROI to the bank, we're spending it in a very inefficient way.

2. Overall trust is damaged, both on the user's part and on my part. The tech's inability to check and rule out the basic, fundamental causes up front lowers the tech's (and by extension, all of the IT department's) reputation in the eyes of the user, who senses he's getting something of the runaround because he can tell the tech is grasping at straws and not working methodically. It also lowers the tech's reputation in my eyes, since next time I will have to not assume that this particular tech has done his basic due diligence and will verify it myself, or force him to verify or re-verify it, which further wastes the user's time, the tech's time, and my time.

Time is money, as they say, and when you're fixing broken stuff, the key is to use your time and tools as wisely and efficiently as possible. It doesn't matter if you're mowing your lawn, working on a car, fixing a computer problem, dealing with a kid's behavioral issues, figuring out your finances, working on your marriage...you can't make any assumptions about what is working and what's not working, or what you should be doing vs. what you are already doing. Even if it's just you and not three people involved in your issue, as it was in mine, you do yourself a disservice by assuming something is fine when it isn't. This might seem logical, even foolish for me to devote this much time to talking about it, but we all do it, every day.

Thanks for reading along.



No comments: