Getting Control of those Pesky Bugs

Some teams seem to have a very hard time managing their bug process.

Even teams who implement good processes sometimes have setbacks when budgets get tight and trade-offs are made. In large and small companies, often management pushes for more features so sales has “new stuff” to sell and bug fixes go by the wayside. Sometimes the code itself is not good due to a rush to get something out-the-door. Teams often end up with a huge technical debt: piles of bugs that need fixing; code that is brittle and hard to maintain.

The team I’ve worked with for 15 years has implemented great processes and tools to stay ahead of the game. Through the years we’ve learned what works and what doesn’t. Here’s a list of do’s and don’ts to stay ahead of the “Technical Debt” debacle.

How to avoid Technical Debt in the First Place

Avoidance is the best policy. It’s really not that hard. Here’s what you need to do:

  1. Code it right the first time. There is no such thing as a “working prototype”. Prototypes end up being the real code. They were just implemented using shortcuts and temporary coding techniques and will end up becoming a huge technical debt.

  2. Keep it simple. Don’t build functions that are more complex than needed. Some teams get too creative trying to build elegant and esoteric software that instead ends up hard to maintain with many features that are never utilized by clients. Don’t use rocket science if you aren’t sending up a rocket. Keep it simple.

  3. Incentivize developers to find their own bugs. Make it part of the company culture that developers do their own unit tests instead of assuming that QA and automated testing will take care of finding any issues. First, bugs are cheaper to fix the earlier in the development cycle that they are found. Second, developers who are trained to produce quality code instead of lots of code will end up producing products with less technical debt.

  4. Never release code with “known issues”. If your company is used to putting out release notes that include a list of problems/warnings for the clients, you’re in trouble. Contrary to some manager’s belief, it is not only possible to deliver bug-free code, it’s the only smart thing to do.

    Let’s quantity that. That does not mean we need to test to the level of standards that are required for space software that can’t be repaired in space or for medical tools which, if they error, will cause human harm.

    It means testing to the level that is commercially acceptable and fixing what is found with a reasonable amount of test cycles. Yes, the clients may still encounter some new issues based on different configurations and usage. Fix those in the next maintenance release.

    But we never, never find problems during testing that users will encounter which will impact their work and deliver anyway just to meet a schedule. If we continue to hit those kinds of decision points, it means we are not scheduling enough time for adequate test and related fixes.

  5. Think long-term. Besides thinking about the functions and new features to be available with this release, think about how your clients will use the system long-term and build your software so they can easily implement and then upgrade from one release to another. This was overlooked during the 20th century by most vendors, particularly the big ERP solution providers. The big gorilla ERP software vendors made a large proportion of their profit from services: Implementation and upgrades. I call that spending money on no-value-added services. I think that is wrong and should/can be avoided.

    In this century, with the advent of cloud computing, software vendors are having to figure out how to make it streamline and simple to upgrade. Finally. But if you are building on-premise software or your systems that need to be integrated with other systems and periodically upgraded, think about ways to make it simple for your clients. It will save you a significant amount of bug-fixing time in the long-run.

    If you don’t have the developer expertise to do that, I recommend you look at building your software on a cloud platform like Salesforce that provides upgrade functions and features with the platform.

  6. How to dig yourself out if you have Technical Debt

    What if you find yourself in a group with significant technical debt? If you’ve decided you need to get rid of your huge pile of bugs, you may find yourself spending inordinate amounts of time and effort sifting through the bug calls, listening to client complaints and trying to prioritize which bugs to fix first.

    Instead, start with a process and stick to it. The best way to get out of your hole is to stop digging.

    First you’ll need to decide if you need any major re-architecture projects. If you’re an IT team, you may need to look for a vendor solution to replace your in-house tools. Look for one from a company with good processes and clean software.

    If the problem is a huge pile of unaddressed bugs and customer issues, here’s a good approach:

    • Gather together a cross-functional team with representatives from your services team who represent the client, the developers, product management/marketing and QA.

    • Take one pass through your open bug list pile, mark each one with three identifiers:
    • Severity: How bad is this bug?
    • 1 – Critical. A client’s production system is down or data is being degraded. Must get fixed ASAP.
    • 2 – Serious. The bug is causing or will cause issues for clients.
    • 3 – Non-Critical. This is a cosmetic issue or else buried in a place where users are very unlikely to go.

    • Priority: How soon do we need to fix this bug?
    • 1 – High. Needs to be fixed ASAP.
    • 2 – Medium. Shouldn’t sit forever, fix when we can.
    • 3 – Low. Fix when there’s time. No big rush.
    • Priority and Severity are normally not coupled. Normally a Severity = Critical will also be Priority = High unless, perhaps, the data degradation isn’t occurring because the client hasn’t turned on some related function. Otherwise there’s little linkage. Even a Severity NonCritical bug could be Priority “High” if it is a misspelling that makes the company look bad, for example.

    • Release: What release do we assign it to?
      This step assumes that, in addition to doing your regular enhancement releases or agile iterations, you are putting out regular maintenance releases. The bug fixes that are not associated with new enhancements should be assigned to release “piles”, leaving some bandwidth in each for unexpected issues.

      Assign to upcoming releases based on a combination of Severity and Priority. Lower Severity and Priority items can get put into a maintenance bucket to be worked off after the more important ones are done.

      Why? Because if these are not assigned somewhere, the team continues to thrash through them, again and again. If they get assigned, even if some adjustments are needed, we have a sense of when fixes can be made available for the user.

    • Hold weekly or bi-weekly SDRB Meetings. Once or twice a week, a cross-functional team should meet to review all of the new issues. This should be the highest-level person responsible for the product area. For example, the QA Director, VP Engineering and/or product lead, Director of Product Management, etc. When our company was small, even the CEO sat in periodically. We call this team the SDRB (Software Development Review Board) and they have complete power for making decisions.
    • Together we decide if they are marked with the right Severity and Priority.
    • Together we assign them to a specific release.
    • There. Done.
    • Instead of continuing to look at them and thrash through them again and again, they are ready for developers.

    • Train the Developers in the Process:
    • Only work on bugs assigned to the current release.
    • Never, never change the Severity or Priority or Release of a bug – only the SDRB can do that.
    • If you have a question, assign the bug to the product manager to answer it. It’s helpful if your bug tool captures comments so the product manager can respond and assign it back to the developer.
    • Note in the tool what you did to fix the problem. We also put our bug number in the code itself so later we can tell why changes were made.
    • When the code is checked into the code repository, the bug is marked “resolved”.
    • When it has been tested, the bug is marked “closed”.

    It’s best if you have a good bug tracking tool that allowed you to change assignments and ask questions, capture all comments, and record history. At my company we used the in-house tool SD Tracker. That tool was the impetus for building Software 2020.

    Regardless of what tool you use, everyone needs to use it and it needs to be easy and effective. It can be done with spreadsheets – it’s just a lot harder.

The “Voice” of the Customer

A few months after the story in my last post, “The Need for a Process”, my customer, the Lt. Colonel, arranged a meeting where he brought some of his soldiers to talk about what functions and features they wanted added or changed in the current software. The software was not used on the front-line. It was back-line communications software. The hardware consisted of a PDP-11 which captured the feeds/signals and a larger VAX/VMS system to analyze and process the results.

One young Army guy got up to talk about the list of change requests and why they were needed. He listed as a top priority changing the queuing algorithm. Humm. It seemed to me that there were a lot of other items on the list that deserved more attention.

“Let me give an example,” said the Army guy.

“One day we were monitoring communications when all of the front-line systems were attacked. Everything went off-line. The PDP-11 also went down. We brought it back on-line and it started processing messages from the queue.” (This was Desert Storm, remember.)

“A half-hour later it got to the last message that had been posted on the queue. ‘Incoming SKUD!’ It would have been really nice”, he said, pausing for the irony to sink in, “if that had been the first message we received instead of a half-hour later.”

Incoming SKUD? Ah, input received. While the system wasn’t designed for the front-line it had to fill the role when front-line systems went down. Wow. OK then. Priority 1! Change the queuing algorithm!

Sometimes it isn’t clear why clients would want changes and enhancements. It certainly helps to understand your customer!

The Best Reward

While I worked with the Lt. Colonel, I tried hard to understand things from his perspective. It made sense how upset he was with our “perceived” software coverup when one realized he was on his way the next day with the software to Desert Storm and his soldiers depended on our software. That’s a lot of pressure.

A few years later I accepted a job to ASK/Ingres, leaving Ford Aerospace after almost 20 years. The Lt. Colonel was in town for a review meeting and was told that I had resigned. Before our meeting, he caught me in the hall.

One of the initial tasks on his project before I’d come on-board was to convert the software from the Ingres database to Oracle. He stopped me in the hall to say he was glad we had a chance to work together and said “Maybe we went the wrong way” (meaning from Ingres to Oracle instead of visa versa since I had selected to work at Ingres). Having this stalwart army customer believe in my decisions was one of the best indications that I had been hearing the “Voice of the Customer”.

Where there’s Smoke, there’s Fire

Last month, Northern California was ablaze! 1200 wildfires burning – most due to dry lightning, some unfortunately from arson. We went out anyway, spending weekends dawdling on the delta, in an anchorage with our powerboat tied up next to our friends’ big new sailboat, the sky smoky, the sun reddish through the haze. We should probably have been inside a house with the windows closed for our health! It was apparent to everyone in Northern California that we have both smoke and fire here

But sometimes the smoke is hard to see even when there’s fire a brewing. Maybe the smoke is just a wisp or the fire is smoldering under wet ashes and everyone just thinks it has been extinguished.

It’s similar with software management. How can you be sure you aren’t missing the signs of your project going astray? Problems brewing under the surface.

The best way is to focus on your customers and how they perceive your company and your product. The old adage “The customer is always right.” is a good barometer for gauging how your company and your product are doing. Is there smoke spewing and you are not paying attention or is it smoldering and hidden but underneath the surface?

Almost every company “says” they are focusing on the customer and usually decisions are being made in what is perceived to be the best interest of the customer. But often we think we are working for the customer’s benefit but we’re missing some key points. Are we focusing on only one aspect of what they want yet not delivering what they really need?

For example, how often have you heard a project team say “The customer’s schedule for delivery is on July 7th so we had to freeze the design last week to meet their schedule.” But is the design that was frozen going to meet their needs? Is the code that is being delivered going to best solve their problem? What if the design team was stymied with how to meet the requirements. Should the team just go ahead freeze the design, code and then deliver just to meet the schedule? Where is the trade-off between quality and schedule? Maybe delivering whatever we can in the required timeframe avoids the big explosion, the blow-up that would occur if the project manager had to tell the client that they can’t meet their schedules. But it doesn’t change the fact that there’s smoldering embers underneath the ashes and eventually when the wind blows (when the customer starts doing their final testing) those smoldering embers will erupt in flames. When the software is delivered but it doesn’t meet the requirements, there will be fire. The best managers will be willing to take the heat and tell the customer up-front if the team can’t meet the schedule. Of course, the underlying problem – why the team thought they could meet the schedule but then missed their target – needs to be examined and rectified so the problem doesn’t happen again. But if the team typically estimates well and is able to perform, but for one project there is a snag, then the customer isn’t served by focusing only on one element of the delivery, the schedule. Quality always has to come first. In this case, “quality” means delivering the software which meets the agreed-upon requirements, requirements that truly meet the need from the customer’s perspective.

Project Managers need to continually scan the horizon for smoke that indicates a fire about to erupt. A project manager that declares milestones complete without actually completing the work is always a sure danger sign. Schedule versus quality is just one example but one that seems to occur far too often in real life.

Fuzzy Peas

We lived in North Carolina many years ago (our youngest daughter was born there).  On weekends we liked to take rides in the car, my husband and I and our oldest daughter, then two.  We’d go into the mountains, visit the furniture stores, or drive off to the sea side.  One of our favorite places was Ashville, NC – in the foothills of the Blue Ridge Mountains.  It was lovely there in June – just the right temperature.  Not too humid.  Enough elevation to get away from the early summer heat. 

There was a great big old house in Ashville the looked like it could have been a plantation or stately manor.  It had a huge porch all the way around the house .  The owners had turned it into a restaurant and the porch now had picnic tables set for visitors.  We went there the first time with our 2-year old daughter and were seated at one of the picnic table overlooking pine trees.  There were no menus but the table was already set with plates and dinnerware.  Shortly a woman came to our table and said “Today we’re having chicken and pork.  Would you care to stay for dinner?”  Not sure what that meant, having expected a menu so everyone could order their choice, we wanted to find out what this Southern option was so agreed and soon large bowls of roasted chicken, pork, boiled potatoes, and black-eyed peas were brought to our table – Southern cooking, family style.  I’d never had black-eyed peas before.

Years later I attended a software management lecture by a man from Tennessee who, with his very Southern accent, talked about the “fuzzy P’s”.  I initially thought he was referring to those black-eyed peas from the South.  But no.  He was referring to the 3 P’s that drive a software project:  People, Plan, and Product.  “People” are the number of heads you can put on the project.  And while you can’t gather 9 women and produce a baby in a month, there are some impacts that can be made if the right resources are allocated to the right schedules.  “Plan” is the schedule – moving the schedule in or out is an obvious choice and one of the ways a manager can effect the end result.  And “Product” refers to how much product (how many changes, bug fixes, enhancements) is included in that release or that service delivery to the customer.  Remove some features, save some time. 

The three “P”s can be adjusted to affect the end result.   But that’s it.  Those are the only viable axes in the three-dimensional world of software that can be controlled and still produce a good, product.  If axis one doesn’t get shortened, the others will not be impacted.  If a software schedule can’t be met, then either more people are needed or less changes / enhancements/ fixes can get into the delivery. 

Usually CEOs want it all – they want the product with all the specified features in the timeframe they want it using only the resources that fit their budget.  But if the three axes don’t align, something’s got to give.  And it’s the software manager’s job to juggle the axes – more people here, less product there.  But CEOs push back and too often software managers try to appease them and agree to accept the dictated schedule with the resources allocated and all the specified product features. 

And there’s only one result – the hit is on quality.   When there aren’t enough resources to do the job right, quality suffers.  It isn’t always apparent to the CEO.  Perhaps the team even thinks they are doing a good job by delivering the product and making the milestones.  There’s a big party to celebrate the release, and everyone is congratulated.  But it’s the customers that will be impacted when they encounter the bugs that ultimately will result. 

And ultimately this approach will affect the bottom line.  It’s another well-known software rule that bugs found by a customer are 1000 times more costly to fix than bugs found during the design phase.  If a bug found during design (or at that point, an issue or problem) take a few minutes – say it would cost $10 to fix, if it isn’t found in design but rather during coding it will cost a couple of hours or $100 to fix.  If found during the QA cycle it costs $1,000 to fix (develop, re-test).  And the same issue found by the client costs $10,000 to fix.  Measure it.  It’s a fact.  Issues found by clients need to go back to the design, impact code, changes are likely to cause other issues, QA needs to be re-done.  Manuals updated.  Other clients notified.  It’s a very expensive proposition.  Not only that, it affects the customers’ perception of the company and it’s software.

So why isn’t quality the primary focus since it’s the most expensive error to make?  It’s the fourth fuzzy P.   “Perception.”  As long as the CEO “perceives” that the product is going out regularly, that everything is on track, managers are rewarded and all’s well.  Or seems to be.   But letting quality slide is a slippery slope.  If no one is tracking the overall quality metrics, quality can slide without anyone noticing until the product has degraded to the point the customers rebel.  Take the Microsoft operating system years ago where the blue screen of death was the well-know scenario. 

Bottom line – the trade-off should never be quality.  Good software managers need to watch their P’s and their Q.

As Time Marches On – Use Metrics

It’s March already.  As days, months, and years pass by, often we just move ahead, one step after another, and don’t lift our heads up to see if we’re going in the right direction or what progress we’ve made.  Periodically we need to stop, step back, and assess our progress and how we’re doing.  True in life, true in software companies.

Sometimes in a software company, all organizations are hard at work but something is amiss.  In one software company, the technical support team was feeling that the customer’s needs weren’t getting addressed yet all of the product organizations were working hard, producing new releases with client-requested enhancements, and regularly issuing standard bug fix maintenance releases.  All of the orgs felt they were busy and overworked but that the product and quality were on track.  But by using metrics, they were able to assess the real status.

Metrics were evaluated about the number of customer calls currently being reported that were product bugs or other product issues versus the number one year prior and two years prior.  The metrics included turn-around time to get the issue resolved. 

What was clear from the metrics was that the number of bug reports had been steadily increasing as new clients buying and installing the software and existing clients were steadily upgrading to the newer releases.  In parallel, several new projects were underway, stretching the bandwidth of the product marketing, development and QA orgs.  So instead of trying to quickly fix all newly reported issues as they came in, which had been the process in prior years, in order to reduce workload on the developers and QA, fixes were being pushed out to maintenance releases two, three, or more months in the future instead of the next planned release.  As a result, more clients were finding related product issues and more issues were being escalated.  So to appease the clients who complained the loudest and wouldn’t wait for the future releases, the clients were sent one-off class files, tested only by the support organization instead of QA.   If multiple clients needed the change in different releases, the developers zipped up sets of fixes.  Then confusion ensued about which client had what file and instead of easing the load, this new degraded process was actually increasing the amount of work due to more call and more one-off fixes.  And as a results, the overall product quality was impacted, causing more client frustration.   When compared with prior years where bugs were immediately categorized and important issues quickly fixed, now there were too many fire drills and much confusion.

Metrics in this case uncovered both the negative quality trend and the underlying cause.  But there is a right way and a wrong way to use metrics.  A company can recognize metrics used in the wrong way when employee behavior is effected in non-useful ways.  For example, one company used metrics to measure their Technical Support response time and rewarded the techs for maintaining 90 percent first-customer-contact turn-around time in less than four hours.  The TS metrics looked great but in reality what the techs were doing was that when they received an automated call from a client, they would place their return call during the lunch hour or just after the company closed, raising the probability that they would be able to simply leave a voice message thereby responding to the call within 4 hours but without having to spend time discussing the call or resolving the problem which could tie them up and make them miss another client’s 4-hour call window.  As a result, clients were not talking to a human for one, two days or up to a week and were playing “telephone tag” and getting frustrated.

In another company, a percentage of each developers merit plan was based on low bug count.  But often issues reported by users as “bugs” were in reality items that were never spec’d or were spec’d incorrectly. So a lot of conflict resulted, arguments between the development org and support arose (“It is a bug.”  “No, it isn’t a bug.”)  Team members became opponents which created organizational silos and mistrust.  Once the underlying issue was realized, the process was changed and a new Tracker category was created separate from “bug” or “enhancement” to denote a design flaw or spec bug.  This allowed the Technical Support team to push that the issue was perceived to be a bug in the client’s eyes and thus get the problem resolved in a maintenance release rather than wait for the yearly enhancement releases. 

But correctly removed the “blame” from the development organization since the issue wasn’t caused by a coding or process issue like a real bug would be and the correct metric was then being used to measure developer performance.  The finger-pointing and arguments ceased, silo walls came down, and the product organizations coalesced into a supportive, cohesive team.

It’s easy to maintain status quo – to march along without noticing the slow and gradual deterioration of quality and effective processes.  But by stepping back periodically and reviewing key metrics, teams can make sure they are working effectively and efficiently.

PS:  Make sure you have measurable metrics – use Tracker to track Calls, Bugs, Enhancement requests and more.  For at-your-fingertips metrics for future use.