BEG for Accountability

Ever been in one of those meetings? Your customers and Technical Support team are yelling about a hot bug that needs to be fixed right away.

The developers are saying “That’s not a bug – we didn’t do anything wrong. Look, the code does exactly what it’s supposed to do. Look at the spec!”

The Product Management lead says “No one has ever wanted the code to do THAT before. We never designed for that scenario. That’s not a bug – it’s a new feature. The customer will have to wait for an enhancement release. We can’t put enhancements in a maintenance release.”

“But”, argues Tech Support, “the customer is saying that’s a common business practice. That our software should already be doing it and without a code change they will be getting the wrong information and can’t rely on the product. It’s a big deal.”

What to do? Customers don’t want enhancements in their ‘bug fix’ maintenance releases else it makes it difficult for them to quickly and easily upgrade. On the other hand, is this an enhancement if what is being asked is what customers would expect it would do? It wouldn’t be right to classify it as a bug – bugs mean the developers made a mistake and go against their metrics.

There’s a simple solution – add a third classification to your tracking system.

  • Bug for true defects – code that doesn’t match the spec.
  • Enh for enhancement requests – new features to be spec’d.
  • Gap – Add a new classification for the in-between customer requests. We call them “Gaps”. They are holes in the spec, gaps that the Product Manager forgot or didn’t have enough information about to add. But they are items customers consider a bug.

Having a third classification lets all of the groups agree on how to position customer complaints/calls without having arguments.

  • It’s in the spec but not the code – it’s a bug.
  • It’s not in the spec and definitely a new feature/request – it’s an enhancement request.
  • It’s in-between? Those that everyone used to fight about. Is it a bug so it gets top priority fix? or an enhancement that needs to wait for the next major release? If neither, it’s a gap. It can be in a maintenance release or enhancement release. It wasn’t the fault of the developer. The code is correct. Engineering and Tech Support don’t need to battle it out. It’s simply a gap.

Bug/Enh/Gap – BEG. Having three classifications stopped the arguments and clarified who’s responsible for the fix. Plus clarified what the underlying root cause was, thus aiding our continuous process improvement efforts.

In addition, when only true bugs are classified as bugs, real developer mistakes where the code doesn’t match the spec, then developers take more responsibility for fixing those when other items (holes, missing specs, new features) aren’t incorrectly classified as bugs. Having three classifications encourages accountability and accountability encourages better teamwork.

You won’t have to “beg” any more.

How to Document software in the 21st Century – Part 1

Something has been bothering me for a long time. I wrote “Smooth Sailing” a few years ago when I was astonished that people didn’t recognize the value of having good documentation – moreover they thought it wasn’t possible. I’d used DOORs successfully for years and to me, having developers work against a spec, give feedback, collaboratively develop the spec and code together, then have QA test against the spec assured a final spec that matched code that Tech Support could use to know, decisively, how the code was supposed to work.

That let us clearly specify customer complaints as Bugs (don’t match the spec), Enhancement Requests (not in the spec) or “Gaps” (maybe should have been in the spec in the customer’s eyes but were missed. Or spec ambiguities.) That saved us a lot of discussion time in our SDRB (Software Design Review Board) meetings where we reviewed the issue clients and internal support people had with the product. Bugs were things developers missed – the onus was on the Development Managers. “Bad Developers”. Enhancements were things we put in the next major releases. Obviously not negotiable for maintenance releases. Gaps were negotiable but there was no blame on the developers – so yep, good idea, let’s move on it. Meetings were clear. No contention.

So why didn’t everyone want good, ‘as built’ specs?

I was surprised when some companies thought they had ‘as built’ specs – only to find what they were referring to were documents an external tech writer group wrote after the software was released. Those aren’t “as built” specs as much as they are “what we think was built” specs. Since the developers didn’t code to them and QA didn’t test to them in the initial release, they are, always suspect. Those documents are never valid, correct. I would question if any company devotes enough time and money to guarantee that kind of ‘after-the-fact’ documentation is correct.

Whereas it’s so clean and clear if the spec is used as part of the development process and then as part of the QA process that it “IS” the “bible” of the code. How the code was developed. What the code does do. If you have specs like that, you know what’s in the code. Tech Support knows what the product does. Everything is clean. Clear.

But I’ve come to realize that there are roadblocks in companies to why this isn’t as easy as I assumed. But I think I have a solution and am working on it. It’s very cool. I’ve realized that there are real reasons why people feel it’s hard to maintain real as-built DOORS-type specs. I, as a manager, was willing to do the extra work to have as built DOORs specs because I saw the value. But not everyone is as dedicated to the end result.

There has to be a better way.

I think there is … stay tuned. We’re calling it Software 2020 – a software tool for the 21st century “:-)”

The Need for a Process

In an Agile class this week, someone asked the question “How can you be sure the sprint testing covers all code changed? Developers often make a change in some different area of code to fix something and testers don’t know to test that area at all since it wasn’t part of the sprint.”

Huh?

That’s a good example of a team that needs a good process. And he wasn’t the only one in the room waiting for an answer. I learned the answer to that question in the early ‘80s. And the lesson was a hard one. ..

I had just been assigned as manager for the Electronic Defense Systems software department for Ford Aerospace. It wasn’t my first management job but was my first position responsible for government customer deliverables rather than internal projects like developing in-house CASE tools or complex algorithms to help the engineers building satellites and ground stations. The five software sections I would manage were each working on separate contracts for different arms of the military. I had met the five supervisors in my new department and their developers but hadn’t officially started work when my boss, Sharon, who headed up all software engineering for Ford Aerospace came to me and said that the customer for one of my new contracts, an Army Lt. Colonel, was going to be giving awards for completing the testing of the release and wanted me to come to the awards ceremony to meet him and see the team get their awards.

It was about 5 PM as we walked across the campus. But as we started up the steps to enter the building the doors opened and young men and women, the developers and testers, came streaming out and down the front steps with tears in their eyes or outright sobs.

“What’s going on?” asked Sharon.

They couldn’t answer and ran by us. The company’s General Manager was hurriedly entering the building. He saw Sharon and said sternly “I was called to a meeting with the Lt. Colonel. You’d better come with me.”
We three went upstairs into a warehouse room where an impromptu meeting was being held. The head of Systems Engineering, Ray (Sharon’s boss) was already there with the Lt. Colonel. The Lt. Colonel was a tall man in full army fatigues. Red flat-top hair with a face as red as his hair. He was livid.
“I normally only give companies one chance. But for you”, he said glaring at our GM, “this is the third strike.”

He saw me and said “Who’s this?”

Sharon told him I was the manager of the software group. He glared at me and said “How are you going to remedy these problems?” Sharon, to my relief, quickly let him know it was my first day as manager. He directed us to find out what was wrong and get it fixed that night or Ford would never have another contract with his division. But, he said, if we want to re-run the test scenarios he was willing to spend the night here but he had to be on a plane, with the verified software by 8 AM in the morning. Sharon said we’d find the problem and run the tests.
We went to the supervisor’s office. The supervisor was a tough, seasoned, ex-Marine used to conflict and discourse, was also crying and said “This is it, I’m through. I quit.”

“What happened?”

“We started the test scenarios and the first test was simply adding a new user. There was a system error. The SETA (Systems Engineering and Technical Assistance) contractor quickly jumped to the conclusion that the code wasn’t complete and we must be hiding problems and issues. The Lt. Colonel blew up, yelled at everyone, and they left in tears. I don’t understand it. We worked long and hard on this release and the tests we ran were perfect.”

A senior programmer, one of Ford’s best, appeared in the doorway. “Excuse me,” he stammered. “I’m afraid I know what is wrong.”

Seems he, at the last minute, had noticed a small issue in the software. Although he was well aware that the process was to first notify the Configuration Manager (CM) before changing anything, and this change wasn’t even needed for this delivery, he thought it was just a simple quick fix. Unfortunately, he failed to realize that that the code was shared by a newly implemented user registration function and his change broke that function. He had reverted the code to the prior version and was sure it would now pass the tests.

The team was rounded back up and were all more than willing to spend the night re-verifying the tests to exonerate themselves. In the morning the tests had all been passed with glowing colors and the Lt. Colonel at 7 AM left the awards to be passed out to the team later in the day.

Ironically, the senior developer who caused the issue was the one who was scheduled to travel with the Lt. Colonel to install the software in Saudi Arabia. The LT. Colonel seemed to take pleasure in telling him before they left that he’d need to shave his beard. “What?” the developer complained. “Well, it’s up to you, but the gas masks don’t fit well over beards.” This was Desert Storm. He shaved.

A good process needs to be both easy to follow and so integrated into the workflow that it’s second nature. It isn’t a burden, just a simple step in the normal activities. And developers have to “buy in” that it’s required, important, and a step that cannot ever, ever be skipped. The issues caused may not be as significant as the story above, but if changes are made to code without a process, there will be issues. This is true for all projects – from major defense software down to little start-ups.

The process we implemented at Azerity was that our SD Tracker tool was the tracking system for every change. Every change to our requirements docs or code needed to be an “SDR” in SD Tracker. Adding an SDR was quick and easy. If a developer wanted to change code for any reason, all they needed to do was enter an SDR saying that. Of course the development managers had guidelines about when and which developers could act immediately on their desire to change other areas of code and when reviews or other conditions were needed to be met. Then QA knew it needed verification and we could confidently tell our clients that our Release Notes listed all areas of code changes which helped reduce their SOX testing on maintenance releases.

Fuzzy Peas

We lived in North Carolina many years ago (our youngest daughter was born there).  On weekends we liked to take rides in the car, my husband and I and our oldest daughter, then two.  We’d go into the mountains, visit the furniture stores, or drive off to the sea side.  One of our favorite places was Ashville, NC – in the foothills of the Blue Ridge Mountains.  It was lovely there in June – just the right temperature.  Not too humid.  Enough elevation to get away from the early summer heat. 

There was a great big old house in Ashville the looked like it could have been a plantation or stately manor.  It had a huge porch all the way around the house .  The owners had turned it into a restaurant and the porch now had picnic tables set for visitors.  We went there the first time with our 2-year old daughter and were seated at one of the picnic table overlooking pine trees.  There were no menus but the table was already set with plates and dinnerware.  Shortly a woman came to our table and said “Today we’re having chicken and pork.  Would you care to stay for dinner?”  Not sure what that meant, having expected a menu so everyone could order their choice, we wanted to find out what this Southern option was so agreed and soon large bowls of roasted chicken, pork, boiled potatoes, and black-eyed peas were brought to our table – Southern cooking, family style.  I’d never had black-eyed peas before.

Years later I attended a software management lecture by a man from Tennessee who, with his very Southern accent, talked about the “fuzzy P’s”.  I initially thought he was referring to those black-eyed peas from the South.  But no.  He was referring to the 3 P’s that drive a software project:  People, Plan, and Product.  “People” are the number of heads you can put on the project.  And while you can’t gather 9 women and produce a baby in a month, there are some impacts that can be made if the right resources are allocated to the right schedules.  “Plan” is the schedule – moving the schedule in or out is an obvious choice and one of the ways a manager can effect the end result.  And “Product” refers to how much product (how many changes, bug fixes, enhancements) is included in that release or that service delivery to the customer.  Remove some features, save some time. 

The three “P”s can be adjusted to affect the end result.   But that’s it.  Those are the only viable axes in the three-dimensional world of software that can be controlled and still produce a good, product.  If axis one doesn’t get shortened, the others will not be impacted.  If a software schedule can’t be met, then either more people are needed or less changes / enhancements/ fixes can get into the delivery. 

Usually CEOs want it all – they want the product with all the specified features in the timeframe they want it using only the resources that fit their budget.  But if the three axes don’t align, something’s got to give.  And it’s the software manager’s job to juggle the axes – more people here, less product there.  But CEOs push back and too often software managers try to appease them and agree to accept the dictated schedule with the resources allocated and all the specified product features. 

And there’s only one result – the hit is on quality.   When there aren’t enough resources to do the job right, quality suffers.  It isn’t always apparent to the CEO.  Perhaps the team even thinks they are doing a good job by delivering the product and making the milestones.  There’s a big party to celebrate the release, and everyone is congratulated.  But it’s the customers that will be impacted when they encounter the bugs that ultimately will result. 

And ultimately this approach will affect the bottom line.  It’s another well-known software rule that bugs found by a customer are 1000 times more costly to fix than bugs found during the design phase.  If a bug found during design (or at that point, an issue or problem) take a few minutes – say it would cost $10 to fix, if it isn’t found in design but rather during coding it will cost a couple of hours or $100 to fix.  If found during the QA cycle it costs $1,000 to fix (develop, re-test).  And the same issue found by the client costs $10,000 to fix.  Measure it.  It’s a fact.  Issues found by clients need to go back to the design, impact code, changes are likely to cause other issues, QA needs to be re-done.  Manuals updated.  Other clients notified.  It’s a very expensive proposition.  Not only that, it affects the customers’ perception of the company and it’s software.

So why isn’t quality the primary focus since it’s the most expensive error to make?  It’s the fourth fuzzy P.   “Perception.”  As long as the CEO “perceives” that the product is going out regularly, that everything is on track, managers are rewarded and all’s well.  Or seems to be.   But letting quality slide is a slippery slope.  If no one is tracking the overall quality metrics, quality can slide without anyone noticing until the product has degraded to the point the customers rebel.  Take the Microsoft operating system years ago where the blue screen of death was the well-know scenario. 

Bottom line – the trade-off should never be quality.  Good software managers need to watch their P’s and their Q.

No Ordinary Moments

Satellite Launch
We had the opportunity last month, as my husband’s retirement gift, to attend the launch of his company’s latest satellite at Cape Canaveral in Florida (Kennedy Space Center). Since I spent the first 20 years at the same company working on satellites, it was a thrill for both of us. We’d never seen a launch before.

This was a special satellite – the largest ever launched at the Cape. With solar array panels that would unfurl to the size of a basketball court. A flexible antenna that would would spread 40 feet, large enough to provide satellite communications for the entire US.

Solar Arrays Satellite Antenna

But even more special was the fact that this satellite was, unlike any of the other satellites my husband’s company builds – communications satellites for various countries or for Intelsat, weather satellites for the weather service – this satellite was commissioned by a small start-up company. Only 50 people. All part owners of the company. And so they all came to watch the launch with their spouses and small children. Their entire company rested on a successful launch. And the riskiest part of a satellite launch is from the launch pad to orbit. Risk of the rocket exploding on the launch pad, an explosion as it is being hurled into space, or a failure during the separation stage. Satellites take 3 years to build. Even though insured, a replacement satellite would take at least 2 years to build. A big risk for a start-up ahead of the competition with new technology and leading-edge ideas.

And so it was, with breaths held in, that everyone watched from the balcony of the observation platform as the rocket’s engines began to spew billowing smoke and with a roar, the huge weight rose from the launch platform. The cheer was heard from the crowd and everyone hurried to the monitors to view the rocket perform through it’s roll and booster separation stages. Mission control provided ongoing updates of the status of the satellite – 5,000 miles above the earth, 10,000 miles until, 30 minutes later, it reached it’s destination 19,400 miles high over Australia where the satellite module performed it’s final separation from the rocket, free to use it’s own thrusters to raise to it’s final elevation and begin to unfurl it’s solar arrays. It would be several weeks until it was completely positioned with the antenna unfurled and ready to begin to transmit but the high-risk part was over and everyone could breath a sigh of relief. A successful launch! It was a thrilling event to attend.

One of the company’s Board of Directors said “I’ve been on the Board of many start-ups. But never in my life have I had the experience where so much rested on 30 minutes.”

But was he right? True, it was very apparent the risk and rewards of that 30 minute timeframe. But are other time periods, other moments of lesser importance? In Dan Millman’s “The Peaceful Warrior” books, Dan’s teacher Socrates, drives home the lesson that there are no ordinary moments. Socrates teaches to be aware of your every movement and to appreciate every task. That the more we are able to live in the moment, the more we get from our lives.

How does this apply to companies and managers? It’s surprising how often companies have no real direction or worse, no sense of urgency. Most companies think they have both yet an outsider can easily see that they are not moving forward but rather in a circle. March’s blog talks about the use of metrics to measure the real progress. But you can also recognize moving in a circle in other ways. How often have you been to meetings and realize that the same meeting was held six months or a year ago with the same resulting list of goals or same decision yet people walked out of the meeting and months later there was no action. Often we move along in a daze, without making progress towards the goal but not recognizing it, much activity but very little progress. Measuring progress with metrics is a way to tell how you’ve done. But to really be effective, one must be very focused on the small details, all of the pieces – each moment where a decision or lack of action can make the difference between success and failure. So when we think that it’s OK to just do the minimum required to meet a schedule, that as long as we deliver something on time, even though we know it has a minor quality problem here or a known issue there, we are saying those are the ordinary actions most companies do and they squeak by so why shouldn’t we? Why should we keep trying for perfection when we can get buy with less?

A 30-minute launch is extremely spectacular but let’s not forget that moments that seem ordinary can end up having a huge impact downstream. If we could recognize how special each moment is and act accordingly, wow. Wouldn’t that be spectacular.

Carrying the Torch

2008_Olympic_Torch_Relay   As the Olympic Torch heads towards San Francisco and what should be an event that joins nations together instead has been mired in controversy, I think of how often one person carrying a torch can bring to light issues and needs that otherwise would remain in the dark.

In a company, torch bearers are needed in every organization.  It’s easy to get in a rut of complatency, doing your job day after day.  Often no one notices the slow deteriorization of quality or effectiveness.

Another way of saying it may simply be that no one is “watching the ship.”  Some might argue “Isn’t that just a normal part of good management?”  But to that I’d respond “Yes.  But…”

Yes.  It’s true that good managers continually monitor and measure performance.  But after 30 years in the software business I can also say it’s common for even good managers to get focused on the wrong metrics.  Or focused on fighting fires and the performance and process metrics go by the way-side.  Or focused on following the company mantra and miss the signs that indicate real, underlying trouble.

March’s blog talked about the use of metrics to identify and quantify changes in effectiveness over time.  But often the message that the metrics are elucidating go unnoticed unless people are ready and willing to carry the torch to help the decision makers and top-level management see the light.

As Time Marches On – Use Metrics

It’s March already.  As days, months, and years pass by, often we just move ahead, one step after another, and don’t lift our heads up to see if we’re going in the right direction or what progress we’ve made.  Periodically we need to stop, step back, and assess our progress and how we’re doing.  True in life, true in software companies.

Sometimes in a software company, all organizations are hard at work but something is amiss.  In one software company, the technical support team was feeling that the customer’s needs weren’t getting addressed yet all of the product organizations were working hard, producing new releases with client-requested enhancements, and regularly issuing standard bug fix maintenance releases.  All of the orgs felt they were busy and overworked but that the product and quality were on track.  But by using metrics, they were able to assess the real status.

Metrics were evaluated about the number of customer calls currently being reported that were product bugs or other product issues versus the number one year prior and two years prior.  The metrics included turn-around time to get the issue resolved. 

What was clear from the metrics was that the number of bug reports had been steadily increasing as new clients buying and installing the software and existing clients were steadily upgrading to the newer releases.  In parallel, several new projects were underway, stretching the bandwidth of the product marketing, development and QA orgs.  So instead of trying to quickly fix all newly reported issues as they came in, which had been the process in prior years, in order to reduce workload on the developers and QA, fixes were being pushed out to maintenance releases two, three, or more months in the future instead of the next planned release.  As a result, more clients were finding related product issues and more issues were being escalated.  So to appease the clients who complained the loudest and wouldn’t wait for the future releases, the clients were sent one-off class files, tested only by the support organization instead of QA.   If multiple clients needed the change in different releases, the developers zipped up sets of fixes.  Then confusion ensued about which client had what file and instead of easing the load, this new degraded process was actually increasing the amount of work due to more call and more one-off fixes.  And as a results, the overall product quality was impacted, causing more client frustration.   When compared with prior years where bugs were immediately categorized and important issues quickly fixed, now there were too many fire drills and much confusion.

Metrics in this case uncovered both the negative quality trend and the underlying cause.  But there is a right way and a wrong way to use metrics.  A company can recognize metrics used in the wrong way when employee behavior is effected in non-useful ways.  For example, one company used metrics to measure their Technical Support response time and rewarded the techs for maintaining 90 percent first-customer-contact turn-around time in less than four hours.  The TS metrics looked great but in reality what the techs were doing was that when they received an automated call from a client, they would place their return call during the lunch hour or just after the company closed, raising the probability that they would be able to simply leave a voice message thereby responding to the call within 4 hours but without having to spend time discussing the call or resolving the problem which could tie them up and make them miss another client’s 4-hour call window.  As a result, clients were not talking to a human for one, two days or up to a week and were playing “telephone tag” and getting frustrated.

In another company, a percentage of each developers merit plan was based on low bug count.  But often issues reported by users as “bugs” were in reality items that were never spec’d or were spec’d incorrectly. So a lot of conflict resulted, arguments between the development org and support arose (“It is a bug.”  “No, it isn’t a bug.”)  Team members became opponents which created organizational silos and mistrust.  Once the underlying issue was realized, the process was changed and a new Tracker category was created separate from “bug” or “enhancement” to denote a design flaw or spec bug.  This allowed the Technical Support team to push that the issue was perceived to be a bug in the client’s eyes and thus get the problem resolved in a maintenance release rather than wait for the yearly enhancement releases. 

But correctly removed the “blame” from the development organization since the issue wasn’t caused by a coding or process issue like a real bug would be and the correct metric was then being used to measure developer performance.  The finger-pointing and arguments ceased, silo walls came down, and the product organizations coalesced into a supportive, cohesive team.

It’s easy to maintain status quo – to march along without noticing the slow and gradual deterioration of quality and effective processes.  But by stepping back periodically and reviewing key metrics, teams can make sure they are working effectively and efficiently.

PS:  Make sure you have measurable metrics – use Tracker to track Calls, Bugs, Enhancement requests and more.  For at-your-fingertips metrics for future use.