(no subject)
Jan. 30th, 2007 11:43 am![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Fixing bugs on our brand-new Java application sometimes takes us back to
the 1960's. There are often bugs that only show up at customer sites,
whether due to scale of implementation or manner of use or some external
environmental factors, and the only way to fix them is to make one's
Best GuessTM at where the program is going wrong, make a
change, compile it, and send it to the customer to try. This means that
any typographical or cut-n-paste error will not be caught at our
site--as long as it compiles--but the new code won't fix the problem. So
we have to try all over again, with at least a 24-hour turnaround before
the new code can go out (in the case I'm working with right now, it's
more like 72 hours--they won't let me send it until Friday) and an
additional several hours or days before we can tell whether it worked.
Might as well be writing code in longhand, submitting it to a typist,
and waiting for a scheduled compile and a scheduled run.
These bugs are usually members of what I call the "Mysterious" class:
bugs that leave clear evidence that they've occurred, but without any
trace of evidence of how. There's no stack trace, or if there is one the
error is not reproducible. If there isn't, we can see data integrity
issues after the fact but the result is irreproducible. Sometimes, if
I'm lucky, I can find an obvious glaring hole in the code that updates
the data in question, but usually it appears to be fine.
Much of my role here seems to be "collector of mysterious bugs". I end
up with a large number of open Change Requests all reporting the same
irreproducible problem, and poke at them from time to time hoping that
something might have changed in the meantime allowing the bug to be
reproduced, though usually it isn't. And then, miraculously, I get
another CR of the same issue which has just one more vital piece of the
scenario, and voila! Half an hour later, the whole issue is resolved.
A recent example of this type was a situation where all the CR's
reported "The SO closes but the accounts are left pending." (Details as
to what this means are not really relevant.) I could see that the data
was indeed left in the invalid state they claimed, but every time I
repaired the data and did the same process myself--using their software
and everything--the process worked perfectly. Until this week, when I
finally hit one that *repeatedly* threw an error. After the error, if
you continued, you'd get the stated result. And the same thing happened
to me on the retry, over and over. Woohoo! Fourteen "Mystery" CR's, all
closed.
the 1960's. There are often bugs that only show up at customer sites,
whether due to scale of implementation or manner of use or some external
environmental factors, and the only way to fix them is to make one's
Best GuessTM at where the program is going wrong, make a
change, compile it, and send it to the customer to try. This means that
any typographical or cut-n-paste error will not be caught at our
site--as long as it compiles--but the new code won't fix the problem. So
we have to try all over again, with at least a 24-hour turnaround before
the new code can go out (in the case I'm working with right now, it's
more like 72 hours--they won't let me send it until Friday) and an
additional several hours or days before we can tell whether it worked.
Might as well be writing code in longhand, submitting it to a typist,
and waiting for a scheduled compile and a scheduled run.
These bugs are usually members of what I call the "Mysterious" class:
bugs that leave clear evidence that they've occurred, but without any
trace of evidence of how. There's no stack trace, or if there is one the
error is not reproducible. If there isn't, we can see data integrity
issues after the fact but the result is irreproducible. Sometimes, if
I'm lucky, I can find an obvious glaring hole in the code that updates
the data in question, but usually it appears to be fine.
Much of my role here seems to be "collector of mysterious bugs". I end
up with a large number of open Change Requests all reporting the same
irreproducible problem, and poke at them from time to time hoping that
something might have changed in the meantime allowing the bug to be
reproduced, though usually it isn't. And then, miraculously, I get
another CR of the same issue which has just one more vital piece of the
scenario, and voila! Half an hour later, the whole issue is resolved.
A recent example of this type was a situation where all the CR's
reported "The SO closes but the accounts are left pending." (Details as
to what this means are not really relevant.) I could see that the data
was indeed left in the invalid state they claimed, but every time I
repaired the data and did the same process myself--using their software
and everything--the process worked perfectly. Until this week, when I
finally hit one that *repeatedly* threw an error. After the error, if
you continued, you'd get the stated result. And the same thing happened
to me on the retry, over and over. Woohoo! Fourteen "Mystery" CR's, all
closed.
no subject
Date: 2007-01-30 05:45 pm (UTC)Should we get you a bunch of pins and a nice display box for you collection?
no subject
Date: 2007-01-30 05:53 pm (UTC)no subject
Date: 2007-01-30 06:44 pm (UTC)I had 2 laptops. I cloning the original took 6 hours. Sure enough, they had correctly described the problem (after 2 years of attempts). So 10 minutes latter I had a laptop that needed to be reformatted.
Once a day I got to run a test. In the end, it was an unitialized pointer that was being read, so each time I modified the code, the problem would move slightly. It took me about 3 weeks of 1 test a day to get it right.
So, when you say "72 hours" to deploy, I feel your pain.
no subject
Date: 2007-01-30 07:22 pm (UTC)Well as much as everyone LOVES UNIX - the whole thing takes us back way before I ever started programming! It's the best software money could buy in the late 50s early 60s!
I feel your pain too - I was here LATE (for me) last night doing much the same thing. (Hey, I almost ran into you in the parking lot as I was attempting to leave in a hurry before someone could find something else for me to fix!)