![]() |
| Ideas, thoughts, and resources from the permanently curious. |
| Living | World | Politics | Business | SciTech | Health | Entertainment | Opinion | Sports | About | Contact |
| |||||||||
|
Lessons learned about risk management (3/24/05) Again we find ourselves in the midst of a Squirrel, Inc. story, this time at the Division of Financial Management where some Risk Management concerns have arisen. SNSI is a regional stock index of Squirrel Nut stocks, a massive zillion acorns a year market. Hello, I've already got some experience with Risk Management due to SNSI and a few other things, so I figured I'd compose a lessons learned document. Please let me know if it is helpful. With the old SNSI system the impetus was on the clients (TV & Newspapers) to notify us when the system went wrong. This, of course, created no end of stress for everyone involved and in general involved a lot of screaming. With the new system we tried to put some safeguards in place to have the system notify us if there was an issue. Nairc and I coded so many fail safes into it that it should never have failed, and yet it did. Things beyond our control, like the network or email, would have a blip and mess the system up, and then we'd still be in the same boat as before except even more exasperated because we'd done so much work to prevent it and it still happened. Half the time one of our fail-safes had a bug that caused the system to fail, so it was almost like the system was better off before all of that work we'd done. I coded fail-safes for the fail-safes, and in general went down a never ending hole where I felt like I was trying to respond to an endless list of possibilities and the various combinations and permutations from them. In the end, the breakthrough came from changing my thinking. I took the perspective of a client and had the deliverables sent to my phone via SMS. So, while in the past I'd be chained to a computer every week day @ 5, by assuming this new perspective I was able to gain a degree of freedom in that I just needed to be near a computer IF I didn't get the emails on my phone. I also ditched the fail-safes and focused on fixing the bugs in the "main" code, and stuck in wait times to give the code time to jump to the next section. Rather than depending on something happening exactly when I wanted it to, I had it try it a few times so if the first one failed because some other part of the program hadn't finished, it'd accept that, wait a bit, and then move on. In other words, I coded it to expect failure and sense success, which is a lot less complicated (see point #3 below) than "sensing" failure due to the many known and unknown facets of failure. One other thing that made a huge difference was ditching the bells and whistles we had used to get data to the clients, and giving them more ways to get the data. So they still get the data via email, but if the emails fail they also know that after a certain time the data is uploaded to a web server, and they can check a really simple webpage to get whatever they need instead of the slower main page. So while the main SNSI page is data intensive and takes some time to load (encouraging stress, I might add):
they now have a quick and simple page to use:
On that page I also gave them open access to the data itself, so they are free to use their own graphs if they wish. They've got the data so their not dependant on our graphs, which would occasionally fail or be a little messed up. Again, it is all about providing them with choices to do as they please. It is my "chained to the computer" comment above that I really want to ensure we steer around with this effort. With SNSI failing "off" the impetus was on me to check every single day to find out if it had worked our not. By having it go to my cell I simply have to listen for a beep around 5:07pm. If I don't hear it, I know to get to a computer. The knee jerk reaction to that is to dream up some system that'll listen for the beep and if it doesn't happen notify me, but the complexity of that just adds more points of failure and would end up making the situation worse, rather than better. In the end it has been reducing points of failure and sensing success that has helped me. It has also helped to focus on those points of failure that do exist, to try to make them less likely to fail. So, to summarize:
- Jason
Business Most popular topics |
| Living | World | Politics | Business | SciTech | Health | Entertainment | Opinion | Sports | About | Contact |