Poorly defined software requirements, or lack thereof, can contribute to a project being late, over budget, or failing to achieve the required software functionality. Taking the time to write well-defined software requirements with the input of all project stakeholders can be the difference between the success or failure of your software development effort.

 
If you need software development related assistance, we can help. Please schedule a complementary consultation with one of our consultants.
Schedule a Free Consult

Transcript:

Building fault tolerance into your software provides you with a system which is designed to detect specific failures and mitigates a significant amount of risk all within the software-level. Can you imagine how beneficial this is to high reliability and/or safety-critical systems? At the highest level, conduct a risk assessment to identify possible failure points within your system. Once you’ve identified potential areas of failure, apply a criticality rating to each scenario. When you determined the greatest risks, you can begin to design the fault tolerance solutions into your system.

Here’s a scenario – fast forward a bit to where your system is fully operational. In this scenario, your team detects a failure within the system. First, confirm the error or failure message is valid by running cyclic redundancy check. Once you have deemed the error to be valid, it is important that you then record where the detected error occurred so that troubleshooting efforts can then commence more smoothly. 

After both detecting and validating the failure, the software team can then move forward with the recovery process. Your response to the failure should appropriately suit the given situation, being conducted in the following order: retry, recover, or fail safe.

By retrying, you can recover from brief intermittent issues. If retrying does not solve the issue at hand, you can then opt to recover from the detected failure. The recovery feature minimizes corruption from the fault and allows you to access your system’s original files. If all else fails, we highly suggest using a fail-safe feature so that the detected failure will not endanger anyone who is near the failed system – making the system just as safe to be near as it would be without the failure.

All in all, once fault tolerance is built into your safety-critical system, your engineers can detect, validate, record, and respond to the error(s) by first retrying, recovering, or using fail safe, in that order. If you find yourself in need of help while looking to implement fault tolerance into your system, please contact us. At PSI, we specialize in custom software solutions for projects that cannot risk failure, and we are always happy to help! Comment below or reach out to us at info@psi-software.com


No Comment

You can post first response comment.

Leave A Comment

Please enter your name. Please enter an valid email address. Please enter message.