Backup and Disaster Recovery
Firefly's Backup and Disaster Recovery (DR) capabilities provide robust tools to safeguard your cloud infrastructure, enabling you to recover quickly from failures and prevent misconfigurations that could lead to outages. This guide details how to use Firefly to mitigate, diagnose, and recover from infrastructure failures, as well as how to proactively prevent them.
Overview: Disaster Recovery
Disaster recovery (DR) is the process of restoring your cloud environment to a healthy state after an incident such as accidental deletion, misconfiguration, or infrastructure failure. Firefly offers:
Automated backups and configuration history
Rapid recovery tools for deleted or misconfigured assets
Comprehensive mutation logs for root cause analysis
Proactive notifications and insights to prevent disasters
Recovering from Infrastructure Failure
When an infrastructure failure occurs, Firefly provides tools to help you diagnose, resolve, and recover. The recovery process depends on whether you know which asset caused the failure.
Recovering Deleted Assets: When the Responsible Asset is Known
If you know which asset was deleted or misconfigured (e.g., a team member accidentally deleted a resource), you can restore it using Firefly's codification and GitOps integration.
Procedure:
Select Inventory > Deleted.
This view lists all assets that have been deleted from your environment.
Filter by Time Range.
Use the filter to narrow down the list to the relevant time period.
Select the Deleted Asset and Codify.
Click on the asset that was deleted. Use the "Codify" action to generate the Infrastructure-as-Code (IaC) template for the asset.
Create a Pull Request.
Firefly will prompt you to select the appropriate repository and branch for your GitOps workflow. Submit a pull request to restore the asset via code.
Review and Merge.
Once reviewed and merged, your CI/CD pipeline will recreate the asset in your cloud environment.
Tip: This process ensures that the restored asset is managed by code, reducing the risk of future drift or manual misconfiguration.
Viewing Mutations: When the Responsible Asset is Unknown
If you do not know which asset caused the failure, use Firefly's mutation tracking to investigate recent changes and identify the root cause.
Procedure:
Select Inventory.
View all assets in your environment.
Apply Filters.
Filter assets by data source, environment, account, and location to narrow your search.
Filter by Asset Flags: Mutations.
Use the "Mutations" filter to show assets with recent changes or drifts.
Review Mutation Log.
Select an asset and open its mutation log to see a timeline of configuration changes.
Codify Revision.
For any suspicious or recent change, select the revision date and use "Codify Revision" to generate the IaC template for that point in time.
Revert via Pull Request or Terraform Import.
Restore the asset to a previous configuration by submitting a pull request or using the provided Terraform import commands.
Tip: Mutation logs provide a detailed audit trail, including who made each change and what was modified, making root cause analysis straightforward.
Preventing Misconfiguration and Reliability Risks
Proactive prevention is key to avoiding disasters. Firefly enables you to set up notifications and subscribe to insights that alert you to risky configurations or changes.
Receiving Notifications on Asset Changes
Stay informed about changes in your infrastructure by subscribing to notifications. These alerts help you:
Detect single points of failure
Monitor data protection status
Maintain visibility into system operations
How to Subscribe:
Go to Settings > Notifications in Firefly.
Choose your preferred notification channels (Slack, Teams, email, etc.).
Select which events or asset changes should trigger notifications (e.g., deletions, drifts, policy violations).
Tip: Fine-tune your notification settings to avoid alert fatigue and focus on critical events.
Subscribing to Insights for Reliability and Misconfiguration Prevention
Firefly Insights are policy-driven checks that highlight risky configurations. Subscribing to these insights helps you proactively address issues before they lead to outages.
Top 5 Insights to Reduce Disaster Risk:
Reliability: AWS Auto-Scaling Groups in a Single Availability Zone
Auto-scaling groups should span multiple availability zones for high availability. Single-zone groups risk downtime if that zone fails.
Reliability: AWS Database Instances in a Single Availability Zone
Databases in one zone are vulnerable to zone failures. Multi-AZ deployment is recommended for resilience.
Reliability: AWS RDS Instance Without Deletion Protection
Without deletion protection, accidental or automated deletions can cause permanent data loss.
Reliability: AWS DynamoDB Tables Without Point-in-Time Recovery
Enable point-in-time recovery to restore tables to any previous state and protect against data loss.
Misconfiguration: AWS ELB/LB Without Access Logs Enabled
Access logs are essential for troubleshooting, monitoring, and security analysis. Enable logging to maintain visibility.
How to Subscribe:
Go to Settings > Insights in Firefly.
Subscribe to the above insights and configure notification preferences.
Tip: Regularly review insight recommendations and remediate flagged issues to maintain a resilient infrastructure.
Summary
Firefly's backup and disaster recovery features empower you to:
Rapidly recover from accidental deletions or misconfigurations
Investigate and revert problematic changes
Proactively prevent outages with real-time notifications and policy-driven insights
By integrating these tools and practices into your operations, you can ensure your cloud environment remains resilient, auditable, and secure.
Last updated
Was this helpful?