Firefly Documentation Portal
  • Welcome to the Documentation Portal
  • Contacting Firefly support
  • User Guides
    • QuickStart Guide
      • Onboarding
      • Dashboard
      • FAQ
      • Glossary
        • IaC status
    • Exploring the Inventory
    • Compose: Generating new configuration
    • Navigating the IaC Explorer
    • Workflows
      • Guardrails
    • Integrations
      • Integrate your providers and tools
        • Integrate your data sources
          • Integrate PagerDuty
          • Integrate MongoDB Atlas
          • Integrate AWS
            • Integrate AWS using Terraform
            • Integrate AWS using CloudFormation
            • Upgrading AWS integration to event-driven
            • AWS Discovery Status
          • Integrate Google Cloud
            • Integrate Google Cloud using a service account key
            • Integrate Google Cloud using Terraform
            • Google Cloud Discovery Status
          • Integrate Kubernetes
          • Integrate Datadog
          • Integrate New Relic
          • Integrate Okta
          • Integrate GitHub service
          • Integrate Cloudflare
          • Integrate NS1
          • Integrate Microsoft Azure
            • Integrate Microsoft Azure using Terraform
            • Azure Discovery Status
          • Integrate HashiCorp Vault
        • Integrate your IaC remote states
          • Integrate Terraform Cloud
          • Integrate Terraform Enterprise
          • Integrate HashiCorp Consul
          • Integrate remote stacks in Google Cloud Storage
          • Integrate env0
        • Integrate your version control system
          • Integrate GitHub
          • Integrate GitLab
          • Integrate Bitbucket
            • Integrate Bitbucket Data Center
            • Integrate Bitbucket Cloud
          • Integrate AWS CodeCommit
          • Integrate Azure DevOps
        • Send Firefly notifications to your messaging tools
          • Send Firefly notifications to Slack
            • Send notifications to Slack using the Slack App
            • Sending notifications to Slack using a webhook
          • Send Firefly notifications to Microsoft Teams
          • Send Firefly notifications to Torq
          • Send Firefly notifications to webhooks
          • Send Firefly notifications to Opsgenie
          • Send Firefly notifications to PagerDuty
            • Integration Key
            • General Access REST API Key
          • Send Firefly notifications to Google Chat
        • Integrate project management tools
          • Integrate Jira
    • Governance
    • Event-Center
    • How-to Guides
      • Manage assets
        • Codify assets
          • Codify assets to Config Connector
          • Codify assets to Manifest
          • Codify assets to Helm
          • Codify assets to CDK8S
          • Codify assets to Terraform
          • Codify assets to Pulumi
          • Codify assets to CloudFormation
          • Codify assets to CDK
          • Codify assets to Crossplane
          • Codify assets to Ansible
        • Delete unmanaged assets
        • Fix drifts
        • Remove asset Terraform code
        • Excluded drifts
        • IaC-Ignored assets
      • Monitor events
      • Manage notifications
      • Manage user roles
    • Deep Dive articles
      • Disaster recovery
      • Drifts
      • Codification
      • Notifications
      • Governance
      • Event-driven
      • IaC-Ignored assets
  • Appendix
    • Migrating CloudFormation resources to Terraform
    • Terraform Cloud Run Tasks
    • Creating a key pair
    • SSO Configuration
    • Firefly API Documentation
    • Support Matrix
    • Data privacy and AI usage
  • Firefly MCP
  • Firefly Backstage Plugin
Powered by GitBook
On this page
  • Recovering from infrastructure failure
  • Recovering deleted assets: When the asset responsible for infrastructure failure is known
  • Viewing mutations: When the asset responsible for infrastructure failure is unknown
  • Preventing misconfiguration and reliability risks
  • Receiving notifications on changes in the status or configuration of your assets

Was this helpful?

  1. User Guides
  2. Deep Dive articles

Disaster recovery

Tools to mitigate and recover from infrastructure failure.

PreviousDeep Dive articlesNextDrifts

Last updated 9 months ago

Was this helpful?

Recovering from infrastructure failure

To diagnose, resolve, and recover from an infrastructure failure, use the tools below:

Recovering deleted assets: When the asset responsible for infrastructure failure is known

If the infrastructure failure is due to a recent change in your infrastructure, and you can identify which asset is responsible for the failure. For example, in the event a team member accidentally deleted an asset, you can resurrect it using code.

Procedure

  1. Select Inventory > Deleted.

  2. From the filter, select the appropriate time range.

  3. Select the asset that was accidently deleted and Codify.

  4. To revive your asset, create a pull request. Verify you selected the appropriate repository and branch of your GitOps.

Viewing mutations: When the asset responsible for infrastructure failure is unknown

If the asset causing the issue is unknown, follow the steps below to resolve the issue:

Procedure

  1. Select Inventory.

  2. Filter your assets according to data source, environment, account, and location.

  3. From the asset flags filter, select Mutations.

  4. Select an asset > Mutation log.

  5. To view the revision code and revert your asset to a previous configuration, select the revision date and Codify Revision.

  6. To revert back to a previous configuration, select Pull request or use the Terraform Import Commands.

Preventing misconfiguration and reliability risks

Receiving notifications on changes in the status or configuration of your assets

Prevent disasters by subscribing to notifications that alert you to any changes in the state or infrastructure of your assets. These notifications enhance your awareness of single points of failure, data protection, and system operation visibility. These attributes are crucial for early identification of potential issues that may lead to service disruptions or disaster scenarios.

To reduce the risk of service disruption or disaster, subscribe to the top five Insight notifications below:

Reliability: AWS auto-scaling groups are running with only a single availability zone

Auto-scaling groups are used to automatically adjust the number of instances in a group based on changing demands. By default, these auto-scaling groups are usually set to operate across multiple availability zones to ensure high availability and fault tolerance. Running auto-scaling groups in multiple availability zones means that if one zone experiences issues or failures, the other zones can continue to operate and maintain the desired level of service. Therefore, this configuration can pose a risk because if that availability zone experiences problems, it may lead to service disruption or downtime without any automatic failover to other zones.

Reliability: AWS database instances are currently deployed in only one availability zone

When a database instance is deployed in a single availability zone, it becomes more vulnerable to potential failures or disruptions in that zone. If the availability zone experiences issues, such as hardware failures, network problems, or scheduled maintenance, the entire database instance could become unavailable until the issue is resolved.

Reliability: AWS RDS instance without deletion protection

When deletion protection is turned off, the database instance can be deleted by users or automated processes without an additional safeguard. This poses a risk because if the instance is accidentally deleted, all the data and configurations associated with the database will be lost permanently. It can lead to data loss and disruption of applications or services that rely on that database.

Reliability: AWS DynamoDB tables without point-in-time recovery enabled

Enabling Point-in-Time Recovery on DynamoDB tables is good practice, especially for critical or important data. It provides an additional layer of protection against data loss and offers peace of mind knowing that you have the option to restore the table to a previous state if needed.

Misconfiguration: AWS ELB/LB without any access logs enabled

Failure to enable access logging results in the load balancer's inability to generate log files, leading to a lack of comprehensive insight into the traffic and requests being processed. The absence of access logs makes it difficult to effectively troubleshoot issues, monitor traffic patterns, and conduct thorough security analysis.

To watch a visual presentation of disaster recovery, play the video below:

To mitigate an infrastructure failure, set up your notification subscriptions to .

are policies that improve the configuration of your assets. By subscribing to the following Insights, you can proactively avoid situations that may lead your account to a disastrous outcome.

Insights
Recover deleted assets
View mutations
receive notifications on changes in the status or configuration of your assets