Part 2: Simulate Failures

In this part, you'll simulate failures to see how Temporal handles them. This demonstrates why Temporal is particularly useful for building reliable systems.

Part 1: Basic Workflow

Part 2: Failure Simulation

Systems fail in unpredictable ways. A seemingly harmless deployment can bring down production, a database connection can time out during peak traffic, or a third-party service can decide to have an outage. Despite our best efforts with comprehensive testing and monitoring, systems are inherently unpredictable and complex. Networks fail, servers restart unexpectedly, and dependencies we trust can become unavailable without warning.

Traditional systems aren't equipped to handle these realities. When something fails halfway through a multi-step process, you're left with partial state, inconsistent data, and the complex task of figuring out where things went wrong and how to recover. Most applications either lose progress entirely or require you to build extensive checkpointing and recovery logic.

In this tutorial, you'll see Temporal's durable execution in action by running two tests: crashing a server while it's working and fixing code problems on the fly without stopping your application.

Recover from a server crash

Unlike other solutions, Temporal is designed with failure in mind. In this part of the tutorial, you'll simulate a server crash mid-transaction and watch Temporal helps you recover from it.

Here's the challenge: Kill your Worker process while money is being transferred. In traditional systems, this would corrupt the transaction or lose data entirely.

What We're Testing

Worker

→

CRASH

→

Recovery

→

Success

Before You Start

Worker is currently stopped

You have terminals ready (Terminal 2 for Worker, Terminal 3 for Workflow)

Web UI is open at http://localhost:8233

What's happening behind the scenes?

Unlike many modern applications that require complex leader election processes and external databases to handle failure, Temporal automatically preserves the state of your Workflow even if the server is down. You can test this by stopping the Temporal Service while a Workflow Execution is in progress.

No data is lost once the Temporal Service went offline. When it comes back online, the work picked up where it left off before the outage. Keep in mind that this example uses a single instance of the service running on a single machine. In a production deployment, the Temporal Service can be deployed as a cluster, spread across several machines for higher availability and increased throughput.

Instructions

Step 1: Start Your Worker

First, stop any running Worker (Ctrl+C) and start a fresh one in Terminal 2.

Worker Status: RUNNING

Workflow Status: WAITING

Terminal 2 - Worker

python run_worker.py

go run worker/main.go

mvn compile exec:java -Dexec.mainClass="moneytransferapp.MoneyTransferWorker"

npm run worker

./rr serve

dotnet run --project MoneyTransferWorker

bundle exec ruby worker.rb

Step 2: Start the Workflow

Now in Terminal 3, start the Workflow. Check the Web UI - you'll see your Worker busy executing the Workflow and its Activities.

Worker Status: EXECUTING

Workflow Status: RUNNING

Terminal 3 - Workflow

python run_workflow.py

go run start/main.go

mvn compile exec:java -Dexec.mainClass="moneytransferapp.TransferApp"

npm run client

php src/transfer.php

dotnet run --project MoneyTransferClient

bundle exec ruby starter.rb

Step 3: Simulate the Crash

The moment of truth! Kill your Worker while it's processing the transaction.

Jump back to the Web UI and refresh. Your Workflow is still showing as "Running"!

That's the magic! The Workflow keeps running because Temporal saved its state, even though we killed the Worker.

Worker Status: CRASHED

Workflow Status: RUNNING

The Crash Test

Go back to Terminal 2 and kill the Worker with Ctrl+C

Step 4: Bring Your Worker Back

Restart your Worker in Terminal 2. Watch Terminal 3 - you'll see the Workflow finish up and show the result!

Worker Status: RECOVERED

Workflow Status: COMPLETED

Transaction: SUCCESS

Terminal 2 - Recovery

python run_worker.py

go run worker/main.go

mvn compile exec:java -Dexec.mainClass="moneytransferapp.MoneyTransferWorker"

npm run worker

./rr serve

dotnet run --project MoneyTransferWorker

bundle exec ruby worker.rb

tip

Try This Challenge

Try killing the Worker at different points during execution. Start the Workflow, kill the Worker during the withdrawal, then restart it. Kill it during the deposit. Each time, notice how Temporal maintains perfect state consistency.

Check the Web UI while the Worker is down and you'll see the Workflow is still "Running" even though no code is executing.

Recover from an unknown error

In this part of the tutorial, you will inject a bug into your production code, watch Temporal retry automatically, then fix the bug while the Workflow is still running. This demo application makes a call to an external service in an Activity. If that call fails due to a bug in your code, the Activity produces an error.

To test this out and see how Temporal responds, you'll simulate a bug in the Deposit Activity function or method.

Live Debugging Flow

Bug

→

Retry

→

Fix

→

Success

Before You Start

Worker is stopped

Code editor open with the Activities file

Ready to uncomment the failure line

Web UI open to watch the retries

Instructions

Step 1: Stop Your Worker

Before we can simulate a failure, we need to stop the current Worker process. This allows us to modify the Activity code safely.

In Terminal 2 (where your Worker is running), stop it with Ctrl+C.

What's happening? You're about to modify Activity code to introduce a deliberate failure. The Worker process needs to restart to pick up code changes, but the Workflow execution will continue running in Temporal's service - this separation between execution state and code is a core Temporal concept.

Step 2: Introduce the Bug

Now we'll intentionally introduce a failure in the deposit Activity to simulate real-world scenarios like network timeouts, database connection issues, or external service failures. This demonstrates how Temporal handles partial failures in multi-step processes.

Find the deposit() method and uncomment the failing line while commenting out the working line:

activities.py

@activity.defn
async def deposit(self, data: PaymentDetails) -> str:
    reference_id = f"{data.reference_id}-deposit"
    try:
        # Comment out this working line:
        # confirmation = await asyncio.to_thread(
        #     self.bank.deposit, data.target_account, data.amount, reference_id
        # )
        
        # Uncomment this failing line:
        confirmation = await asyncio.to_thread(
            self.bank.deposit_that_fails,
            data.target_account,
            data.amount,
            reference_id,
        )
        return confirmation
    except InvalidAccountError:
        raise
    except Exception:
        activity.logger.exception("Deposit failed")
        raise