Storage and Policies
This page covers the pieces that make the Android SDK resilient over time rather than only correct in the happy path: checkpoint persistence, retry backoff, and failure classification.
Checkpoint store
TaskBridgeCheckpointStore is the durable boundary for stream resume state.
It has three responsibilities:
load(key)save(key, lastEventId)clear(key)
The SDK writes checkpoints after emitted events and clears them on terminal events.
Available store implementations
InMemoryTaskBridgeCheckpointStore
Use it when:
- tasks are short-lived;
- process death recovery does not matter;
- you are testing or prototyping.
Do not use it when you need resume after app restart.
DataStoreTaskBridgeCheckpointStore
Use it when:
- you want stream recovery across process death;
- your app already accepts AndroidX DataStore as part of the dependency graph;
- you need stable resume behavior between foreground and background work.
Real setup shape:
val checkpointStore =
DataStoreTaskBridgeCheckpointStore(
file = File(filesDir, "taskbridge-checkpoints.preferences_pb"),
scope = CoroutineScope(SupervisorJob() + Dispatchers.IO),
)
This store uses Jetpack DataStore preferences under the hood and URL-encodes keys before persistence.
Checkpoint key design
Checkpoint keys are built from:
- optional namespace;
- normalized base URL;
- task ID.
This prevents collisions across environments like staging and production.
Practical implication:
- the same
taskIdfrom two different backends does not overwrite the same checkpoint; - adding a namespace lets one app isolate multiple TaskBridge consumers cleanly.
Namespace guidance
Use checkpointNamespace when:
- one app talks to multiple logical TaskBridge consumers;
- you want to isolate task classes or product areas;
- migrations need old and new checkpoint spaces to coexist temporarily.
Do not add namespaces casually if one client already owns the whole checkpoint file. Extra namespaces are useful when they express a real ownership boundary.
Retry policy
TaskBridgeRetryPolicy answers one question: how long should the SDK wait before the next retry attempt?
The default implementation is ExponentialBackoffTaskBridgeRetryPolicy.
Current behavior:
- base delay starts at
500ms; - delay grows exponentially;
- exponential growth is capped;
- jitter is added to avoid synchronized retries.
This is the right default for most mobile apps because it avoids both hot-loop retries and overly slow recovery.
Transport retry gate
While TaskBridgeRetryPolicy determines how long the backoff delay should be, the TransportRetryGate determines when a retry is allowed to proceed based on external conditions (e.g., active internet availability).
By separating these two concepts, we keep the retry backoff calculation pure, functional, and free of platform-specific dependencies or blocking thread operations.
Interface signature
interface TransportRetryGate {
suspend fun awaitRetryAllowed()
}
When a recoverable stream failure occurs, the SDK first suspends inside awaitRetryAllowed() of the configured gate, and only then executes the backoff delay calculated by TaskBridgeRetryPolicy.
Supported implementations
NoOpTransportRetryGate(Default): Immediately permits retries. Used as the default baseline to maintain backward compatibility.AndroidConnectivityRetryGate: Suspends observation loop retries while the device is offline. It registers a dynamic network callback with Android'sConnectivityManagerinside a coldcallbackFlowto wait for active, validated internet connectivity (NET_CAPABILITY_INTERNETand optionallyNET_CAPABILITY_VALIDATED) before resuming.
Why this matters
- Non-blocking Coroutine Model: It avoids thread-blocking calls like
runBlockingor sleep loops inside retry calculations, keeping standard dispatcher threads (such asDispatchers.IO) free. - Cancellation Support:
awaitRetryAllowed()is fully suspending and respects coroutine cancellation. If the stream observation scope is cancelled while offline, the waiting sequence is cleanly terminated immediately.
Failure classifier
TaskBridgeFailureClassifier decides whether a failure is retryable.
Default behavior:
- retry
IOException; - retry HTTP
429; - retry HTTP
5xx; - do not retry serialization failures;
- do not treat
401as generally retryable in the classifier.
The 401 case is special:
- auth refresh is given a narrow chance in the transport layer;
- repeated auth failure eventually becomes fatal and terminates observation.
Policy interaction
flowchart TD
Error[Throwable] --> Classifier[TaskBridgeFailureClassifier]
Classifier -->|retryable| Gate[TransportRetryGate]
Gate -->|await allowed| Retry[TaskBridgeRetryPolicy]
Retry --> Delay[Backoff delay]
Delay --> Resume[Retry command or stream]
Classifier -->|fatal| Fail[Surface error to caller]
When to customize policies
Customize TaskBridgeRetryPolicy when:
- your app needs tighter or slower retry pacing;
- you are integrating with backend rate limits that require specific backoff behavior.
Customize TaskBridgeFailureClassifier when:
- your backend returns additional retryable statuses;
- a proxy or gateway layer wraps transient failures in app-specific exception types.
Do not customize either one just because you want extra logging. Use diagnostics hooks for that.
Recommended production baseline
For most Android consumers:
- use
DataStoreTaskBridgeCheckpointStore; - keep
ExponentialBackoffTaskBridgeRetryPolicy; - keep
DefaultTaskBridgeFailureClassifier; - add a namespace only if you have multiple logical consumers.
That baseline is already aligned with the way TaskBridge stream recovery works internally.