This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Proposals

Architecture and infrastructure change proposals

Overview

This section contains proposals for significant changes to applications, infrastructure, or architecture. Each proposal documents the rationale, impact, and implementation plan.

Active Proposals

Browse all proposals below or navigate through the sidebar.

Proposal Template

When creating a new proposal, include:

  • Summary - Brief overview of the proposed change
  • Rationale - Why this change is needed
  • Impact - Which applications and systems are affected
  • Implementation Plan - Steps to execute the proposal
  • Risks - Potential issues and mitigation strategies
  • Status - Draft, Under Review, Approved, Implemented, Rejected

1 - Edge Department Info Service Migration

Proposal for edge-department-info-service migration/modernization

Summary

Proposal for the migration and modernization of the edge-department-info-service.

Rationale

[To be filled in - reasons for this change]

Key benefits expected:

  • [List expected benefits]

Affected Applications

  • flux-handler-service - Django REST based web service for Department Info (opening times etc.) lookups

Impact

Services Affected

  • Edge department info service
  • [List additional services and dependencies]

Technical Considerations

  • API compatibility with existing consumers
  • Department data accuracy and synchronization
  • Performance and response time requirements
  • Caching strategy for department information

Data Considerations

  • Department data storage and updates
  • Data consistency across systems
  • Data source integration
  • Master data management

Implementation Plan

Phase 1: Assessment & Design

  • Document current service functionality
  • Identify all consumers and integration points
  • Evaluate target architecture/platform
  • Assess data migration requirements
  • Define migration strategy
  • Create proof of concept

Phase 2: Development

  • Set up new infrastructure
  • Implement/migrate core functionality
  • Configure API endpoints
  • Set up authentication and authorization
  • Implement monitoring and logging
  • Create comprehensive test suite

Phase 3: Migration

  • Deploy to staging environment
  • Integration testing with consumers
  • Performance and load testing
  • Data validation testing
  • User acceptance testing
  • Security review
  • Create rollback procedures

Phase 4: Cutover

  • Deploy to production
  • Gradual traffic migration
  • Monitor service health and performance
  • Update consumer configurations
  • Decommission old infrastructure
  • Archive legacy documentation

Risks & Mitigation

RiskImpactMitigation
Service disruptionHighGradual cutover with parallel running systems
Department data inconsistencyHighThorough data validation and testing
Performance degradationMediumLoad testing and capacity planning
Integration breakageHighExtensive integration testing before cutover
Data source synchronization issuesMediumImplement robust data sync mechanisms

Timeline

[To be determined]

Status

Current Status: Draft

History:

  • 2025-12-22: Proposal created

2 - Edge Sources Service Migration

Proposal for edge-sources-service migration/modernization

Summary

Proposal for the migration and modernization of the edge-sources-service.

Rationale

[To be filled in - reasons for this change]

Key benefits expected:

  • [List expected benefits]

Affected Applications

No applications currently associated with this proposal.

Impact

Services Affected

  • Edge sources service
  • [List additional services and dependencies]

Technical Considerations

  • API compatibility with existing consumers
  • Sources data accuracy and synchronization
  • Performance and response time requirements
  • Caching strategy for sources information

Data Considerations

  • Sources data storage and updates
  • Data consistency across systems
  • Data source integration
  • Master data management

Implementation Plan

Phase 1: Assessment & Design

  • Document current service functionality
  • Identify all consumers and integration points
  • Evaluate target architecture/platform
  • Assess data migration requirements
  • Define migration strategy
  • Create proof of concept

Phase 2: Development

  • Set up new infrastructure
  • Implement/migrate core functionality
  • Configure API endpoints
  • Set up authentication and authorization
  • Implement monitoring and logging
  • Create comprehensive test suite

Phase 3: Migration

  • Deploy to staging environment
  • Integration testing with consumers
  • Performance and load testing
  • Data validation testing
  • User acceptance testing
  • Security review
  • Create rollback procedures

Phase 4: Cutover

  • Deploy to production
  • Gradual traffic migration
  • Monitor service health and performance
  • Update consumer configurations
  • Decommission old infrastructure
  • Archive legacy documentation

Risks & Mitigation

RiskImpactMitigation
Service disruptionHighGradual cutover with parallel running systems
Sources data inconsistencyHighThorough data validation and testing
Performance degradationMediumLoad testing and capacity planning
Integration breakageHighExtensive integration testing before cutover
Data source synchronization issuesMediumImplement robust data sync mechanisms

Timeline

[To be determined]

Status

Current Status: Draft

History:

  • 2025-12-22: Proposal created

3 - Migrate Applications from Elastic Beanstalk to AWS Fargate

Proposal to migrate Elastic Beanstalk-hosted applications to AWS Fargate

Summary

Proposal to migrate applications currently hosted on Elastic Beanstalk to AWS Fargate, a serverless container orchestration platform.

Rationale

[To be filled in - reasons for migrating from Elastic Beanstalk to Fargate]

Key benefits expected:

  • Greater infrastructure control and flexibility
  • Improved cost efficiency at scale
  • Enhanced performance and resource allocation
  • Better integration with AWS ecosystem
  • Reduced vendor lock-in
  • Improved scalability options

Affected Applications

Impact

Services Affected

  • All Elastic Beanstalk-hosted applications
  • [List specific applications to be migrated]

Technical Considerations

  • Container image creation and registry setup (ECR)
  • VPC and networking configuration
  • Load balancer setup (ALB/NLB)
  • Service discovery and task definitions
  • Auto-scaling policies
  • CI/CD pipeline modifications
  • Logging and monitoring infrastructure
  • Secret management (AWS Secrets Manager/Parameter Store)

Infrastructure Considerations

  • Database migration (Elastic Beanstalk Postgres to RDS/Aurora)
  • Redis/cache migration (Elastic Beanstalk Redis to ElastiCache)
  • File storage migration (S3 integration)
  • Add-on replacements and alternatives
  • Domain and DNS configuration
  • SSL/TLS certificate management

Data Considerations

  • Database backup and migration strategy
  • Data consistency during migration
  • Downtime requirements
  • Rollback procedures

Implementation Plan

Phase 1: Assessment & Design

  • Audit all Elastic Beanstalk applications and dependencies
  • Document current Elastic Beanstalk add-ons and services
  • Design AWS infrastructure architecture
  • Map Elastic Beanstalk add-ons to AWS equivalents
  • Containerize applications (if not already containerized)
  • Create cost analysis and comparison
  • Define migration priority and sequence

Phase 2: Infrastructure Setup

  • Set up AWS accounts and IAM roles
  • Configure VPC, subnets, and security groups
  • Set up container registry (ECR)
  • Configure ECS clusters and Fargate capacity
  • Set up load balancers and target groups
  • Configure CloudWatch logging and monitoring
  • Set up AWS Secrets Manager
  • Create infrastructure as code (Terraform/CloudFormation)

Phase 3: Application Preparation

  • Create Dockerfiles for each application
  • Build and test container images
  • Create ECS task definitions
  • Configure environment variables and secrets
  • Set up CI/CD pipelines for container builds
  • Implement health checks and monitoring
  • Create deployment scripts

Phase 4: Data Migration

  • Set up target databases (RDS/Aurora)
  • Configure database replication (if applicable)
  • Plan data migration windows
  • Test database migration procedures
  • Migrate Redis/cache data
  • Migrate file storage to S3

Phase 5: Testing & Validation

  • Deploy to staging/test environment
  • Perform integration testing
  • Load and performance testing
  • Security and compliance review
  • Validate monitoring and alerting
  • Test disaster recovery procedures

Phase 6: Migration & Cutover

  • Execute database migration
  • Deploy applications to Fargate
  • Update DNS records
  • Monitor application health and performance
  • Validate all functionality
  • Address any issues
  • Decommission Elastic Beanstalk applications
  • Cancel Elastic Beanstalk subscriptions

Risks & Mitigation

RiskImpactMitigation
Extended downtime during migrationCriticalPlan careful cutover with blue-green deployment
Data loss during database migrationCriticalMultiple backups, test migrations, validation procedures
Application compatibility issuesHighThorough testing in staging environment
Increased operational complexityMediumComprehensive documentation, training, automation
Cost overrunsMediumDetailed cost analysis, monitoring, right-sizing
Performance degradationHighPerformance testing, proper resource allocation
Loss of Elastic Beanstalk add-on functionalityMediumIdentify and implement AWS equivalents beforehand
CI/CD pipeline disruptionMediumTest new pipelines before cutover

Timeline

[To be determined]

Status

Current Status: Draft

History:

  • 2025-12-22: Proposal created

4 - Migrate Applications from Heroku to AWS Fargate

Proposal to migrate Heroku-hosted applications to AWS Fargate

Summary

Proposal to migrate applications currently hosted on Heroku to AWS Fargate, a serverless container orchestration platform.

Rationale

[To be filled in - reasons for migrating from Heroku to Fargate]

Key benefits expected:

  • Greater infrastructure control and flexibility
  • Improved cost efficiency at scale
  • Enhanced performance and resource allocation
  • Better integration with AWS ecosystem
  • Reduced vendor lock-in
  • Improved scalability options

Affected Applications

Impact

Services Affected

  • All Heroku-hosted applications
  • [List specific applications to be migrated]

Technical Considerations

  • Container image creation and registry setup (ECR)
  • VPC and networking configuration
  • Load balancer setup (ALB/NLB)
  • Service discovery and task definitions
  • Auto-scaling policies
  • CI/CD pipeline modifications
  • Logging and monitoring infrastructure
  • Secret management (AWS Secrets Manager/Parameter Store)

Infrastructure Considerations

  • Database migration (Heroku Postgres to RDS/Aurora)
  • Redis/cache migration (Heroku Redis to ElastiCache)
  • File storage migration (S3 integration)
  • Add-on replacements and alternatives
  • Domain and DNS configuration
  • SSL/TLS certificate management

Data Considerations

  • Database backup and migration strategy
  • Data consistency during migration
  • Downtime requirements
  • Rollback procedures

Implementation Plan

Phase 1: Assessment & Design

  • Audit all Heroku applications and dependencies
  • Document current Heroku add-ons and services
  • Design AWS infrastructure architecture
  • Map Heroku add-ons to AWS equivalents
  • Containerize applications (if not already containerized)
  • Create cost analysis and comparison
  • Define migration priority and sequence

Phase 2: Infrastructure Setup

  • Set up AWS accounts and IAM roles
  • Configure VPC, subnets, and security groups
  • Set up container registry (ECR)
  • Configure ECS clusters and Fargate capacity
  • Set up load balancers and target groups
  • Configure CloudWatch logging and monitoring
  • Set up AWS Secrets Manager
  • Create infrastructure as code (Terraform/CloudFormation)

Phase 3: Application Preparation

  • Create Dockerfiles for each application
  • Build and test container images
  • Create ECS task definitions
  • Configure environment variables and secrets
  • Set up CI/CD pipelines for container builds
  • Implement health checks and monitoring
  • Create deployment scripts

Phase 4: Data Migration

  • Set up target databases (RDS/Aurora)
  • Configure database replication (if applicable)
  • Plan data migration windows
  • Test database migration procedures
  • Migrate Redis/cache data
  • Migrate file storage to S3

Phase 5: Testing & Validation

  • Deploy to staging/test environment
  • Perform integration testing
  • Load and performance testing
  • Security and compliance review
  • Validate monitoring and alerting
  • Test disaster recovery procedures

Phase 6: Migration & Cutover

  • Execute database migration
  • Deploy applications to Fargate
  • Update DNS records
  • Monitor application health and performance
  • Validate all functionality
  • Address any issues
  • Decommission Heroku applications
  • Cancel Heroku subscriptions

Risks & Mitigation

RiskImpactMitigation
Extended downtime during migrationCriticalPlan careful cutover with blue-green deployment
Data loss during database migrationCriticalMultiple backups, test migrations, validation procedures
Application compatibility issuesHighThorough testing in staging environment
Increased operational complexityMediumComprehensive documentation, training, automation
Cost overrunsMediumDetailed cost analysis, monitoring, right-sizing
Performance degradationHighPerformance testing, proper resource allocation
Loss of Heroku add-on functionalityMediumIdentify and implement AWS equivalents beforehand
CI/CD pipeline disruptionMediumTest new pipelines before cutover

Timeline

[To be determined]

Status

Current Status: Draft

History:

  • 2025-12-22: Proposal created

5 - Migrate Applications from Heroku to Cloudflare

Proposal to migrate Heroku-hosted applications to Cloudflare platform

Summary

Proposal to migrate applications currently hosted on Heroku to Cloudflare’s edge computing platform (Workers, Pages, etc.).

Rationale

[To be filled in - reasons for migrating from Heroku to Cloudflare]

Key benefits expected:

  • Edge computing with global low-latency performance
  • Improved cost efficiency
  • Automatic global distribution
  • Built-in DDoS protection and security
  • Reduced infrastructure complexity
  • Better integration with Cloudflare ecosystem
  • Zero cold starts (Workers)

Affected Applications

Impact

Services Affected

  • All Heroku-hosted applications targeted for migration
  • [List specific applications to be migrated]

Technical Considerations

  • Application architecture adaptation for edge computing
  • Cloudflare Workers runtime compatibility
  • Request/response size limitations
  • Execution time limits (CPU time constraints)
  • Code bundling and module system
  • API compatibility and framework support
  • Stateless architecture requirements

Platform Considerations

  • Cloudflare Workers vs Pages vs hybrid approach
  • Database migration (D1, Durable Objects, or external)
  • KV storage for caching and session data
  • R2 for object storage needs
  • Queue system for async processing
  • Analytics and monitoring setup
  • Custom domain configuration

Data Considerations

  • Database compatibility and migration strategy
  • Session storage and state management
  • File/asset storage migration
  • Data residency and compliance requirements

Implementation Plan

Phase 1: Assessment & Design

  • Audit all Heroku applications and dependencies
  • Evaluate application compatibility with Cloudflare platform
  • Identify applications suitable for Workers vs Pages
  • Design edge-native architecture
  • Map Heroku add-ons to Cloudflare/external equivalents
  • Assess code refactoring requirements
  • Create cost analysis and comparison
  • Define migration priority and sequence

Phase 2: Development Environment Setup

  • Set up Cloudflare accounts and API tokens
  • Configure Wrangler CLI and development tools
  • Set up local development environment
  • Create project structure and repositories
  • Configure environment variables and secrets
  • Set up Cloudflare bindings (KV, D1, R2, etc.)

Phase 3: Application Migration

  • Refactor applications for edge compatibility
  • Adapt middleware and frameworks to Workers runtime
  • Implement database access patterns for D1/Durable Objects
  • Configure KV storage for sessions/cache
  • Set up R2 for file storage
  • Implement logging and error handling
  • Bundle and optimize application code
  • Create deployment configurations

Phase 4: Data Migration

  • Set up target databases (D1, external DB, or Durable Objects)
  • Plan data migration strategy
  • Migrate database content
  • Migrate KV data
  • Migrate static assets to R2 or Pages
  • Validate data integrity

Phase 5: Testing & Validation

  • Deploy to Cloudflare preview environments
  • Integration testing
  • Edge performance testing
  • Geographic distribution validation
  • Load testing with global traffic simulation
  • Security review
  • Test custom domains and SSL

Phase 6: Migration & Cutover

  • Deploy to Cloudflare production
  • Configure DNS and custom domains
  • Gradual traffic migration via DNS/load balancing
  • Monitor performance and error rates
  • Validate functionality across regions
  • Address any issues
  • Decommission Heroku applications
  • Cancel Heroku subscriptions

Risks & Mitigation

RiskImpactMitigation
Runtime compatibility issuesHighThorough testing, identify incompatibilities early
CPU time limit exceededHighOptimize code, consider splitting workloads
Database performance on D1MediumEvaluate alternative database solutions if needed
Cold start issues (if any)LowWorkers have minimal cold starts, but monitor
Code refactoring complexityHighIncremental migration, thorough testing
Loss of Heroku add-on functionalityMediumIdentify Cloudflare or external equivalents
Geographic data complianceHighUnderstand Cloudflare’s data residency options
Vendor lock-in to CloudflareMediumAbstract critical services where possible
Development workflow changesMediumTraining and documentation for team

Timeline

[To be determined]

Status

Current Status: Draft

History:

  • 2025-12-22: Proposal created

6 - Migrate Exchange REST API to Serverless

Proposal to migrate exchange REST API to Exchange serverless stack

Summary

Proposal to migrate the exchange REST API to a serverless architecture using the Exchange serverless stack.

Rationale

[To be filled in - reasons for migrating to serverless]

Key benefits expected:

  • Improved scalability
  • Reduced operational overhead
  • Cost optimization through pay-per-use model
  • Enhanced deployment flexibility

Affected Applications

Impact

Services Affected

  • Exchange REST API endpoints
  • [List additional services]

Technical Considerations

  • API contract compatibility
  • Authentication/authorization changes
  • Performance characteristics
  • Monitoring and logging setup

Data Considerations

  • Database connection management
  • State management in serverless environment
  • Caching strategy

Implementation Plan

Phase 1: Assessment & Design

  • Audit current REST API endpoints
  • Design serverless architecture
  • Evaluate AWS Lambda/serverless framework
  • Define resource requirements
  • Create proof of concept

Phase 2: Development

  • Set up serverless infrastructure
  • Implement API endpoints in serverless stack
  • Configure API Gateway/routing
  • Implement authentication/authorization
  • Set up monitoring and logging
  • Create automated tests

Phase 3: Migration

  • Deploy to staging environment
  • Performance testing
  • Load testing
  • Security review
  • Create rollback plan

Phase 4: Cutover

  • Deploy to production
  • Update DNS/routing
  • Monitor performance
  • Decommission old infrastructure

Risks & Mitigation

RiskImpactMitigation
API downtime during migrationHighBlue-green deployment with gradual traffic shift
Performance degradationMediumThorough load testing before cutover
Cold start latencyMediumImplement provisioned concurrency for critical endpoints
Breaking changes to API contractHighMaintain backward compatibility, version API
Increased complexityMediumComprehensive documentation and runbooks

Timeline

[To be determined]

Status

Current Status: Draft

History:

  • 2025-12-22: Proposal created

7 - Migrate QAB Buy Web Services to Serverless

Proposal to migrate legacy Qab buy web services to QabBuy serverless stack

Summary

Proposal to migrate legacy Qab buy web services to a modern serverless architecture using the QabBuy serverless stack.

Rationale

[To be filled in - reasons for migrating to serverless]

Key benefits expected:

  • Modernize legacy infrastructure
  • Improved scalability and reliability
  • Reduced maintenance burden
  • Cost optimization
  • Enhanced deployment practices
  • Better handling of peak purchase volumes

Affected Applications

Impact

Services Affected

  • QAB buy/purchase web services
  • [List additional services and dependencies]

Technical Considerations

  • Legacy code modernization requirements
  • API compatibility with existing consumers
  • Integration with payment systems
  • Transaction handling in serverless environment
  • Performance and response time requirements
  • Error handling and retry logic

Data Considerations

  • Purchase data storage and retrieval
  • Transaction state management
  • Database access patterns
  • Data consistency requirements
  • Audit logging

Implementation Plan

Phase 1: Assessment & Design

  • Document current buy service functionality
  • Identify all consumers and integration points
  • Map payment and transaction flows
  • Design serverless architecture for QabBuy
  • Define migration strategy for legacy code
  • Assess infrastructure and security requirements
  • Create proof of concept

Phase 2: Development

  • Set up QabBuy serverless infrastructure
  • Migrate/refactor purchase logic
  • Implement serverless endpoints
  • Configure API Gateway and routing
  • Implement authentication and authorization
  • Set up transaction handling
  • Implement monitoring, logging, and alerting
  • Create comprehensive test suite including transaction scenarios

Phase 3: Migration

  • Deploy to staging environment
  • Integration testing with consumers
  • Payment system integration testing
  • Performance and load testing
  • Transaction failure scenario testing
  • User acceptance testing
  • Security and compliance review
  • Create rollback procedures

Phase 4: Cutover

  • Gradual traffic migration to new stack
  • Monitor service health and transaction success rates
  • Update consumer configurations
  • Validate payment processing
  • Decommission legacy buy services
  • Archive legacy code and documentation

Risks & Mitigation

RiskImpactMitigation
Transaction failures during migrationCriticalRobust rollback plan, gradual cutover with monitoring
Payment processing issuesCriticalExtensive testing with payment systems, parallel running
Service disruptionHighBlue-green deployment with quick rollback capability
Legacy code complexityHighThorough documentation and refactoring plan
Integration breakageHighExtensive integration testing before cutover
Data inconsistencyHighTransaction integrity testing, audit logging
Performance degradationMediumLoad testing and capacity planning
Compliance issuesHighSecurity review and compliance validation

Timeline

[To be determined]

Status

Current Status: Draft

History:

  • 2025-12-22: Proposal created

8 - Migrate QAB Quoting Web Services to Serverless

Proposal to migrate legacy Qab quoting web services to QabQuote serverless stack

Summary

Proposal to migrate legacy Qab quoting web services to a modern serverless architecture using the QabQuote serverless stack.

Rationale

[To be filled in - reasons for migrating to serverless]

Key benefits expected:

  • Modernize legacy infrastructure
  • Improved scalability and reliability
  • Reduced maintenance burden
  • Cost optimization
  • Enhanced deployment practices

Affected Applications

No applications currently associated with this proposal.

Impact

Services Affected

  • QAB quoting web services
  • [List additional services and dependencies]

Technical Considerations

  • Legacy code modernization requirements
  • API compatibility with existing consumers
  • Integration points with other systems
  • Performance and response time requirements
  • Session management in serverless context

Data Considerations

  • Database access patterns
  • Quote data storage and retrieval
  • State management
  • Caching requirements

Implementation Plan

Phase 1: Assessment & Design

  • Document current web service functionality
  • Identify all consumers and integration points
  • Design serverless architecture for QabQuote
  • Define migration strategy for legacy code
  • Assess infrastructure requirements
  • Create proof of concept

Phase 2: Development

  • Set up QabQuote serverless infrastructure
  • Migrate/refactor quoting logic
  • Implement serverless endpoints
  • Configure API Gateway and routing
  • Set up authentication and authorization
  • Implement monitoring and logging
  • Create comprehensive test suite

Phase 3: Migration

  • Deploy to staging environment
  • Integration testing with consumers
  • Performance and load testing
  • User acceptance testing
  • Security review
  • Create rollback procedures

Phase 4: Cutover

  • Gradual traffic migration to new stack
  • Monitor service health and performance
  • Update consumer configurations
  • Decommission legacy web services
  • Archive legacy code and documentation

Risks & Mitigation

RiskImpactMitigation
Service disruptionHighGradual cutover with parallel running systems
Legacy code complexityHighThorough documentation and refactoring plan
Integration breakageHighExtensive integration testing before cutover
Performance issuesMediumLoad testing and capacity planning
Data inconsistencyHighThorough testing of quote calculations
Loss of domain knowledgeMediumDocument business logic during migration

Timeline

[To be determined]

Status

Current Status: Draft

History:

  • 2025-12-22: Proposal created

9 - Serverless Callbacks Migration

Proposal to migrate callbacks functionality to serverless architecture

Summary

Proposal to migrate callbacks functionality to a serverless architecture.

Rationale

[To be filled in - reasons for migrating to serverless]

Key benefits expected:

  • Improved scalability for callback processing
  • Reduced operational overhead
  • Cost optimization through pay-per-use model
  • Better handling of variable callback volumes
  • Enhanced reliability and retry mechanisms

Affected Applications

Impact

Services Affected

  • Callback handling services
  • [List additional services and dependencies]

Technical Considerations

  • Webhook/callback endpoint compatibility
  • Event-driven architecture implementation
  • Asynchronous processing requirements
  • Retry and failure handling
  • Message queue integration
  • Timeout and execution limits

Data Considerations

  • Callback data storage and logging
  • Event sourcing patterns
  • Audit trail requirements
  • Dead letter queue handling

Implementation Plan

Phase 1: Assessment & Design

  • Document current callback workflows
  • Identify all callback sources and consumers
  • Design event-driven serverless architecture
  • Evaluate messaging/queue services (SQS, EventBridge, etc.)
  • Define retry and error handling strategy
  • Assess monitoring requirements
  • Create proof of concept

Phase 2: Development

  • Set up serverless infrastructure
  • Implement event handlers and processors
  • Configure API Gateway for webhook endpoints
  • Set up message queues and event routing
  • Implement retry logic and dead letter queues
  • Set up monitoring, logging, and alerting
  • Create comprehensive test suite including failure scenarios

Phase 3: Migration

  • Deploy to staging environment
  • Integration testing with callback sources
  • Load and stress testing
  • Failure scenario testing
  • Latency and performance validation
  • Security review
  • Create rollback procedures

Phase 4: Cutover

  • Deploy to production
  • Gradual traffic migration
  • Monitor callback processing rates and success
  • Update webhook endpoint configurations
  • Validate end-to-end callback flows
  • Decommission old infrastructure
  • Archive legacy documentation

Risks & Mitigation

RiskImpactMitigation
Callback delivery failuresHighImplement robust retry mechanisms and dead letter queues
Duplicate callback processingMediumImplement idempotency keys and deduplication
Cold start latencyMediumUse provisioned concurrency for critical callbacks
Message loss during migrationHighParallel processing with validation before cutover
Timeout issues for long-running callbacksMediumDesign async processing patterns, evaluate timeout limits
Integration breakageHighExtensive testing with all callback sources
Monitoring gapsMediumComprehensive logging and alerting setup

Timeline

[To be determined]

Status

Current Status: Draft

History:

  • 2025-12-22: Proposal created

10 - Universal Geo Service Migration

Proposal for universal-geo-service migration/modernization

Summary

Proposal for the migration and modernization of the universal-geo-service.

Rationale

[To be filled in - reasons for this change]

Key benefits expected:

  • [List expected benefits]

Affected Applications

Impact

Services Affected

  • Universal geo service
  • [List additional services and dependencies]

Technical Considerations

  • API compatibility with existing consumers
  • Geographic data accuracy and coverage
  • Performance and response time requirements
  • Caching strategy for geo lookups

Data Considerations

  • Geographic data storage and updates
  • Data accuracy and validation
  • Third-party data sources and licensing

Implementation Plan

Phase 1: Assessment & Design

  • Document current service functionality
  • Identify all consumers and integration points
  • Evaluate target architecture/platform
  • Assess data migration requirements
  • Define migration strategy
  • Create proof of concept

Phase 2: Development

  • Set up new infrastructure
  • Implement/migrate core functionality
  • Configure API endpoints
  • Set up authentication and authorization
  • Implement monitoring and logging
  • Create comprehensive test suite

Phase 3: Migration

  • Deploy to staging environment
  • Integration testing with consumers
  • Performance and load testing
  • Data validation testing
  • User acceptance testing
  • Security review
  • Create rollback procedures

Phase 4: Cutover

  • Deploy to production
  • Gradual traffic migration
  • Monitor service health and performance
  • Update consumer configurations
  • Decommission old infrastructure
  • Archive legacy documentation

Risks & Mitigation

RiskImpactMitigation
Service disruptionHighGradual cutover with parallel running systems
Geographic data accuracy issuesHighThorough data validation and testing
Performance degradationMediumLoad testing and capacity planning
Integration breakageHighExtensive integration testing before cutover
Third-party dependency issuesMediumEvaluate and test all external dependencies

Timeline

[To be determined]

Status

Current Status: Draft

History:

  • 2025-12-22: Proposal created

11 - Remove Fakertrail Application

Proposal to decommission the fakertrail application

Summary

Proposal to remove the fakertrail application from our infrastructure.

Rationale

[To be filled in - reasons for removing this application]

Affected Applications

Impact

Services Affected

  • List services that will be impacted
  • Downstream dependencies

Data Considerations

  • Data migration requirements
  • Backup and archival needs

Implementation Plan

Phase 1: Preparation

  • Audit current usage
  • Identify dependencies
  • Create data backup

Phase 2: Migration

  • Redirect traffic if needed
  • Migrate critical data

Phase 3: Decommissioning

  • Remove from Heroku
  • Update DNS records
  • Remove monitoring/alerts
  • Archive documentation

Risks & Mitigation

RiskImpactMitigation
Loss of functionalityMediumDocument all features before removal
Data lossHighFull backup before decommission
Broken integrationsMediumAudit all integrations beforehand

Timeline

[To be determined]

Status

Current Status: Draft

History:

  • 2025-12-20: Proposal created

12 - Provider Agnostic Geo Lookup Library

Proposal for a Python library that abstracts geo lookup providers for resilience and reusability
StatusEffortClient Cost Saving
Draft20TBD

Summary

Create a Python library that provides a provider-agnostic interface for geographic address lookup services. This library will support automatic failover between providers, ensuring service continuity during provider outages, and can be deployed across multiple projects and platforms (DRF, Lambda).

Rationale

The Problem

On 2026-02-04, getaddress.io experienced several hours of downtime. This caused a complete failure of the quote service flow because:

  1. Single Provider Dependency: The geo service currently relies exclusively on getaddress.io with no fallback mechanism
  2. Cross-Client Impact: This single point of failure affects multiple clients (Adrian Flux, Sterling, Bikesure) and multiple product lines (EPAs, JAF forms), amplifying the business risk
  3. User Experience Impact: While users could technically enter addresses manually, this option is not immediately obvious in the UI
  4. Business Impact: No quotes could be processed during the outage across any client or product line, directly affecting revenue

Why This Change is Needed

  • Resilience: We have an avoidable single point of failure that affects multiple clients and product lines - a single provider outage takes down quote functionality across the entire business
  • Code Exists: The codebase technically supports multiple providers, but only one is implemented and active
  • Reusability: Other hut42 projects (Viitata, Forecaster, Katala) require similar geo lookup functionality
  • Cost Optimization: Some providers like Google offer free tiers that could serve as fallback options

Proposed Solution

Python Library

A standalone Python library that:

  1. Abstracts Provider Implementation: Unified interface regardless of underlying provider
  2. Supports Multiple Providers:
    • getaddress.io (primary)
    • Google Places API (fallback - potentially free tier)
    • Additional providers as needed
  3. Manual or Automatic Failover: Configurable provider switching (see phased approach below)
  4. Framework Agnostic: Works with DRF services and Lambda functions
  5. Built from Existing Code: Refactored from the current geo service codebase

Phased Approach

Phase 1 - Manual Switchover: Implement multi-provider support with manual provider switching via configuration. Service health monitoring handled externally by Uptime Robot, with manual intervention to switch providers during outages.

Phase 2 - Automatic Failover: Add built-in health checking and automatic failover management to the library, removing the need for manual intervention.

Architecture

Phase 1 - Manual Switchover

┌─────────────────────────────────────────────────────────┐
│                    Geo Lookup Library                   │
├─────────────────────────────────────────────────────────┤
│  GeoLookupClient                                        │
│  ├── Provider Registry                                  │
│  ├── Provider Selector (config-driven)                  │
│  └── Response Normalizer                                │
├─────────────────────────────────────────────────────────┤
│  Providers                                              │
│  ├── GetAddressProvider (primary)                       │
│  ├── GooglePlacesProvider (fallback)                    │
│  └── BaseProvider (abstract interface)                  │
└─────────────────────────────────────────────────────────┘
           │
           ▼
    ┌──────────┐        ┌─────────────────┐
    │ Geo Svc  │◄───────│  Uptime Robot   │
    │          │        |  (monitoring)   │
    └──────────┘        └─────────────────┘
                                │
                                ▼
                        ┌─────────────────┐
                        │ Manual config   │
                        │ change to swap  │
                        │ provider        │
                        └─────────────────┘

Phase 2 - Automatic Failover

┌─────────────────────────────────────────────────────────┐
│                    Geo Lookup Library                   │
├─────────────────────────────────────────────────────────┤
│  GeoLookupClient                                        │
│  ├── Provider Registry                                  │
│  ├── Health Checker              ◄── NEW                │
│  ├── Failover Manager            ◄── NEW                │
│  └── Response Normalizer                                │
├─────────────────────────────────────────────────────────┤
│  Providers                                              │
│  ├── GetAddressProvider (primary)                       │
│  ├── GooglePlacesProvider (fallback)                    │
│  └── BaseProvider (abstract interface)                  │
└─────────────────────────────────────────────────────────┘
           │                    │                    
           ▼                    ▼                    
    ┌──────────┐        ┌───────────┐ 
    │ Geo Svc  │        │ Geo Svc   │
    | [Flux]   |        | [Hutsoft] |
    └──────────┘        └───────────┘

Key Features

FeaturePhaseDescription
Provider Abstraction1Unified API regardless of underlying service
Response Normalization1Consistent response format across all providers
Configurable Provider1Select active provider via configuration
External Health Monitoring1Uptime Robot monitors provider health
Manual Switchover1Change provider via config during outages
Metrics & Logging1Track provider performance and failures
Built-in Health Checker2Periodic health checks on all configured providers
Automatic Failover Manager2Circuit breaker pattern for automatic provider switching

Affected Applications

Primary Integration

  • Universal geo service (immediate integration)

Main Consumers (highest traffic)

  • Adrian Flux, Sterling and Bikesure EPAs
  • Adrian Flux and Sterling JAF forms

Other Potential Consumers

  • Viitata
  • Forecaster
  • Katala
  • Any future projects requiring geo lookup

Impact

API Compatibility

This proposal maintains 100% API compatibility with the existing geo service. No consumer application changes will be required. The library will be integrated behind the existing service interface, making this change completely transparent to Adrian Flux, Sterling, Bikesure EPAs, JAF forms, and all other consumers.

Services Affected

  • Universal geo service (internal code changes for library integration)
  • Quote service (indirect - improved reliability)

Technical Considerations

  • API compatibility will be maintained for existing geo service consumers
  • Response format normalization between different providers
  • Rate limiting and quota management per provider
  • Caching strategy to reduce API calls and costs
  • Configuration management for API keys across environments

Data Considerations

  • Different providers may return slightly different address formats
  • UK-specific address formatting (getaddress.io strength)
  • Postcode validation consistency across providers

Provider Analysis

getaddress.io (Current/Primary)

  • Pros: UK-focused, excellent postcode lookup, current integration exists
  • Cons: Single point of failure, outage yesterday
  • Cost: Paid service

Google Places API (Proposed Fallback)

  • Pros: Highly reliable, generous free tier, global coverage
  • Cons: Less UK-specific, may require address normalization
  • Cost: Free tier available (suitable for fallback volumes)

Other Options to Evaluate

  • Postcodes.io (UK specific, open data)
  • Ideal Postcodes
  • OS Places API

Implementation Plan

Phase 1: Multi-Provider Support with Manual Switchover

1.1 Library Development

  • Extract and refactor existing geo code into standalone library
  • Define abstract provider interface (BaseProvider)
  • Implement GetAddressProvider (port existing code)
  • Implement GooglePlacesProvider
  • Create response normalization layer
  • Implement configurable provider selection
  • Write comprehensive test suite
  • Package as installable Python library

1.2 Geo Service Integration

  • Install library in universal-geo-service
  • Configure primary and fallback providers
  • Update service to use library interface
  • Integration testing
  • Deploy to staging

1.3 Monitoring & Rollout

  • Configure Uptime Robot to monitor provider endpoints
  • Set up alerting for provider outages
  • Document manual switchover procedure
  • Deploy to production
  • Test manual switchover process

Phase 2: Automatic Health Checking & Failover

2.1 Library Enhancement

  • Implement Health Checker component
  • Implement Failover Manager with circuit breaker pattern
  • Add automatic provider switching logic
  • Configure failover thresholds and retry policies
  • Update test suite for failover scenarios

2.2 Integration & Rollout

  • Update geo service to use automatic failover
  • Deploy to staging
  • Performance and failover testing
  • Deploy to production
  • Monitor automatic failover events

Wider Adoption (Post Phase 1 or 2)

  • Provide library documentation and examples
  • Support Viitata integration
  • Support Forecaster integration
  • Support Katala integration

Risks & Mitigation

RiskImpactPhaseMitigation
Provider response format differencesMedium1 & 2Robust response normalization and testing
Increased complexityLow1 & 2Clean abstraction layer, good documentation
Google free tier limits exceededLow1 & 2Monitor usage, upgrade plan if needed
Manual switchover delayMedium1Uptime Robot alerting, documented runbook
Failover latencyLow2Health checks, circuit breaker pattern
Address accuracy differencesMedium1 & 2Thorough testing with UK addresses, consider primary-only for critical paths

Success Metrics

Phase 1

  • Ability to switch providers within minutes of detected outage
  • Zero extended quote service outages due to geo provider failures
  • Library adoption in at least 2 additional projects within 6 months
  • Reduced operational impact from geo lookup alerts

Phase 2

  • < 500ms automatic failover time when primary provider fails
  • Zero manual intervention required for provider outages
  • Reduced operational alerts related to geo lookups

Estimations

Phase 1: 3 days (20 hours)

TaskEffort
Project setup, current implementation investigation, alternative provider investigation4 hours
Development build8 hours
Testing and integration8 hours

Phase 2: 2 days (16 hours)

TaskEffort
Development build8 hours
Testing and integration8 hours

Status

Current Status: Draft

History:

  • 2025-02-05: Proposal created following getaddress.io outage