# BaoLife Deployment Guide

Deploy the BaoLife WebSocket backend to Google Cloud Platform (GCP).

## Backend Options

| Backend | Directory | Status | Deployment |
|---------|-----------|--------|------------|
| **TypeScript** | `server/` | Active | Recommended |
| **Python** | `ws/` | Legacy | Maintenance only |

## Architecture Overview

The production deployment uses:

- **Cloud Run**: Serverless container platform for the WebSocket server
  - Native WebSocket support (no configuration needed)
  - Auto-scaling from min to max instances based on traffic
  - Built-in HTTPS/WSS termination
  - 1 hour request timeout (perfect for long-lived WebSocket connections)

- **Cloud SQL**: Managed MySQL database
  - Automatic backups and point-in-time recovery
  - High availability options
  - Secure connections via Unix sockets

- **Secret Manager**: Secure storage for credentials
  - Database passwords
  - JWT secrets
  - API keys

- **Cloud Build**: Container image building
  - Automatic Docker builds from source
  - Image storage in Container Registry

## Prerequisites

### 1. Install gcloud CLI

**macOS:**
```bash
brew install google-cloud-sdk
```

**Linux:**
```bash
curl https://sdk.cloud.google.com | bash
exec -l $SHELL
```

**Windows:**
Download from: https://cloud.google.com/sdk/docs/install

### 2. Authenticate with GCP

```bash
gcloud auth login
gcloud auth application-default login
```

### 3. Create a GCP Project

```bash
# Create new project
gcloud projects create PROJECT_ID --name="BaoLife Production"

# Or use existing project
gcloud config set project PROJECT_ID

# Enable billing (required)
# Visit: https://console.cloud.google.com/billing
```

### 4. Set Required Permissions

Your account needs these roles:
- Cloud Run Admin
- Cloud SQL Admin
- Secret Manager Admin
- Service Account User
- Cloud Build Editor

```bash
# Grant yourself these roles (if you're the project owner)
PROJECT_ID="your-project-id"
USER_EMAIL="your-email@example.com"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="user:$USER_EMAIL" \
  --role="roles/run.admin"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="user:$USER_EMAIL" \
  --role="roles/cloudsql.admin"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="user:$USER_EMAIL" \
  --role="roles/secretmanager.admin"
```

## TypeScript Backend Deployment

### Local Development

```bash
cd server
npm install
npm run dev      # Development with hot reload on port 8001
```

### Building for Production

```bash
cd server
npm run build    # Compiles to dist/
```

### Docker Build

```bash
cd server
docker build -t baolife-backend .
docker run -p 8001:8001 baolife-backend
```

## GCP Deployment

`deploy-gcp.sh` builds and deploys the active TypeScript backend from `server/`.
The legacy Python backend in `ws/` is not used by this Cloud Run deploy path.
On first login after cutover, the TypeScript backend migrates missing player
rows from legacy `lifesim_savegames` records into the native `players` table.
Legacy JSON saves migrate directly; pickle-only saves use the bundled
`server/scripts/export-legacy-save.py` helper in the runtime image.

### Multi-Environment Support

The deployment script supports three environments with automatic resource sizing:

| Configuration | Development | Staging | Production |
|---------------|-------------|---------|------------|
| Service Name | `baolife-backend-dev` | `baolife-backend-staging` | `baolife-backend-prod` |
| DB Instance | `baolife-db-dev` | `baolife-db-staging` | `baolife-db-prod` |
| DB Tier | db-f1-micro | db-g1-small | db-g1-small |
| Memory | 512Mi | 1Gi | 1Gi |
| Min Instances | 0 (scales to zero) | 0 | 1 (always on) |
| Max Instances | 3 | 5 | 10 |
| Concurrency | 40 | 60 | 80 |
| Storage | 10GB | 10GB | 20GB |
| **Est. Cost/Month** | **~$7** | **~$20** | **~$40+** |

**Key Benefits:**
- **Isolated Resources**: Each environment has separate databases, secrets, and services
- **Cost-Effective Development**: Dev scales to zero when idle
- **Safe Testing**: Test changes in dev before promoting to production
- **Independent Scaling**: Scale each environment based on actual needs

### Initial Deployment

**Development Environment:**
```bash
# Navigate to project root
cd /path/to/lichun

# Deploy development environment
./deploy-gcp.sh \
  --project YOUR_PROJECT_ID \
  --environment dev \
  --db-password "DEV_PASSWORD"
```

For first deploys, export the API key for the configured AI provider so the
script can create the matching Secret Manager entry:

```bash
export OPENAI_API_KEY="sk-..."
./deploy-gcp.sh \
  --project YOUR_PROJECT_ID \
  --environment dev \
  --db-password "DEV_PASSWORD" \
  --ai-provider openai
```

**Production Environment:**
```bash
# Deploy production environment
./deploy-gcp.sh \
  --project YOUR_PROJECT_ID \
  --environment production \
  --db-password "PROD_PASSWORD"
```

**Parameters:**
- `--project`: Your GCP project ID (required)
- `--environment`: Environment type: dev, staging, or production (default: dev)
- `--region`: GCP region (default: us-central1)
- `--db-password`: MySQL database password (required for initial setup)
- `--db-tier`: Cloud SQL tier (auto-set based on environment if not specified)
- `--min-instances`: Minimum running instances (auto-set based on environment if not specified)
- `--max-instances`: Maximum instances for auto-scaling (auto-set based on environment if not specified)

**Full deployment takes 10-15 minutes** and includes:
1. Enabling required APIs
2. Creating Cloud SQL instance
3. Setting up database and users
4. Storing secrets securely (with environment prefix)
5. Building Docker image
6. Deploying to Cloud Run

### Update Deployment (Code Changes)

After initial setup, to deploy code updates:

**Development:**
```bash
./deploy-gcp.sh \
  --project YOUR_PROJECT_ID \
  --environment dev \
  --skip-db \
  --skip-secrets
```

**Production:**
```bash
./deploy-gcp.sh \
  --project YOUR_PROJECT_ID \
  --environment production \
  --skip-db \
  --skip-secrets
```

This **takes 3-5 minutes** and only rebuilds/redeploys the application.

## Configuration

### Environment Variables

The following environment variables are automatically configured:

| Variable | Source | Description |
|----------|--------|-------------|
| `ENVIRONMENT` | Set in Cloud Run | `production` |
| `DB_NAME` | Set in Cloud Run | `lifesim` |
| `DB_USER` | Set in Cloud Run | `baolife` |
| `DB_HOST` | Set in Cloud Run | Cloud SQL Unix socket path |
| `AI_PROVIDER` | Set in Cloud Run | `openai`, `together`, `mistral`, or `openrouter` |
| `OPENAI_MODEL` | Set in Cloud Run | OpenAI-compatible conversation model |
| `TOGETHER_MODEL` | Set in Cloud Run | Together model when `AI_PROVIDER=together` |
| `MISTRAL_MODEL` | Set in Cloud Run | Mistral model when `AI_PROVIDER=mistral` |
| `OPENROUTER_MODEL` | Set in Cloud Run | OpenRouter model when `AI_PROVIDER=openrouter` |
| `IMAGE_GENERATION_PROVIDER` | Set in Cloud Run | `imagen4`, `flux`, or `dalle3` |
| `DB_PASSWORD` | Secret Manager | Database password |
| `JWT_SECRET` | Secret Manager | JWT signing key |
| `OPENAI_API_KEY` | Secret Manager | Required when `AI_PROVIDER=openai`; also used by dating bio generation |
| `TOGETHER_API_KEY` | Secret Manager | Required when `AI_PROVIDER=together` |
| `MISTRAL_API_KEY` | Secret Manager | Required when `AI_PROVIDER=mistral` |
| `OPENROUTER_API_KEY` | Secret Manager | Required when `AI_PROVIDER=openrouter` |
| `FAL_AI_KEY` | Secret Manager | Optional image generation key |
| `REPLICATE_API_TOKEN` | Secret Manager | Optional image generation key |

Secret names are environment-prefixed, such as `dev-openai-api-key` or
`production-openrouter-api-key`. `deploy-gcp.sh --skip-secrets` expects those
secrets to already exist and fails fast if the active provider key is missing.

### Scaling Configuration

**Adjust instance count:**
```bash
# For development
gcloud run services update baolife-backend-dev \
  --min-instances=0 \
  --max-instances=5 \
  --region=us-central1

# For production
gcloud run services update baolife-backend-prod \
  --min-instances=2 \
  --max-instances=20 \
  --region=us-central1
```

**Adjust resources:**
```bash
# For production
gcloud run services update baolife-backend-prod \
  --memory=2Gi \
  --cpu=2 \
  --region=us-central1
```

**Concurrency (connections per instance):**
```bash
# For production
gcloud run services update baolife-backend-prod \
  --concurrency=100 \
  --region=us-central1
```

### Database Scaling

**Upgrade instance tier:**
```bash
# Check available tiers
gcloud sql tiers list

# Upgrade to higher tier (for production)
gcloud sql instances patch baolife-db-prod \
  --tier=db-n1-standard-1

# For development (if needed)
gcloud sql instances patch baolife-db-dev \
  --tier=db-g1-small
```

## Custom Domain Setup

### 1. Verify Domain Ownership

```bash
gcloud domains verify lichun.app
```

### 2. Map Domain to Service

**For Production:**
```bash
gcloud run domain-mappings create \
  --service=baolife-backend-prod \
  --domain=wss.lichun.app \
  --region=us-central1
```

**For Development/Staging:**
```bash
gcloud run domain-mappings create \
  --service=baolife-backend-dev \
  --domain=wss-dev.lichun.app \
  --region=us-central1
```

### 3. Update DNS Records

Add the DNS records shown in the output to your domain registrar:
- Type: CNAME
- Name: wss (or wss-dev for development)
- Value: ghs.googlehosted.com

**SSL/TLS certificates are automatically provisioned** by Google.

## Monitoring & Operations

### View Logs

**Stream real-time logs:**
```bash
# Development
gcloud run logs tail baolife-backend-dev --region=us-central1

# Production
gcloud run logs tail baolife-backend-prod --region=us-central1
```

**View recent logs:**
```bash
# Development
gcloud run logs read baolife-backend-dev \
  --region=us-central1 \
  --limit=100

# Production
gcloud run logs read baolife-backend-prod \
  --region=us-central1 \
  --limit=100
```

### Monitor Metrics

**Via Console:**
https://console.cloud.google.com/run/detail/REGION/SERVICE_NAME/metrics

**Key metrics to watch:**
- Request count
- Request latency
- Instance count
- Container CPU utilization
- Container memory utilization
- WebSocket connection count

### Database Operations

**Connect to database:**
```bash
# Development
gcloud sql connect baolife-db-dev --user=baolife --quiet

# Production
gcloud sql connect baolife-db-prod --user=baolife --quiet
```

**Create backup:**
```bash
# Production (recommended before major updates)
gcloud sql backups create \
  --instance=baolife-db-prod \
  --description="Manual backup $(date +%Y%m%d)"

# Development
gcloud sql backups create \
  --instance=baolife-db-dev \
  --description="Manual backup $(date +%Y%m%d)"
```

**List backups:**
```bash
# Production
gcloud sql backups list --instance=baolife-db-prod

# Development
gcloud sql backups list --instance=baolife-db-dev
```

### Health Checks

The Dockerfile includes a health check that Cloud Run uses automatically:
- Checks every 30 seconds
- Attempts TCP connection to port 8001
- Service is marked unhealthy after 3 failed checks

**View service health:**
```bash
# Development
gcloud run services describe baolife-backend-dev --region=us-central1

# Production
gcloud run services describe baolife-backend-prod --region=us-central1
```

## Cost Optimization

### Cloud Run Pricing

**Free tier includes:**
- 2 million requests/month
- 360,000 GB-seconds/month
- 180,000 vCPU-seconds/month

**Optimization tips:**
1. **Reduce min-instances to 0** for dev/staging (adds cold start delay)
2. **Use smaller memory/CPU** if sufficient (1Gi RAM, 1 vCPU is default)
3. **Set appropriate max-instances** to cap costs

### Cloud SQL Pricing

**Cost factors:**
- Instance tier (db-f1-micro is cheapest)
- Storage size (10GB minimum)
- Backups (7 automatic backups retained)

**Optimization tips:**
1. **Use db-f1-micro** for development
2. **Upgrade to shared-core** (db-g1-small) for production
3. **Enable storage auto-increase** to prevent outages

### Estimated Monthly Costs

**Development (low traffic):**
- Cloud Run: ~$0 (within free tier)
- Cloud SQL (db-f1-micro): ~$7/month
- **Total: ~$7/month**

**Production (moderate traffic):**
- Cloud Run (2 instances avg): ~$15/month
- Cloud SQL (db-g1-small): ~$25/month
- **Total: ~$40/month**

**Production (high traffic):**
- Cloud Run (5 instances avg): ~$40/month
- Cloud SQL (db-n1-standard-1): ~$80/month
- **Total: ~$120/month**

## Troubleshooting

### Deployment Fails

**Check enabled APIs:**
```bash
gcloud services list --enabled
```

**Check IAM permissions:**
```bash
gcloud projects get-iam-policy YOUR_PROJECT_ID
```

### Database Connection Issues

**Test Cloud SQL connectivity:**
```bash
# Development
gcloud sql connect baolife-db-dev --user=root

# Production
gcloud sql connect baolife-db-prod --user=root
```

**Check Cloud SQL instance status:**
```bash
# Development
gcloud sql instances describe baolife-db-dev

# Production
gcloud sql instances describe baolife-db-prod
```

### Service Not Responding

**Check service status:**
```bash
# Development
gcloud run services describe baolife-backend-dev --region=us-central1

# Production
gcloud run services describe baolife-backend-prod --region=us-central1
```

**Check for recent errors:**
```bash
# Development
gcloud run logs read baolife-backend-dev \
  --region=us-central1 \
  --format="table(timestamp,severity,textPayload)" \
  --filter="severity>=ERROR"

# Production
gcloud run logs read baolife-backend-prod \
  --region=us-central1 \
  --format="table(timestamp,severity,textPayload)" \
  --filter="severity>=ERROR"
```

### WebSocket Connection Drops

**Increase timeout:**
```bash
# Development
gcloud run services update baolife-backend-dev \
  --timeout=3600 \
  --region=us-central1

# Production
gcloud run services update baolife-backend-prod \
  --timeout=3600 \
  --region=us-central1
```

**Check instance count:**
```bash
# Development
gcloud run services describe baolife-backend-dev \
  --region=us-central1 \
  --format="value(status.observedGeneration)"

# Production
gcloud run services describe baolife-backend-prod \
  --region=us-central1 \
  --format="value(status.observedGeneration)"
```

## Security Best Practices

### 1. Secret Management

✅ **DO:**
- Use Secret Manager for all sensitive data
- Rotate secrets regularly
- Use different passwords for each environment

❌ **DON'T:**
- Commit secrets to git
- Hard-code credentials
- Share production secrets

### 2. Database Security

✅ **DO:**
- Use strong passwords (16+ characters)
- Create separate users per application
- Enable Cloud SQL SSL enforcement
- Restrict Cloud SQL authorized networks

### 3. Service Authentication

For production, consider requiring authentication:
```bash
gcloud run services update baolife-backend \
  --no-allow-unauthenticated \
  --region=us-central1
```

Then configure client authentication via service accounts or Identity Platform.

## Rollback Procedure

**List revisions:**
```bash
# Development
gcloud run revisions list \
  --service=baolife-backend-dev \
  --region=us-central1

# Production
gcloud run revisions list \
  --service=baolife-backend-prod \
  --region=us-central1
```

**Rollback to previous revision:**
```bash
# Development
gcloud run services update-traffic baolife-backend-dev \
  --to-revisions=REVISION_NAME=100 \
  --region=us-central1

# Production
gcloud run services update-traffic baolife-backend-prod \
  --to-revisions=REVISION_NAME=100 \
  --region=us-central1
```

## CI/CD Integration

### GitHub Actions Example

Create separate workflows for each environment:

**`.github/workflows/deploy-dev.yml`** (Auto-deploy on push to develop):
```yaml
name: Deploy to Development

on:
  push:
    branches: [develop]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - uses: google-github-actions/auth@v1
        with:
          credentials_json: ${{ secrets.GCP_CREDENTIALS }}

      - uses: google-github-actions/setup-gcloud@v1

      - name: Deploy to Development
        run: |
          ./deploy-gcp.sh \
            --project=${{ secrets.GCP_PROJECT_ID }} \
            --environment=dev \
            --skip-db \
            --skip-secrets
```

**`.github/workflows/deploy-production.yml`** (Manual trigger for production):
```yaml
name: Deploy to Production

on:
  workflow_dispatch:  # Manual trigger only
  push:
    branches: [main]
    tags:
      - 'v*'

jobs:
  deploy:
    runs-on: ubuntu-latest
    environment: production  # Requires approval
    steps:
      - uses: actions/checkout@v3

      - uses: google-github-actions/auth@v1
        with:
          credentials_json: ${{ secrets.GCP_CREDENTIALS }}

      - uses: google-github-actions/setup-gcloud@v1

      - name: Deploy to Production
        run: |
          ./deploy-gcp.sh \
            --project=${{ secrets.GCP_PROJECT_ID }} \
            --environment=production \
            --skip-db \
            --skip-secrets
```

## Support

For issues or questions:
1. Check logs for your environment:
   - Development: `gcloud run logs tail baolife-backend-dev --region=us-central1`
   - Production: `gcloud run logs tail baolife-backend-prod --region=us-central1`
2. Review GCP Status: https://status.cloud.google.com/
3. Consult Cloud Run docs: https://cloud.google.com/run/docs

## Additional Resources

- [Cloud Run Documentation](https://cloud.google.com/run/docs)
- [Cloud SQL for MySQL](https://cloud.google.com/sql/docs/mysql)
- [Secret Manager](https://cloud.google.com/secret-manager/docs)
- [gcloud CLI Reference](https://cloud.google.com/sdk/gcloud/reference)
- [WebSocket on Cloud Run](https://cloud.google.com/run/docs/triggering/websockets)
