# Plan 02: Security & Production Readiness

**Status**: 📋 Planned
**Priority**: Critical
**Estimated Effort**: 12-18 hours (2-4 days)
**Dependencies**: Plan 01 (Testing Infrastructure) - recommended but not required

---

## Overview & Motivation

### The Problem

BaoLife currently has **critical security vulnerabilities** and **production-readiness gaps**:

**Security Issues**:
- 🔴 **Hardcoded secrets** - Database password, OpenAI API key in source code
- 🔴 **No input validation** - WebSocket messages accepted without sanitization
- 🔴 **SQL injection risk** - Direct string interpolation in queries
- 🔴 **No authentication** - Anyone can access any user's game
- 🔴 **No rate limiting** - OpenAI API can be abused (expensive!)

**Production Gaps**:
- 🟠 **Global state** - Cannot scale beyond single process
- 🟠 **No logging** - Impossible to debug production issues
- 🟠 **No monitoring** - Can't tell if system is healthy
- 🟠 **No error tracking** - Crashes go unnoticed
- 🟠 **Single database connection** - Bottleneck for multiple users

**Risk Assessment**:
- **Current State**: ⚠️ **NOT production-ready** - Security risks, scalability issues
- **After This Plan**: ✅ **Production-ready** - Secure, scalable, observable

### The Solution

Transform BaoLife into a secure, production-grade system:
- **Security hardening** - Remove secrets, validate inputs, add authentication
- **Scalability** - Multi-user support, connection pooling, session management
- **Observability** - Structured logging, error tracking, metrics
- **Production ops** - Health checks, monitoring, deployment guides

### Success Outcomes

✅ No hardcoded secrets in codebase
✅ All WebSocket inputs validated and sanitized
✅ User authentication and authorization working
✅ Rate limiting on expensive operations
✅ Structured logging for debugging
✅ Health monitoring and alerting
✅ Support 100+ concurrent users
✅ Ready for production deployment

---

## Current State Analysis

### Critical Security Vulnerabilities

#### 1. Hardcoded Credentials (CRITICAL)

**`ws/functions.py:2648`**:
```python
def connect_to_database():
    mydb = mysql.connector.connect(
        host="localhost",
        user="root",
        password="H8g6gRA2r/h$[t{6",  # EXPOSED!
        database="lifesim"
    )
```

**`ws/conversationEvents.py:15`**:
```python
openai.api_key = "sk-proj-..."  # EXPOSED!
```

**Impact**:
- Credentials visible in git history
- Anyone with repo access can access database
- API key can be used to rack up OpenAI bills

#### 2. No Input Validation (CRITICAL)

**`ws/app.py:consumer()`**:
```python
async def consumer(websocket):
    data = json.loads(message)  # No validation!

    if data['type'] == 'answer':
        questionID = data['id']  # Could be malicious
        response = data['response']  # Could be XSS payload
```

**Impact**:
- Can inject malicious data
- Potential XSS attacks
- Can crash server with malformed data

#### 3. SQL Injection Risk (HIGH)

**`ws/functions.py`**:
```python
def saveGame(player):
    sql = f"UPDATE users SET data = '{json.dumps(player_data)}' WHERE userID = '{player.userID}'"
    mycursor.execute(sql)  # Vulnerable to SQL injection!
```

**Impact**:
- Attacker can modify any user's data
- Can extract sensitive information
- Can drop tables

#### 4. No Authentication (CRITICAL)

**`ws/app.py:handler()`**:
```python
async def handler(websocket, path):
    data = json.loads(await websocket.recv())
    userID = data['userID']  # Trust client completely!
    websocket.userID = userID
```

**Impact**:
- Anyone can impersonate any user
- No session validation
- Can access/modify anyone's game

#### 5. No Rate Limiting (HIGH)

**`ws/conversationEvents.py`**:
```python
# OpenAI API called without limits
result = await openai.ChatCompletion.acreate(...)  # $$$
```

**Impact**:
- Can be abused to generate huge bills
- No protection against spam
- Can DoS the service

### Scalability Issues

#### Global State (HIGH)

**`ws/app.py`**:
```python
USERS = set()  # Module-level global
playerRecords = {}  # Shared dict
mydb = connect_to_database()  # Single connection
```

**Impact**:
- Cannot scale to multiple processes
- No horizontal scaling possible
- Shared state causes race conditions

#### Single Database Connection (MEDIUM)

**`ws/functions.py`**:
```python
mydb = connect_to_database()  # Module-level, single connection
```

**Impact**:
- Bottleneck with multiple users
- Connection can drop and not recover
- No connection pooling

### Observability Gaps

#### No Structured Logging (HIGH)

**Current logging**:
```python
print(f"User {userID} connected")  # Unstructured
print("Error occurred")  # No context
```

**Impact**:
- Can't search/filter logs
- No log levels (info vs error)
- No correlation IDs
- Can't debug production issues

#### No Error Tracking (HIGH)

**Current error handling**:
```python
try:
    # ... game logic ...
except Exception as e:
    print(f"Error: {e}")  # Lost forever
```

**Impact**:
- Errors go unnoticed
- Can't track error patterns
- No alerting on failures

#### No Metrics (MEDIUM)

**Current state**: No metrics collection

**Impact**:
- Can't measure performance
- Can't detect anomalies
- Can't capacity plan

---

## Implementation Plan

### Phase 1: Security Hardening (4-5 hours)

#### Task 1.1: Remove Hardcoded Secrets

**Create `.env.example`** (checked into git):
```bash
# Database Configuration
DB_HOST=localhost
DB_PORT=3306
DB_USER=root
DB_PASSWORD=your_password_here
DB_NAME=lifesim

# OpenAI Configuration
OPENAI_API_KEY=sk-proj-your-key-here

# Security Configuration
JWT_SECRET=your-secret-key-here
SESSION_TIMEOUT=3600

# Application Configuration
ENVIRONMENT=development
DEBUG=false
LOG_LEVEL=INFO
```

**Create `.env`** (gitignored, actual secrets):
```bash
DB_PASSWORD=H8g6gRA2r/h$[t{6
OPENAI_API_KEY=sk-proj-...actual-key...
JWT_SECRET=...generate-random-secret...
```

**Update `.gitignore`**:
```
.env
*.env
!.env.example
```

**Create `ws/config.py`** (configuration management):
```python
"""
Configuration management using environment variables.

Usage:
    from config import config
    db_password = config.DB_PASSWORD
"""

import os
from pathlib import Path
from typing import Optional

# Load .env file if exists
try:
    from dotenv import load_dotenv
    env_path = Path(__file__).parent.parent / '.env'
    load_dotenv(dotenv_path=env_path)
except ImportError:
    print("Warning: python-dotenv not installed. Using environment variables only.")


class Config:
    """Application configuration from environment variables"""

    # Database
    DB_HOST: str = os.getenv('DB_HOST', 'localhost')
    DB_PORT: int = int(os.getenv('DB_PORT', '3306'))
    DB_USER: str = os.getenv('DB_USER', 'root')
    DB_PASSWORD: str = os.getenv('DB_PASSWORD', '')
    DB_NAME: str = os.getenv('DB_NAME', 'lifesim')

    # OpenAI
    OPENAI_API_KEY: str = os.getenv('OPENAI_API_KEY', '')

    # Security
    JWT_SECRET: str = os.getenv('JWT_SECRET', 'dev-secret-change-in-production')
    SESSION_TIMEOUT: int = int(os.getenv('SESSION_TIMEOUT', '3600'))
    ALLOWED_ORIGINS: list = os.getenv('ALLOWED_ORIGINS', '*').split(',')

    # Application
    ENVIRONMENT: str = os.getenv('ENVIRONMENT', 'development')
    DEBUG: bool = os.getenv('DEBUG', 'false').lower() == 'true'
    LOG_LEVEL: str = os.getenv('LOG_LEVEL', 'INFO')
    MAX_CONNECTIONS: int = int(os.getenv('MAX_CONNECTIONS', '100'))

    # Rate Limiting
    OPENAI_MAX_REQUESTS_PER_HOUR: int = int(os.getenv('OPENAI_MAX_REQUESTS_PER_HOUR', '60'))
    WEBSOCKET_MAX_MESSAGES_PER_MINUTE: int = int(os.getenv('WEBSOCKET_MAX_MESSAGES_PER_MINUTE', '30'))

    # Testing
    TEST_MODE: bool = os.getenv('TEST_MODE', 'false').lower() == 'true'

    @classmethod
    def validate(cls):
        """Validate required configuration"""
        errors = []

        if not cls.DB_PASSWORD and not cls.TEST_MODE:
            errors.append("DB_PASSWORD not set")

        if not cls.OPENAI_API_KEY and not cls.TEST_MODE:
            errors.append("OPENAI_API_KEY not set")

        if cls.ENVIRONMENT == 'production' and cls.JWT_SECRET == 'dev-secret-change-in-production':
            errors.append("JWT_SECRET must be changed in production")

        if errors:
            raise ValueError(f"Configuration errors: {', '.join(errors)}")

    @classmethod
    def get_database_url(cls) -> str:
        """Get database connection URL"""
        return f"mysql://{cls.DB_USER}:{cls.DB_PASSWORD}@{cls.DB_HOST}:{cls.DB_PORT}/{cls.DB_NAME}"


# Singleton instance
config = Config()

# Validate on import (except in test mode)
if not config.TEST_MODE:
    config.validate()
```

**Modify `ws/functions.py`**:
```python
from config import config

def connect_to_database():
    """Get database connection using config"""
    return mysql.connector.connect(
        host=config.DB_HOST,
        port=config.DB_PORT,
        user=config.DB_USER,
        password=config.DB_PASSWORD,
        database=config.DB_NAME
    )
```

**Modify `ws/conversationEvents.py`**:
```python
from config import config
import openai

openai.api_key = config.OPENAI_API_KEY
```

**Update `ws/requirements.txt`**:
```txt
python-dotenv==1.0.0
```

#### Task 1.2: Input Validation Layer

**Create `ws/validators.py`**:
```python
"""Input validation and sanitization"""
from typing import Dict, Any, Optional, List
import re
from datetime import datetime

class ValidationError(Exception):
    """Raised when validation fails"""
    pass


class Validator:
    """Input validator with common patterns"""

    @staticmethod
    def validate_user_id(user_id: str) -> str:
        """
        Validate user ID format.

        Args:
            user_id: User ID to validate

        Returns:
            Sanitized user ID

        Raises:
            ValidationError: If invalid
        """
        if not user_id or not isinstance(user_id, str):
            raise ValidationError("User ID must be a non-empty string")

        # Only allow alphanumeric, dash, underscore
        if not re.match(r'^[a-zA-Z0-9_-]+$', user_id):
            raise ValidationError("User ID contains invalid characters")

        if len(user_id) > 64:
            raise ValidationError("User ID too long (max 64 characters)")

        return user_id

    @staticmethod
    def validate_command(command: str) -> str:
        """
        Validate game command.

        Args:
            command: Command string

        Returns:
            Validated command

        Raises:
            ValidationError: If invalid
        """
        valid_commands = {'start', 'stop', 'restart', 'pause', 'resume'}

        if command not in valid_commands:
            raise ValidationError(f"Invalid command. Must be one of: {valid_commands}")

        return command

    @staticmethod
    def validate_speed(speed: Any) -> int:
        """
        Validate game speed.

        Args:
            speed: Speed value

        Returns:
            Validated speed as int

        Raises:
            ValidationError: If invalid
        """
        try:
            speed_int = int(speed)
        except (ValueError, TypeError):
            raise ValidationError("Speed must be a number")

        if not (1 <= speed_int <= 5000):
            raise ValidationError("Speed must be between 1 and 5000")

        return speed_int

    @staticmethod
    def validate_answer_id(answer_id: str) -> str:
        """
        Validate question/answer ID.

        Args:
            answer_id: Answer ID to validate

        Returns:
            Sanitized answer ID

        Raises:
            ValidationError: If invalid
        """
        if not answer_id or not isinstance(answer_id, str):
            raise ValidationError("Answer ID must be a non-empty string")

        # Only allow alphanumeric and underscore (function names)
        if not re.match(r'^[a-zA-Z_][a-zA-Z0-9_]*$', answer_id):
            raise ValidationError("Answer ID contains invalid characters")

        if len(answer_id) > 128:
            raise ValidationError("Answer ID too long")

        return answer_id

    @staticmethod
    def validate_response_text(response: str) -> str:
        """
        Validate and sanitize response text.

        Args:
            response: Response text

        Returns:
            Sanitized response

        Raises:
            ValidationError: If invalid
        """
        if not isinstance(response, str):
            raise ValidationError("Response must be a string")

        # Limit length
        if len(response) > 10000:
            raise ValidationError("Response too long (max 10000 characters)")

        # Strip dangerous HTML/script tags
        response = re.sub(r'<script[^>]*>.*?</script>', '', response, flags=re.IGNORECASE | re.DOTALL)
        response = re.sub(r'<iframe[^>]*>.*?</iframe>', '', response, flags=re.IGNORECASE | re.DOTALL)

        return response.strip()

    @staticmethod
    def sanitize_sql_string(value: str) -> str:
        """
        Sanitize string for SQL (escape quotes).

        NOTE: Should use parameterized queries instead when possible.

        Args:
            value: String to sanitize

        Returns:
            Sanitized string
        """
        # Escape single quotes
        return value.replace("'", "''").replace("\\", "\\\\")


def validate_websocket_message(message: Dict[str, Any]) -> Dict[str, Any]:
    """
    Validate incoming WebSocket message.

    Args:
        message: Parsed JSON message

    Returns:
        Validated message dict

    Raises:
        ValidationError: If validation fails
    """
    if not isinstance(message, dict):
        raise ValidationError("Message must be a JSON object")

    # Validate type field
    if 'type' not in message:
        raise ValidationError("Message missing 'type' field")

    msg_type = message['type']
    if not isinstance(msg_type, str):
        raise ValidationError("Message type must be a string")

    # Validate based on type
    if msg_type == 'init':
        if 'userID' not in message:
            raise ValidationError("Init message missing 'userID'")
        message['userID'] = Validator.validate_user_id(message['userID'])

    elif msg_type == 'command':
        if 'value' not in message:
            raise ValidationError("Command message missing 'value'")
        message['value'] = Validator.validate_command(message['value'])

    elif msg_type == 'speed':
        if 'value' not in message:
            raise ValidationError("Speed message missing 'value'")
        message['value'] = Validator.validate_speed(message['value'])

    elif msg_type == 'answer':
        if 'id' not in message:
            raise ValidationError("Answer message missing 'id'")
        if 'response' not in message:
            raise ValidationError("Answer message missing 'response'")

        message['id'] = Validator.validate_answer_id(message['id'])
        message['response'] = Validator.validate_response_text(message['response'])

    else:
        raise ValidationError(f"Unknown message type: {msg_type}")

    return message
```

**Modify `ws/app.py`** to use validation:
```python
from validators import validate_websocket_message, ValidationError
import logging

logger = logging.getLogger(__name__)

async def consumer(websocket):
    """Handle incoming WebSocket messages"""
    async for message in websocket:
        try:
            data = json.loads(message)

            # VALIDATE INPUT
            try:
                data = validate_websocket_message(data)
            except ValidationError as e:
                logger.warning(f"Invalid message from {websocket.remote_address}: {e}")
                await websocket.send(json.dumps({
                    'type': 'error',
                    'message': f'Invalid input: {str(e)}'
                }))
                continue

            # Process validated message
            if data['type'] == 'init':
                # ... existing logic with validated data ...
                pass

        except json.JSONDecodeError:
            logger.error(f"Invalid JSON from {websocket.remote_address}")
            await websocket.send(json.dumps({
                'type': 'error',
                'message': 'Invalid JSON'
            }))
        except Exception as e:
            logger.error(f"Error processing message: {e}", exc_info=True)
```

#### Task 1.3: SQL Injection Prevention

**Modify `ws/functions.py`** to use parameterized queries:

```python
def saveGame(player, storage=None):
    """Save game with SQL injection protection"""
    if storage:
        return storage.save(player)

    try:
        mydb = get_database_connection()
        cursor = mydb.cursor()

        # Use parameterized query (NOT string interpolation)
        sql = """
            INSERT INTO users (userID, data, lastSaved)
            VALUES (%s, %s, %s)
            ON DUPLICATE KEY UPDATE
                data = VALUES(data),
                lastSaved = VALUES(lastSaved)
        """

        player_json = json.dumps(player.to_dict())
        timestamp = datetime.now()

        # Pass parameters separately (safe from injection)
        cursor.execute(sql, (player.userID, player_json, timestamp))
        mydb.commit()

        return True

    except Exception as e:
        logger.error(f"Failed to save game for {player.userID}: {e}", exc_info=True)
        return False
    finally:
        if cursor:
            cursor.close()


def loadGame(userID, storage=None):
    """Load game with SQL injection protection"""
    if storage:
        return storage.load(userID)

    try:
        # Validate user ID first
        from validators import Validator
        userID = Validator.validate_user_id(userID)

        mydb = get_database_connection()
        cursor = mydb.cursor()

        # Parameterized query
        sql = "SELECT data FROM users WHERE userID = %s"
        cursor.execute(sql, (userID,))

        result = cursor.fetchone()
        if result:
            return json.loads(result[0])
        return None

    except Exception as e:
        logger.error(f"Failed to load game for {userID}: {e}", exc_info=True)
        return None
    finally:
        if cursor:
            cursor.close()
```

#### Task 1.4: Authentication System

**Create `ws/auth.py`**:
```python
"""
Authentication and session management.

Uses JWT tokens for stateless authentication.
"""

import jwt
import bcrypt
import secrets
from datetime import datetime, timedelta
from typing import Optional, Dict, Any
from config import config
import logging

logger = logging.getLogger(__name__)


class AuthError(Exception):
    """Authentication error"""
    pass


class AuthManager:
    """Manage user authentication"""

    def __init__(self):
        self.secret = config.JWT_SECRET
        self.timeout = config.SESSION_TIMEOUT

    def hash_password(self, password: str) -> str:
        """
        Hash password using bcrypt.

        Args:
            password: Plain text password

        Returns:
            Hashed password
        """
        salt = bcrypt.gensalt()
        hashed = bcrypt.hashpw(password.encode('utf-8'), salt)
        return hashed.decode('utf-8')

    def verify_password(self, password: str, hashed: str) -> bool:
        """
        Verify password against hash.

        Args:
            password: Plain text password
            hashed: Hashed password from database

        Returns:
            True if password matches
        """
        try:
            return bcrypt.checkpw(password.encode('utf-8'), hashed.encode('utf-8'))
        except Exception as e:
            logger.error(f"Password verification error: {e}")
            return False

    def create_token(self, user_id: str, additional_claims: Dict = None) -> str:
        """
        Create JWT token for user.

        Args:
            user_id: User ID
            additional_claims: Optional additional JWT claims

        Returns:
            JWT token string
        """
        now = datetime.utcnow()
        payload = {
            'user_id': user_id,
            'iat': now,
            'exp': now + timedelta(seconds=self.timeout),
            'jti': secrets.token_urlsafe(16)  # Unique token ID
        }

        if additional_claims:
            payload.update(additional_claims)

        token = jwt.encode(payload, self.secret, algorithm='HS256')
        return token

    def verify_token(self, token: str) -> Dict[str, Any]:
        """
        Verify JWT token.

        Args:
            token: JWT token string

        Returns:
            Decoded token payload

        Raises:
            AuthError: If token invalid or expired
        """
        try:
            payload = jwt.decode(token, self.secret, algorithms=['HS256'])
            return payload
        except jwt.ExpiredSignatureError:
            raise AuthError("Token expired")
        except jwt.InvalidTokenError as e:
            raise AuthError(f"Invalid token: {e}")

    def create_session(self, user_id: str) -> Dict[str, str]:
        """
        Create authenticated session.

        Args:
            user_id: User ID

        Returns:
            Dict with token and expiry
        """
        token = self.create_token(user_id)

        return {
            'token': token,
            'user_id': user_id,
            'expires_in': self.timeout
        }

    def authenticate_user(self, user_id: str, password: str) -> Optional[Dict[str, str]]:
        """
        Authenticate user with credentials.

        Args:
            user_id: User ID
            password: Password

        Returns:
            Session dict if successful, None otherwise
        """
        # Load user from database
        from functions import get_database_connection

        try:
            mydb = get_database_connection()
            cursor = mydb.cursor()

            sql = "SELECT userID, passwordHash FROM users WHERE userID = %s"
            cursor.execute(sql, (user_id,))

            result = cursor.fetchone()
            if not result:
                logger.warning(f"Authentication failed: user {user_id} not found")
                return None

            stored_user_id, password_hash = result

            # Verify password
            if not self.verify_password(password, password_hash):
                logger.warning(f"Authentication failed: invalid password for {user_id}")
                return None

            # Create session
            session = self.create_session(user_id)
            logger.info(f"User {user_id} authenticated successfully")

            return session

        except Exception as e:
            logger.error(f"Authentication error: {e}", exc_info=True)
            return None
        finally:
            if cursor:
                cursor.close()


# Singleton
auth_manager = AuthManager()


def require_auth(websocket, token: str) -> Optional[str]:
    """
    Require authentication for WebSocket connection.

    Args:
        websocket: WebSocket connection
        token: JWT token from client

    Returns:
        User ID if authenticated, None otherwise
    """
    try:
        payload = auth_manager.verify_token(token)
        user_id = payload['user_id']

        # Attach user ID to websocket
        websocket.user_id = user_id
        websocket.authenticated = True

        logger.info(f"WebSocket authenticated for user {user_id}")
        return user_id

    except AuthError as e:
        logger.warning(f"WebSocket authentication failed: {e}")
        websocket.authenticated = False
        return None
```

**Modify `ws/app.py`** to require authentication:
```python
from auth import auth_manager, require_auth, AuthError

async def handler(websocket, path):
    """WebSocket connection handler with authentication"""
    try:
        # First message must be authentication
        auth_message = await websocket.recv()
        auth_data = json.loads(auth_message)

        if auth_data.get('type') != 'auth':
            await websocket.send(json.dumps({
                'type': 'error',
                'message': 'First message must be authentication'
            }))
            return

        token = auth_data.get('token')
        if not token:
            await websocket.send(json.dumps({
                'type': 'error',
                'message': 'Missing authentication token'
            }))
            return

        # Authenticate
        user_id = require_auth(websocket, token)
        if not user_id:
            await websocket.send(json.dumps({
                'type': 'error',
                'message': 'Authentication failed'
            }))
            return

        # Send success
        await websocket.send(json.dumps({
            'type': 'auth_success',
            'user_id': user_id
        }))

        # Continue with normal handler
        websocket.userID = user_id
        USERS.add(websocket)
        # ... rest of existing handler logic ...

    except Exception as e:
        logger.error(f"Handler error: {e}", exc_info=True)
    finally:
        USERS.discard(websocket)
```

**Create API endpoint for login** (`api/auth.php` or add to Python):
```python
# api/login.py (if using Python API)
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from auth import auth_manager

app = FastAPI()

class LoginRequest(BaseModel):
    user_id: str
    password: str

@app.post("/api/login")
async def login(request: LoginRequest):
    """Authenticate user and return token"""
    session = auth_manager.authenticate_user(request.user_id, request.password)

    if not session:
        raise HTTPException(status_code=401, detail="Invalid credentials")

    return session
```

#### Task 1.5: Rate Limiting

**Create `ws/rate_limiter.py`**:
```python
"""
Rate limiting for expensive operations.

Uses token bucket algorithm.
"""

from collections import defaultdict
from datetime import datetime, timedelta
from typing import Dict, Tuple
import asyncio
from config import config
import logging

logger = logging.getLogger(__name__)


class RateLimiter:
    """
    Token bucket rate limiter.

    Usage:
        limiter = RateLimiter(max_requests=60, window_seconds=3600)
        if limiter.is_allowed('user_123'):
            # Allow request
        else:
            # Reject (rate limited)
    """

    def __init__(self, max_requests: int, window_seconds: int):
        """
        Initialize rate limiter.

        Args:
            max_requests: Maximum requests allowed in window
            window_seconds: Time window in seconds
        """
        self.max_requests = max_requests
        self.window = timedelta(seconds=window_seconds)
        self.requests: Dict[str, list] = defaultdict(list)

    def is_allowed(self, identifier: str) -> bool:
        """
        Check if request is allowed.

        Args:
            identifier: User ID or other identifier

        Returns:
            True if allowed, False if rate limited
        """
        now = datetime.now()
        cutoff = now - self.window

        # Clean old requests
        self.requests[identifier] = [
            req_time for req_time in self.requests[identifier]
            if req_time > cutoff
        ]

        # Check limit
        if len(self.requests[identifier]) >= self.max_requests:
            logger.warning(f"Rate limit exceeded for {identifier}")
            return False

        # Record request
        self.requests[identifier].append(now)
        return True

    def get_remaining(self, identifier: str) -> int:
        """
        Get remaining requests in current window.

        Args:
            identifier: User ID or other identifier

        Returns:
            Number of requests remaining
        """
        now = datetime.now()
        cutoff = now - self.window

        # Clean old requests
        self.requests[identifier] = [
            req_time for req_time in self.requests[identifier]
            if req_time > cutoff
        ]

        return max(0, self.max_requests - len(self.requests[identifier]))

    def reset(self, identifier: str):
        """Reset rate limit for identifier"""
        if identifier in self.requests:
            del self.requests[identifier]


# Global rate limiters
openai_limiter = RateLimiter(
    max_requests=config.OPENAI_MAX_REQUESTS_PER_HOUR,
    window_seconds=3600
)

websocket_limiter = RateLimiter(
    max_requests=config.WEBSOCKET_MAX_MESSAGES_PER_MINUTE,
    window_seconds=60
)


async def check_openai_rate_limit(user_id: str) -> bool:
    """
    Check if user can make OpenAI request.

    Args:
        user_id: User ID

    Returns:
        True if allowed, False if rate limited
    """
    if not openai_limiter.is_allowed(user_id):
        remaining = openai_limiter.get_remaining(user_id)
        logger.warning(f"OpenAI rate limit exceeded for {user_id}. Remaining: {remaining}")
        return False
    return True


async def check_websocket_rate_limit(user_id: str) -> bool:
    """
    Check if user can send WebSocket message.

    Args:
        user_id: User ID

    Returns:
        True if allowed, False if rate limited
    """
    if not websocket_limiter.is_allowed(user_id):
        logger.warning(f"WebSocket rate limit exceeded for {user_id}")
        return False
    return True
```

**Modify `ws/conversationEvents.py`** to use rate limiting:
```python
from rate_limiter import check_openai_rate_limit

async def get_ai_response(player, conversation_history, character):
    """Get AI response with rate limiting"""

    # Check rate limit
    if not await check_openai_rate_limit(player.userID):
        return "I'm a bit overwhelmed right now. Can we talk again later?"

    # Proceed with OpenAI call
    try:
        result = await openai.ChatCompletion.acreate(...)
        return result
    except Exception as e:
        logger.error(f"OpenAI error: {e}")
        return "Sorry, I'm having trouble thinking right now."
```

**Modify `ws/app.py`** consumer to use rate limiting:
```python
from rate_limiter import check_websocket_rate_limit

async def consumer(websocket):
    """Handle incoming messages with rate limiting"""
    async for message in websocket:
        # Check rate limit
        if not await check_websocket_rate_limit(websocket.userID):
            await websocket.send(json.dumps({
                'type': 'error',
                'message': 'Rate limit exceeded. Please slow down.'
            }))
            continue

        # Process message
        # ... existing logic ...
```

---

### Phase 2: Multi-User Scalability (3-4 hours)

#### Task 2.1: Connection Pooling

**Create `ws/database.py`** (database connection management):
```python
"""
Database connection pooling.

Uses mysql-connector-python pool for efficient connections.
"""

import mysql.connector
from mysql.connector import pooling
from config import config
import logging

logger = logging.getLogger(__name__)

# Connection pool (module-level singleton)
_connection_pool = None


def get_connection_pool():
    """
    Get or create database connection pool.

    Returns:
        MySQL connection pool
    """
    global _connection_pool

    if _connection_pool is None:
        try:
            _connection_pool = pooling.MySQLConnectionPool(
                pool_name="baolife_pool",
                pool_size=config.MAX_CONNECTIONS,
                pool_reset_session=True,
                host=config.DB_HOST,
                port=config.DB_PORT,
                user=config.DB_USER,
                password=config.DB_PASSWORD,
                database=config.DB_NAME
            )
            logger.info(f"Database connection pool created (size: {config.MAX_CONNECTIONS})")
        except Exception as e:
            logger.error(f"Failed to create connection pool: {e}")
            raise

    return _connection_pool


def get_database_connection():
    """
    Get database connection from pool.

    Returns:
        MySQL connection

    Usage:
        conn = get_database_connection()
        try:
            cursor = conn.cursor()
            cursor.execute("SELECT ...")
        finally:
            conn.close()  # Returns to pool
    """
    try:
        pool = get_connection_pool()
        return pool.get_connection()
    except Exception as e:
        logger.error(f"Failed to get database connection: {e}")
        raise


def close_connection_pool():
    """Close all connections in pool (for cleanup)"""
    global _connection_pool

    if _connection_pool:
        # Pool doesn't have explicit close, but we can clear reference
        _connection_pool = None
        logger.info("Database connection pool closed")
```

**Modify `ws/functions.py`** to use pool:
```python
from database import get_database_connection

# Remove old module-level connection
# OLD: mydb = connect_to_database()

# Use connection from pool in functions
def saveGame(player, storage=None):
    """Save game using connection pool"""
    conn = None
    try:
        conn = get_database_connection()
        cursor = conn.cursor()

        # ... save logic ...

        conn.commit()
        return True

    except Exception as e:
        logger.error(f"Save error: {e}")
        if conn:
            conn.rollback()
        return False
    finally:
        if conn:
            conn.close()  # Returns to pool
```

#### Task 2.2: Session Management

**Create `ws/session_manager.py`**:
```python
"""
Session management for active game instances.

Stores active player sessions in memory with TTL.
"""

from typing import Dict, Optional
from datetime import datetime, timedelta
from functions import playerClass
import asyncio
import logging

logger = logging.getLogger(__name__)


class SessionManager:
    """
    Manage active player sessions.

    Replaces global playerRecords dict with proper session management.
    """

    def __init__(self, session_timeout: int = 3600):
        """
        Initialize session manager.

        Args:
            session_timeout: Session timeout in seconds
        """
        self.sessions: Dict[str, Dict] = {}
        self.timeout = timedelta(seconds=session_timeout)
        self._cleanup_task = None

    def create_session(self, user_id: str, player: playerClass) -> str:
        """
        Create new session for player.

        Args:
            user_id: User ID
            player: Player instance

        Returns:
            Session ID
        """
        self.sessions[user_id] = {
            'player': player,
            'created_at': datetime.now(),
            'last_accessed': datetime.now(),
            'websocket': None
        }

        logger.info(f"Session created for user {user_id}")
        return user_id

    def get_session(self, user_id: str) -> Optional[playerClass]:
        """
        Get player from session.

        Args:
            user_id: User ID

        Returns:
            Player instance if session exists and valid, None otherwise
        """
        if user_id not in self.sessions:
            return None

        session = self.sessions[user_id]

        # Check if expired
        if datetime.now() - session['last_accessed'] > self.timeout:
            logger.info(f"Session expired for user {user_id}")
            self.remove_session(user_id)
            return None

        # Update access time
        session['last_accessed'] = datetime.now()

        return session['player']

    def remove_session(self, user_id: str):
        """
        Remove session.

        Args:
            user_id: User ID
        """
        if user_id in self.sessions:
            del self.sessions[user_id]
            logger.info(f"Session removed for user {user_id}")

    def attach_websocket(self, user_id: str, websocket):
        """
        Attach WebSocket to session.

        Args:
            user_id: User ID
            websocket: WebSocket connection
        """
        if user_id in self.sessions:
            self.sessions[user_id]['websocket'] = websocket

    def get_websocket(self, user_id: str):
        """
        Get WebSocket for session.

        Args:
            user_id: User ID

        Returns:
            WebSocket if attached, None otherwise
        """
        if user_id in self.sessions:
            return self.sessions[user_id]['websocket']
        return None

    def get_active_sessions(self) -> int:
        """
        Get count of active sessions.

        Returns:
            Number of active sessions
        """
        return len(self.sessions)

    async def cleanup_expired_sessions(self):
        """Cleanup task to remove expired sessions"""
        while True:
            try:
                await asyncio.sleep(300)  # Check every 5 minutes

                now = datetime.now()
                expired = []

                for user_id, session in self.sessions.items():
                    if now - session['last_accessed'] > self.timeout:
                        expired.append(user_id)

                for user_id in expired:
                    logger.info(f"Cleaning up expired session for {user_id}")
                    self.remove_session(user_id)

                if expired:
                    logger.info(f"Cleaned up {len(expired)} expired sessions")

            except Exception as e:
                logger.error(f"Session cleanup error: {e}", exc_info=True)

    def start_cleanup_task(self):
        """Start background cleanup task"""
        if self._cleanup_task is None:
            self._cleanup_task = asyncio.create_task(self.cleanup_expired_sessions())
            logger.info("Session cleanup task started")

    def stop_cleanup_task(self):
        """Stop background cleanup task"""
        if self._cleanup_task:
            self._cleanup_task.cancel()
            self._cleanup_task = None
            logger.info("Session cleanup task stopped")


# Global session manager
session_manager = SessionManager()
```

**Modify `ws/app.py`** to use session manager:
```python
from session_manager import session_manager

# Replace global playerRecords
# OLD: playerRecords = {}

async def handler(websocket, path):
    """WebSocket handler with session management"""
    try:
        # ... authentication logic ...

        user_id = websocket.userID

        # Get or create session
        player = session_manager.get_session(user_id)
        if not player:
            # Load from database
            player = loadGame(user_id)
            if not player:
                player = create_new_player(user_id)

            session_manager.create_session(user_id, player)

        # Attach websocket to session
        session_manager.attach_websocket(user_id, websocket)

        # ... rest of handler ...

    finally:
        # Don't remove session on disconnect (keep in memory)
        pass


# Start cleanup task when server starts
async def main():
    """Main server with cleanup task"""
    session_manager.start_cleanup_task()

    async with websockets.serve(handler, "0.0.0.0", 8001):
        await asyncio.Future()  # Run forever
```

#### Task 2.3: Refactor Global State

**Modify `ws/app.py`** to remove global state:

```python
# OLD (global state):
# USERS = set()
# playerRecords = {}
# mydb = connect_to_database()

# NEW (managed state):
from session_manager import session_manager
from database import get_database_connection

# Active WebSocket connections (still need this for broadcasting)
active_connections = set()

async def handler(websocket, path):
    """Handler without global state"""
    try:
        active_connections.add(websocket)

        # Use session manager instead of global dict
        # ... handler logic ...

    finally:
        active_connections.discard(websocket)


async def broadcast_message(message):
    """Broadcast to all active connections"""
    if active_connections:
        await asyncio.gather(
            *[ws.send(message) for ws in active_connections],
            return_exceptions=True
        )
```

---

### Phase 3: Observability & Production Ops (3-4 hours)

#### Task 3.1: Structured Logging

**Create `ws/logging_config.py`**:
```python
"""
Structured logging configuration.

Sets up JSON logging for production and human-readable for development.
"""

import logging
import logging.config
import json
from datetime import datetime
from config import config


class JSONFormatter(logging.Formatter):
    """Format log records as JSON"""

    def format(self, record):
        """Format log record as JSON"""
        log_data = {
            'timestamp': datetime.utcnow().isoformat(),
            'level': record.levelname,
            'logger': record.name,
            'message': record.getMessage(),
            'module': record.module,
            'function': record.funcName,
            'line': record.lineno
        }

        # Add exception info if present
        if record.exc_info:
            log_data['exception'] = self.formatException(record.exc_info)

        # Add extra fields
        if hasattr(record, 'user_id'):
            log_data['user_id'] = record.user_id

        if hasattr(record, 'request_id'):
            log_data['request_id'] = record.request_id

        return json.dumps(log_data)


def setup_logging():
    """Configure logging based on environment"""

    # Development: Human-readable console logging
    # Production: JSON logging to file

    log_level = getattr(logging, config.LOG_LEVEL.upper(), logging.INFO)

    if config.ENVIRONMENT == 'production':
        # JSON logging for production
        logging_config = {
            'version': 1,
            'disable_existing_loggers': False,
            'formatters': {
                'json': {
                    '()': JSONFormatter
                }
            },
            'handlers': {
                'file': {
                    'class': 'logging.handlers.RotatingFileHandler',
                    'filename': 'logs/baolife.log',
                    'maxBytes': 10485760,  # 10MB
                    'backupCount': 5,
                    'formatter': 'json'
                },
                'error_file': {
                    'class': 'logging.handlers.RotatingFileHandler',
                    'filename': 'logs/baolife_errors.log',
                    'maxBytes': 10485760,
                    'backupCount': 5,
                    'formatter': 'json',
                    'level': logging.ERROR
                }
            },
            'root': {
                'level': log_level,
                'handlers': ['file', 'error_file']
            }
        }
    else:
        # Console logging for development
        logging_config = {
            'version': 1,
            'disable_existing_loggers': False,
            'formatters': {
                'standard': {
                    'format': '%(asctime)s [%(levelname)s] %(name)s: %(message)s'
                }
            },
            'handlers': {
                'console': {
                    'class': 'logging.StreamHandler',
                    'formatter': 'standard'
                }
            },
            'root': {
                'level': log_level,
                'handlers': ['console']
            }
        }

    logging.config.dictConfig(logging_config)

    # Create logs directory if needed
    if config.ENVIRONMENT == 'production':
        import os
        os.makedirs('logs', exist_ok=True)


# Initialize logging
setup_logging()
logger = logging.getLogger(__name__)
logger.info(f"Logging configured for environment: {config.ENVIRONMENT}")
```

**Modify all files to use proper logging**:
```python
# Replace all print() statements with logger calls

# OLD:
print(f"User {userID} connected")

# NEW:
import logging
logger = logging.getLogger(__name__)
logger.info(f"User connected", extra={'user_id': userID})

# OLD:
print(f"Error: {e}")

# NEW:
logger.error(f"Operation failed", exc_info=True, extra={'user_id': userID})
```

#### Task 3.2: Error Tracking

**Optional: Sentry Integration** (create `ws/error_tracking.py`):
```python
"""
Error tracking with Sentry (optional).

Install: pip install sentry-sdk
"""

import sentry_sdk
from config import config
import logging

logger = logging.getLogger(__name__)


def init_error_tracking():
    """Initialize Sentry error tracking"""

    sentry_dsn = config.SENTRY_DSN if hasattr(config, 'SENTRY_DSN') else None

    if not sentry_dsn:
        logger.info("Sentry DSN not configured. Error tracking disabled.")
        return

    sentry_sdk.init(
        dsn=sentry_dsn,
        environment=config.ENVIRONMENT,
        traces_sample_rate=0.1 if config.ENVIRONMENT == 'production' else 1.0,
        profiles_sample_rate=0.1 if config.ENVIRONMENT == 'production' else 1.0
    )

    logger.info("Sentry error tracking initialized")


def capture_exception(exception, context=None):
    """
    Capture exception to Sentry with context.

    Args:
        exception: Exception to capture
        context: Additional context dict
    """
    if context:
        with sentry_sdk.push_scope() as scope:
            for key, value in context.items():
                scope.set_context(key, value)
            sentry_sdk.capture_exception(exception)
    else:
        sentry_sdk.capture_exception(exception)
```

#### Task 3.3: Metrics Collection

**Create `ws/metrics.py`**:
```python
"""
Metrics collection and monitoring.

Simple in-memory metrics (can be exported to Prometheus/Grafana later).
"""

from collections import defaultdict
from datetime import datetime, timedelta
from typing import Dict
import asyncio
import logging

logger = logging.getLogger(__name__)


class MetricsCollector:
    """Collect application metrics"""

    def __init__(self):
        self.counters: Dict[str, int] = defaultdict(int)
        self.gauges: Dict[str, float] = {}
        self.histograms: Dict[str, list] = defaultdict(list)
        self.start_time = datetime.now()

    # Counters (monotonically increasing)

    def increment(self, metric: str, value: int = 1):
        """Increment counter"""
        self.counters[metric] += value

    def get_counter(self, metric: str) -> int:
        """Get counter value"""
        return self.counters.get(metric, 0)

    # Gauges (current value)

    def set_gauge(self, metric: str, value: float):
        """Set gauge value"""
        self.gauges[metric] = value

    def get_gauge(self, metric: str) -> float:
        """Get gauge value"""
        return self.gauges.get(metric, 0.0)

    # Histograms (value distributions)

    def observe(self, metric: str, value: float):
        """Record observation"""
        self.histograms[metric].append({
            'value': value,
            'timestamp': datetime.now()
        })

        # Keep last hour only
        cutoff = datetime.now() - timedelta(hours=1)
        self.histograms[metric] = [
            obs for obs in self.histograms[metric]
            if obs['timestamp'] > cutoff
        ]

    def get_histogram_stats(self, metric: str) -> Dict:
        """Get histogram statistics"""
        values = [obs['value'] for obs in self.histograms.get(metric, [])]

        if not values:
            return {'count': 0}

        return {
            'count': len(values),
            'min': min(values),
            'max': max(values),
            'avg': sum(values) / len(values),
            'sum': sum(values)
        }

    # Summary

    def get_summary(self) -> Dict:
        """Get metrics summary"""
        uptime = (datetime.now() - self.start_time).total_seconds()

        return {
            'uptime_seconds': uptime,
            'counters': dict(self.counters),
            'gauges': dict(self.gauges),
            'histograms': {
                name: self.get_histogram_stats(name)
                for name in self.histograms.keys()
            }
        }


# Global metrics
metrics = MetricsCollector()


# Common metrics helpers

def track_request(func):
    """Decorator to track request metrics"""
    import functools
    import time

    @functools.wraps(func)
    async def wrapper(*args, **kwargs):
        start = time.time()
        metrics.increment('requests_total')

        try:
            result = await func(*args, **kwargs)
            metrics.increment('requests_success')
            return result
        except Exception as e:
            metrics.increment('requests_error')
            raise
        finally:
            duration = time.time() - start
            metrics.observe('request_duration_seconds', duration)

    return wrapper
```

**Instrument key operations** (`ws/app.py`):
```python
from metrics import metrics

async def handler(websocket, path):
    """Handler with metrics"""
    metrics.increment('websocket_connections')
    metrics.set_gauge('active_connections', len(active_connections))

    try:
        # ... handler logic ...
        pass
    finally:
        metrics.increment('websocket_disconnections')
        metrics.set_gauge('active_connections', len(active_connections))


async def initLifeSim(websocket, ...):
    """Game loop with metrics"""
    metrics.increment('games_started')

    import time
    while player.controller == 'active':
        tick_start = time.time()

        # ... game tick ...

        tick_duration = time.time() - tick_start
        metrics.observe('game_tick_duration_seconds', tick_duration)

        await asyncio.sleep(...)

    metrics.increment('games_ended')
```

#### Task 3.4: Health Checks

**Create `ws/health.py`**:
```python
"""
Health check endpoints.

Provides /health and /metrics endpoints for monitoring.
"""

from aiohttp import web
from database import get_database_connection
from metrics import metrics
from session_manager import session_manager
import json
import logging

logger = logging.getLogger(__name__)


async def health_check(request):
    """
    Health check endpoint.

    Returns 200 if healthy, 503 if unhealthy.
    """
    checks = {}

    # Check database
    try:
        conn = get_database_connection()
        cursor = conn.cursor()
        cursor.execute("SELECT 1")
        cursor.close()
        conn.close()
        checks['database'] = 'healthy'
    except Exception as e:
        logger.error(f"Database health check failed: {e}")
        checks['database'] = 'unhealthy'

    # Check sessions
    try:
        active_sessions = session_manager.get_active_sessions()
        checks['sessions'] = {
            'status': 'healthy',
            'active_count': active_sessions
        }
    except Exception as e:
        logger.error(f"Session health check failed: {e}")
        checks['sessions'] = 'unhealthy'

    # Overall status
    is_healthy = all(
        check != 'unhealthy'
        for check in checks.values()
        if isinstance(check, str)
    )

    response_data = {
        'status': 'healthy' if is_healthy else 'unhealthy',
        'checks': checks
    }

    status_code = 200 if is_healthy else 503

    return web.Response(
        text=json.dumps(response_data, indent=2),
        content_type='application/json',
        status=status_code
    )


async def metrics_endpoint(request):
    """
    Metrics endpoint.

    Returns application metrics.
    """
    summary = metrics.get_summary()

    # Add session metrics
    summary['gauges']['active_sessions'] = session_manager.get_active_sessions()

    return web.Response(
        text=json.dumps(summary, indent=2),
        content_type='application/json'
    )


async def start_health_server(port=8002):
    """
    Start health check HTTP server.

    Runs alongside WebSocket server.
    """
    app = web.Application()
    app.router.add_get('/health', health_check)
    app.router.add_get('/metrics', metrics_endpoint)

    runner = web.AppRunner(app)
    await runner.setup()

    site = web.TCPSite(runner, '0.0.0.0', port)
    await site.start()

    logger.info(f"Health check server started on port {port}")
    logger.info(f"  - http://localhost:{port}/health")
    logger.info(f"  - http://localhost:{port}/metrics")

    return runner
```

**Modify `ws/app.py`** to start health server:
```python
from health import start_health_server

async def main():
    """Start both WebSocket and health servers"""
    # Start health check server
    health_runner = await start_health_server(port=8002)

    # Start session cleanup
    session_manager.start_cleanup_task()

    # Start WebSocket server
    async with websockets.serve(handler, "0.0.0.0", 8001):
        logger.info("WebSocket server started on port 8001")
        await asyncio.Future()  # Run forever
```

---

### Phase 4: Event System Improvements (2-3 hours)

#### Task 4.1: Event Registry

**Create `ws/event_registry.py`**:
```python
"""
Event registration and management system.

Provides centralized event management with priority and dependencies.
"""

from typing import Callable, Dict, List, Optional
from dataclasses import dataclass
from enum import IntEnum
import logging

logger = logging.getLogger(__name__)


class EventPriority(IntEnum):
    """Event priority levels"""
    CRITICAL = 100  # Death, game over events
    HIGH = 75       # Major life events (birth, marriage, etc.)
    NORMAL = 50     # Regular events
    LOW = 25        # Minor flavor events


@dataclass
class EventDefinition:
    """Event metadata"""
    name: str
    function: Callable
    priority: EventPriority = EventPriority.NORMAL
    category: str = 'general'
    dependencies: List[str] = None  # Events that must trigger first
    conflicts: List[str] = None     # Events that cannot trigger together
    description: str = ''

    def __post_init__(self):
        if self.dependencies is None:
            self.dependencies = []
        if self.conflicts is None:
            self.conflicts = []


class EventRegistry:
    """
    Centralized event registry.

    Manages all game events with priority and dependency handling.
    """

    def __init__(self):
        self.events: Dict[str, EventDefinition] = {}
        self.categories: Dict[str, List[str]] = {}

    def register(self,
                name: str,
                function: Callable,
                priority: EventPriority = EventPriority.NORMAL,
                category: str = 'general',
                dependencies: List[str] = None,
                conflicts: List[str] = None,
                description: str = ''):
        """
        Register an event.

        Args:
            name: Event name (function name)
            function: Event function
            priority: Event priority
            category: Event category
            dependencies: Events that must trigger first
            conflicts: Events that cannot trigger together
            description: Event description
        """
        event_def = EventDefinition(
            name=name,
            function=function,
            priority=priority,
            category=category,
            dependencies=dependencies or [],
            conflicts=conflicts or [],
            description=description
        )

        self.events[name] = event_def

        # Update category index
        if category not in self.categories:
            self.categories[category] = []
        self.categories[category].append(name)

        logger.debug(f"Registered event: {name} (priority: {priority}, category: {category})")

    def get_event(self, name: str) -> Optional[EventDefinition]:
        """Get event definition by name"""
        return self.events.get(name)

    def get_events_by_category(self, category: str) -> List[EventDefinition]:
        """Get all events in category"""
        event_names = self.categories.get(category, [])
        return [self.events[name] for name in event_names]

    def get_sorted_events(self, filter_category: str = None) -> List[EventDefinition]:
        """
        Get events sorted by priority.

        Args:
            filter_category: Optional category filter

        Returns:
            List of events sorted by priority (highest first)
        """
        events = list(self.events.values())

        if filter_category:
            events = [e for e in events if e.category == filter_category]

        return sorted(events, key=lambda e: e.priority, reverse=True)

    def check_dependencies(self, event_name: str, triggered_events: List[str]) -> bool:
        """
        Check if event dependencies are satisfied.

        Args:
            event_name: Event to check
            triggered_events: Events already triggered

        Returns:
            True if dependencies satisfied
        """
        event_def = self.events.get(event_name)
        if not event_def:
            return False

        # All dependencies must have triggered
        return all(dep in triggered_events for dep in event_def.dependencies)

    def check_conflicts(self, event_name: str, triggered_events: List[str]) -> bool:
        """
        Check if event conflicts with already triggered events.

        Args:
            event_name: Event to check
            triggered_events: Events already triggered

        Returns:
            True if no conflicts
        """
        event_def = self.events.get(event_name)
        if not event_def:
            return True

        # No conflicts should have triggered
        return not any(conflict in triggered_events for conflict in event_def.conflicts)


# Global registry
event_registry = EventRegistry()


# Decorator for registering events
def register_event(name: str = None,
                  priority: EventPriority = EventPriority.NORMAL,
                  category: str = 'general',
                  dependencies: List[str] = None,
                  conflicts: List[str] = None,
                  description: str = ''):
    """
    Decorator to register event function.

    Usage:
        @register_event(priority=EventPriority.HIGH, category='school')
        def firstDayOfSchool(player, type='message'):
            # ... event logic ...
    """
    def decorator(func):
        event_name = name or func.__name__
        event_registry.register(
            name=event_name,
            function=func,
            priority=priority,
            category=category,
            dependencies=dependencies,
            conflicts=conflicts,
            description=description or func.__doc__ or ''
        )
        return func
    return decorator
```

**Refactor `ws/events.py`** to use registry:
```python
from event_registry import register_event, EventPriority

@register_event(
    priority=EventPriority.NORMAL,
    category='school',
    description='Ask student if they like school'
)
def likeSchool(player, type='message', message=False, response=False):
    """Ask if student likes school"""
    # ... existing logic ...


@register_event(
    priority=EventPriority.HIGH,
    category='school',
    description='First day of school event'
)
def firstDayOfSchool(player, type='message'):
    """First day of school"""
    # ... existing logic ...


@register_event(
    priority=EventPriority.HIGH,
    category='relationships',
    dependencies=['makeNewFriend'],  # Must have friend first
    description='Start dating someone'
)
def startDating(player, type='message'):
    """Start dating event"""
    # ... existing logic ...
```

**Modify `ws/app.py`** to use registry:
```python
from event_registry import event_registry

def parseEvents(player):
    """
    Parse events using registry (priority-based).

    Events checked in priority order with dependency resolution.
    """
    triggered_events = []

    # Get events sorted by priority
    sorted_events = event_registry.get_sorted_events()

    for event_def in sorted_events:
        # Check dependencies
        if not event_registry.check_dependencies(event_def.name, player.events):
            continue

        # Check conflicts
        if not event_registry.check_conflicts(event_def.name, triggered_events):
            continue

        # Check if event should trigger
        try:
            result = event_def.function(player, type='check')
            if result:
                # Trigger event
                event_data = event_def.function(player, type='message')
                if event_data:
                    triggered_events.append(event_def.name)
                    yield event_data
        except Exception as e:
            logger.error(f"Error checking event {event_def.name}: {e}", exc_info=True)

    return triggered_events
```

---

### Phase 5: API Consolidation (2-3 hours)

#### Task 5.1: Migrate PHP API to Python

**Create `ws/api_server.py`** (FastAPI REST API):
```python
"""
REST API server using FastAPI.

Replaces PHP API with Python implementation.
"""

from fastapi import FastAPI, HTTPException, Depends, Header
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
from typing import Optional
from auth import auth_manager, AuthError
from functions import loadGame, saveGame, playerClass
from config import config
import logging

logger = logging.getLogger(__name__)

app = FastAPI(title="BaoLife API", version="2.0")

# CORS configuration
app.add_middleware(
    CORSMiddleware,
    allow_origins=config.ALLOWED_ORIGINS,
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)


# Request models
class LoginRequest(BaseModel):
    user_id: str
    password: str

class RegisterRequest(BaseModel):
    user_id: str
    password: str
    email: Optional[str] = None


# Authentication dependency
def get_current_user(authorization: str = Header(None)) -> str:
    """Verify JWT token and return user ID"""
    if not authorization:
        raise HTTPException(status_code=401, detail="Missing authorization header")

    try:
        scheme, token = authorization.split()
        if scheme.lower() != 'bearer':
            raise HTTPException(status_code=401, detail="Invalid authentication scheme")

        payload = auth_manager.verify_token(token)
        return payload['user_id']

    except AuthError as e:
        raise HTTPException(status_code=401, detail=str(e))
    except ValueError:
        raise HTTPException(status_code=401, detail="Invalid authorization header format")


# Endpoints

@app.post("/api/login")
async def login(request: LoginRequest):
    """Authenticate user and return token"""
    session = auth_manager.authenticate_user(request.user_id, request.password)

    if not session:
        raise HTTPException(status_code=401, detail="Invalid credentials")

    return session


@app.post("/api/register")
async def register(request: RegisterRequest):
    """Register new user"""
    # TODO: Implement user registration
    # - Validate user_id doesn't exist
    # - Hash password
    # - Create user in database
    # - Return token

    raise HTTPException(status_code=501, detail="Not implemented")


@app.get("/api/game/{user_id}")
async def get_game(user_id: str, current_user: str = Depends(get_current_user)):
    """Get game data for user"""

    # Check authorization
    if user_id != current_user:
        raise HTTPException(status_code=403, detail="Not authorized")

    # Load game
    game_data = loadGame(user_id)

    if not game_data:
        raise HTTPException(status_code=404, detail="Game not found")

    return game_data


@app.get("/api/health")
async def health():
    """Health check endpoint"""
    return {"status": "healthy"}


# Run server
if __name__ == '__main__':
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8003)
```

**Update `ws/requirements.txt`**:
```txt
fastapi==0.104.1
uvicorn==0.24.0
pydantic==2.5.0
```

---

## Files Summary

### New Files Created

**Configuration**:
- `.env.example` - Example environment variables (checked in)
- `.env` - Actual secrets (gitignored)
- `ws/config.py` - Configuration management

**Security**:
- `ws/validators.py` - Input validation
- `ws/auth.py` - Authentication system
- `ws/rate_limiter.py` - Rate limiting

**Scalability**:
- `ws/database.py` - Connection pooling
- `ws/session_manager.py` - Session management

**Observability**:
- `ws/logging_config.py` - Structured logging
- `ws/error_tracking.py` - Error tracking (optional)
- `ws/metrics.py` - Metrics collection
- `ws/health.py` - Health checks

**Architecture**:
- `ws/event_registry.py` - Event registration system
- `ws/api_server.py` - FastAPI REST API

**Documentation**:
- `docs/SECURITY.md` - Security documentation (recommended)
- `docs/DEPLOYMENT.md` - Deployment guide (recommended)

### Modified Files

- `.gitignore` - Add `.env` and secrets
- `ws/requirements.txt` - Add new dependencies
- `ws/functions.py` - Use config, connection pooling, parameterized queries
- `ws/app.py` - Add validation, auth, sessions, logging, metrics, health server
- `ws/conversationEvents.py` - Use config, rate limiting
- `ws/events.py` - Register events with metadata
- `ws/dayEvents.py` - Register events with metadata

---

## Success Criteria

✅ **Security**
- [ ] No hardcoded secrets in codebase
- [ ] All inputs validated and sanitized
- [ ] SQL injection prevented (parameterized queries)
- [ ] Authentication required for all operations
- [ ] Rate limiting on expensive operations

✅ **Scalability**
- [ ] Connection pooling implemented
- [ ] Session management working
- [ ] No global state issues
- [ ] Can handle 100+ concurrent users

✅ **Observability**
- [ ] Structured JSON logging
- [ ] Error tracking configured
- [ ] Metrics collection working
- [ ] Health checks responding

✅ **Production Readiness**
- [ ] Configuration via environment variables
- [ ] Deployment documentation
- [ ] Monitoring setup
- [ ] Graceful shutdown handling

---

## Timeline

### Day 1 (4-6 hours)
- Phase 1: Security hardening
  - Remove secrets
  - Input validation
  - SQL injection fixes
  - Authentication

### Day 2 (3-4 hours)
- Phase 2: Multi-user scalability
  - Connection pooling
  - Session management
  - Refactor global state

### Day 3 (3-4 hours)
- Phase 3: Observability
  - Structured logging
  - Metrics
  - Health checks

### Day 4 (2-4 hours)
- Phase 4 & 5: Event system + API
  - Event registry
  - API migration (if needed)
  - Testing and documentation

**Total: 12-18 hours over 2-4 days**

---

## Dependencies

**Required**: Plan 01 (Testing Infrastructure) - highly recommended for safe refactoring

---

## Risks & Mitigations

| Risk | Impact | Mitigation |
|------|--------|------------|
| Breaking existing clients | High | Maintain backward compatibility, versioned API |
| Authentication adds friction | Medium | Simple token flow, remember devices |
| Performance overhead from logging | Low | Async logging, sampling in production |
| Database migration needed | Medium | Gradual migration, support both old/new |
| Secrets in git history | High | Rotate all secrets after removing from code |

---

## Post-Deployment Checklist

After deploying to production:

✅ Rotate all secrets (database password, API keys)
✅ Enable HTTPS/WSS
✅ Configure monitoring alerts
✅ Set up log aggregation
✅ Test health checks from monitoring system
✅ Verify rate limits work
✅ Load test with 100+ concurrent users
✅ Document incident response procedures

---

## Next Steps

After this plan:
1. **Performance optimization** - Now that we can measure, we can optimize
2. **Advanced features** - Build on secure, scalable foundation
3. **Mobile app improvements** - Better API for iOS app
4. **Analytics & insights** - Use metrics for game design decisions
