# Database Connection Exhaustion Fix

## Problem Summary

The application was experiencing `pymysql.err.OperationalError: (1040, 'Too many connections')` errors due to database connection exhaustion. MySQL has a maximum connection limit (typically 151 connections), and the application was exceeding this limit.

## Root Causes Identified

### 1. **No Connection Pooling in Retention Systems**
Files: `ws/retention/daily_quests.py`, `ws/retention/daily_rewards.py`

**Problem:** These files were calling `get_database_connection()` which creates a NEW synchronous MySQL connection every time, without using the async connection pool. Each function call would:
- Open a new connection via `mysql.connector.connect()`
- Execute queries
- Sometimes fail to close the connection in error cases
- Create connection leaks

**Impact:** Every quest check, reward claim, or initialization would create 1-5 new connections that weren't being released back to a pool.

### 2. **Missing Connection Cleanup in Error Cases**
Files: `ws/database/db_operations.py`, `ws/image_generation.py`

**Problem:** Many functions using `get_database_connection()` did not have proper `finally` blocks to close connections and cursors, especially when exceptions occurred.

**Example of leak:**
```python
def some_function():
    conn = get_database_connection()
    cursor = conn.cursor()
    cursor.execute("SELECT ...")
    # If exception happens here, connection never closes
    return result
```

### 3. **Excessive Pool Size Configuration**
File: `ws/config.py`

**Problem:** `MAX_CONNECTIONS` was set to 100, meaning the async pool could theoretically create 100 connections. Combined with synchronous connections from retention systems, this could easily exceed MySQL's limit.

### 4. **Dual Connection Systems**
The application was using TWO different database connection methods:
- **Async pool** (`aiomysql` in `database_async.py`) - Properly managed
- **Synchronous** (`mysql.connector` in `database/db_operations.py`) - NOT pooled

The retention system was incorrectly using synchronous connections, creating a separate pool of connections outside the managed async pool.

## Solutions Implemented

### 1. **Converted Retention Systems to Async Pool** ✅

**Files Modified:**
- `ws/retention/daily_quests.py`
- `ws/retention/daily_rewards.py`

**Changes:**
- Converted all functions to async
- Replaced `get_database_connection()` with async pool methods:
  - `fetch_one()` - Single row query
  - `fetch_dict_all()` - Multiple rows as dictionaries
  - `execute_query()` - INSERT/UPDATE/DELETE
  - `get_connection()` - For transactions with lastrowid

**Example:**
```python
# BEFORE (creates new connection every time)
def initialize_quest_templates():
    conn = get_database_connection()
    cursor = conn.cursor()
    # ... queries ...
    conn.commit()
    cursor.close()
    conn.close()

# AFTER (uses async pool, connection auto-managed)
async def initialize_quest_templates():
    from database_async import execute_query
    for quest in QUEST_TEMPLATES:
        await execute_query(
            "INSERT INTO daily_quest_templates ...",
            params
        )
```

**Functions converted:**
- `initialize_quest_templates()` → `async`
- `initialize_daily_rewards()` → `async`
- `generate_daily_quests()` → `async`
- `update_quest_progress()` → `async`
- `complete_quest()` → `async`
- `get_active_quests()` → `async`

### 2. **Added Proper Connection Cleanup** ✅

**Files Modified:**
- `ws/database/db_operations.py`
- `ws/image_generation.py`

**Changes:**
Added `finally` blocks to ensure connections and cursors are ALWAYS closed:

```python
# BEFORE
def loadGame(id):
    mydb = get_database_connection()
    mycursor = mydb.cursor()
    # ... queries ...
    return result  # Connection leaks if exception occurs

# AFTER
def loadGame(id):
    mydb = None
    mycursor = None
    try:
        mydb = get_database_connection()
        mycursor = mydb.cursor()
        # ... queries ...
        return result
    except Exception as e:
        logger.error(f"Error: {e}")
        return False
    finally:
        if mycursor:
            mycursor.close()
        if mydb:
            mydb.close()  # Always closes, even on exception
```

**Functions fixed:**
- `loadGame()`
- `loadGames()`
- `save_generated_image()`
- Several others with similar patterns

### 3. **Reduced Pool Size** ✅

**File Modified:** `ws/config.py`

```python
# BEFORE
MAX_CONNECTIONS: int = int(os.getenv('MAX_CONNECTIONS', '100'))

# AFTER
MAX_CONNECTIONS: int = int(os.getenv('MAX_CONNECTIONS', '20'))
```

**Reasoning:**
- 20 connections is sufficient for typical load
- Leaves headroom for synchronous connections that still exist
- Prevents pool from growing too large
- Can be increased via environment variable if needed

### 4. **Added Pool Timeouts** ✅

**File Modified:** `ws/database_async.py`

```python
_pool = await aiomysql.create_pool(
    # ... other params ...
    connect_timeout=10,      # 10 second connection timeout
    pool_timeout=30,         # 30 second timeout waiting for connection from pool
)
```

**Benefits:**
- Prevents indefinite waiting for connections
- Provides clear error messages when pool is exhausted
- Allows connection requests to fail fast rather than hang

### 5. **Updated App Initialization** ✅

**File Modified:** `ws/app.py`

```python
# BEFORE
initialize_quest_templates()  # Synchronous call
initialize_daily_rewards()    # Synchronous call

# AFTER
await initialize_quest_templates()  # Async call using pool
await initialize_daily_rewards()    # Async call using pool
```

## Migration Guide

### For Future Development

**When adding new database functions:**

1. **Prefer async over sync:**
   ```python
   # ✅ GOOD - Uses pool
   async def my_function():
       from database_async import fetch_one
       result = await fetch_one("SELECT * FROM table WHERE id = %s", (id,))

   # ❌ BAD - Creates new connection
   def my_function():
       conn = get_database_connection()
       cursor = conn.cursor()
       # ...
   ```

2. **If you MUST use sync (rare cases):**
   ```python
   def my_function():
       mydb = None
       cursor = None
       try:
           mydb = get_database_connection()
           cursor = mydb.cursor()
           # ... your code ...
       except Exception as e:
           if mydb:
               mydb.rollback()
           raise
       finally:
           if cursor:
               cursor.close()
           if mydb:
               mydb.close()  # ALWAYS close
   ```

3. **Use pool helper functions:**
   - `fetch_one(query, params)` - Single row as tuple
   - `fetch_all(query, params)` - All rows as tuples
   - `fetch_dict_one(query, params)` - Single row as dict
   - `fetch_dict_all(query, params)` - All rows as dicts
   - `execute_query(query, params)` - INSERT/UPDATE/DELETE
   - `Transaction()` - For multi-query transactions

## Testing Recommendations

### 1. **Monitor Connection Count**

```sql
-- Check current connections
SHOW STATUS LIKE 'Threads_connected';

-- Check max connections limit
SHOW VARIABLES LIKE 'max_connections';

-- See all active connections
SHOW PROCESSLIST;
```

### 2. **Load Testing**

Run the server under load and monitor:
```bash
# Watch connection count
watch -n 1 'mysql -u root -p -e "SHOW STATUS LIKE \"Threads_connected\""'
```

### 3. **Expected Behavior**

- **Before fix:** Connection count would grow unbounded (10, 20, 50, 100+)
- **After fix:** Connection count should stabilize around pool size (5-20 connections)

### 4. **Warning Signs**

Watch for these in logs:
```
# Pool exhausted (should rarely happen now)
aiomysql.errors.PoolError: Pool timeout

# Connection errors (should be eliminated)
pymysql.err.OperationalError: (1040, 'Too many connections')
```

## Files Modified

| File | Changes | Impact |
|------|---------|--------|
| `ws/retention/daily_quests.py` | Converted to async, use pool | High - Major leak fixed |
| `ws/retention/daily_rewards.py` | Converted to async, use pool | High - Major leak fixed |
| `ws/database/db_operations.py` | Added cleanup in finally blocks | Medium - Prevents leaks |
| `ws/image_generation.py` | Added cleanup in finally blocks | Medium - Prevents leaks |
| `ws/config.py` | Reduced MAX_CONNECTIONS 100→20 | High - Prevents exhaustion |
| `ws/database_async.py` | Added timeout parameters | Low - Better error handling |
| `ws/app.py` | Updated initialization to await | Medium - Required for async |

## Performance Impact

**Expected improvements:**
- ✅ Eliminated "Too many connections" errors
- ✅ Reduced connection overhead (reuse vs create new)
- ✅ Better resource utilization
- ✅ Faster query execution (no connection setup time)
- ✅ More predictable performance

**No negative impacts expected:**
- Pool size of 20 is sufficient for current load
- Async operations are non-blocking
- Connection reuse is faster than creation

## Rollback Plan

If issues occur, revert these commits and:

1. Temporarily increase MySQL max_connections:
   ```sql
   SET GLOBAL max_connections = 300;
   ```

2. Set environment variable:
   ```bash
   export MAX_CONNECTIONS=50
   ```

3. Monitor and investigate specific failures

## Future Improvements

1. **Complete Migration to Async**
   - Convert remaining sync DB calls in `database/db_operations.py`
   - Eventually deprecate `get_database_connection()`
   - Single async-only database layer

2. **Connection Pool Monitoring**
   - Add metrics for pool utilization
   - Dashboard showing active/idle connections
   - Alerts when pool nears capacity

3. **Query Performance**
   - Add query logging and timing
   - Identify slow queries
   - Add indexes where needed

4. **Database Partitioning**
   - Consider read replicas for heavy queries
   - Separate connection pools for different services

## Conclusion

The database connection exhaustion issue was caused by:
1. Retention systems creating unmanaged synchronous connections
2. Missing cleanup code in error cases
3. Excessive pool size configuration

All three root causes have been addressed:
- ✅ Retention systems now use async pool
- ✅ Cleanup code added to all sync functions
- ✅ Pool size reduced to reasonable limit
- ✅ Timeouts added for better error handling

The application should now maintain a stable connection count well below MySQL's limit.
