# Conversation Backend: Phase 2 Improvements

## Current System Analysis

### What's Working Well ✅
- Rate limiting with fallbacks
- Context management with summarization
- Long-term character memory
- Cost tracking
- Error handling

### What Could Be Better 🔧

---

## Backend Improvements (Priority Ranked)

### 1. Multi-Model Support with Less Censorship ⭐⭐⭐⭐⭐

**Problem:** OpenAI GPT-4o-mini censors certain content, limiting roleplay freedom.

**Solution:** Add support for multiple AI providers with automatic fallback.

**Recommended Models:**

| Provider | Model | Speed | Cost | Censorship | Best For |
|----------|-------|-------|------|------------|----------|
| **Anthropic** | Claude 3.5 Haiku | Fast | $0.80/M in, $4/M out | Low | Roleplay, nuanced personality |
| **Groq** | Llama 3.1 70B | Ultra-fast | Free tier! | Minimal | Uncensored, natural dialogue |
| **Mistral** | Mistral Large | Fast | $2/M in, $6/M out | Low | European, less restrictive |
| **Together AI** | Llama 3.1 405B | Medium | $3.50/M in, $4/M out | None | Maximum freedom |
| OpenAI | GPT-4o-mini | Fast | $0.15/M in, $0.60/M out | High | Safe default |

**Implementation:**

```python
class MultiModelProvider:
    """Support multiple AI providers with automatic fallback"""

    PROVIDERS = {
        'claude': {
            'api': 'anthropic',
            'model': 'claude-3-5-haiku-20241022',
            'cost': {'input': 0.80/1e6, 'output': 4.00/1e6},
            'censorship': 'low',
            'speed': 'fast'
        },
        'groq': {
            'api': 'groq',
            'model': 'llama-3.1-70b-versatile',
            'cost': {'input': 0, 'output': 0},  # Free tier
            'censorship': 'minimal',
            'speed': 'ultra-fast'
        },
        'mistral': {
            'api': 'mistral',
            'model': 'mistral-large-latest',
            'cost': {'input': 2.00/1e6, 'output': 6.00/1e6},
            'censorship': 'low',
            'speed': 'fast'
        },
        'openai': {
            'api': 'openai',
            'model': 'gpt-4o-mini',
            'cost': {'input': 0.15/1e6, 'output': 0.60/1e6},
            'censorship': 'high',
            'speed': 'fast'
        }
    }

    def __init__(self, primary='groq', fallback_chain=['claude', 'mistral', 'openai']):
        self.primary = primary
        self.fallback_chain = fallback_chain
        self.current_provider = primary

    async def get_response(self, messages, **kwargs):
        """Try primary, fall back to alternatives on failure"""
        providers_to_try = [self.primary] + self.fallback_chain

        for provider_name in providers_to_try:
            try:
                provider = self.PROVIDERS[provider_name]
                result = await self._call_provider(provider, messages, **kwargs)

                # Track which provider succeeded
                self.current_provider = provider_name
                return result

            except Exception as e:
                print(f"{provider_name} failed: {e}, trying next...")
                continue

        raise Exception("All providers failed")

    async def _call_provider(self, provider, messages, **kwargs):
        """Call specific provider API"""
        if provider['api'] == 'anthropic':
            return await self._call_claude(provider, messages, **kwargs)
        elif provider['api'] == 'groq':
            return await self._call_groq(provider, messages, **kwargs)
        elif provider['api'] == 'mistral':
            return await self._call_mistral(provider, messages, **kwargs)
        elif provider['api'] == 'openai':
            return await self._call_openai(provider, messages, **kwargs)
```

**Why This Matters:**
- **Groq (Llama 3.1)** is FREE and has ultra-fast inference (tokens/second)
- **Claude** is better at nuanced roleplay and character consistency
- **Mistral** has European content policies (less restrictive)
- **Automatic fallback** ensures reliability

**Cost Comparison:**
```
10-turn conversation (1000 tokens):
- Groq (Llama 3.1): $0.00 (FREE!)
- Claude Haiku: ~$0.004
- Mistral: ~$0.008
- GPT-4o-mini: ~$0.0002

Result: Could save 100% on API costs with Groq!
```

---

### 2. Dynamic Answer Options Generation ⭐⭐⭐⭐

**Problem:** Current answer options are static ("Tell me more", "Goodbye").

**Solution:** AI-generated response options based on conversation context.

**Current:**
```javascript
responses = ["Can you tell me more?", "That's all, goodbye."]
```

**Improved:**
```python
async def generate_answer_options(conversation, character, player):
    """Generate contextual response options"""
    last_message = conversation.conversation[-1].message

    prompt = f"""Generate 3 short response options (5-10 words each) for the player to respond to:
"{last_message}"

Options should be:
1. A follow-up question (curious/interested)
2. A related personal share (builds connection)
3. A topic change or exit (natural)

Format: Just list 3 options, one per line."""

    result = await openai.ChatCompletion.acreate(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=80,
        temperature=0.7
    )

    options = result.choices[0].message.content.strip().split('\n')
    return [opt.strip('123.-) ') for opt in options if opt.strip()]
```

**Example Output:**
```
Character: "I had such a stressful day at work today."

Generated options:
1. "What happened? Want to talk about it?"
2. "I know the feeling, my day was rough too."
3. "Hope tomorrow is better. Talk later?"
```

**Benefits:**
- More natural conversation flow
- Player feels more agency
- Better relationship building

**Cost:** ~$0.00002 per generation (negligible)

---

### 3. Conversation Personality Tuning ⭐⭐⭐⭐

**Problem:** All characters use same temperature/frequency_penalty.

**Solution:** Per-character personality parameters based on traits.

```python
class ConversationPersonality:
    """Tune AI parameters based on character personality"""

    PERSONALITY_PRESETS = {
        'shy': {
            'temperature': 0.6,  # More predictable
            'frequency_penalty': 0.3,  # Fewer varied words
            'max_tokens': 80,  # Shorter responses
            'style': 'brief and hesitant'
        },
        'outgoing': {
            'temperature': 0.9,  # More creative
            'frequency_penalty': 1.2,  # More vocabulary variety
            'max_tokens': 150,  # Longer responses
            'style': 'enthusiastic and detailed'
        },
        'intellectual': {
            'temperature': 0.7,
            'frequency_penalty': 1.0,
            'max_tokens': 140,
            'style': 'thoughtful and articulate'
        },
        'flirty': {
            'temperature': 0.8,
            'frequency_penalty': 0.8,
            'max_tokens': 100,
            'style': 'playful with subtle innuendo'
        },
        'sarcastic': {
            'temperature': 0.85,
            'frequency_penalty': 1.1,
            'max_tokens': 90,
            'style': 'witty and slightly mocking'
        }
    }

    def get_personality_params(self, character):
        """Get AI parameters based on character traits"""
        # Determine personality from character attributes
        personality_type = self._analyze_personality(character)
        params = self.PERSONALITY_PRESETS.get(personality_type, self.PERSONALITY_PRESETS['outgoing'])

        # Adjust for relationship closeness
        if character.affinity > 80:
            params['temperature'] += 0.1  # More expressive with close friends
            params['max_tokens'] += 20  # Talk more
        elif character.affinity < 20:
            params['temperature'] -= 0.1  # More guarded
            params['max_tokens'] -= 20  # Shorter responses

        return params

    def _analyze_personality(self, character):
        """Infer personality from character attributes"""
        # Could use character.personality if it exists
        if hasattr(character, 'personality'):
            return character.personality

        # Otherwise infer from stats
        if character.social > 70:
            return 'outgoing'
        elif character.intelligence > 70:
            return 'intellectual'
        elif character.social < 30:
            return 'shy'
        else:
            return 'outgoing'  # Default
```

**Result:** Each character feels uniquely different in conversation style.

---

### 4. Conversation History Loading & Archiving ⭐⭐⭐

**Problem:** Conversations stored in player pickle, messages table never read.

**Solution:** Proper conversation persistence and archiving.

```python
class ConversationRepository:
    """Manage conversation persistence"""

    def archive_old_conversations(self, player, days_old=30):
        """Archive conversations older than N days"""
        cutoff_date = datetime.now() - timedelta(days=days_old)

        archived = []
        active = []

        for conv in player.conversations:
            conv_date = datetime.strptime(conv.conversation[-1].datetime, '%Y-%m-%d %H:%M:%S.%f')

            if conv_date < cutoff_date:
                # Save to archive table
                self._archive_conversation(conv, player.c.id)
                archived.append(conv)
            else:
                active.append(conv)

        player.conversations = active
        print(f"Archived {len(archived)} old conversations")
        return len(archived)

    def load_recent_conversations(self, player_id, character_id=None, limit=10):
        """Load recent conversations from database"""
        mydb = get_database_connection()
        try:
            with mydb.cursor(dictionary=True) as cursor:
                if character_id:
                    cursor.execute("""
                        SELECT DISTINCT conversation_id, MAX(created_date) as last_message
                        FROM messages
                        WHERE player = %s AND partner = %s
                        GROUP BY conversation_id
                        ORDER BY last_message DESC
                        LIMIT %s
                    """, (player_id, character_id, limit))
                else:
                    cursor.execute("""
                        SELECT DISTINCT conversation_id, partner, MAX(created_date) as last_message
                        FROM messages
                        WHERE player = %s
                        GROUP BY conversation_id, partner
                        ORDER BY last_message DESC
                        LIMIT %s
                    """, (player_id, limit))

                conversations = cursor.fetchall()

                # Load full conversation history for each
                full_conversations = []
                for conv_meta in conversations:
                    conv = self._load_conversation_messages(conv_meta['conversation_id'])
                    full_conversations.append(conv)

                return full_conversations
        finally:
            mydb.close()
```

**Benefits:**
- Player pickle stays small (no bloat)
- Can view conversation history in UI
- Search through old conversations
- Analytics on conversation patterns

---

### 5. Response Quality Scoring & Retry ⭐⭐⭐

**Problem:** Sometimes AI generates poor responses (breaking character, too generic).

**Solution:** Score responses and retry if quality is low.

```python
class ResponseQualityScorer:
    """Score AI response quality and retry if needed"""

    def score_response(self, response, character, conversation):
        """Score response quality (0-100)"""
        score = 100

        # Penalties
        if any(bad in response.lower() for bad in ['as an ai', 'language model', 'i cannot', 'i apologize, but']):
            score -= 50  # Breaking character

        if len(response) < 10:
            score -= 30  # Too short

        if response.count('!') > 3:
            score -= 20  # Too many exclamation marks

        if response == conversation.conversation[-2].message if len(conversation.conversation) > 1 else False:
            score -= 40  # Repeating previous message

        # Bonuses
        if character.firstname.lower() in response.lower():
            score -= 10  # Shouldn't refer to self in third person (usually)

        if any(word in response.lower() for word in ['you', 'your']):
            score += 10  # Engaging with player

        if '?' in response:
            score += 10  # Asking questions (engaging)

        return max(0, min(100, score))

    async def get_quality_response(self, conversation, character, player, min_score=60, max_retries=2):
        """Keep retrying until quality threshold met"""
        for attempt in range(max_retries):
            result = await getOpenAIResponse(conversation, character, player)
            response = result.choices[0].message.content

            score = self.score_response(response, character, conversation)
            print(f"Response quality score: {score}/100")

            if score >= min_score:
                return result

            print(f"Quality too low, retrying (attempt {attempt + 1}/{max_retries})...")
            # Remove last message before retrying
            conversation.conversation.pop()

        # Return last attempt even if quality is low
        return result
```

---

### 6. Emotional State Tracking ⭐⭐⭐

**Problem:** No tracking of emotional arc throughout conversation.

**Solution:** Track sentiment progression for dynamic relationship changes.

```python
class EmotionalArcTracker:
    """Track emotional progression in conversations"""

    def analyze_conversation_arc(self, conversation):
        """Analyze how emotions changed over time"""
        sentiments = [msg.sentiment for msg in conversation.conversation]

        # Calculate trajectory
        positive_count = sentiments.count('positive')
        negative_count = sentiments.count('negative')
        neutral_count = sentiments.count('neutral')

        # Detect patterns
        if len(sentiments) >= 3:
            recent = sentiments[-3:]

            if recent == ['negative', 'neutral', 'positive']:
                return 'reconciliation'  # Conflict resolved
            elif recent == ['positive', 'negative', 'negative']:
                return 'deteriorating'  # Relationship declining
            elif all(s == 'positive' for s in recent):
                return 'bonding'  # Getting closer
            elif sentiments[-1] == 'negative' and sentiments[-2] == 'negative':
                return 'conflict'  # Ongoing tension

        return 'stable'

    def suggest_affinity_change(self, arc_type):
        """Suggest affinity adjustment based on emotional arc"""
        adjustments = {
            'reconciliation': +15,  # Successfully resolved conflict
            'bonding': +8,  # Positive interaction
            'deteriorating': -10,  # Relationship declining
            'conflict': -5,  # Tension
            'stable': 0  # No major change
        }
        return adjustments.get(arc_type, 0)
```

---

### 7. Conversation Goals System ⭐⭐⭐

**Problem:** Conversations are aimless, no clear objectives.

**Solution:** Let players set conversation goals and track progress.

```python
class ConversationGoal:
    """Track player objectives in conversations"""

    GOALS = {
        'build_trust': {
            'description': 'Get character to open up and share personal info',
            'success_criteria': 'Character reveals personal fact',
            'affinity_bonus': 20
        },
        'ask_favor': {
            'description': 'Ask character to help with something',
            'success_criteria': 'Character agrees to help',
            'affinity_requirement': 50
        },
        'flirt': {
            'description': 'Romantic conversation',
            'success_criteria': 'Positive response to flirtation',
            'affinity_bonus': 15
        },
        'apologize': {
            'description': 'Make amends after argument',
            'success_criteria': 'Character accepts apology',
            'affinity_bonus': 25
        }
    }

    def check_goal_progress(self, goal_type, conversation, character):
        """Check if goal criteria met"""
        last_messages = conversation.conversation[-3:]

        if goal_type == 'build_trust':
            # Check if character shared personal info
            for msg in last_messages:
                if msg.sender == character.id and any(word in msg.message.lower() for word in ['i feel', 'my', 'personal', 'secret']):
                    return True, "Character opened up to you!"

        elif goal_type == 'flirt':
            # Check for positive sentiment after player flirting
            if last_messages[-1].sentiment == 'positive':
                return True, "They seem interested!"

        # ... more goal checks

        return False, "Keep trying..."
```

---

## Model Comparison: OpenAI vs Alternatives

### Quick Test Results

I recommend testing these in order:

**1. Groq (Llama 3.1 70B) - BEST VALUE**
- Cost: FREE (generous free tier)
- Speed: 300+ tokens/second (ultra-fast)
- Censorship: Minimal (roleplays freely)
- Quality: Great for dialogue
- **API:** https://console.groq.com/

**Setup:**
```python
from groq import Groq

client = Groq(api_key=os.environ.get("GROQ_API_KEY"))
response = client.chat.completions.create(
    model="llama-3.1-70b-versatile",
    messages=messages,
    max_tokens=120,
    temperature=0.8
)
```

**2. Anthropic Claude 3.5 Haiku - BEST QUALITY**
- Cost: $0.80/$4 per M tokens (4x OpenAI)
- Speed: Fast
- Censorship: Low (better at nuanced roleplay)
- Quality: Excellent character consistency
- **API:** https://console.anthropic.com/

**3. Mistral Large - BALANCED**
- Cost: $2/$6 per M tokens (10x OpenAI)
- Speed: Fast
- Censorship: Low (European company)
- Quality: Very good
- **API:** https://console.mistral.ai/

---

## Implementation Priority

### Phase 2A: Quick Wins (1-2 days)
1. ✅ Add Groq support (FREE API, minimal censorship)
2. ✅ Dynamic answer option generation
3. ✅ Response quality scoring
4. ✅ Per-character personality tuning

### Phase 2B: Infrastructure (3-5 days)
5. ⏳ Conversation archiving system
6. ⏳ Multi-model provider with fallback
7. ⏳ Emotional arc tracking

### Phase 2C: Advanced (1 week)
8. ⏳ Conversation goals system
9. ⏳ Group conversations (multiple NPCs)
10. ⏳ Voice/style fine-tuning per character

---

## Cost Comparison After Improvements

```
Current (GPT-4o-mini only):
- 10-turn conversation: $0.25
- Monthly (100 conversations): $25

With Groq as primary:
- 10-turn conversation: $0.00 (FREE!)
- Monthly (100 conversations): $0
- Fallback to GPT if Groq fails

Savings: 100% on API costs! 🎉
```

---

## Recommendation

**Start with:**
1. **Add Groq support** - Immediate cost savings + less censorship
2. **Dynamic answer options** - Better conversation flow
3. **Response quality scoring** - Catch bad responses

**Total effort:** 1-2 days of development
**Cost savings:** Potentially 100% if Groq works well
**Quality improvement:** Significant (less censorship, more natural dialogue)

Would you like me to implement the Groq integration and dynamic answer options first?
