Troubleshooting Guide¶

This guide helps you diagnose and fix common issues with Sleepless Agent.

Quick Diagnostics¶

Run these commands first to identify issues:

# Check system status
sle check

# Verify Claude Code CLI
claude --version
claude /usage

# Check daemon status
ps aux | grep "sle daemon"

# Review recent logs
tail -50 workspace/data/agent.log | grep ERROR

Common Issues¶

Agent Not Starting¶

Symptom¶

Error: Failed to start daemon

Solutions¶

Check Python version:
```
python --version  # Must be 3.11+
```

Verify dependencies:

pip install --upgrade sleepless-agent
pip list | grep -E "anthropic|slack-sdk|sqlalchemy"

Check workspace permissions:

ls -la workspace/
chmod -R 755 workspace/

Reset database:
```
rm workspace/data/tasks.db
sle init
```

Slack Bot Not Responding¶

Symptom¶

Slash commands don't trigger any response in Slack.

Solutions¶

Verify Socket Mode:
Go to Slack App settings
Settings → Socket Mode → Should be ON
Regenerate app token if needed

Check tokens:

# Verify tokens in .env
cat .env | grep SLACK
# SLACK_BOT_TOKEN should start with xoxb-
# SLACK_APP_TOKEN should start with xapp-

Test bot connection:

# Check if bot is running
sle daemon --test-slack

Restart bot:
```
pkill -f "sle daemon"
sle daemon
```

Tasks Not Executing¶

Symptom¶

Tasks stay in "pending" status and never execute.

Solutions¶

Check Claude Code authentication:
```
claude login
claude /usage
```

Verify usage limits:

sle check
# Look for "Pro Usage" - if at threshold, wait for reset

Check task status:

# View pending tasks
sqlite3 workspace/data/tasks.db \
  "SELECT id, description, status FROM tasks WHERE status='pending';"

Force task execution:

# Restart daemon with debug logging
SLEEPLESS_LOG_LEVEL=DEBUG sle daemon

Usage Threshold Reached¶

Symptom¶

WARNING: Pro usage at 85%, pausing task generation

Solutions¶

Check current usage:
```
claude /usage
sle check
```
Wait for window reset:
Pro plan resets every 5 hours
Check logs for exact reset time

Adjust thresholds (if needed):

# In config.yaml
claude_code:
  threshold_day: 30.0   # Increase daytime limit
  threshold_night: 90.0  # Increase nighttime limit

Clear completed tasks:

# Archive old tasks to reduce overhead
sle archive --before 7d

Database Locked¶

Symptom¶

sqlite3.OperationalError: database is locked

Solutions¶

Stop all processes:

pkill -f sleepless
pkill -f "sle daemon"

Check for hung processes:

lsof workspace/data/tasks.db
# Kill any processes still using the database

Reset database (last resort):

cp workspace/data/tasks.db workspace/data/tasks.db.backup
rm workspace/data/tasks.db
sle init

Git Integration Issues¶

Symptom¶

Commits or PRs are not being created.

Solutions¶

Configure Git user:

git config --global user.name "Sleepless Agent"
git config --global user.email "agent@sleepless.local"

Authenticate GitHub CLI:
```
gh auth login
gh auth status
```

Check repository configuration:

# In config.yaml
git:
  use_remote_repo: true
  remote_repo_url: git@github.com:username/repo.git

Test Git operations:

cd workspace/projects/test
git remote -v
git push --dry-run

High Memory Usage¶

Symptom¶

Agent consuming excessive RAM.

Solutions¶

Check running tasks:
```
ps aux | grep claude
sle check
```

Clean up workspaces:

# Remove old task workspaces
find workspace/tasks -type d -mtime +7 -exec rm -rf {} +

Limit concurrent tasks:

# In config.yaml
agent:
  max_concurrent_tasks: 2

Restart daemon:
```
pkill -f "sle daemon"
sle daemon
```

Tasks Failing¶

Symptom¶

Tasks consistently marked as "failed".

Solutions¶

Check error logs:

# View recent errors
grep ERROR workspace/data/agent.log | tail -20

# Check specific task
sle report <task_id>

Increase timeout:

# In config.yaml
agent:
  task_timeout_seconds: 3600  # Increase to 1 hour

Test Claude Code directly:

# Try a simple command
echo "print('test')" | claude

Check workspace permissions:

ls -la workspace/tasks/
chmod -R 755 workspace/

Performance Issues¶

Slow Task Execution¶

Check system resources:

top -p $(pgrep -f "sle daemon")
df -h workspace/

Optimize database:

sqlite3 workspace/data/tasks.db "VACUUM;"
sqlite3 workspace/data/tasks.db "ANALYZE;"

Clear old logs:

# Archive old logs
mv workspace/data/agent.log workspace/data/agent.log.old

Queue Backlog¶

View queue status:
```
sle check
```

Cancel stuck tasks:

# Cancel old pending tasks
sle cancel --status pending --older-than 24h

Prioritize important tasks:

# Move task to front of queue
sle prioritize <task_id>

Log Analysis¶

Enable Debug Logging¶

# Temporary debug mode
SLEEPLESS_LOG_LEVEL=DEBUG sle daemon

# Or in .env
SLEEPLESS_LOG_LEVEL=DEBUG

Common Log Patterns¶

# Find authentication issues
grep -i "auth\|token\|login" workspace/data/agent.log

# Find task failures
grep "status.*failed" workspace/data/agent.log

# Find usage warnings
grep -i "usage\|threshold\|limit" workspace/data/agent.log

# Find Slack errors
grep -i "slack.*error" workspace/data/agent.log

Log Rotation¶

# Set up log rotation
cat > /etc/logrotate.d/sleepless-agent << EOF
workspace/data/agent.log {
    daily
    rotate 7
    compress
    delaycompress
    notifempty
    create 644 $USER $USER
}
EOF

System Checks¶

Health Check Script¶

Create health_check.sh:

#!/bin/bash

echo "=== Sleepless Agent Health Check ==="
echo

# Check daemon
if pgrep -f "sle daemon" > /dev/null; then
    echo "✅ Daemon is running"
else
    echo "❌ Daemon is NOT running"
fi

# Check Claude Code
if claude --version > /dev/null 2>&1; then
    echo "✅ Claude Code CLI installed"
else
    echo "❌ Claude Code CLI missing"
fi

# Check database
if [ -f "workspace/data/tasks.db" ]; then
    echo "✅ Database exists"
    TASK_COUNT=$(sqlite3 workspace/data/tasks.db "SELECT COUNT(*) FROM tasks;" 2>/dev/null)
    echo "   Total tasks: $TASK_COUNT"
else
    echo "❌ Database missing"
fi

# Check Slack tokens
if [ -f ".env" ]; then
    if grep -q "SLACK_BOT_TOKEN=xoxb" .env; then
        echo "✅ Slack bot token configured"
    else
        echo "❌ Slack bot token missing"
    fi
else
    echo "❌ .env file missing"
fi

# Check disk space
WORKSPACE_SIZE=$(du -sh workspace 2>/dev/null | cut -f1)
echo "📊 Workspace size: $WORKSPACE_SIZE"

echo
echo "Run 'sle check' for detailed status"

Recovery Procedures¶

Complete Reset¶

# Backup current data
tar -czf workspace_backup.tar.gz workspace/

# Stop everything
pkill -f sleepless
pkill -f claude

# Clean workspace
rm -rf workspace/
rm .env

# Reinstall
pip install --upgrade sleepless-agent

# Reconfigure
sle init
cp .env.example .env
# Edit .env with your tokens

# Restart
sle daemon

Restore from Backup¶

# Stop daemon
pkill -f "sle daemon"

# Restore backup
tar -xzf workspace_backup.tar.gz

# Verify integrity
sqlite3 workspace/data/tasks.db "PRAGMA integrity_check;"

# Restart
sle daemon

Getting Help¶

If these solutions don't resolve your issue:

Collect diagnostic information:
```
sle diagnose > diagnostic_report.txt
```
Check existing issues:
GitHub Issues
Join the community:
Discord Server
Report new issue:
Include diagnostic report
Describe steps to reproduce
Attach relevant log excerpts

Prevention Tips¶

Regular maintenance:
Clean old task workspaces weekly
Archive completed tasks monthly
Rotate logs daily
Monitor resources:
Set up disk space alerts
Monitor memory usage
Track task success rates
Keep updated:
Update Sleepless Agent regularly
Update Claude Code CLI
Update dependencies
Backup important data:
Database backups
Configuration backups
Result archives