Skip to content

Troubleshooting

gomathishankar37 edited this page Feb 6, 2026 · 1 revision

Troubleshooting

This comprehensive guide helps diagnose and resolve common issues with crashupload. Use the quick navigation below to jump to specific problems.

Quick Index

Common Issues:

By Component:

Tools:


Uploads Not Working

Symptom: Dumps not being uploaded

Check list:

  1. Verify dumps exist:
# For broadband/extender
ls -lh /minidumps/*.dmp

# For video devices
ls -lh /opt/minidumps/*.dmp
ls -lh /var/lib/systemd/coredump/core.*
  1. Check service status:
# Systemd services
systemctl status coredump-upload.service
systemctl status minidump-on-bootup-upload.service

# Check if running
ps aux | grep -E 'crashupload|uploadDumps'
  1. Verify network connectivity:
# Ping upload server (extract from config)
UPLOAD_SERVER=$(grep POTOMAC_SVR /etc/device.properties | cut -d= -f2 | sed 's|https\?://||' | cut -d/ -f1)
ping -c 3 "$UPLOAD_SERVER"

# Test DNS resolution
nslookup "$UPLOAD_SERVER"

# Test HTTPS connection
curl -I "https://$UPLOAD_SERVER"
  1. Check logs:
# Main log file
tail -100 /var/log/core_log.txt  # or /rdklogs/logs/core_log.txt

# Systemd journal
journalctl -u coredump-upload.service -n 50
journalctl -u minidump-on-bootup-upload.service -n 50

# Look for errors
grep -i error /var/log/core_log.txt | tail -20

Symptom: "Permission denied" errors

Cause: Insufficient permissions on dump directories or lock files

Solution:

# Fix dump directory permissions
chmod 755 /minidumps /opt/minidumps /var/lib/systemd/coredump
chown root:root /minidumps /opt/minidumps

# Fix lock file permissions
chmod 666 /tmp/.minidump_upload.lock
chmod 666 /tmp/.coredump_upload.lock

# Fix log file permissions
chmod 666 /var/log/core_log.txt

Symptom: "Telemetry opt-out enabled, skipping upload"

Cause: User has opted out of telemetry via RFC

Check:

# Check opt-out status
tr181Set Device.DeviceInfo.X_RDKCENTRAL-COM_RFC.Feature.TelemetryOptOut.Enable

# Check if feature is enabled
grep -r "TelemetryOptOut" /opt/

Solution:

  • This is expected behavior if user opted out
  • To override for testing: export TELEMETRY_OPTOUT=0
  • For production, respect user preference

Symptom: Upload returns HTTP error codes

HTTP Code Meaning Solution
400 Bad Request Check dump file format, verify metadata
401 Unauthorized Verify mTLS certificates if enabled
403 Forbidden Check server-side IP whitelist/firewall
413 Payload Too Large Dump exceeds server limit, check archivecompression
500 Server Error Server-side issue, retry later
503 Service Unavailable Server down/overloaded, retry with backoff

Debug:

# Test upload manually with verbose output
export CURL_VERBOSE=1
export DEBUG=1
/lib/rdk/uploadDumps.sh /opt/minidumps/test.dmp 0

# Check HTTP response code
cat /tmp/httpcode

Rate Limiting Problems

Symptom: "Rate limit exceeded" message

Cause: More than 10 uploads in 10 minutes

Check current state:

# View rate limit history
cat /tmp/.upload_ratelimit

# Count uploads in last 10 minutes
CURRENT=$(date +%s)
CUTOFF=$((CURRENT - 600))
awk -v cutoff="$CUTOFF" '$1 > cutoff' /tmp/.upload_ratelimit | wc -l

Solutions:

  1. Wait for window to expire:
# Find oldest upload in window
OLDEST=$(head -1 /tmp/.upload_ratelimit)
EXPIRES=$((OLDEST + 600))
NOW=$(date +%s)
WAIT=$((EXPIRES - NOW))
echo "Rate limit expires in $WAIT seconds"
  1. Clear rate limit (testing only):
rm /tmp/.upload_ratelimit
  1. Increase limit (configuration):
# Add to /opt/coredump.properties (non-prod)
MAX_UPLOADS=20
RATE_WINDOW=600
  1. Bypass for recovery:
export NO_RATELIMIT=1
/lib/rdk/uploadDumps.sh /opt/minidumps/critical.dmp 0

Symptom: Crash loop detected

Cause: Same process crashing repeatedly (5+ times in 5 minutes)

Check:

# View recent crashes
tail -50 /var/log/core_log.txt | grep "Crash loop"

# Identify looping process
awk '{print $NF}' /tmp/.upload_ratelimit | sort | uniq -c | sort -rn

Solutions:

  1. Fix the crashing application
  2. Temporarily disable rate limiting for debugging
  3. Check for infinite restart loops in systemd

Lock File Issues

Symptom: "Script is already working. Lock file exists"

Cause: Previous instance didn't clean up, or still running

Diagnose:

# Check if process is actually running
LOCK_FILE="/tmp/.minidump_upload.lock"  # or .coredump_upload.lock
if [ -f "$LOCK_FILE" ]; then
    PID=$(cat "$LOCK_FILE")
    echo "Lock file contains PID: $PID"
    
    if ps -p $PID > /dev/null 2>&1; then
        echo "Process $PID is running"
        ps -p $PID -o cmd=
    else
        echo "Process $PID is NOT running (stale lock)"
    fi
fi

Solutions:

  1. If process is running - wait or kill:
# Wait for completion
while [ -f /tmp/.minidump_upload.lock ]; do
    sleep 2
    echo "Waiting for upload to complete..."
done

# Or kill if hung
PID=$(cat /tmp/.minidump_upload.lock)
kill $PID
sleep 2
kill -9 $PID  # Force kill if needed
  1. If stale lock - remove:
rm -f /tmp/.minidump_upload.lock
rm -f /tmp/.coredump_upload.lock
  1. Prevent future stale locks:
# Add trap to cleanup on exit
trap 'rm -f /tmp/.minidump_upload.lock' EXIT INT TERM

Symptom: Lock created but never removed

Cause: Script crashed before cleanup

Check:

# Find crashed instances
journalctl | grep -A5 "crashupload.*killed"
journalctl | grep -A5 "uploadDumps.*terminated"

# Check for signals
dmesg | grep -i "killed process"

Solution: Ensure SIGTERM handler is working:

# Script should have:
trap 'remove_lock "$LOCK_FILE"' EXIT INT TERM SIGTERM

Build Failures

Symptom: "configure: error: libcurl not found"

Cause: Missing development dependencies

Solution:

# Debian/Ubuntu
sudo apt-get update
sudo apt-get install -y \
    build-essential \
    autoconf \
    automake \
    libtool \
    libcurl4-openssl-dev \
    libssl-dev \
    pkg-config

# Yocto/RDK build
bitbake -c populate_sdk rdk-generic-image

Symptom: Compilation errors in C code

Common errors and fixes:

Error Cause Fix
undefined reference to 'curl_easy_init' Missing -lcurl Add to LDFLAGS in Makefile.am
error: 'CURLOPT_SSLVERSION' undeclared Old libcurl Update libcurl >= 7.58.0
warning: implicit declaration of function Missing header Add #include
error: 'for' loop initial declarations Old C standard Use gcc with -std=c99

Debug:

# Verbose compilation
make V=1

# Check compiler version
gcc --version

# Check library versions
pkg-config --modversion libcurl
pkg-config --modversion openssl

Symptom: Autotools errors

Solutions:

# Regenerate autotools files
cd c_sourcecode
autoreconf --install --force

# Clean and reconfigure
make distclean
./configure
make

Test Failures

Symptom: Unit tests fail

Diagnose:

cd unittest

# Run tests with verbose output
make check VERBOSE=1

# Run specific test
./network_utils_gtest --gtest_filter="*GetMACAddress*"

# Check test logs
cat test-suite.log

Common test failures:

  1. Mock failures:
Expected: mock_curl.easy_perform() called 3 times
  Actual: called 2 times

Fix: Update test expectations or implementation

  1. Assertion failures:
Expected: result == 0
  Actual: -1

Fix: Check error paths in implementation

  1. Memory leaks:
LEAK SUMMARY:
   definitely lost: 64 bytes in 1 blocks

Fix: Free allocated memory

Run with valgrind:

valgrind --leak-check=full ./network_utils_gtest

Symptom: Functional tests fail

Check:

# View L2 test logs
cat /tmp/l2_test_report/report.html

# Run specific scenario
cd test/functional-tests
python3 -m pytest tests/test_basic_upload.py -v -s

Common issues:

  • Mock server not running
  • Incorrect test environment setup
  • Network dependencies

Network Issues

Symptom: "curl: (6) Could not resolve host"

Cause: DNS resolution failure

Diagnose:

# Check DNS servers
cat /etc/resolv.conf

# Test DNS
nslookup crash.rdkcentral.com
dig crash.rdkcentral.com

# Check network interface
ifconfig
ip addr show

Solutions:

# Restart networking
/etc/init.d/networking restart

# Use alternate DNS temporarily
echo "nameserver 8.8.8.8" > /etc/resolv.conf

# Check if interface is up
ifconfig erouter0 up  # or appropriate interface

Symptom: "curl: (28) Operation timed out"

Cause: Network unreachable or slow

Diagnose:

# Test connectivity
ping -c 5 crash.rdkcentral.com

# Trace route
traceroute crash.rdkcentral.com

# Test with longer timeout
curl -v --connect-timeout 30 --max-time 120 https://crash.rdkcentral.com

Solutions:

# Increase timeout
export CURL_UPLOAD_TIMEOUT=180

# Or in configuration
echo "CURL_UPLOAD_TIMEOUT=180" >> /opt/coredump.properties

Symptom: "SSL certificate problem: certificate has expired"

Cause: Expired CA certificates or clock skew

Diagnose:

# Check system time
date

# Check certificate validity
openssl s_client -connect crash.rdkcentral.com:443 -showcerts

# Check CA bundle
ls -l /etc/ssl/certs/ca-certificates.crt

Solutions:

# Update CA certificates
update-ca-certificates

# Sync system time
ntpdate pool.ntp.org

# Temporary workaround (NOT for production)
export CURL_OPTIONS="--insecure"

Symptom: "curl: (35) SSL connect error"

Cause: TLS version mismatch or cipher incompatibility

Diagnose:

# Test TLS versions
curl -v --tlsv1.2 https://crash.rdkcentral.com
curl -v --tlsv1.3 https://crash.rdkcentral.com

# Check OpenSSL version
openssl version

Solution:

# Force TLS 1.2
export TLS="--tlsv1.2"

# Update OpenSSL if too old
opkg update && opkg upgrade openssl

Archive/Compression Issues

Symptom: "Archive creation failed"

Diagnose:

# Check available space
df -h /tmp
df -h /minidumps

# Check permissions
ls -ld /tmp
ls -ld /minidumps

# Test compression manually
tar czf /tmp/test.tar.gz /opt/minidumps/*.dmp

Solutions:

  1. No space:
# Clean /tmp
rm -rf /tmp/*.tar.gz /tmp/*.log

# Increase tmpfs size
mount -o remount,size=100M /tmp
  1. Permission denied:
chmod 1777 /tmp  # Sticky bit + world writable
  1. Compression too slow/large:
# Use faster compression
export GZIP=-1  # Faster, less compression

# Or skip compression (testing only)
tar cf archive.tar file.dmp  # No compression

Symptom: Archive missing log files

Cause: breakpad-logmapper.conf not configured

Check:

# Verify logmapper file
cat /etc/breakpad-logmapper.conf

# Check if logs exist
ls -lh /opt/logs/

Solution:

# Create/update logmapper
cat > /etc/breakpad-logmapper.conf << 'EOF'
Receiver|/opt/logs/receiver.log
WPEFramework|/opt/logs/wpeframework.log
xre-receiver|/opt/logs/xre.log
EOF

Configuration Problems

Symptom: Wrong paths being used

Diagnose:

# Check device type
grep DEVICE_TYPE /etc/device.properties

# Verify expected paths
case "$(grep DEVICE_TYPE /etc/device.properties | cut -d= -f2)" in
    broadband)
        echo "Expected: /minidumps, /rdklogs/logs"
        ;;
    video)
        echo "Expected: /opt/minidumps, /var/log"
        ;;
    extender)
        echo "Expected: /minidumps, /var/log/messages"
        ;;
    mediaclient)
        echo "Expected: /opt/minidumps, /opt/logs"
        ;;
esac

Solution:

# Fix device.properties
vi /etc/device.properties
# Set correct DEVICE_TYPE

# Create missing directories
mkdir -p /minidumps /opt/minidumps /rdklogs/logs

Symptom: Configuration not taking effect

Cause: Priority order - higher priority source overriding

Debug:

# Print full configuration resolution
#!/bin/bash
echo "=== Configuration Resolution ==="

# Start with defaults
VALUE="default"
echo "1. Default: $VALUE"

# Load device.properties
if [ -f /etc/device.properties ]; then
    source /etc/device.properties
    echo "2. device.properties: ${SOME_VAR:-$VALUE}"
fi

# Load overrides
if [ -f /opt/coredump.properties ]; then
    source /opt/coredump.properties
    echo "3. coredump.properties: ${SOME_VAR:-$VALUE}"
fi

# Check environment
if [ -n "$SOME_VAR" ]; then
    echo "4. Environment: $SOME_VAR"
fi

echo "Final value: ${SOME_VAR:-$VALUE}"

Performance Issues

Symptom: Uploads taking too long

Measure:

# Time an upload
time /lib/rdk/uploadDumps.sh /opt/minidumps/test.dmp 0

# Profile with timestamps
export DEBUG=1
/lib/rdk/uploadDumps.sh /opt/minidumps/test.dmp 0 2>&1 | while read line; do
    echo "$(date +%T.%3N) $line"
done

Optimize:

  1. Large files: Use compiled binary instead of shell script
  2. Slow compression: Use faster compression level
  3. Network slow: Check bandwidth, use closer server
  4. Many files: Process in batches

Symptom: High memory usage

Check:

# Monitor memory during upload
while true; do
    ps aux | grep -E 'crashupload|uploadDumps' | grep -v grep
    free -h
    sleep 1
done

Solutions:

  • Use compiled binary (4-6MB vs shell 8-10MB)
  • Process smaller batch sizes
  • Increase available RAM if possible

Diagnostic Commands

Complete System Check

#!/bin/bash
# crashupload_diag.sh - Complete diagnostic script

echo "=== Crashupload Diagnostics ==="
echo ""

echo "1. System Information:"
uname -a
cat /etc/device.properties | grep -E "DEVICE_TYPE|MODEL|BUILD_TYPE"
echo ""

echo "2. Dump File Status:"
ls -lh /minidumps/*.dmp 2>/dev/null | head -5 || echo "  No dumps in /minidumps"
ls -lh /opt/minidumps/*.dmp 2>/dev/null | head -5 || echo "  No dumps in /opt/minidumps"
echo ""

echo "3. Service Status:"
systemctl is-active coredump-upload.service 2>/dev/null || echo "  coredump-upload: unknown"
systemctl is-active minidump-on-bootup-upload.service 2>/dev/null || echo "  minidump-upload: unknown"
echo ""

echo "4. Process Status:"
ps aux | grep -E 'crashupload|uploadDumps' | grep -v grep || echo "  No upload process running"
echo ""

echo "5. Lock Files:"
ls -l /tmp/.*.lock 2>/dev/null || echo "  No lock files"
echo ""

echo "6. Rate Limit Status:"
if [ -f /tmp/.upload_ratelimit ]; then
    COUNT=$(wc -l < /tmp/.upload_ratelimit)
    echo "  $COUNT uploads in rate limit file"
else
    echo "  No rate limit file"
fi
echo ""

echo "7. Recent Logs:"
tail -20 /var/log/core_log.txt 2>/dev/null || tail -20 /rdklogs/logs/core_log.txt 2>/dev/null || echo "  No logs found"
echo ""

echo "8. Network Connectivity:"
ping -c 2 8.8.8.8 > /dev/null 2>&1 && echo "  Internet: ✅" || echo "  Internet: ❌"
echo ""

echo "9. Disk Space:"
df -h | grep -E "Filesystem|/tmp|/opt|/minidumps"
echo ""

echo "=== End Diagnostics ==="

Live Upload Debugging

# Watch uploads in real-time
tail -f /var/log/core_log.txt | grep -E "CRASH_UPLOAD|Upload|Error"

# Monitor network traffic
tcpdump -i any -nn 'tcp port 443' -A

# Watch file changes
inotifywait -m -r /minidumps /opt/minidumps

Log Analysis

Key Log Patterns

Successful upload:

[CRASH_UPLOAD] Upload successful: test.dmp (HTTP 200)
[CRASH_UPLOAD] Deleted file: /opt/minidumps/test.dmp

Rate limit hit:

[CRASH_UPLOAD] Rate limit exceeded: 10 uploads in 10 minutes
[CRASH_UPLOAD] Skipping upload for: app.dmp

Network failure:

[CRASH_UPLOAD] curl error (6): Could not resolve host
[CRASH_UPLOAD] Retry 1/3 in 5 seconds...

Crashloop detection:

[CRASH_UPLOAD] Crash loop detected for process 'Receiver'
[CRASH_UPLOAD] 5 crashes in last 5 minutes

Log Parsing Script

#!/bin/bash
# analyze_logs.sh - Parse crashupload logs

LOG_FILE="/var/log/core_log.txt"

echo "=== Crashupload Log Analysis ==="
echo ""

echo "Upload Statistics:"
echo "  Total attempts: $(grep -c "Starting upload" "$LOG_FILE")"
echo "  Successful: $(grep -c "Upload successful" "$LOG_FILE")"
echo "  Failed: $(grep -c "Upload failed" "$LOG_FILE")"
echo ""

echo "Top Error Types:"
grep -i error "$LOG_FILE" | awk '{print $NF}' | sort | uniq -c | sort -rn | head -5
echo ""

echo "Rate Limit Events:"
grep -c "Rate limit exceeded" "$LOG_FILE"
echo ""

echo "Recent Failures (last 10):"
grep "Upload failed" "$LOG_FILE" | tail -10
echo ""

Debug Mode

Enable Full Debug Logging

# Set environment variables
export DEBUG=1
export LOG_LEVEL="DEBUG"
export CURL_VERBOSE=1
export KEEP_TEMP_FILES=1

# Run with debug
/lib/rdk/uploadDumps.sh /opt/minidumps/test.dmp 0

# Or for compiled binary
RDK_LOG_LEVEL=DEBUG /usr/bin/crashupload /opt/minidumps/test.dmp 0

Trace Script Execution

# Shell tracing
bash -x /lib/rdk/uploadDumps.sh /opt/minidumps/test.dmp 0 2>&1 | tee /tmp/trace.log

# System call tracing
strace -o /tmp/strace.log -ff /usr/bin/crashupload /opt/minidumps/test.dmp 0

# Library call tracing
ltrace -o /tmp/ltrace.log /usr/bin/crashupload /opt/minidumps/test.dmp 0

Getting Help

If you've tried the solutions above and still have issues:

  1. Gather diagnostic information:
./crashupload_diag.sh > /tmp/diagnostic_report.txt
  1. Collect relevant logs:
tar czf /tmp/crashupload_logs.tar.gz \
    /var/log/core_log.txt \
    /rdklogs/logs/core_log.txt \
    /tmp/.upload_ratelimit \
    /etc/device.properties
  1. Report the issue:

References


Navigation: Home | Configuration Guide

Clone this wiki locally