Understanding Results

This guide explains the structure of Magika.jl results and how to interpret them effectively.

Result Structure Overview

A MagikaResult contains three main components:

Path information: The file path or identifier that was analyzed
Status: Whether the identification was successful
Prediction details (if successful): Comprehensive information about the detected content type

Checking Result Status

Always check the status before accessing prediction details:

result = identify_path(m, "file.txt")

if is_ok(result)
    # Safe to access prediction details
    println("Detection successful!")
else
    println("Detection failed with status: $(result.status)")
end

Status Types

Magika.jl defines several status codes:

OK: Detection was successful
FILE_NOT_FOUND_ERROR: The file path doesn't exist
PERMISSION_ERROR: Unable to access the file due to permissions
UNKNOWN_ERROR: An unexpected error occurred

Prediction Details

When status is OK, the result.prediction field contains:

1. Deep Learning Output (`dl`)

The raw prediction from the neural network model:

label: The content type label (e.g., python, jpeg)
description: Human-readable description (e.g., "Python source")
mime_type: Standard MIME type (e.g., "text/x-python")
group: Content category (e.g., "code", "document", "image")
extensions: Common file extensions for this type
is_text: Whether this is a text-based format

2. Final Output (`output`)

The final content type after applying confidence thresholds and overwrite rules:

Same fields as dl, but potentially modified based on confidence

3. Confidence Score

score: A float between 0.0 and 1.0 indicating prediction confidence
Higher scores indicate more reliable predictions

4. Overwrite Reason

Explains why the final output might differ from the raw model prediction:

NONE: No overwrite occurred
LOW_CONFIDENCE: Downgraded due to low confidence score
OVERWRITE_MAP: Changed according to predefined mapping rules

Example: Complete Result Analysis

function analyze_result(result::MagikaResult)
    println("Path: $(result.path)")
    
    if !is_ok(result)
        println("Status: $(result.status)")
        return
    end
    
    pred = result.prediction
    
    println("\nRaw model prediction (dl):")
    println("  Label: $(pred.dl.label)")
    println("  Description: $(pred.dl.description)")
    println("  MIME type: $(pred.dl.mime_type)")
    println("  Group: $(pred.dl.group)")
    println("  Extensions: $(join(pred.dl.extensions, \", \"))")
    println("  Is text: $(pred.dl.is_text)")
    
    println("\nFinal output:")
    println("  Label: $(pred.output.label)")
    println("  Description: $(pred.output.description)")
    
    println("\nConfidence score: $(pred.score)")
    
    println("\nOverwrite reason: $(pred.overwrite_reason)")
    if pred.overwrite_reason != NONE
        println("  Note: The raw prediction was modified because of low confidence")
        println("  or predefined mapping rules. Consider the confidence score")
        println("  when making decisions based on this result.")
    end
    
    # Practical interpretation
    if pred.score > 0.95
        println("\nInterpretation: Very high confidence prediction")
    elseif pred.score > 0.85
        println("\nInterpretation: High confidence prediction")
    elseif pred.score > 0.7
        println("\nInterpretation: Medium confidence - generally reliable")
    else
        println("\nInterpretation: Low confidence - consider verification")
    end
    
    # Security considerations
    if pred.overwrite_reason == LOW_CONFIDENCE && pred.output.is_text
        println("\nSecurity note: This was originally detected as binary content")
        println("but downgraded to generic text due to low confidence.")
        println("Exercise caution when processing this file.")
    end
end

# Usage example
result = identify_path(MagikaConfig(), "unknown_file")
analyze_result(result)

Common Result Patterns

1. High-Confidence Detection

# Raw prediction and final output are the same
# High confidence score (>0.9)
# Overwrite reason is NONE

2. Low-Confidence Text File

# Raw prediction might be "python" but final output is "txt"
# Confidence score is medium (0.6-0.8)
# Overwrite reason is LOW_CONFIDENCE

3. Low-Confidence Binary File

# Raw prediction might be "pebin" but final output is "unknown"
# Confidence score is low (<0.6)
# Overwrite reason is LOW_CONFIDENCE

4. Mapping Override

# Raw prediction might be "windows_dll" but final output is "dll"
# Overwrite reason is OVERWRITE_MAP
# This happens due to predefined content type mappings

Understanding these patterns helps you make informed decisions about how to handle different file types in your applications while maintaining appropriate security postures.