API Reference

Magika.ContentTypeInfo — Type

struct ContentTypeInfo

Metadata container for a specific content type that Magika can identify.

Fields

label::ContentTypeLabel: Enum identifier for the content type (e.g., python, jpeg)
mime_type::String: Standard MIME type associated with this content type (e.g., "text/x-python", "image/jpeg")
group::String: Category group of the content type (e.g., "code", "document", "image", "audio")
description::String: Human-readable description of the content type (e.g., "Python source", "JPEG image")
extensions::Vector{String}: Common file extensions associated with this content type (e.g., ["py", "pyi"])
is_text::Bool: Flag indicating whether this content type is text-based (true) or binary (false)

Notes

This struct is automatically populated from Google Magika's content type knowledge base during initialization. Users typically access this information through prediction results rather than constructing it directly.

Each content type in Magika's detection system has a corresponding ContentTypeInfo that provides comprehensive metadata about that file format.

Example

m = MagikaConfig()
result = identify_path(m, "script.py")

if is_ok(result)
    info = result.prediction.output
    println("File format: $(info.description)")
    println("MIME type: $(info.mime_type)")
    println("Common extensions: $(join(info.extensions, ", "))")
    println("Category: $(info.group)")
    println("Text-based: $(info.is_text)")
end

source

Magika.ContentTypeLabel — Type

@enum ContentTypeLabel

Enumeration of all supported content type labels that Magika can detect. Contains over 300 labels for different file formats, from common types like PNG, PDF to specialized formats like BRAINFUCK, VIMRC.

Examples

# Get the string representation of a content type
println(string(ContentTypeLabel("png")))  # "png"

# Convert from string to enum
label = ContentTypeLabel("jpeg")

Notes

Each label corresponds to a specific file format with associated metadata including MIME type, description, and file extensions.

source

Magika.MagikaConfig — Type

MagikaConfig(; prediction_mode=HIGH_CONFIDENCE, no_dereference=false)

Create a Magika configuration object for file type detection.

Arguments

prediction_mode::PredictionMode: Confidence threshold mode for predictions.
- HIGH_CONFIDENCE: Only returns predictions with high confidence specific to each content type
- MEDIUM_CONFIDENCE: Balanced approach using a medium confidence threshold
- BEST_GUESS: Always returns the most likely prediction regardless of confidence
no_dereference::Bool: When true, symbolic links are identified as links rather than their targets

Examples

# Create a detector with default settings
m = MagikaConfig()

# Create a detector that doesn't follow symlinks and uses medium confidence
m = MagikaConfig(prediction_mode=MEDIUM_CONFIDENCE, no_dereference=true)

source

Magika.MagikaPrediction — Type

struct MagikaPrediction

Container for the prediction results from Magika's deep learning model.

Fields

dl::ContentTypeInfo: The raw prediction directly from the deep learning model, before applying confidence thresholds or overwrite rules
output::ContentTypeInfo: The final prediction after applying confidence thresholds and overwrite rules
score::Float32: Confidence score between 0.0 and 1.0 indicating how certain the model is about its prediction
overwrite_reason::OverwriteReason: Enum explaining why the final output might differ from the raw model prediction:
- NONE: No overwrite occurred, output matches raw prediction
- LOW_CONFIDENCE: Prediction was downgraded due to low confidence score
- OVERWRITE_MAP: Prediction was replaced according to predefined mapping rules

Notes

The distinction between dl (deep learning raw output) and output (final decision) is crucial for understanding Magika's behavior. In some cases, especially with low-confidence predictions, Magika will "downgrade" specific predictions to more generic types (e.g., from a specific programming language to generic "text") for safety and reliability.

The confidence score should be considered when making security-critical decisions about file handling. Higher scores indicate more reliable predictions.

Example

m = MagikaConfig()
result = identify_path(m, "unknown_file")

if is_ok(result)
    pred = result.prediction
    
    # Access final prediction
    println("Detected as: $(pred.output.description)")
    println("Confidence: $(pred.score)")
    
    # Check if prediction was modified from raw model output
    if pred.overwrite_reason != NONE
        println("Original prediction ($(pred.dl.description)) was modified because: $(pred.overwrite_reason)")
    end
    
    # Use confidence score to make decisions
    if pred.score > 0.9
        println("High confidence prediction - safe to process automatically")
    elseif pred.overwrite_reason == LOW_CONFIDENCE
        println("Low confidence prediction - consider human verification")
    end
end

source

Magika.MagikaResult — Type

struct MagikaResult

Container for file identification results.

Fields

path::String: Path or identifier of the analyzed content
status::Status: Status of the identification operation (OK, FILE_NOT_FOUND_ERROR, etc.)
prediction::Union{MagikaPrediction, Nothing}: Prediction details when status is OK

Accessors

Use is_ok(result) to check if identification was successful
When successful, access prediction details via result.prediction

source

Magika.ModelConfig — Type

struct ModelConfig

Configuration parameters for the Magika deep learning model. This struct contains all the hyperparameters and settings needed for feature extraction and prediction.

Fields

beg_size::Int: Number of bytes to extract from the beginning of a file for feature extraction.
mid_size::Int: Number of bytes to extract from the middle of a file (currently not used in inference).
end_size::Int: Number of bytes to extract from the end of a file for feature extraction.
use_inputs_at_offsets::Bool: Flag indicating whether to use content at specific byte offsets (reserved for future use).
medium_confidence_threshold::Float32: Default confidence threshold used in MEDIUM_CONFIDENCE prediction mode.
min_file_size_for_dl::Int: Minimum file size (in bytes) required to use the deep learning model. Smaller files use simpler detection methods.
padding_token::Int: Byte value used for padding when file content is shorter than required feature size.
block_size::Int: Size of data blocks read from files during feature extraction.
target_labels_space::Vector{ContentTypeLabel}: Vector of all possible content type labels the model can predict.
thresholds::Dict{ContentTypeLabel, Float32}: Content type-specific confidence thresholds used in HIGH_CONFIDENCE mode.
overwrite_map::Dict{ContentTypeLabel, ContentTypeLabel}: Mapping of content types to be automatically overwritten with alternative labels (e.g., for security reasons).

Notes

This configuration is automatically loaded from the model's config file when a MagikaConfig is initialized. Users typically don't need to construct this struct directly.

The configuration values are optimized during model training to achieve high accuracy while maintaining fast inference times. The model only analyzes the beginning and end portions of files, making detection time nearly constant regardless of file size.

Example

# This struct is typically loaded automatically
cfg = m._model_config  # where m is a MagikaConfig instance

println("Minimum file size for DL: $(cfg.min_file_size_for_dl) bytes")
println("Medium confidence threshold: $(cfg.medium_confidence_threshold)")

source

Magika.OverwriteReason — Type

@enum OverwriteReason NONE LOW_CONFIDENCE OVERWRITE_MAP

Enumeration of reasons why a model prediction might be overwritten with a different label.

Values

NONE: No overwrite occurred, using the raw model prediction
LOW_CONFIDENCE: Prediction was downgraded due to low confidence score
OVERWRITE_MAP: Prediction was replaced according to predefined mapping rules

Notes

This information is available in result.prediction.overwrite_reason to understand why a prediction might differ from the model's raw output.

source

Magika.PredictionMode — Type

@enum PredictionMode HIGH_CONFIDENCE MEDIUM_CONFIDENCE BEST_GUESS

Enumeration of prediction confidence modes that control how strictly Magika applies confidence thresholds.

Values

HIGH_CONFIDENCE: Most conservative mode. Only returns predictions that meet content-type-specific high confidence thresholds.
MEDIUM_CONFIDENCE: Balanced mode. Uses a medium confidence threshold for all content types.
BEST_GUESS: Most permissive mode. Always returns the model's top prediction regardless of confidence score.

Examples

# Create a highly conservative detector
m = MagikaConfig(prediction_mode=HIGH_CONFIDENCE)

source

Magika.Status — Type

@enum Status OK FILE_NOT_FOUND_ERROR PERMISSION_ERROR UNKNOWN_ERROR

Enumeration of possible status codes for identification operations.

Values

OK: Operation completed successfully
FILE_NOT_FOUND_ERROR: File path does not exist
PERMISSION_ERROR: Unable to access file due to permissions
UNKNOWN_ERROR: Other unexpected error occurred

source

Magika.identify_bytes — Method

identify_bytes(m::MagikaConfig, content::Vector{UInt8})::MagikaResult

Identify the content type from raw byte content.

Arguments

m::MagikaConfig: Pre-configured Magika detector
content::Vector{UInt8}: Byte array containing file content to analyze

Returns

A MagikaResult object containing the detection results.

Examples

m = MagikaConfig()
content = read("image.png")
result = identify_bytes(m, content)
println("Content type: ", result.prediction.output.label)

source

Magika.identify_path — Method

identify_path(m::MagikaConfig, path::AbstractString)::MagikaResult

Identify the content type of a file at the given path.

Arguments

m::MagikaConfig: Pre-configured Magika detector
path::AbstractString: Path to the file to analyze

Returns

A MagikaResult object containing the detection results. Check with is_ok(result) before accessing prediction details.

Examples

m = MagikaConfig()
result = identify_path(m, "document.pdf")
if is_ok(result)
    println("Detected as: ", result.prediction.output.description)
    println("MIME type: ", result.prediction.output.mime_type)
    println("Confidence: ", result.prediction.score)
else
    println("Error: ", result.status)
end

source

Magika.identify_stream — Method

identify_stream(m::MagikaConfig, stream::IO)::MagikaResult

Identify the content type from an IO stream.

Arguments

m::MagikaConfig: Pre-configured Magika detector
stream::IO: IO stream to read content from (will be seeked)

Returns

A MagikaResult object containing the detection results.

Examples

m = MagikaConfig()
open("data.csv", "r") do io
    result = identify_stream(m, io)
    println("File group: ", result.prediction.output.group)
end

source

Magika.is_ok — Method

is_ok(result::MagikaResult)::Bool

Check if a MagikaResult represents a successful identification.

Returns

true if the status is OK, false otherwise.

Examples

result = identify_path(m, "file.txt")
if is_ok(result)
    # Safe to access result.prediction
    println("Detected as: ", result.prediction.output.description)
else
    println("Identification failed with status: ", result.status)
end

source