API Reference

Magika.ContentTypeInfoType
struct ContentTypeInfo

Metadata container for a specific content type that Magika can identify.

Fields

  • label::ContentTypeLabel: Enum identifier for the content type (e.g., python, jpeg)
  • mime_type::String: Standard MIME type associated with this content type (e.g., "text/x-python", "image/jpeg")
  • group::String: Category group of the content type (e.g., "code", "document", "image", "audio")
  • description::String: Human-readable description of the content type (e.g., "Python source", "JPEG image")
  • extensions::Vector{String}: Common file extensions associated with this content type (e.g., ["py", "pyi"])
  • is_text::Bool: Flag indicating whether this content type is text-based (true) or binary (false)

Notes

This struct is automatically populated from Google Magika's content type knowledge base during initialization. Users typically access this information through prediction results rather than constructing it directly.

Each content type in Magika's detection system has a corresponding ContentTypeInfo that provides comprehensive metadata about that file format.

Example

m = MagikaConfig()
result = identify_path(m, "script.py")

if is_ok(result)
    info = result.prediction.output
    println("File format: $(info.description)")
    println("MIME type: $(info.mime_type)")
    println("Common extensions: $(join(info.extensions, ", "))")
    println("Category: $(info.group)")
    println("Text-based: $(info.is_text)")
end
source
Magika.ContentTypeLabelType
@enum ContentTypeLabel

Enumeration of all supported content type labels that Magika can detect. Contains over 300 labels for different file formats, from common types like PNG, PDF to specialized formats like BRAINFUCK, VIMRC.

Examples

# Get the string representation of a content type
println(string(ContentTypeLabel("png")))  # "png"

# Convert from string to enum
label = ContentTypeLabel("jpeg")

Notes

Each label corresponds to a specific file format with associated metadata including MIME type, description, and file extensions.

source
Magika.MagikaConfigType
MagikaConfig(; prediction_mode=HIGH_CONFIDENCE, no_dereference=false)

Create a Magika configuration object for file type detection.

Arguments

  • prediction_mode::PredictionMode: Confidence threshold mode for predictions.
    • HIGH_CONFIDENCE: Only returns predictions with high confidence specific to each content type
    • MEDIUM_CONFIDENCE: Balanced approach using a medium confidence threshold
    • BEST_GUESS: Always returns the most likely prediction regardless of confidence
  • no_dereference::Bool: When true, symbolic links are identified as links rather than their targets

Examples

# Create a detector with default settings
m = MagikaConfig()

# Create a detector that doesn't follow symlinks and uses medium confidence
m = MagikaConfig(prediction_mode=MEDIUM_CONFIDENCE, no_dereference=true)
source
Magika.MagikaPredictionType
struct MagikaPrediction

Container for the prediction results from Magika's deep learning model.

Fields

  • dl::ContentTypeInfo: The raw prediction directly from the deep learning model, before applying confidence thresholds or overwrite rules
  • output::ContentTypeInfo: The final prediction after applying confidence thresholds and overwrite rules
  • score::Float32: Confidence score between 0.0 and 1.0 indicating how certain the model is about its prediction
  • overwrite_reason::OverwriteReason: Enum explaining why the final output might differ from the raw model prediction:
    • NONE: No overwrite occurred, output matches raw prediction
    • LOW_CONFIDENCE: Prediction was downgraded due to low confidence score
    • OVERWRITE_MAP: Prediction was replaced according to predefined mapping rules

Notes

The distinction between dl (deep learning raw output) and output (final decision) is crucial for understanding Magika's behavior. In some cases, especially with low-confidence predictions, Magika will "downgrade" specific predictions to more generic types (e.g., from a specific programming language to generic "text") for safety and reliability.

The confidence score should be considered when making security-critical decisions about file handling. Higher scores indicate more reliable predictions.

Example

m = MagikaConfig()
result = identify_path(m, "unknown_file")

if is_ok(result)
    pred = result.prediction
    
    # Access final prediction
    println("Detected as: $(pred.output.description)")
    println("Confidence: $(pred.score)")
    
    # Check if prediction was modified from raw model output
    if pred.overwrite_reason != NONE
        println("Original prediction ($(pred.dl.description)) was modified because: $(pred.overwrite_reason)")
    end
    
    # Use confidence score to make decisions
    if pred.score > 0.9
        println("High confidence prediction - safe to process automatically")
    elseif pred.overwrite_reason == LOW_CONFIDENCE
        println("Low confidence prediction - consider human verification")
    end
end
source
Magika.MagikaResultType
struct MagikaResult

Container for file identification results.

Fields

  • path::String: Path or identifier of the analyzed content
  • status::Status: Status of the identification operation (OK, FILE_NOT_FOUND_ERROR, etc.)
  • prediction::Union{MagikaPrediction, Nothing}: Prediction details when status is OK

Accessors

  • Use is_ok(result) to check if identification was successful
  • When successful, access prediction details via result.prediction
source
Magika.ModelConfigType
struct ModelConfig

Configuration parameters for the Magika deep learning model. This struct contains all the hyperparameters and settings needed for feature extraction and prediction.

Fields

  • beg_size::Int: Number of bytes to extract from the beginning of a file for feature extraction.
  • mid_size::Int: Number of bytes to extract from the middle of a file (currently not used in inference).
  • end_size::Int: Number of bytes to extract from the end of a file for feature extraction.
  • use_inputs_at_offsets::Bool: Flag indicating whether to use content at specific byte offsets (reserved for future use).
  • medium_confidence_threshold::Float32: Default confidence threshold used in MEDIUM_CONFIDENCE prediction mode.
  • min_file_size_for_dl::Int: Minimum file size (in bytes) required to use the deep learning model. Smaller files use simpler detection methods.
  • padding_token::Int: Byte value used for padding when file content is shorter than required feature size.
  • block_size::Int: Size of data blocks read from files during feature extraction.
  • target_labels_space::Vector{ContentTypeLabel}: Vector of all possible content type labels the model can predict.
  • thresholds::Dict{ContentTypeLabel, Float32}: Content type-specific confidence thresholds used in HIGH_CONFIDENCE mode.
  • overwrite_map::Dict{ContentTypeLabel, ContentTypeLabel}: Mapping of content types to be automatically overwritten with alternative labels (e.g., for security reasons).

Notes

This configuration is automatically loaded from the model's config file when a MagikaConfig is initialized. Users typically don't need to construct this struct directly.

The configuration values are optimized during model training to achieve high accuracy while maintaining fast inference times. The model only analyzes the beginning and end portions of files, making detection time nearly constant regardless of file size.

Example

# This struct is typically loaded automatically
cfg = m._model_config  # where m is a MagikaConfig instance

println("Minimum file size for DL: $(cfg.min_file_size_for_dl) bytes")
println("Medium confidence threshold: $(cfg.medium_confidence_threshold)")
source
Magika.OverwriteReasonType
@enum OverwriteReason NONE LOW_CONFIDENCE OVERWRITE_MAP

Enumeration of reasons why a model prediction might be overwritten with a different label.

Values

  • NONE: No overwrite occurred, using the raw model prediction
  • LOW_CONFIDENCE: Prediction was downgraded due to low confidence score
  • OVERWRITE_MAP: Prediction was replaced according to predefined mapping rules

Notes

This information is available in result.prediction.overwrite_reason to understand why a prediction might differ from the model's raw output.

source
Magika.PredictionModeType
@enum PredictionMode HIGH_CONFIDENCE MEDIUM_CONFIDENCE BEST_GUESS

Enumeration of prediction confidence modes that control how strictly Magika applies confidence thresholds.

Values

  • HIGH_CONFIDENCE: Most conservative mode. Only returns predictions that meet content-type-specific high confidence thresholds.
  • MEDIUM_CONFIDENCE: Balanced mode. Uses a medium confidence threshold for all content types.
  • BEST_GUESS: Most permissive mode. Always returns the model's top prediction regardless of confidence score.

Examples

# Create a highly conservative detector
m = MagikaConfig(prediction_mode=HIGH_CONFIDENCE)
source
Magika.StatusType
@enum Status OK FILE_NOT_FOUND_ERROR PERMISSION_ERROR UNKNOWN_ERROR

Enumeration of possible status codes for identification operations.

Values

  • OK: Operation completed successfully
  • FILE_NOT_FOUND_ERROR: File path does not exist
  • PERMISSION_ERROR: Unable to access file due to permissions
  • UNKNOWN_ERROR: Other unexpected error occurred
source
Magika.identify_bytesMethod
identify_bytes(m::MagikaConfig, content::Vector{UInt8})::MagikaResult

Identify the content type from raw byte content.

Arguments

  • m::MagikaConfig: Pre-configured Magika detector
  • content::Vector{UInt8}: Byte array containing file content to analyze

Returns

A MagikaResult object containing the detection results.

Examples

m = MagikaConfig()
content = read("image.png")
result = identify_bytes(m, content)
println("Content type: ", result.prediction.output.label)
source
Magika.identify_pathMethod
identify_path(m::MagikaConfig, path::AbstractString)::MagikaResult

Identify the content type of a file at the given path.

Arguments

  • m::MagikaConfig: Pre-configured Magika detector
  • path::AbstractString: Path to the file to analyze

Returns

A MagikaResult object containing the detection results. Check with is_ok(result) before accessing prediction details.

Examples

m = MagikaConfig()
result = identify_path(m, "document.pdf")
if is_ok(result)
    println("Detected as: ", result.prediction.output.description)
    println("MIME type: ", result.prediction.output.mime_type)
    println("Confidence: ", result.prediction.score)
else
    println("Error: ", result.status)
end
source
Magika.identify_streamMethod
identify_stream(m::MagikaConfig, stream::IO)::MagikaResult

Identify the content type from an IO stream.

Arguments

  • m::MagikaConfig: Pre-configured Magika detector
  • stream::IO: IO stream to read content from (will be seeked)

Returns

A MagikaResult object containing the detection results.

Examples

m = MagikaConfig()
open("data.csv", "r") do io
    result = identify_stream(m, io)
    println("File group: ", result.prediction.output.group)
end
source
Magika.is_okMethod
is_ok(result::MagikaResult)::Bool

Check if a MagikaResult represents a successful identification.

Returns

true if the status is OK, false otherwise.

Examples

result = identify_path(m, "file.txt")
if is_ok(result)
    # Safe to access result.prediction
    println("Detected as: ", result.prediction.output.description)
else
    println("Identification failed with status: ", result.status)
end
source