API Reference
Magika.ContentTypeInfo — Type
struct ContentTypeInfoMetadata container for a specific content type that Magika can identify.
Fields
label::ContentTypeLabel: Enum identifier for the content type (e.g.,python,jpeg)mime_type::String: Standard MIME type associated with this content type (e.g., "text/x-python", "image/jpeg")group::String: Category group of the content type (e.g., "code", "document", "image", "audio")description::String: Human-readable description of the content type (e.g., "Python source", "JPEG image")extensions::Vector{String}: Common file extensions associated with this content type (e.g., ["py", "pyi"])is_text::Bool: Flag indicating whether this content type is text-based (true) or binary (false)
Notes
This struct is automatically populated from Google Magika's content type knowledge base during initialization. Users typically access this information through prediction results rather than constructing it directly.
Each content type in Magika's detection system has a corresponding ContentTypeInfo that provides comprehensive metadata about that file format.
Example
m = MagikaConfig()
result = identify_path(m, "script.py")
if is_ok(result)
info = result.prediction.output
println("File format: $(info.description)")
println("MIME type: $(info.mime_type)")
println("Common extensions: $(join(info.extensions, ", "))")
println("Category: $(info.group)")
println("Text-based: $(info.is_text)")
endMagika.ContentTypeLabel — Type
@enum ContentTypeLabelEnumeration of all supported content type labels that Magika can detect. Contains over 300 labels for different file formats, from common types like PNG, PDF to specialized formats like BRAINFUCK, VIMRC.
Examples
# Get the string representation of a content type
println(string(ContentTypeLabel("png"))) # "png"
# Convert from string to enum
label = ContentTypeLabel("jpeg")Notes
Each label corresponds to a specific file format with associated metadata including MIME type, description, and file extensions.
Magika.MagikaConfig — Type
MagikaConfig(; prediction_mode=HIGH_CONFIDENCE, no_dereference=false)Create a Magika configuration object for file type detection.
Arguments
prediction_mode::PredictionMode: Confidence threshold mode for predictions.HIGH_CONFIDENCE: Only returns predictions with high confidence specific to each content typeMEDIUM_CONFIDENCE: Balanced approach using a medium confidence thresholdBEST_GUESS: Always returns the most likely prediction regardless of confidence
no_dereference::Bool: When true, symbolic links are identified as links rather than their targets
Examples
# Create a detector with default settings
m = MagikaConfig()
# Create a detector that doesn't follow symlinks and uses medium confidence
m = MagikaConfig(prediction_mode=MEDIUM_CONFIDENCE, no_dereference=true)Magika.MagikaPrediction — Type
struct MagikaPredictionContainer for the prediction results from Magika's deep learning model.
Fields
dl::ContentTypeInfo: The raw prediction directly from the deep learning model, before applying confidence thresholds or overwrite rulesoutput::ContentTypeInfo: The final prediction after applying confidence thresholds and overwrite rulesscore::Float32: Confidence score between 0.0 and 1.0 indicating how certain the model is about its predictionoverwrite_reason::OverwriteReason: Enum explaining why the final output might differ from the raw model prediction:NONE: No overwrite occurred, output matches raw predictionLOW_CONFIDENCE: Prediction was downgraded due to low confidence scoreOVERWRITE_MAP: Prediction was replaced according to predefined mapping rules
Notes
The distinction between dl (deep learning raw output) and output (final decision) is crucial for understanding Magika's behavior. In some cases, especially with low-confidence predictions, Magika will "downgrade" specific predictions to more generic types (e.g., from a specific programming language to generic "text") for safety and reliability.
The confidence score should be considered when making security-critical decisions about file handling. Higher scores indicate more reliable predictions.
Example
m = MagikaConfig()
result = identify_path(m, "unknown_file")
if is_ok(result)
pred = result.prediction
# Access final prediction
println("Detected as: $(pred.output.description)")
println("Confidence: $(pred.score)")
# Check if prediction was modified from raw model output
if pred.overwrite_reason != NONE
println("Original prediction ($(pred.dl.description)) was modified because: $(pred.overwrite_reason)")
end
# Use confidence score to make decisions
if pred.score > 0.9
println("High confidence prediction - safe to process automatically")
elseif pred.overwrite_reason == LOW_CONFIDENCE
println("Low confidence prediction - consider human verification")
end
endMagika.MagikaResult — Type
struct MagikaResultContainer for file identification results.
Fields
path::String: Path or identifier of the analyzed contentstatus::Status: Status of the identification operation (OK,FILE_NOT_FOUND_ERROR, etc.)prediction::Union{MagikaPrediction, Nothing}: Prediction details when status isOK
Accessors
- Use
is_ok(result)to check if identification was successful - When successful, access prediction details via
result.prediction
Magika.ModelConfig — Type
struct ModelConfigConfiguration parameters for the Magika deep learning model. This struct contains all the hyperparameters and settings needed for feature extraction and prediction.
Fields
beg_size::Int: Number of bytes to extract from the beginning of a file for feature extraction.mid_size::Int: Number of bytes to extract from the middle of a file (currently not used in inference).end_size::Int: Number of bytes to extract from the end of a file for feature extraction.use_inputs_at_offsets::Bool: Flag indicating whether to use content at specific byte offsets (reserved for future use).medium_confidence_threshold::Float32: Default confidence threshold used inMEDIUM_CONFIDENCEprediction mode.min_file_size_for_dl::Int: Minimum file size (in bytes) required to use the deep learning model. Smaller files use simpler detection methods.padding_token::Int: Byte value used for padding when file content is shorter than required feature size.block_size::Int: Size of data blocks read from files during feature extraction.target_labels_space::Vector{ContentTypeLabel}: Vector of all possible content type labels the model can predict.thresholds::Dict{ContentTypeLabel, Float32}: Content type-specific confidence thresholds used inHIGH_CONFIDENCEmode.overwrite_map::Dict{ContentTypeLabel, ContentTypeLabel}: Mapping of content types to be automatically overwritten with alternative labels (e.g., for security reasons).
Notes
This configuration is automatically loaded from the model's config file when a MagikaConfig is initialized. Users typically don't need to construct this struct directly.
The configuration values are optimized during model training to achieve high accuracy while maintaining fast inference times. The model only analyzes the beginning and end portions of files, making detection time nearly constant regardless of file size.
Example
# This struct is typically loaded automatically
cfg = m._model_config # where m is a MagikaConfig instance
println("Minimum file size for DL: $(cfg.min_file_size_for_dl) bytes")
println("Medium confidence threshold: $(cfg.medium_confidence_threshold)")Magika.OverwriteReason — Type
@enum OverwriteReason NONE LOW_CONFIDENCE OVERWRITE_MAPEnumeration of reasons why a model prediction might be overwritten with a different label.
Values
NONE: No overwrite occurred, using the raw model predictionLOW_CONFIDENCE: Prediction was downgraded due to low confidence scoreOVERWRITE_MAP: Prediction was replaced according to predefined mapping rules
Notes
This information is available in result.prediction.overwrite_reason to understand why a prediction might differ from the model's raw output.
Magika.PredictionMode — Type
@enum PredictionMode HIGH_CONFIDENCE MEDIUM_CONFIDENCE BEST_GUESSEnumeration of prediction confidence modes that control how strictly Magika applies confidence thresholds.
Values
HIGH_CONFIDENCE: Most conservative mode. Only returns predictions that meet content-type-specific high confidence thresholds.MEDIUM_CONFIDENCE: Balanced mode. Uses a medium confidence threshold for all content types.BEST_GUESS: Most permissive mode. Always returns the model's top prediction regardless of confidence score.
Examples
# Create a highly conservative detector
m = MagikaConfig(prediction_mode=HIGH_CONFIDENCE)Magika.Status — Type
@enum Status OK FILE_NOT_FOUND_ERROR PERMISSION_ERROR UNKNOWN_ERROREnumeration of possible status codes for identification operations.
Values
OK: Operation completed successfullyFILE_NOT_FOUND_ERROR: File path does not existPERMISSION_ERROR: Unable to access file due to permissionsUNKNOWN_ERROR: Other unexpected error occurred
Magika.identify_bytes — Method
identify_bytes(m::MagikaConfig, content::Vector{UInt8})::MagikaResultIdentify the content type from raw byte content.
Arguments
m::MagikaConfig: Pre-configured Magika detectorcontent::Vector{UInt8}: Byte array containing file content to analyze
Returns
A MagikaResult object containing the detection results.
Examples
m = MagikaConfig()
content = read("image.png")
result = identify_bytes(m, content)
println("Content type: ", result.prediction.output.label)Magika.identify_path — Method
identify_path(m::MagikaConfig, path::AbstractString)::MagikaResultIdentify the content type of a file at the given path.
Arguments
m::MagikaConfig: Pre-configured Magika detectorpath::AbstractString: Path to the file to analyze
Returns
A MagikaResult object containing the detection results. Check with is_ok(result) before accessing prediction details.
Examples
m = MagikaConfig()
result = identify_path(m, "document.pdf")
if is_ok(result)
println("Detected as: ", result.prediction.output.description)
println("MIME type: ", result.prediction.output.mime_type)
println("Confidence: ", result.prediction.score)
else
println("Error: ", result.status)
endMagika.identify_stream — Method
identify_stream(m::MagikaConfig, stream::IO)::MagikaResultIdentify the content type from an IO stream.
Arguments
m::MagikaConfig: Pre-configured Magika detectorstream::IO: IO stream to read content from (will be seeked)
Returns
A MagikaResult object containing the detection results.
Examples
m = MagikaConfig()
open("data.csv", "r") do io
result = identify_stream(m, io)
println("File group: ", result.prediction.output.group)
endMagika.is_ok — Method
is_ok(result::MagikaResult)::BoolCheck if a MagikaResult represents a successful identification.
Returns
true if the status is OK, false otherwise.
Examples
result = identify_path(m, "file.txt")
if is_ok(result)
# Safe to access result.prediction
println("Detected as: ", result.prediction.output.description)
else
println("Identification failed with status: ", result.status)
end