Data Models
This section documents the key data models used throughout the RepoPacker.jl package.
RepoPacker.add_extension
— Methodadd_extension(ext::AbstractString)
Add a file extension (e.g., ".r", ".sql") to the list of recognized text file extensions. The extension should include the leading dot.
Arguments
ext
: File extension to add (must start with a dot)
Examples
RepoPacker.add_extension(".r")
RepoPacker.add_extension(".sql")
Errors
- Throws
ArgumentError
if extension doesn't start with a dot
RepoPacker.calculate_file_metrics
— Methodcalculate_file_metrics(text_files::Vector{String}, base_dir::AbstractString)
Calculate metrics for a collection of text files.
Returns
- Tuple containing:
- total_chars: Total character count
- total_tokens: Total token estimate
- filecharcounts: Dict mapping file paths to character counts
- filetokencounts: Dict mapping file paths to token estimates
RepoPacker.clone_and_pack
— Functionclone_and_pack(repo_url::AbstractString, output_file::AbstractString="repo.xml";
output_style::Symbol=:xml, temp_dir::AbstractString=tempname(), verbose::Bool=false)
Clone a GitHub repository and pack its text files into a file in the specified format.
Arguments
repo_url
: URL of the Git repository to cloneoutput_file
: Output file path (default: "repo.xml")output_style
: Format to use (:xml
,:json
, or:markdown
)temp_dir
: Temporary directory for cloning (default: auto-generated)verbose
: Whether to enable detailed logging (default: false)
Returns
- Path to the generated output file
Examples
RepoPacker.clone_and_pack("https://github.com/username/repo.git", "output.xml")
Errors
- Throws errors related to Git operations or file writing
RepoPacker.collect_text_files
— Methodcollect_text_files(dir_path::AbstractString; verbose::Bool=false)
Recursively collect all text files in a directory, skipping .git and neglected paths.
Arguments
dir_path
: Directory path to scanverbose
: Whether to enable detailed logging (default: false)
Returns
- Vector of text file paths
Examples
files = RepoPacker.collect_text_files(".")
Errors
- Throws
ArgumentError
if directory doesn't exist
RepoPacker.estimate_token_count
— Methodestimate_token_count(content::String)
Estimate token count using the simple heuristic: length(content) ÷ 4
. This approximates GPT-style tokenizers for English-like code/text.
Returns
- Estimated token count as
Int
RepoPacker.generate_json_content
— Methodgenerate_json_content(text_files::Vector{String}, base_dir::AbstractString; verbose::Bool=false)
Generate repository content in Repomix-inspired JSON format.
Returns
- String containing valid JSON with keys:
fileSummary
,directoryStructure
,files
,metrics
RepoPacker.generate_markdown_content
— Methodgenerate_markdown_content(text_files::Vector{String}, base_dir::AbstractString; verbose::Bool=false)
Generate repository content in Repomix-inspired Markdown format.
RepoPacker.generate_xml_content
— Methodgenerate_xml_content(text_files::Vector{String}, base_dir::AbstractString; verbose::Bool=false)
Generate XML content in Repomix format.
Arguments
text_files
: Vector of text file paths to includebase_dir
: Base directory of the repositoryverbose
: Whether to enable detailed logging (default: false)
Returns
- String containing the XML content with header
Examples
files = RepoPacker.collect_text_files(".")
xml_content = RepoPacker.generate_xml_content(files, ".")
RepoPacker.get_directory_structure
— Methodget_directory_structure(dir_path::AbstractString; verbose::Bool=false)
Generate a visual representation of the directory structure, excluding neglected paths.
Arguments
dir_path
: Directory path to analyzeverbose
: Whether to enable detailed logging (default: false)
Returns
- String representation of the directory structure
Examples
structure = RepoPacker.get_directory_structure(".")
println(structure)
RepoPacker.get_top_files
— Functionget_top_files(file_token_counts::Dict{String, Int}, n::Int=5)
Get the top N files with the highest token counts.
Returns
- Vector of tuples (filepath, tokencount) sorted by token count descending
RepoPacker.is_text_file
— Methodis_text_file(path::AbstractString)
Check if a file is a text file based on its extension.
Arguments
path
: File path to check
Returns
true
if the file has a recognized text extension,false
otherwise
Examples
RepoPacker.is_text_file("src/RepoPacker.jl") # returns true
RepoPacker.is_text_file("docs/logo.png") # returns false
RepoPacker.neglect_path
— Methodneglect_path(path::AbstractString)
Add a path (file or directory) to be excluded from packing. Paths are matched as substrings in the full file path (relative to repo root).
Arguments
path
: Path pattern to exclude (can be relative or absolute)
Examples
RepoPacker.neglect_path("test/")
RepoPacker.neglect_path(".env")
RepoPacker.pack_directory
— Functionpack_directory(dir_path::AbstractString, output_file::AbstractString="repo.xml";
output_style::Symbol=:xml, verbose::Bool=false)
Pack the text files in a directory into a file in the specified format.
Arguments
dir_path
: Directory path to packoutput_file
: Output file path (default: "repo.xml")output_style
: Format to use (:xml
,:json
, or:markdown
)verbose
: Whether to enable detailed logging (default: false)
Returns
- Path to the generated output file
Examples
RepoPacker.pack_directory(".", "repo.xml")
Errors
- Throws
ArgumentError
if directory doesn't exist
RepoPacker.should_neglect
— Methodshould_neglect(full_path::AbstractString, base_dir::AbstractString)
Check if a file should be excluded based on the global NEGLECTPATHS list. Compares against both absolute and relative (to basedir) paths.
Arguments
full_path
: Absolute path of the filebase_dir
: Base directory of the repository
Returns
true
if the file should be excluded,false
otherwise