Module content_cleaner

Source

ConstantsΒ§

JSON_BEGIN_MARKER πŸ”’
JSON markers for LLM prompt
JSON_END_MARKER πŸ”’
MAX_CONTEXT_UTILIZATION
Maximum percentage of context window to use in a single request
MAX_CONTEXT_WINDOW
Maximum context window size for LLM in tokens
REQUEST_TEMPERATURE
Temperature for requests, low for deterministic results
SYSTEM_PROMPT πŸ”’
System prompt for converting course material to markdown
USER_PROMPT_START πŸ”’
User prompt for converting course material to markdown

FunctionsΒ§

append_markdown_with_separator
Appends markdown content to a result string with proper newline separators
block_to_json_string πŸ”’
Converts a block to JSON string, removing any private_spec fields recursively
blocks_to_json_string πŸ”’
Converts a vector of blocks to JSON string, removing any private_spec fields recursively
calculate_safe_token_limit
Calculate the safe token limit based on context window and utilization
convert_material_blocks_to_markdown_with_llm
Cleans content by converting the material blocks to clean markdown using an LLM
prepare_llm_messages
Prepare messages for the LLM request
process_block_chunk πŸ”’
Process a subset of blocks in a single LLM request
process_chunks πŸ”’
Process all chunks and combine the results
remove_private_spec_recursive πŸ”’
Recursively removes all fields named β€œprivate_spec” from a JSON value
split_blocks_into_chunks
Split blocks into chunks that fit within token limits
split_oversized_block πŸ”’
Splits an oversized block into smaller string chunks