Module content_cleaner

Constants§

JSON_BEGIN_MARKER 🔒: JSON markers for LLM prompt
JSON_END_MARKER 🔒
MAX_CONTEXT_UTILIZATION: Maximum percentage of context window to use in a single request
MAX_CONTEXT_WINDOW: Maximum context window size for LLM in tokens
REQUEST_TEMPERATURE: Temperature for requests, low for deterministic results
SYSTEM_PROMPT 🔒: System prompt for converting course material to markdown
USER_PROMPT_START 🔒: User prompt for converting course material to markdown

append_markdown_with_separator: Appends markdown content to a result string with proper newline separators
calculate_safe_token_limit: Calculate the safe token limit based on context window and utilization
convert_material_blocks_to_markdown_with_llm: Cleans content by converting the material blocks to clean markdown using an LLM
prepare_llm_messages: Prepare messages for the LLM request
process_block_chunk 🔒: Process a subset of blocks in a single LLM request
process_chunks 🔒: Process all chunks and combine the results
split_blocks_into_chunks: Split blocks into chunks that fit within token limits
split_oversized_block 🔒: Splits an oversized block into smaller string chunks