Module content_cleaner

Source

Constantsยง

JSON_BEGIN_MARKER ๐Ÿ”’
JSON markers for LLM prompt
JSON_END_MARKER ๐Ÿ”’
MAX_CONTEXT_UTILIZATION
Maximum percentage of context window to use in a single request
MAX_CONTEXT_WINDOW
Maximum context window size for LLM in tokens
REQUEST_TEMPERATURE
Temperature for requests, low for deterministic results
SYSTEM_PROMPT ๐Ÿ”’
System prompt for converting course material to markdown
USER_PROMPT_START ๐Ÿ”’
User prompt for converting course material to markdown

Functionsยง

append_markdown_with_separator
Appends markdown content to a result string with proper newline separators
calculate_safe_token_limit
Calculate the safe token limit based on context window and utilization
convert_material_blocks_to_markdown_with_llm
Cleans content by converting the material blocks to clean markdown using an LLM
prepare_llm_messages
Prepare messages for the LLM request
process_block_chunk ๐Ÿ”’
Process a subset of blocks in a single LLM request
process_chunks ๐Ÿ”’
Process all chunks and combine the results
split_blocks_into_chunks
Split blocks into chunks that fit within token limits
split_oversized_block ๐Ÿ”’
Splits an oversized block into smaller string chunks