Star Coffee Writing about stuff, working in public

Starting jl-note

Starting a new project related to literate programming

Photo by Liam Gamba

Introduction

Hello and welcome back to Star Coffee. This past week (as of writing) I've started working on jl-note, a simple tool for literate programming. It's a quite simple tool, it reads in the content of a plaintext file that contains blocks of Julia code, evaluates the blocks, and writes their output or value back to the file.

This project was heavily inspired by Babel for Org which is in general the same idea. However, I had some problems trying to get Babel to work well with Julia code, and I wanted a more lightweight/universal tool than having to use Emacs.

The Code

The tool is quite simple in its current state, it's by no means optimized or works super smoothly. It also has to evaluate the code from scratch everytime it's run, which means waiting for all of the precompilation Julia needs to do (or it can be run in a Julia REPL, which removes the need for precompilation every time it's run). I'll try to tackle these problems and make it a much more streamlined tool, at which point I might open-source it.

The heavy lifting in jl-note is done by IOCapture.jl, which captures IO from evaluated Julia code and let's you access the results. At the moment, this is the only dependency for jl-note:

using IOCapture

First the script grabs the file path supplied to jl-note via the command line:

file_path = ARGS[1]

Next I defined some structs to store the contents of a file:

struct InputChunk
    content::String
    is_code::Bool
end

struct InputFile
    file_path::String
    file_contents::Vector{InputChunk}
end

The InputChunk struct stores a block of text, and denotes whether the text contains Julia code. The InputFile struct simply stores the file path and a vector of InputChunks. Next I wrote a function to parse a plaintext file and return an InputFile:

function get_file_chunks(file_path::String)::InputFile
    file_chunks = InputChunk[]

    reading_code_chunk = false
    code_chunk_start = 0
    
    current_chunk = ""

    for (i, line) in enumerate(eachline(file_path))
        if (!reading_code_chunk && !occursin("```", line)) || (reading_code_chunk && !occursin("```", line))
            if line != "\n"
                current_chunk *= line * '\n'
            end
        end

        if occursin("```", line) && !reading_code_chunk
            if current_chunk != ""
                push!(file_chunks, InputChunk(current_chunk, false))
            end

            reading_code_chunk = true
            code_chunk_start = i

            current_chunk = line * '\n'
        
        elseif reading_code_chunk && i != code_chunk_start && occursin("```", line)
            reading_code_chunk = false

            current_chunk *= line * '\n'

            if !occursin("```STDOUT", current_chunk) && !occursin("```OUTPUT", current_chunk)
                push!(file_chunks, InputChunk(current_chunk, true))
            end

            current_chunk = ""
        end
    end
    
    return InputFile(file_path, file_chunks)
end

I know the code is quite messy, but it works, optimizing comes later. This function basically goes line-by-line through the input file, and splits the file into chunks, alternating between text and code. Next I defined a CodeChunk struct to encode information about the code chunks, and I wrote a function to parse the InputFile to return a Vector{CodeChunk}:

struct CodeChunk
    properties::Vector{String}
    chunk::String
end

function get_code_chunks(input_file::InputFile)::Vector{CodeChunk}
    filtered = filter((chunk) -> chunk.is_code == true, input_file.file_contents)

    return [CodeChunk(String[], replace(chunk.content, "```julia\n" => "", "\n```" => "")) for chunk in filtered]
end

As of yet, I have not implemented any properties for the code chunks, so the properties field is unused, but I included it for now to allow for future development. Now for the interesting part, next is the function that takes a Vector{CodeChunk} and evaluates the code itself, capturing the output using IOCapture.jl:

function get_io_captures(code_chunks::Vector{CodeChunk})::Vector
    io_captures = []
    
    for chunk in code_chunks
        parsed_entry = replace(chunk.chunk, "\n\t" => "", '\n' => ';')

        io_capture = IOCapture.capture() do
            eval(Meta.parse(parsed_entry))
        end

        push!(io_captures, io_capture)
    end

    return io_captures
end

This function is pretty simple, it first converts the text into a single line and replaces any newline characters with semicolons, and then the code inside each chunk is evaluated. I do not return a concrete type from this function, because the return type from IOCapture.capture() is quite complicated and varies depending on the evaluated code. Next we have a function that takes in the InputFile and the Vector from get_io_captures, and inserts the results of the captures after each code block:

function insert_code_chunks(input_file::InputFile, code_chunks)::Vector{String}
    final_chunks = String[]
    code_counter = 1

    for chunk in input_file.file_contents
        push!(final_chunks, chunk.content)

        if chunk.is_code && code_counter <= length(code_chunks)
            code_output = ""

            if code_chunks[code_counter].output != ""
                code_output *= """\```STDOUT
                $(code_chunks[code_counter].output)
                \```
                """
            end

            if code_chunks[code_counter].value !== nothing
                code_output = """\```OUTPUT
                $(code_chunks[code_counter].value)
                \```
                """
            end

            push!(final_chunks, code_output)
            
            code_counter += 1;
        end
    end

    return final_chunks
end

This function returns a Vector{String} which contains each chunk of the original file with the results of the Julia code blocks inserted. Please note the \ characters in front of each ``` in the function above, I needed to insert the backslashes to not make Franklin.jl crash when writing this blog post. If you're using this code (please feel free to!) just remove those backslashes. Finally, we come to our last function, the one that puts all of the building blocks together and writes the result to the original file:

function run_note!(file_path::String)
    input_file = get_file_chunks(file_path)

    io_captures = get_code_chunks(input_file) |> get_io_captures

    full_string = insert_code_chunks(input_file, io_captures) |> join
    
    open(file_path, "w") do file
        write(file, full_string);
    end
end

And that's it (for now). Running the run_note! function either in the file itself or in the Julia REPL with a supplied file_path results in the script reading the file contents, evaluating any Julia code blocks, and writing the results back to the supplied file. See this tweet of mine to see it in action.

Conclusion

Thanks for reading this short post! I'm very happy with the progress I made on this project so far, and I'm excited to keep working at it in my free time. I recently found a job working at the Canadian Space Agency so I'm a bit busy these days, but I'd like to find some time every now and again to work on projects and write on this blog.

I hope you found this project interesting, feel free to use my code and play around with it. If people are interested I'll try to improve this script and open-source it, but we'll see what happens. Until next time.

CC BY-SA 4.0 Michal Jagodzinski. Last modified: May 09, 2024.
Website built with Franklin.jl and the Julia programming language.