Obsidian Automated AI Voice Workflows

4 minutes read ā€¢

The Problem šŸ¤”

For a long time, Iā€™ve wanted the ability to record voice notes on the go and access them later as text.

No, Iā€™m not talking about simple dictation, I want something that cuts down on my distracted ā€œummsā€, ā€œahhsā€, flight-of-ideas, and other incoherent vocalised nonsense as I drive like the incapable-of-multitasking man I am.

For a while, I used AudioPen, and it was pretty fantastic at condensing everything import into neat little summaries. The features have grown over the years, but Iā€™m still a little wary of paying a subscription.

I dreamed a little as I entertained the thought of building something out:

I couldnā€™t find anything like that yetā€¦

So I started building my own thing, again. (As if I didnā€™t start enough projects over my parental leave break šŸ¤¦ā€ā™‚ļøšŸ‘¶)


The Idea šŸ’”

And so ThinkAloud.md was born!

(And at time of writing, is very much still a baby)

šŸ«± Check it out! šŸ«²

Hereā€™s the gist:

  1. Audio Recording:

    • Utilise Obsidianā€™s built-in audio recording feature to capture voice notes directly within the app.
  2. Git Sync Backend:

    • using fit or git extensions for Obisidian, git push is triggered to GitHub.
  3. GitHub Actions Workflow:

    • The push initiates a GitHub Actions workflow that processes the new voice note.
  4. Speech-to-Text Conversion:

    • The workflow sends the audio file to AssemblyAIā€™s API for transcription.
  5. Content Parsing and Formatting:

    • Assembly AIā€™s LeMUR parses the transcribed text to extract metadata and formats the content according to predefined templates.
  6. Text-to-Speech (Optional):

    • If specified in the template, the content is sent to Eleven Labs AI for voice synthesis.
  7. Update Obsidian Note:

    • The processed content is fed back into the original note containing the voice recording.

The Doing It šŸƒ

Granted, itā€™s a little asyncā€¦ Iā€™m a bit worried that itā€™ll be too slow. But itā€™s very powerful. Anything that can run in a GitHub workflow can influence what ends up in the notes.

My personal uses cases are pretty much all while Iā€™m commuting. I like to ā€œthink outā€ long form ideas on-the-go and review my notes quite a bit later, so granted the CI/CD workflow doesnā€™t fail, it should be fine for me.

If youā€™re after similar and are happy with less flexibility, Iā€™d recommend AudioPen or Obsidian Scribe.

The Templates šŸ½ļø

Templates guide LeMUR and Eleven Labs AI in processing and formatting the content. Hereā€™s an example of a ā€œjournal entryā€ template:

---
author: {{ Author }}
date: {{ Date }} 
tags: journal, diary, log

match_phrase: captains log
process_stt: ready
process_tts: ready 
---

# {{ Title }}

## In a Nutshell

*Today I was feeling...*

{{ Tone/Emotion, 3 lines }}

**Context**: {{ Context. When, where, etc. 50 words }}

{{ Provide a synopsis on the journal entry, roughly a paragraph }}

## Key points 

{{ Key points, 3 - 5 }}

## Full text 

{{ Provide verbatim transcript lightly cleaned up: "umms", "ahhs", stutters, grammatical mistakes removed etc. }}

In a nutshell, the syntax is pretty much this:

I am a **static** ~~potato~~ *markdown* string, I will be completely ignored by the LLM for context's sake.

## Big Ol' MD Heading

{{ I am a placeholder string. Put instructions in me for the LLM to follow}}

// I am a comment. I will be stripped out of the final output.

Note

Not really sure about the comment syntax. Markdown is funny with comments. Theyā€™re not really part of the ā€œspecā€.

The Power āš”ļø

Since the templates are processed by a GPT, you can get really creative šŸŽØ You can even chain further workflows to automate further:


Existing Solutions

While developing this workflow, I explored existing plugins that offer similar functionalities:

These plugins offer valuable features, and I could have settled with Vox or Scribe. But while I really enjoy Obsidian, I love the idea of being able to use this for any markdown notes, anywhere.

Better yet is the flexibility of being able to kick off all sorts of actions with CI/CD workflows!


Stay tuned for updates as this project progresses! āœØ