Obsidian Automated AI Voice Workflows
4 minutes read ā¢
The Problem š¤
For a long time, Iāve wanted the ability to record voice notes on the go and access them later as text.
No, Iām not talking about simple dictation, I want something that cuts down on my distracted āummsā, āahhsā, flight-of-ideas, and other incoherent vocalised nonsense as I drive like the incapable-of-multitasking man I am.
For a while, I used AudioPen, and it was pretty fantastic at condensing everything import into neat little summaries. The features have grown over the years, but Iām still a little wary of paying a subscription.
I dreamed a little as I entertained the thought of building something out:
- It would be open-source
- Ideally extendable
- Loosely-coupled to work with a multitude of tools
- What if it could run any workflow?
I couldnāt find anything like that yetā¦
So I started building my own thing, again. (As if I didnāt start enough projects over my parental leave break š¤¦āāļøš¶)
The Idea š”
And so ThinkAloud.md was born!
(And at time of writing, is very much still a baby)
š«± Check it out! š«²
Hereās the gist:
Audio Recording:
- Utilise Obsidianās built-in audio recording feature to capture voice notes directly within the app.
Git Sync Backend:
- using
fit
orgit
extensions for Obisidian, git push is triggered to GitHub.
- using
GitHub Actions Workflow:
- The push initiates a GitHub Actions workflow that processes the new voice note.
Speech-to-Text Conversion:
- The workflow sends the audio file to AssemblyAIās API for transcription.
Content Parsing and Formatting:
- Assembly AIās LeMUR parses the transcribed text to extract metadata and formats the content according to predefined templates.
Text-to-Speech (Optional):
- If specified in the template, the content is sent to Eleven Labs AI for voice synthesis.
Update Obsidian Note:
- The processed content is fed back into the original note containing the voice recording.
The Doing It š
Granted, itās a little asyncā¦ Iām a bit worried that itāll be too slow. But itās very powerful. Anything that can run in a GitHub workflow can influence what ends up in the notes.
My personal uses cases are pretty much all while Iām commuting. I like to āthink outā long form ideas on-the-go and review my notes quite a bit later, so granted the CI/CD workflow doesnāt fail, it should be fine for me.
If youāre after similar and are happy with less flexibility, Iād recommend AudioPen or Obsidian Scribe.
The Templates š½ļø
Templates guide LeMUR and Eleven Labs AI in processing and formatting the content. Hereās an example of a ājournal entryā template:
In a nutshell, the syntax is pretty much this:
Note
Not really sure about the comment syntax. Markdown is funny with comments. Theyāre not really part of the āspecā.
The Power ā”ļø
Since the templates are processed by a GPT, you can get really creative šØ You can even chain further workflows to automate further:
Clean up ad-lib artefacts: āummsā and āahhsā, mistakes, stutters, silences.
Re-record: Swap your noisy car voice-note with a professional studio-quality voice-over using ElevenLabs.
Speaker diarisation: Title each reader uniquely for conversational dialogue, like for a podcast.
Journal: Summarise your whole day as a structured journal entry with time and place context, emotional analysis, key points, etc.
Populate Dataviews: Populate Obsidian dataview fields, autofill tables, queries and kick-off Javascript tasks.
Integrate with Workflow: n8n workflows.
Blogging: Create draft posts that match the structure and tone of your personal blog, or even publish directly to GitHub pages afterwards.
Social Cross-Posting: Publish to multiple social media platforms at once, respecting their various algorithms.
BYD: Generate and publish proof-of-concept apps from spur-of-the-moment app ideas.
Existing Solutions
While developing this workflow, I explored existing plugins that offer similar functionalities:
-
- Automatically transcribes audio notes, extracting metadata, categories, and tags.
- Frontmatter and verbatim text only, no templating or workflows.
-
- Creates high-quality text transcriptions from media files using OpenAIās Whisper.
- Verbatim text only, no templating or workflows.
-
- Records voice notes, transcribes, summarises, and enriches them with AI.
- Ask questions mid recording, automatically filled in.
- Still no workflows.
These plugins offer valuable features, and I could have settled with Vox or Scribe. But while I really enjoy Obsidian, I love the idea of being able to use this for any markdown notes, anywhere.
Better yet is the flexibility of being able to kick off all sorts of actions with CI/CD workflows!
Stay tuned for updates as this project progresses! āØ