ObsidianMD

4099 readers

1 users here now

Unofficial Lemmy community for https://obsidian.md

founded 1 year ago

MODERATORS

[email protected]

Using Pandoc to export to Obsidian markdown? (lemmy.world)

submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/[email protected]

4 comments fedilink hide all child comments

Does anyone have a good setup/configuration for converting documents to Obsidian-flavored markdown with Pandoc? I’ve been fiddling with it for a few hours but can’t seem to get everything right:

Obsidian markdown doesn’t support ^superscript^. I can get Pandoc to use sup instead by allowing raw_html, but then…
Image embeds don’t work. Pandoc wants to use img for some reason, and no matter what relative src I use the image just won’t show up.

I could fix all of this by running the files through a linter of some sort, but I feel like I’m missing something. Surely someone must have had these issues before me, right?

top 4 comments

sorted by: hot top controversial new old

[–] [email protected] 3 points 1 year ago

I got this mostly working, but it was not easy. Not only does Obsidian have a few peculiarities that make it less compatible with standard Markdown, but Word also does a few funny things.

Here's the config.yaml I used for Pandoc:

from: docx
to: markdown-smart-simple_tables-multiline_tables-grid_tables+pipe_tables+yaml_metadata_block-superscript-subscript-bracketed_spans-native_spans-link_attributes-raw_html+rebase_relative_paths+four_space_rule
extract-media: "./"
wrap: preserve
markdown-headings: atx
tab-stop: 2
shift-heading-level-by: 1
standalone: true
template: obsidian.md
filters:
  - compact-list.lua
  - remove-single-characters.py
  - remove-extra-linebreaks.py
metadata:
  tags: "tags/go/here"

The three filters:

Removed extra linebreaks added between bulleted lists to make them more compact.
Removed lines with only a single character in them. Usually an invisible character like nbsp, which made Pandoc's linter not remove them automatically.
Removes linebreaks enclosed in Strong tags. This is an artifact from Word where a line is bolded but has no content: technically the line break is bolded.

I then ran the resulting file through a RegExp replacement to change the superscript carats into HTML sup tags.

Even after all this, I still have to go through with an Obsidian plugin to convert the standard Markdown links and embeds into [[Wikilink]] style, since Obsidian will only do one or the other throughout your whole vault.

[–] [email protected] 1 points 1 year ago

Not sure about a specific plugin but couldn't you sed your way out of it?

[–] [email protected] 1 points 1 year ago (1 children)

I’ve done something like this converting html to obsidian md. I interrogated gpt 3.5 with specifically what I needed to accomplish and went from there. If you can’t accomplish a formatting quirk in the same conversion process you might run iterative processes to accomplish them after conversion. I’ve done similar with BBEdit and vs code basically to find and replace across a lot of documents.

[–] [email protected] 1 points 1 year ago

Oh wait I think you want to expressly use pandoc, my bad