I got this mostly working, but it was not easy. Not only does Obsidian have a few peculiarities that make it less compatible with standard Markdown, but Word also does a few funny things.
Here's the config.yaml
I used for Pandoc:
from: docx
to: markdown-smart-simple_tables-multiline_tables-grid_tables+pipe_tables+yaml_metadata_block-superscript-subscript-bracketed_spans-native_spans-link_attributes-raw_html+rebase_relative_paths+four_space_rule
extract-media: "./"
wrap: preserve
markdown-headings: atx
tab-stop: 2
shift-heading-level-by: 1
standalone: true
template: obsidian.md
filters:
- compact-list.lua
- remove-single-characters.py
- remove-extra-linebreaks.py
metadata:
tags: "tags/go/here"
The three filters:
- Removed extra linebreaks added between bulleted lists to make them more compact.
- Removed lines with only a single character in them. Usually an invisible character like
nbsp
, which made Pandoc's linter not remove them automatically. - Removes linebreaks enclosed in
Strong
tags. This is an artifact from Word where a line is bolded but has no content: technically the line break is bolded.
I then ran the resulting file through a RegExp replacement to change the superscript carats into HTML sup
tags.
Even after all this, I still have to go through with an Obsidian plugin to convert the standard Markdown links and embeds into [[Wikilink]]
style, since Obsidian will only do one or the other throughout your whole vault.