Merge branch 'develop'

bump to version v1.2.0
Add basic type stubs (#221 ) (#215 )
2025-08-09 19:41:10 +02:00 · 2025-08-09 19:40:43 +02:00 · 2025-08-03 06:35:46 -04:00 · 2025-08-03 06:24:28 -04:00 · 2025-07-14 21:52:04 +02:00 · 2025-07-09 22:08:47 +02:00
12 changed files with 379 additions and 44 deletions
--- a/.github/workflows/python-app.yml
+++ b/.github/workflows/python-app.yml
@@ -15,7 +15,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
-    - uses: actions/checkout@v2
+    - uses: actions/checkout@v4
    - name: Set up Python 3.8
      uses: actions/setup-python@v2
      with:
@@ -30,3 +30,22 @@ jobs:
    - name: Build
      run: |
        python -m build -nwsx .
  types:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    - name: Set up Python 3.8
      uses: actions/setup-python@v2
      with:
        python-version: 3.8
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install --upgrade setuptools setuptools_scm wheel build tox mypy types-beautifulsoup4
    - name: Check types
      run: |
        mypy .
        mypy --strict tests/types.py
--- a/.github/workflows/python-publish.yml
+++ b/.github/workflows/python-publish.yml
@@ -13,7 +13,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
-    - uses: actions/checkout@v2
+    - uses: actions/checkout@v4
    - name: Set up Python
      uses: actions/setup-python@v2
      with:
--- a/README.rst
+++ b/README.rst
@@ -110,7 +110,7 @@ code_language_callback
  When the HTML code contains ``pre`` tags that in some way provide the code
  language, for example as class, this callback can be used to extract the
  language from the tag and prefix it to the converted ``pre`` tag.
-  The callback gets one single argument, an BeautifylSoup object, and returns
+  The callback gets one single argument, a BeautifulSoup object, and returns
  a string containing the code language, or ``None``.
  An example to use the class name as code language could be::
@@ -157,6 +157,23 @@ strip_document
  within the document are unaffected.
  Defaults to ``STRIP``.
 strip_pre
  Controls whether leading/trailing blank lines are removed from ``<pre>``
  tags. Supported values are ``STRIP`` (all leading/trailing blank lines),
  ``STRIP_ONE`` (one leading/trailing blank line), and ``None`` (neither).
  Defaults to ``STRIP``.
 bs4_options
  Specify additional configuration options for the ``BeautifulSoup`` object
  used to interpret the HTML markup. String and list values (such as ``lxml``
  or ``html5lib``) are treated as ``features`` arguments to control parser
  selection. Dictionary values (such as ``{"from_encoding": "iso-8859-8"}``)
  are treated as full kwargs to be used for the BeautifulSoup constructor,
  allowing specification of any parameter. For parameter details, see the
  Beautiful Soup documentation at:
 .. _BeautifulSoup: https://www.crummy.com/software/BeautifulSoup/bs4/doc/
 Options may be specified as kwargs to the ``markdownify`` function, or as a
 nested ``Options`` class in ``MarkdownConverter`` subclasses.
--- a/markdownify/init.py
+++ b/markdownify/init.py
@@ -11,6 +11,10 @@ re_whitespace = re.compile(r'[\t ]+')
 re_all_whitespace = re.compile(r'[\t \r\n]+')
 re_newline_whitespace = re.compile(r'[\t \r\n]*[\r\n][\t \r\n]*')
 re_html_heading = re.compile(r'h(\d+)')
 re_pre_lstrip1 = re.compile(r'^ *\n')
 re_pre_rstrip1 = re.compile(r'\n *$')
 re_pre_lstrip = re.compile(r'^[ \n]*\n')
 re_pre_rstrip = re.compile(r'[ \n]*$')
 # Pattern for creating convert_<tag> function names from tag names
 re_make_convert_fn_name = re.compile(r'[\[\]:-]')
@@ -37,6 +41,9 @@ re_escape_misc_hashes = re.compile(r'(\s|^)(#{1,6}(?:\s|$))')
 # confused with a list item
 re_escape_misc_list_items = re.compile(r'((?:\s|^)[0-9]{1,9})([.)](?:\s|$))')
 # Find consecutive backtick sequences in a string
 re_backtick_runs = re.compile(r'`+')
 # Heading styles
 ATX = 'atx'
 ATX_CLOSED = 'atx_closed'
@@ -51,10 +58,25 @@ BACKSLASH = 'backslash'
 ASTERISK = '*'
 UNDERSCORE = '_'
-# Document strip styles
+# Document/pre strip styles
 LSTRIP = 'lstrip'
 RSTRIP = 'rstrip'
 STRIP = 'strip'
 STRIP_ONE = 'strip_one'
 def strip1_pre(text):
    """Strip one leading and trailing newline from a <pre> string."""
    text = re_pre_lstrip1.sub('', text)
    text = re_pre_rstrip1.sub('', text)
    return text
 def strip_pre(text):
    """Strip all leading and trailing newlines from a <pre> string."""
    text = re_pre_lstrip.sub('', text)
    text = re_pre_rstrip.sub('', text)
    return text
 def chomp(text):
@@ -106,6 +128,7 @@ def should_remove_whitespace_inside(el):
    return el.name in ('p', 'blockquote',
                       'article', 'div', 'section',
                       'ol', 'ul', 'li',
                       'dl', 'dt', 'dd',
                       'table', 'thead', 'tbody', 'tfoot',
                       'tr', 'td', 'th')
@@ -153,6 +176,7 @@ def _next_block_content_sibling(el):
 class MarkdownConverter(object):
    class DefaultOptions:
        autolinks = True
        bs4_options = 'html.parser'
        bullets = '*+-'  # An iterable of bullet types.
        code_language = ''
        code_language_callback = None
@@ -166,6 +190,7 @@ class MarkdownConverter(object):
        newline_style = SPACES
        strip = None
        strip_document = STRIP
        strip_pre = STRIP
        strong_em_symbol = ASTERISK
        sub_symbol = ''
        sup_symbol = ''
@@ -186,11 +211,15 @@ class MarkdownConverter(object):
            raise ValueError('You may specify either tags to strip or tags to'
                             ' convert, but not both.')
        # If a string or list is passed to bs4_options, assume it is a 'features' specification
        if not isinstance(self.options['bs4_options'], dict):
            self.options['bs4_options'] = {'features': self.options['bs4_options']}
        # Initialize the conversion function cache
        self.convert_fn_cache = {}
    def convert(self, html):
-        soup = BeautifulSoup(html, 'html.parser')
+        soup = BeautifulSoup(html, **self.options['bs4_options'])
        return self.convert_soup(soup)
    def convert_soup(self, soup):
@@ -361,16 +390,20 @@ class MarkdownConverter(object):
        if not self.should_convert_tag(tag_name):
            return None
-        # Handle headings with _convert_hn() function
+        # Look for an explicitly defined conversion function by tag name first
        convert_fn_name = "convert_%s" % re_make_convert_fn_name.sub("_", tag_name)
        convert_fn = getattr(self, convert_fn_name, None)
        if convert_fn:
            return convert_fn
        # If tag is any heading, handle with convert_hN() function
        match = re_html_heading.match(tag_name)
        if match:
-            n = int(match.group(1))
+            n = int(match.group(1))  # get value of N from <hN>
-            return lambda el, text, parent_tags: self._convert_hn(n, el, text, parent_tags)
+            return lambda el, text, parent_tags: self.convert_hN(n, el, text, parent_tags)
-        # For other tags, look up their conversion function by tag name
+        # No conversion function was found
-        convert_fn_name = "convert_%s" % re_make_convert_fn_name.sub('_', tag_name)
+        return None
        convert_fn = getattr(self, convert_fn_name, None)
        return convert_fn
    def should_convert_tag(self, tag):
        """Given a tag name, return whether to convert based on strip/convert options."""
@@ -442,7 +475,7 @@ class MarkdownConverter(object):
    def convert_br(self, el, text, parent_tags):
        if '_inline' in parent_tags:
-            return ""
+            return ' '
        if self.options['newline_style'].lower() == BACKSLASH:
            return '\\\n'
@@ -450,10 +483,24 @@ class MarkdownConverter(object):
            return '  \n'
    def convert_code(self, el, text, parent_tags):
-        if 'pre' in parent_tags:
+        if '_noformat' in parent_tags:
            return text
-        converter = abstract_inline_conversion(lambda self: '`')
+
-        return converter(self, el, text, parent_tags)
+        prefix, suffix, text = chomp(text)
        if not text:
            return ''
        # Find the maximum number of consecutive backticks in the text, then
        # delimit the code span with one more backtick than that
        max_backticks = max((len(match) for match in re.findall(re_backtick_runs, text)), default=0)
        markup_delimiter = '`' * (max_backticks + 1)
        # If the maximum number of backticks is greater than zero, add a space
        # to avoid interpretation of inside backticks as literals
        if max_backticks > 0:
            text = " " + text + " "
        return '%s%s%s%s%s' % (prefix, markup_delimiter, text, markup_delimiter, suffix)
    convert_del = abstract_inline_conversion(lambda self: '~~')
@@ -489,6 +536,11 @@ class MarkdownConverter(object):
        return '%s\n' % text
    # definition lists are formatted as follows:
    #   https://pandoc.org/MANUAL.html#definition-lists
    #   https://michelf.ca/projects/php-markdown/extra/#def-list
    convert_dl = convert_div
    def convert_dt(self, el, text, parent_tags):
        # remove newlines from term text
        text = (text or '').strip()
@@ -501,14 +553,14 @@ class MarkdownConverter(object):
        # TODO - format consecutive <dt> elements as directly adjacent lines):
        #   https://michelf.ca/projects/php-markdown/extra/#def-list
-        return '\n%s\n' % text
+        return '\n\n%s\n' % text
-    def _convert_hn(self, n, el, text, parent_tags):
+    def convert_hN(self, n, el, text, parent_tags):
-        """ Method name prefixed with _ to prevent <hn> to call this """
+        # convert_hN() converts <hN> tags, where N is any integer
        if '_inline' in parent_tags:
            return text
-        # prevent MemoryErrors in case of very large n
+        # Markdown does not support heading depths of n > 6
        n = max(1, min(6, n))
        style = self.options['heading_style'].lower()
@@ -538,6 +590,24 @@ class MarkdownConverter(object):
        return '![%s](%s%s)' % (alt, src, title_part)
    def convert_video(self, el, text, parent_tags):
        if ('_inline' in parent_tags
                and el.parent.name not in self.options['keep_inline_images_in']):
            return text
        src = el.attrs.get('src', None) or ''
        if not src:
            sources = el.find_all('source', attrs={'src': True})
            if sources:
                src = sources[0].attrs.get('src', None) or ''
        poster = el.attrs.get('poster', None) or ''
        if src and poster:
            return '[![%s](%s)](%s)' % (text, poster, src)
        if src:
            return '[%s](%s)' % (text, src)
        if poster:
            return '![%s](%s)' % (text, poster)
        return text
    def convert_list(self, el, text, parent_tags):
        # Converting a list to inline is undefined.
@@ -623,8 +693,20 @@ class MarkdownConverter(object):
        if self.options['code_language_callback']:
            code_language = self.options['code_language_callback'](el) or code_language
        if self.options['strip_pre'] == STRIP:
            text = strip_pre(text)  # remove all leading/trailing newlines
        elif self.options['strip_pre'] == STRIP_ONE:
            text = strip1_pre(text)  # remove one leading/trailing newline
        elif self.options['strip_pre'] is None:
            pass  # leave leading and trailing newlines as-is
        else:
            raise ValueError('Invalid value for strip_pre: %s' % self.options['strip_pre'])
        return '\n\n```%s\n%s\n```\n\n' % (code_language, text)
    def convert_q(self, el, text, parent_tags):
        return '"' + text + '"'
    def convert_script(self, el, text, parent_tags):
        return ''
@@ -653,13 +735,13 @@ class MarkdownConverter(object):
    def convert_td(self, el, text, parent_tags):
        colspan = 1
        if 'colspan' in el.attrs and el['colspan'].isdigit():
-            colspan = int(el['colspan'])
+            colspan = max(1, min(1000, int(el['colspan'])))
        return ' ' + text.strip().replace("\n", " ") + ' |' * colspan
    def convert_th(self, el, text, parent_tags):
        colspan = 1
        if 'colspan' in el.attrs and el['colspan'].isdigit():
-            colspan = int(el['colspan'])
+            colspan = max(1, min(1000, int(el['colspan'])))
        return ' ' + text.strip().replace("\n", " ") + ' |' * colspan
    def convert_tr(self, el, text, parent_tags):
@@ -677,6 +759,12 @@ class MarkdownConverter(object):
        )
        overline = ''
        underline = ''
        full_colspan = 0
        for cell in cells:
            if 'colspan' in cell.attrs and cell['colspan'].isdigit():
                full_colspan += max(1, min(1000, int(cell['colspan'])))
            else:
                full_colspan += 1
        if ((is_headrow
             or (is_head_row_missing
                 and self.options['table_infer_header']))
@@ -685,12 +773,6 @@ class MarkdownConverter(object):
            # - is headline or
            # - headline is missing and header inference is enabled
            # print headline underline
            full_colspan = 0
            for cell in cells:
                if 'colspan' in cell.attrs and cell['colspan'].isdigit():
                    full_colspan += int(cell["colspan"])
                else:
                    full_colspan += 1
            underline += '| ' + ' | '.join(['---'] * full_colspan) + ' |' + '\n'
        elif ((is_head_row_missing
               and not self.options['table_infer_header'])
@@ -703,8 +785,8 @@ class MarkdownConverter(object):
            #  - the parent is table or
            #  - the parent is tbody at the beginning of a table.
            # print empty headline above this row
-            overline += '| ' + ' | '.join([''] * len(cells)) + ' |' + '\n'
+            overline += '| ' + ' | '.join([''] * full_colspan) + ' |' + '\n'
-            overline += '| ' + ' | '.join(['---'] * len(cells)) + ' |' + '\n'
+            overline += '| ' + ' | '.join(['---'] * full_colspan) + ' |' + '\n'
        return overline + '|' + text + '\n' + underline
--- a/markdownify/init.pyi
+++ b/markdownify/init.pyi
@@ -0,0 +1,77 @@
 from _typeshed import Incomplete
 from typing import Callable, Union
 ATX: str
 ATX_CLOSED: str
 UNDERLINED: str
 SETEXT = UNDERLINED
 SPACES: str
 BACKSLASH: str
 ASTERISK: str
 UNDERSCORE: str
 LSTRIP: str
 RSTRIP: str
 STRIP: str
 STRIP_ONE: str
 def markdownify(
    html: str,
    autolinks: bool = ...,
    bs4_options: str = ...,
    bullets: str = ...,
    code_language: str = ...,
    code_language_callback: Union[Callable[[Incomplete], Union[str, None]], None] = ...,
    convert: Union[list[str], None] = ...,
    default_title: bool = ...,
    escape_asterisks: bool = ...,
    escape_underscores: bool = ...,
    escape_misc: bool = ...,
    heading_style: str = ...,
    keep_inline_images_in: list[str] = ...,
    newline_style: str = ...,
    strip: Union[list[str], None] = ...,
    strip_document: Union[str, None] = ...,
    strip_pre: str = ...,
    strong_em_symbol: str = ...,
    sub_symbol: str = ...,
    sup_symbol: str = ...,
    table_infer_header: bool = ...,
    wrap: bool = ...,
    wrap_width: int = ...,
 ) -> str: ...
 class MarkdownConverter:
    def __init__(
        self,
        autolinks: bool = ...,
        bs4_options: str = ...,
        bullets: str = ...,
        code_language: str = ...,
        code_language_callback: Union[Callable[[Incomplete], Union[str, None]], None] = ...,
        convert: Union[list[str], None] = ...,
        default_title: bool = ...,
        escape_asterisks: bool = ...,
        escape_underscores: bool = ...,
        escape_misc: bool = ...,
        heading_style: str = ...,
        keep_inline_images_in: list[str] = ...,
        newline_style: str = ...,
        strip: Union[list[str], None] = ...,
        strip_document: Union[str, None] = ...,
        strip_pre: str = ...,
        strong_em_symbol: str = ...,
        sub_symbol: str = ...,
        sup_symbol: str = ...,
        table_infer_header: bool = ...,
        wrap: bool = ...,
        wrap_width: int = ...,
    ) -> None:
        ...
    def convert(self, html: str) -> str:
        ...
    def convert_soup(self, soup: Incomplete) -> str:
        ...
--- a/markdownify/main.py
+++ b/markdownify/main.py
@@ -55,7 +55,9 @@ def main(argv=sys.argv[1:]):
    parser.add_argument('--no-escape-underscores', dest='escape_underscores',
                        action='store_false',
                        help="Do not escape '_' to '\\_' in text.")
-    parser.add_argument('-i', '--keep-inline-images-in', nargs='*',
+    parser.add_argument('-i', '--keep-inline-images-in',
                        default=[],
                        nargs='*',
                        help="Images are converted to their alt-text when the images are "
                        "located inside headlines or table cells. If some inline images "
                        "should be converted to markdown images instead, this option can "
@@ -68,6 +70,11 @@ def main(argv=sys.argv[1:]):
    parser.add_argument('-w', '--wrap', action='store_true',
                        help="Wrap all text paragraphs at --wrap-width characters.")
    parser.add_argument('--wrap-width', type=int, default=80)
    parser.add_argument('--bs4-options',
                        default='html.parser',
                        help="Specifies the parser that BeautifulSoup should use to parse "
                             "the HTML markup. Examples include 'html5.parser', 'lxml', and "
                             "'html5lib'.")
    args = parser.parse_args(argv)
    print(markdownify(**vars(args)))
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "markdownify"
-version = "1.0.0"
+version = "1.2.0"
 authors = [{name = "Matthew Tretter", email = "m@tthewwithanm.com"}]
 description = "Convert HTML to markdown."
 readme = "README.rst"
--- a/tests/test_args.py
+++ b/tests/test_args.py
@@ -2,7 +2,7 @@
 Test whitelisting/blacklisting of specific tags.
 """
-from markdownify import markdownify, LSTRIP, RSTRIP, STRIP
+from markdownify import markdownify, LSTRIP, RSTRIP, STRIP, STRIP_ONE
 from .utils import md
@@ -32,3 +32,16 @@ def test_strip_document():
    assert markdownify("<p>Hello</p>", strip_document=RSTRIP) == "\n\nHello"
    assert markdownify("<p>Hello</p>", strip_document=STRIP) == "Hello"
    assert markdownify("<p>Hello</p>", strip_document=None) == "\n\nHello\n\n"
 def test_strip_pre():
    assert markdownify("<pre>  \n  \n  Hello  \n  \n  </pre>") == "```\n  Hello\n```"
    assert markdownify("<pre>  \n  \n  Hello  \n  \n  </pre>", strip_pre=STRIP) == "```\n  Hello\n```"
    assert markdownify("<pre>  \n  \n  Hello  \n  \n  </pre>", strip_pre=STRIP_ONE) == "```\n  \n  Hello  \n  \n```"
    assert markdownify("<pre>  \n  \n  Hello  \n  \n  </pre>", strip_pre=None) == "```\n  \n  \n  Hello  \n  \n  \n```"
 def bs4_options():
    assert markdownify("<p>Hello</p>", bs4_options="html.parser") == "Hello"
    assert markdownify("<p>Hello</p>", bs4_options=["html.parser"]) == "Hello"
    assert markdownify("<p>Hello</p>", bs4_options={"features": "html.parser"}) == "Hello"
--- a/tests/test_conversions.py
+++ b/tests/test_conversions.py
@@ -79,6 +79,8 @@ def test_blockquote_nested():
 def test_br():
    assert md('a<br />b<br />c') == 'a  \nb  \nc'
    assert md('a<br />b<br />c', newline_style=BACKSLASH) == 'a\\\nb\\\nc'
    assert md('<h1>foo<br />bar</h1>', heading_style=ATX) == '\n\n# foo bar\n\n'
    assert md('<td>foo<br />bar</td>', heading_style=ATX) == ' foo bar |'
 def test_code():
@@ -99,16 +101,19 @@ def test_code():
    assert md('<code>foo<s> bar </s>baz</code>') == '`foo bar baz`'
    assert md('<code>foo<sup>bar</sup>baz</code>', sup_symbol='^') == '`foobarbaz`'
    assert md('<code>foo<sub>bar</sub>baz</code>', sub_symbol='^') == '`foobarbaz`'
    assert md('foo<code>`bar`</code>baz') == 'foo`` `bar` ``baz'
    assert md('foo<code>``bar``</code>baz') == 'foo``` ``bar`` ```baz'
    assert md('foo<code> `bar` </code>baz') == 'foo `` `bar` `` baz'
 def test_dl():
-    assert md('<dl><dt>term</dt><dd>definition</dd></dl>') == '\nterm\n:   definition\n'
+    assert md('<dl><dt>term</dt><dd>definition</dd></dl>') == '\n\nterm\n:   definition\n\n'
-    assert md('<dl><dt><p>te</p><p>rm</p></dt><dd>definition</dd></dl>') == '\nte rm\n:   definition\n'
+    assert md('<dl><dt><p>te</p><p>rm</p></dt><dd>definition</dd></dl>') == '\n\nte rm\n:   definition\n\n'
-    assert md('<dl><dt>term</dt><dd><p>definition-p1</p><p>definition-p2</p></dd></dl>') == '\nterm\n:   definition-p1\n\n    definition-p2\n'
+    assert md('<dl><dt>term</dt><dd><p>definition-p1</p><p>definition-p2</p></dd></dl>') == '\n\nterm\n:   definition-p1\n\n    definition-p2\n\n'
-    assert md('<dl><dt>term</dt><dd><p>definition 1</p></dd><dd><p>definition 2</p></dd></dl>') == '\nterm\n:   definition 1\n:   definition 2\n'
+    assert md('<dl><dt>term</dt><dd><p>definition 1</p></dd><dd><p>definition 2</p></dd></dl>') == '\n\nterm\n:   definition 1\n:   definition 2\n\n'
-    assert md('<dl><dt>term 1</dt><dd>definition 1</dd><dt>term 2</dt><dd>definition 2</dd></dl>') == '\nterm 1\n:   definition 1\nterm 2\n:   definition 2\n'
+    assert md('<dl><dt>term 1</dt><dd>definition 1</dd><dt>term 2</dt><dd>definition 2</dd></dl>') == '\n\nterm 1\n:   definition 1\n\nterm 2\n:   definition 2\n\n'
-    assert md('<dl><dt>term</dt><dd><blockquote><p>line 1</p><p>line 2</p></blockquote></dd></dl>') == '\nterm\n:   > line 1\n    >\n    > line 2\n'
+    assert md('<dl><dt>term</dt><dd><blockquote><p>line 1</p><p>line 2</p></blockquote></dd></dl>') == '\n\nterm\n:   > line 1\n    >\n    > line 2\n\n'
-    assert md('<dl><dt>term</dt><dd><ol><li><p>1</p><ul><li>2a</li><li>2b</li></ul></li><li><p>3</p></li></ol></dd></dl>') == '\nterm\n:   1. 1\n\n       * 2a\n       * 2b\n    2. 3\n'
+    assert md('<dl><dt>term</dt><dd><ol><li><p>1</p><ul><li>2a</li><li>2b</li></ul></li><li><p>3</p></li></ol></dd></dl>') == '\n\nterm\n:   1. 1\n\n       * 2a\n       * 2b\n    2. 3\n\n'
 def test_del():
@@ -162,7 +167,8 @@ def test_hn():
    assert md('<h5>Hello</h5>') == '\n\n##### Hello\n\n'
    assert md('<h6>Hello</h6>') == '\n\n###### Hello\n\n'
    assert md('<h10>Hello</h10>') == md('<h6>Hello</h6>')
-    assert md('<hn>Hello</hn>') == md('Hello')
+    assert md('<h0>Hello</h0>') == md('<h1>Hello</h1>')
    assert md('<hx>Hello</hx>') == md('Hello')
 def test_hn_chained():
@@ -243,6 +249,14 @@ def test_img():
    assert md('<img src="/path/to/img.jpg" alt="Alt text" />') == '![Alt text](/path/to/img.jpg)'
 def test_video():
    assert md('<video src="/path/to/video.mp4" poster="/path/to/img.jpg">text</video>') == '[![text](/path/to/img.jpg)](/path/to/video.mp4)'
    assert md('<video src="/path/to/video.mp4">text</video>') == '[text](/path/to/video.mp4)'
    assert md('<video><source src="/path/to/video.mp4"/>text</video>') == '[text](/path/to/video.mp4)'
    assert md('<video poster="/path/to/img.jpg">text</video>') == '![text](/path/to/img.jpg)'
    assert md('<video>text</video>') == 'text'
 def test_kbd():
    inline_tests('kbd', '`')
@@ -294,6 +308,11 @@ def test_pre():
    assert md("<p>foo</p>\n<pre>bar</pre>\n</p>baz</p>", sub_symbol="^") == "\n\nfoo\n\n```\nbar\n```\n\nbaz"
 def test_q():
    assert md('foo <q>quote</q> bar') == 'foo "quote" bar'
    assert md('foo <q cite="https://example.com">quote</q> bar') == 'foo "quote" bar'
 def test_script():
    assert md('foo <script>var foo=42;</script> bar') == 'foo  bar'
@@ -354,4 +373,4 @@ def test_spaces():
    assert md('test <blockquote> text </blockquote> after') == 'test\n> text\n\nafter'
    assert md(' <ol> <li> x </li> <li> y </li> </ol> ') == '\n\n1. x\n2. y\n'
    assert md(' <ul> <li> x </li> <li> y </li> </ol> ') == '\n\n* x\n* y\n'
-    assert md('test <pre> foo </pre> bar') == 'test\n\n```\n foo \n```\n\nbar'
+    assert md('test <pre> foo </pre> bar') == 'test\n\n```\n foo\n```\n\nbar'
--- a/tests/test_custom_converter.py
+++ b/tests/test_custom_converter.py
@@ -12,7 +12,15 @@ class UnitTestConverter(MarkdownConverter):
    def convert_custom_tag(self, el, text, parent_tags):
        """Ensure conversion function is found for tags with special characters in name"""
-        return "FUNCTION USED: %s" % text
+        return "convert_custom_tag(): %s" % text
    def convert_h1(self, el, text, parent_tags):
        """Ensure explicit heading conversion function is used"""
        return "convert_h1: %s" % (text)
    def convert_hN(self, n, el, text, parent_tags):
        """Ensure general heading conversion function is used"""
        return "convert_hN(%d): %s" % (n, text)
 def test_custom_conversion_functions():
@@ -23,7 +31,11 @@ def test_custom_conversion_functions():
    assert md('<img src="/path/to/img.jpg" alt="Alt text" title="Optional title" />text') == '![Alt text](/path/to/img.jpg "Optional title")\n\ntext'
    assert md('<img src="/path/to/img.jpg" alt="Alt text" />text') == '![Alt text](/path/to/img.jpg)\n\ntext'
-    assert md("<custom-tag>text</custom-tag>") == "FUNCTION USED: text"
+    assert md("<custom-tag>text</custom-tag>") == "convert_custom_tag(): text"
    assert md("<h1>text</h1>") == "convert_h1: text"
    assert md("<h3>text</h3>") == "convert_hN(3): text"
 def test_soup():
--- a/tests/test_tables.py
+++ b/tests/test_tables.py
@@ -267,6 +267,23 @@ table_with_undefined_colspan = """<table>
    </tr>
 </table>"""
 table_with_colspan_missing_head = """<table>
    <tr>
        <td colspan="2">Name</td>
        <td>Age</td>
    </tr>
    <tr>
        <td>Jill</td>
        <td>Smith</td>
        <td>50</td>
    </tr>
    <tr>
        <td>Eve</td>
        <td>Jackson</td>
        <td>94</td>
    </tr>
 </table>"""
 def test_table():
    assert md(table) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
@@ -283,6 +300,7 @@ def test_table():
    assert md(table_with_caption) == 'TEXT\n\nCaption\n\n|  |  |  |\n| --- | --- | --- |\n| Firstname | Lastname | Age |\n\n'
    assert md(table_with_colspan) == '\n\n| Name | | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
    assert md(table_with_undefined_colspan) == '\n\n| Name | Age |\n| --- | --- |\n| Jill | Smith |\n\n'
    assert md(table_with_colspan_missing_head) == '\n\n|  |  |  |\n| --- | --- | --- |\n| Name | | Age |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
 def test_table_infer_header():
@@ -300,3 +318,4 @@ def test_table_infer_header():
    assert md(table_with_caption, table_infer_header=True) == 'TEXT\n\nCaption\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n\n'
    assert md(table_with_colspan, table_infer_header=True) == '\n\n| Name | | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
    assert md(table_with_undefined_colspan, table_infer_header=True) == '\n\n| Name | Age |\n| --- | --- |\n| Jill | Smith |\n\n'
    assert md(table_with_colspan_missing_head, table_infer_header=True) == '\n\n| Name | | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
--- a/tests/types.py
+++ b/tests/types.py
@@ -0,0 +1,70 @@
 from markdownify import markdownify, ASTERISK, BACKSLASH, LSTRIP, RSTRIP, SPACES, STRIP, UNDERLINED, UNDERSCORE, MarkdownConverter
 from bs4 import BeautifulSoup
 from typing import Union
 markdownify("<p>Hello</p>") == "Hello"  # test default of STRIP
 markdownify("<p>Hello</p>", strip_document=LSTRIP) == "Hello\n\n"
 markdownify("<p>Hello</p>", strip_document=RSTRIP) == "\n\nHello"
 markdownify("<p>Hello</p>", strip_document=STRIP) == "Hello"
 markdownify("<p>Hello</p>", strip_document=None) == "\n\nHello\n\n"
 # default options
 MarkdownConverter(
    autolinks=True,
    bs4_options='html.parser',
    bullets='*+-',
    code_language='',
    code_language_callback=None,
    convert=None,
    default_title=False,
    escape_asterisks=True,
    escape_underscores=True,
    escape_misc=False,
    heading_style=UNDERLINED,
    keep_inline_images_in=[],
    newline_style=SPACES,
    strip=None,
    strip_document=STRIP,
    strip_pre=STRIP,
    strong_em_symbol=ASTERISK,
    sub_symbol='',
    sup_symbol='',
    table_infer_header=False,
    wrap=False,
    wrap_width=80,
 ).convert("")
 # custom options
 MarkdownConverter(
    strip_document=None,
    bullets="-",
    escape_asterisks=True,
    escape_underscores=True,
    escape_misc=True,
    autolinks=True,
    default_title=True,
    newline_style=BACKSLASH,
    sup_symbol='^',
    sub_symbol='^',
    keep_inline_images_in=['h3'],
    wrap=True,
    wrap_width=80,
    strong_em_symbol=UNDERSCORE,
    code_language='python',
    code_language_callback=None
 ).convert("")
 html = '<b>test</b>'
 soup = BeautifulSoup(html, 'html.parser')
 MarkdownConverter().convert_soup(soup) == '**test**'
 def callback(el: BeautifulSoup) -> Union[str, None]:
    return el['class'][0] if el.has_attr('class') else None
 MarkdownConverter(code_language_callback=callback).convert("")
 MarkdownConverter(code_language_callback=lambda el: None).convert("")
 markdownify('<pre class="python">test\n    foo\nbar</pre>', code_language_callback=callback)
 markdownify('<pre class="python">test\n    foo\nbar</pre>', code_language_callback=lambda el: None)
Author	SHA1	Message	Date
AlexVonB	c47709c21c	Merge branch 'develop'	2025-08-09 19:41:10 +02:00
AlexVonB	fbc1353593	bump to version v1.2.0	2025-08-09 19:40:43 +02:00
Gareth Jones	85ef82e083	Add basic type stubs (#221 ) (#215 ) * feat: add basic type stubs * feat: add types for constants * feat: add type for `MarkdownConverter` class * ci: add basic job for checking types * feat: add new constant * ci: install types as required * ci: install types package manually * test: add strict coverage for types * fix: allow `strip_document` to be `None` * feat: expand types for MarkdownConverter * fix: do not use `Unpack` as it requires Python 3.12 * feat: define `MarkdownConverter#convert_soup` * feat: improve type for `code_language_callback` * chore: add end-of-file newline * refactor: use `Union` for now	2025-08-03 06:35:46 -04:00
Gareth Jones	f7053e46ab	docs: fix typo (#234 )	2025-08-03 06:24:28 -04:00
Gareth Jones	7edbc5a22b	ci: update `actions/checkout` to v4 (#233 ) * ci: update `actions/checkout` to v4	2025-07-14 21:52:04 +02:00
alheiveea	76e5edb357	limit colspan values to range [1, 1000] (#232 )	2025-07-09 22:08:47 +02:00
Chris Papademetrious	48724e7002	support backticks in <code> spans (#226 ) (#230 ) Signed-off-by: chrispy <chrispy@synopsys.com>	2025-06-29 14:56:21 -04:00
Chris Papademetrious	9b1412aa5b	implement a strip_pre configuration option (#218 ) (#222 ) Signed-off-by: chrispy <chrispy@synopsys.com>	2025-06-14 16:37:47 -04:00
Chris Papademetrious	75ab3064dd	allow BeautifulSoup configuration kwargs to be specified (#224 ) Signed-off-by: chrispy <chrispy@synopsys.com>	2025-06-14 09:06:22 -04:00
Chris Papademetrious	016251e915	ensure that explicitly provided heading conversion functions are used (#212 ) (#214 ) Signed-off-by: chrispy <chrispy@synopsys.com>	2025-05-03 10:57:09 -04:00
Colin	0e1a849346	Add conversion support for <q> tags (#217 )	2025-04-28 06:37:33 -04:00
Chris Papademetrious	e29de4e753	make convert_hn() public instead of internal (#213 ) Signed-off-by: chrispy <chrispy@synopsys.com>	2025-04-20 06:20:01 -04:00
Vincent Kelleher	2d654a6b7e	Add beautiful_soup_parser option (#206 ) * add beautiful_soup_parser option * add Beautiful Soup parser argument to command line --------- Co-authored-by: Vincent Kelleher <vincent.kelleher-ext@francetravail.fr> Co-authored-by: AlexVonB <AlexVonB@users.noreply.github.com>	2025-03-29 11:29:29 +01:00
chrispy	26566891a7	Merge branch 'develop'	2025-03-05 06:48:47 -05:00
chrispy	13183f9925	bump to version v1.1.0 Signed-off-by: chrispy <chrispy@synopsys.com>	2025-03-05 06:47:28 -05:00
Stephen V. Brown	7908f1492a	Generalize handling of colspan in case where colspan is in first row but header row is missing (#203 )	2025-03-04 20:01:16 -05:00
Chris Papademetrious	618747c18c	in inline contexts, resolve <br/> to a space instead of an empty string (#202 ) Signed-off-by: chrispy <chrispy@synopsys.com>	2025-03-04 07:37:22 -05:00
Chris Papademetrious	5122c973c1	add missing newlines for definition lists (#200 ) Signed-off-by: chrispy <chrispy@synopsys.com>	2025-03-02 06:42:56 -05:00
itmammoth	ac5736f0a3	Support `video` tag with `poster` attribute (#189 )	2025-02-28 10:51:42 +01:00