use static version instead of dynamic git tag info

test build in develop and pull requests
Merge branch 'pyproject.toml' of https://github.com/KOLANICH-libs/markdownify.py into KOLANICH-libs-pyproject.toml
2024-07-14 22:34:30 +02:00 · 2024-07-14 22:10:01 +02:00 · 2024-07-14 21:53:09 +02:00 · 2022-11-10 15:29:25 +03:00 · 2022-11-10 15:27:26 +03:00 · 2022-11-10 15:25:39 +03:00
18 changed files with 284 additions and 1091 deletions
--- a/.github/workflows/python-app.yml
+++ b/.github/workflows/python-app.yml
@@ -15,7 +15,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
-    - uses: actions/checkout@v4
+    - uses: actions/checkout@v2
    - name: Set up Python 3.8
      uses: actions/setup-python@v2
      with:
@@ -30,22 +30,3 @@ jobs:
    - name: Build
      run: |
        python -m build -nwsx .
  types:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    - name: Set up Python 3.8
      uses: actions/setup-python@v2
      with:
        python-version: 3.8
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install --upgrade setuptools setuptools_scm wheel build tox mypy types-beautifulsoup4
    - name: Check types
      run: |
        mypy .
        mypy --strict tests/types.py
--- a/.github/workflows/python-publish.yml
+++ b/.github/workflows/python-publish.yml
@@ -13,7 +13,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
-    - uses: actions/checkout@v4
+    - uses: actions/checkout@v2
    - name: Set up Python
      uses: actions/setup-python@v2
      with:
--- a/README.rst
+++ b/README.rst
@@ -110,7 +110,7 @@ code_language_callback
  When the HTML code contains ``pre`` tags that in some way provide the code
  language, for example as class, this callback can be used to extract the
  language from the tag and prefix it to the converted ``pre`` tag.
-  The callback gets one single argument, a BeautifulSoup object, and returns
+  The callback gets one single argument, an BeautifylSoup object, and returns
  a string containing the code language, or ``None``.
  An example to use the class name as code language could be::
@@ -128,9 +128,9 @@ escape_underscores
  Defaults to ``True``.
 escape_misc
-  If set to ``True``, escape miscellaneous punctuation characters
+  If set to ``False``, do not escape miscellaneous punctuation characters
  that sometimes have Markdown significance in text.
-  Defaults to ``False``.
+  Defaults to ``True``.
 keep_inline_images_in
  Images are converted to their alt-text when the images are located inside
@@ -139,40 +139,10 @@ keep_inline_images_in
  that should be allowed to contain inline images, for example ``['td']``.
  Defaults to an empty list.
 table_infer_header
  Controls handling of tables with no header row (as indicated by ``<thead>``
  or ``<th>``). When set to ``True``, the first body row is used as the header row.
  Defaults to ``False``, which leaves the header row empty.
 wrap, wrap_width
  If ``wrap`` is set to ``True``, all text paragraphs are wrapped at
  ``wrap_width`` characters. Defaults to ``False`` and ``80``.
  Use with ``newline_style=BACKSLASH`` to keep line breaks in paragraphs.
  A `wrap_width` value of `None` reflows lines to unlimited line length.
 strip_document
  Controls whether leading and/or trailing separation newlines are removed from
  the final converted document. Supported values are ``LSTRIP`` (leading),
  ``RSTRIP`` (trailing), ``STRIP`` (both), and ``None`` (neither). Newlines
  within the document are unaffected.
  Defaults to ``STRIP``.
 strip_pre
  Controls whether leading/trailing blank lines are removed from ``<pre>``
  tags. Supported values are ``STRIP`` (all leading/trailing blank lines),
  ``STRIP_ONE`` (one leading/trailing blank line), and ``None`` (neither).
  Defaults to ``STRIP``.
 bs4_options
  Specify additional configuration options for the ``BeautifulSoup`` object
  used to interpret the HTML markup. String and list values (such as ``lxml``
  or ``html5lib``) are treated as ``features`` arguments to control parser
  selection. Dictionary values (such as ``{"from_encoding": "iso-8859-8"}``)
  are treated as full kwargs to be used for the BeautifulSoup constructor,
  allowing specification of any parameter. For parameter details, see the
  Beautiful Soup documentation at:
 .. _BeautifulSoup: https://www.crummy.com/software/BeautifulSoup/bs4/doc/
 Options may be specified as kwargs to the ``markdownify`` function, or as a
 nested ``Options`` class in ``MarkdownConverter`` subclasses.
@@ -197,7 +167,7 @@ If you have a special usecase that calls for a special conversion, you can
 always inherit from ``MarkdownConverter`` and override the method you want to
 change.
 The function that handles a HTML tag named ``abc`` is called
-``convert_abc(self, el, text, parent_tags)`` and returns a string
+``convert_abc(self, el, text, convert_as_inline)`` and returns a string
 containing the converted HTML tag.
 The ``MarkdownConverter`` object will handle the conversion based on the
 function names:
@@ -210,8 +180,8 @@ function names:
        """
        Create a custom MarkdownConverter that adds two newlines after an image
        """
-        def convert_img(self, el, text, parent_tags):
+        def convert_img(self, el, text, convert_as_inline):
-            return super().convert_img(el, text, parent_tags) + '\n\n'
+            return super().convert_img(el, text, convert_as_inline) + '\n\n'
    # Create shorthand method for conversion
    def md(html, **options):
@@ -225,7 +195,7 @@ function names:
        """
        Create a custom MarkdownConverter that ignores paragraphs
        """
-        def convert_p(self, el, text, parent_tags):
+        def convert_p(self, el, text, convert_as_inline):
            return ''
    # Create shorthand method for conversion
--- a/markdownify/init.py
+++ b/markdownify/init.py
@@ -1,48 +1,15 @@
-from bs4 import BeautifulSoup, Comment, Doctype, NavigableString, Tag
+from bs4 import BeautifulSoup, NavigableString, Comment, Doctype
 from textwrap import fill
 import re
 import six
-# General-purpose regex patterns
+convert_heading_re = re.compile(r'convert_h(\d+)')
-re_convert_heading = re.compile(r'convert_h(\d+)')
+line_beginning_re = re.compile(r'^', re.MULTILINE)
-re_line_with_content = re.compile(r'^(.*)', flags=re.MULTILINE)
+whitespace_re = re.compile(r'[\t ]+')
-re_whitespace = re.compile(r'[\t ]+')
+all_whitespace_re = re.compile(r'[\s]+')
-re_all_whitespace = re.compile(r'[\t \r\n]+')
+html_heading_re = re.compile(r'h[1-6]')
 re_newline_whitespace = re.compile(r'[\t \r\n]*[\r\n][\t \r\n]*')
 re_html_heading = re.compile(r'h(\d+)')
 re_pre_lstrip1 = re.compile(r'^ *\n')
 re_pre_rstrip1 = re.compile(r'\n *$')
 re_pre_lstrip = re.compile(r'^[ \n]*\n')
 re_pre_rstrip = re.compile(r'[ \n]*$')
 # Pattern for creating convert_<tag> function names from tag names
 re_make_convert_fn_name = re.compile(r'[\[\]:-]')
 # Extract (leading_nl, content, trailing_nl) from a string
 # (functionally equivalent to r'^(\n*)(.*?)(\n*)$', but greedy is faster than reluctant here)
 re_extract_newlines = re.compile(r'^(\n*)((?:.*[^\n])?)(\n*)$', flags=re.DOTALL)
 # Escape miscellaneous special Markdown characters
 re_escape_misc_chars = re.compile(r'([]\\&<`[>~=+|])')
 # Escape sequence of one or more consecutive '-', preceded
 # and followed by whitespace or start/end of fragment, as it
 # might be confused with an underline of a header, or with a
 # list marker
 re_escape_misc_dash_sequences = re.compile(r'(\s|^)(-+(?:\s|$))')
 # Escape sequence of up to six consecutive '#', preceded
 # and followed by whitespace or start/end of fragment, as
 # it might be confused with an ATX heading
 re_escape_misc_hashes = re.compile(r'(\s|^)(#{1,6}(?:\s|$))')
 # Escape '.' or ')' preceded by up to nine digits, as it might be
 # confused with a list item
 re_escape_misc_list_items = re.compile(r'((?:\s|^)[0-9]{1,9})([.)](?:\s|$))')
 # Find consecutive backtick sequences in a string
 re_backtick_runs = re.compile(r'`+')
 # Heading styles
 ATX = 'atx'
@@ -58,26 +25,6 @@ BACKSLASH = 'backslash'
 ASTERISK = '*'
 UNDERSCORE = '_'
 # Document/pre strip styles
 LSTRIP = 'lstrip'
 RSTRIP = 'rstrip'
 STRIP = 'strip'
 STRIP_ONE = 'strip_one'
 def strip1_pre(text):
    """Strip one leading and trailing newline from a <pre> string."""
    text = re_pre_lstrip1.sub('', text)
    text = re_pre_rstrip1.sub('', text)
    return text
 def strip_pre(text):
    """Strip all leading and trailing newlines from a <pre> string."""
    text = re_pre_lstrip.sub('', text)
    text = re_pre_rstrip.sub('', text)
    return text
 def chomp(text):
    """
@@ -100,13 +47,13 @@ def abstract_inline_conversion(markup_fn):
    the text if it looks like an HTML tag. markup_fn is necessary to allow for
    references to self.strong_em_symbol etc.
    """
-    def implementation(self, el, text, parent_tags):
+    def implementation(self, el, text, convert_as_inline):
        markup_prefix = markup_fn(self)
        if markup_prefix.startswith('<') and markup_prefix.endswith('>'):
            markup_suffix = '</' + markup_prefix[1:]
        else:
            markup_suffix = markup_prefix
-        if '_noformat' in parent_tags:
+        if el.find_parent(['pre', 'code', 'kbd', 'samp']):
            return text
        prefix, suffix, text = chomp(text)
        if not text:
@@ -119,64 +66,9 @@ def _todict(obj):
    return dict((k, getattr(obj, k)) for k in dir(obj) if not k.startswith('_'))
 def should_remove_whitespace_inside(el):
    """Return to remove whitespace immediately inside a block-level element."""
    if not el or not el.name:
        return False
    if re_html_heading.match(el.name) is not None:
        return True
    return el.name in ('p', 'blockquote',
                       'article', 'div', 'section',
                       'ol', 'ul', 'li',
                       'dl', 'dt', 'dd',
                       'table', 'thead', 'tbody', 'tfoot',
                       'tr', 'td', 'th')
 def should_remove_whitespace_outside(el):
    """Return to remove whitespace immediately outside a block-level element."""
    return should_remove_whitespace_inside(el) or (el and el.name == 'pre')
 def _is_block_content_element(el):
    """
    In a block context, returns:
    - True for content elements (tags and non-whitespace text)
    - False for non-content elements (whitespace text, comments, doctypes)
    """
    if isinstance(el, Tag):
        return True
    elif isinstance(el, (Comment, Doctype)):
        return False  # (subclasses of NavigableString, must test first)
    elif isinstance(el, NavigableString):
        return el.strip() != ''
    else:
        return False
 def _prev_block_content_sibling(el):
    """Returns the first previous sibling that is a content element, else None."""
    while el is not None:
        el = el.previous_sibling
        if _is_block_content_element(el):
            return el
    return None
 def _next_block_content_sibling(el):
    """Returns the first next sibling that is a content element, else None."""
    while el is not None:
        el = el.next_sibling
        if _is_block_content_element(el):
            return el
    return None
 class MarkdownConverter(object):
    class DefaultOptions:
        autolinks = True
        bs4_options = 'html.parser'
        bullets = '*+-'  # An iterable of bullet types.
        code_language = ''
        code_language_callback = None
@@ -184,17 +76,14 @@ class MarkdownConverter(object):
        default_title = False
        escape_asterisks = True
        escape_underscores = True
-        escape_misc = False
+        escape_misc = True
        heading_style = UNDERLINED
        keep_inline_images_in = []
        newline_style = SPACES
        strip = None
        strip_document = STRIP
        strip_pre = STRIP
        strong_em_symbol = ASTERISK
        sub_symbol = ''
        sup_symbol = ''
        table_infer_header = False
        wrap = False
        wrap_width = 80
@@ -211,202 +100,101 @@ class MarkdownConverter(object):
            raise ValueError('You may specify either tags to strip or tags to'
                             ' convert, but not both.')
        # If a string or list is passed to bs4_options, assume it is a 'features' specification
        if not isinstance(self.options['bs4_options'], dict):
            self.options['bs4_options'] = {'features': self.options['bs4_options']}
        # Initialize the conversion function cache
        self.convert_fn_cache = {}
    def convert(self, html):
-        soup = BeautifulSoup(html, **self.options['bs4_options'])
+        soup = BeautifulSoup(html, 'html.parser')
        return self.convert_soup(soup)
    def convert_soup(self, soup):
-        return self.process_tag(soup, parent_tags=set())
+        return self.process_tag(soup, convert_as_inline=False, children_only=True)
-    def process_element(self, node, parent_tags=None):
+    def process_tag(self, node, convert_as_inline, children_only=False):
-        if isinstance(node, NavigableString):
+        text = ''
            return self.process_text(node, parent_tags=parent_tags)
        else:
            return self.process_tag(node, parent_tags=parent_tags)
-    def process_tag(self, node, parent_tags=None):
+        # markdown headings or cells can't include
-        # For the top-level element, initialize the parent context with an empty set.
+        # block elements (elements w/newlines)
-        if parent_tags is None:
+        isHeading = html_heading_re.match(node.name) is not None
-            parent_tags = set()
+        isCell = node.name in ['td', 'th']
        convert_children_as_inline = convert_as_inline
-        # Collect child elements to process, ignoring whitespace-only text elements
+        if not children_only and (isHeading or isCell):
-        # adjacent to the inner/outer boundaries of block elements.
+            convert_children_as_inline = True
        should_remove_inside = should_remove_whitespace_inside(node)
-        def _can_ignore(el):
+        # Remove whitespace-only textnodes in purely nested nodes
-            if isinstance(el, Tag):
+        def is_nested_node(el):
-                # Tags are always processed.
+            return el and el.name in ['ol', 'ul', 'li',
-                return False
+                                      'table', 'thead', 'tbody', 'tfoot',
-            elif isinstance(el, (Comment, Doctype)):
+                                      'tr', 'td', 'th']
-                # Comment and Doctype elements are always ignored.
+
-                # (subclasses of NavigableString, must test first)
+        if is_nested_node(node):
-                return True
+            for el in node.children:
                # Only extract (remove) whitespace-only text node if any of the
                # conditions is true:
                # - el is the first element in its parent
                # - el is the last element in its parent
                # - el is adjacent to an nested node
                can_extract = (not el.previous_sibling
                               or not el.next_sibling
                               or is_nested_node(el.previous_sibling)
                               or is_nested_node(el.next_sibling))
                if (isinstance(el, NavigableString)
                        and six.text_type(el).strip() == ''
                        and can_extract):
                    el.extract()
        # Convert the children first
        for el in node.children:
            if isinstance(el, Comment) or isinstance(el, Doctype):
                continue
            elif isinstance(el, NavigableString):
-                if six.text_type(el).strip() != '':
+                text += self.process_text(el)
                    # Non-whitespace text nodes are always processed.
                    return False
                elif should_remove_inside and (not el.previous_sibling or not el.next_sibling):
                    # Inside block elements (excluding <pre>), ignore adjacent whitespace elements.
                    return True
                elif should_remove_whitespace_outside(el.previous_sibling) or should_remove_whitespace_outside(el.next_sibling):
                    # Outside block elements (including <pre>), ignore adjacent whitespace elements.
                    return True
                else:
                    return False
            elif el is None:
                return True
            else:
-                raise ValueError('Unexpected element type: %s' % type(el))
+                text += self.process_tag(el, convert_children_as_inline)
-        children_to_convert = [el for el in node.children if not _can_ignore(el)]
+        if not children_only:
-
+            convert_fn = getattr(self, 'convert_%s' % node.name, None)
-        # Create a copy of this tag's parent context, then update it to include this tag
+            if convert_fn and self.should_convert_tag(node.name):
-        # to propagate down into the children.
+                text = convert_fn(node, text, convert_as_inline)
        parent_tags_for_children = set(parent_tags)
        parent_tags_for_children.add(node.name)
        # if this tag is a heading or table cell, add an '_inline' parent pseudo-tag
        if (
            re_html_heading.match(node.name) is not None  # headings
            or node.name in {'td', 'th'}  # table cells
        ):
            parent_tags_for_children.add('_inline')
        # if this tag is a preformatted element, add a '_noformat' parent pseudo-tag
        if node.name in {'pre', 'code', 'kbd', 'samp'}:
            parent_tags_for_children.add('_noformat')
        # Convert the children elements into a list of result strings.
        child_strings = [
            self.process_element(el, parent_tags=parent_tags_for_children)
            for el in children_to_convert
        ]
        # Remove empty string values.
        child_strings = [s for s in child_strings if s]
        # Collapse newlines at child element boundaries, if needed.
        if node.name == 'pre' or node.find_parent('pre'):
            # Inside <pre> blocks, do not collapse newlines.
            pass
        else:
            # Collapse newlines at child element boundaries.
            updated_child_strings = ['']  # so the first lookback works
            for child_string in child_strings:
                # Separate the leading/trailing newlines from the content.
                leading_nl, content, trailing_nl = re_extract_newlines.match(child_string).groups()
                # If the last child had trailing newlines and this child has leading newlines,
                # use the larger newline count, limited to 2.
                if updated_child_strings[-1] and leading_nl:
                    prev_trailing_nl = updated_child_strings.pop()  # will be replaced by the collapsed value
                    num_newlines = min(2, max(len(prev_trailing_nl), len(leading_nl)))
                    leading_nl = '\n' * num_newlines
                # Add the results to the updated child string list.
                updated_child_strings.extend([leading_nl, content, trailing_nl])
            child_strings = updated_child_strings
        # Join all child text strings into a single string.
        text = ''.join(child_strings)
        # apply this tag's final conversion function
        convert_fn = self.get_conv_fn_cached(node.name)
        if convert_fn is not None:
            text = convert_fn(node, text, parent_tags=parent_tags)
        return text
-    def convert__document_(self, el, text, parent_tags):
+    def process_text(self, el):
        """Final document-level formatting for BeautifulSoup object (node.name == "[document]")"""
        if self.options['strip_document'] == LSTRIP:
            text = text.lstrip('\n')  # remove leading separation newlines
        elif self.options['strip_document'] == RSTRIP:
            text = text.rstrip('\n')  # remove trailing separation newlines
        elif self.options['strip_document'] == STRIP:
            text = text.strip('\n')  # remove leading and trailing separation newlines
        elif self.options['strip_document'] is None:
            pass  # leave leading and trailing separation newlines as-is
        else:
            raise ValueError('Invalid value for strip_document: %s' % self.options['strip_document'])
        return text
    def process_text(self, el, parent_tags=None):
        # For the top-level element, initialize the parent context with an empty set.
        if parent_tags is None:
            parent_tags = set()
        text = six.text_type(el) or ''
        # normalize whitespace if we're not inside a preformatted element
-        if 'pre' not in parent_tags:
+        if not el.find_parent('pre'):
-            if self.options['wrap']:
+            text = whitespace_re.sub(' ', text)
                text = re_all_whitespace.sub(' ', text)
            else:
                text = re_newline_whitespace.sub('\n', text)
                text = re_whitespace.sub(' ', text)
        # escape special characters if we're not inside a preformatted or code element
-        if '_noformat' not in parent_tags:
+        if not el.find_parent(['pre', 'code', 'kbd', 'samp']):
-            text = self.escape(text, parent_tags)
+            text = self.escape(text)
-        # remove leading whitespace at the start or just after a
+        # remove trailing whitespaces if any of the following condition is true:
-        # block-level element; remove traliing whitespace at the end
+        # - current text node is the last node in li
-        # or just before a block-level element.
+        # - current text node is followed by an embedded list
-        if (should_remove_whitespace_outside(el.previous_sibling)
+        if (el.parent.name == 'li'
-                or (should_remove_whitespace_inside(el.parent)
+                and (not el.next_sibling
-                    and not el.previous_sibling)):
+                     or el.next_sibling.name in ['ul', 'ol'])):
            text = text.lstrip(' \t\r\n')
        if (should_remove_whitespace_outside(el.next_sibling)
                or (should_remove_whitespace_inside(el.parent)
                    and not el.next_sibling)):
            text = text.rstrip()
        return text
-    def get_conv_fn_cached(self, tag_name):
+    def __getattr__(self, attr):
-        """Given a tag name, return the conversion function using the cache."""
+        # Handle headings
-        # If conversion function is not in cache, add it
+        m = convert_heading_re.match(attr)
-        if tag_name not in self.convert_fn_cache:
+        if m:
-            self.convert_fn_cache[tag_name] = self.get_conv_fn(tag_name)
+            n = int(m.group(1))
-        # Return the cached entry
+            def convert_tag(el, text, convert_as_inline):
-        return self.convert_fn_cache[tag_name]
+                return self.convert_hn(n, el, text, convert_as_inline)
-    def get_conv_fn(self, tag_name):
+            convert_tag.__name__ = 'convert_h%s' % n
-        """Given a tag name, find and return the conversion function."""
+            setattr(self, convert_tag.__name__, convert_tag)
-        tag_name = tag_name.lower()
+            return convert_tag
-        # Handle strip/convert exclusion options
+        raise AttributeError(attr)
        if not self.should_convert_tag(tag_name):
            return None
        # Look for an explicitly defined conversion function by tag name first
        convert_fn_name = "convert_%s" % re_make_convert_fn_name.sub("_", tag_name)
        convert_fn = getattr(self, convert_fn_name, None)
        if convert_fn:
            return convert_fn
        # If tag is any heading, handle with convert_hN() function
        match = re_html_heading.match(tag_name)
        if match:
            n = int(match.group(1))  # get value of N from <hN>
            return lambda el, text, parent_tags: self.convert_hN(n, el, text, parent_tags)
        # No conversion function was found
        return None
    def should_convert_tag(self, tag):
-        """Given a tag name, return whether to convert based on strip/convert options."""
+        tag = tag.lower()
        strip = self.options['strip']
        convert = self.options['convert']
        if strip is not None:
@@ -416,28 +204,26 @@ class MarkdownConverter(object):
        else:
            return True
-    def escape(self, text, parent_tags):
+    def escape(self, text):
        if not text:
            return ''
        if self.options['escape_misc']:
-            text = re_escape_misc_chars.sub(r'\\\1', text)
+            text = re.sub(r'([\\&<`[>~#=+|-])', r'\\\1', text)
-            text = re_escape_misc_dash_sequences.sub(r'\1\\\2', text)
+            text = re.sub(r'([0-9])([.)])', r'\1\\\2', text)
            text = re_escape_misc_hashes.sub(r'\1\\\2', text)
            text = re_escape_misc_list_items.sub(r'\1\\\2', text)
        if self.options['escape_asterisks']:
            text = text.replace('*', r'\*')
        if self.options['escape_underscores']:
            text = text.replace('_', r'\_')
        return text
    def indent(self, text, level):
        return line_beginning_re.sub('\t' * level, text) if text else ''
    def underline(self, text, pad_char):
        text = (text or '').rstrip()
-        return '\n\n%s\n%s\n\n' % (text, pad_char * len(text)) if text else ''
+        return '%s\n%s\n\n' % (text, pad_char * len(text)) if text else ''
-    def convert_a(self, el, text, parent_tags):
+    def convert_a(self, el, text, convert_as_inline):
        if '_noformat' in parent_tags:
            return text
        prefix, suffix, text = chomp(text)
        if not text:
            return ''
@@ -457,188 +243,94 @@ class MarkdownConverter(object):
    convert_b = abstract_inline_conversion(lambda self: 2 * self.options['strong_em_symbol'])
-    def convert_blockquote(self, el, text, parent_tags):
+    def convert_blockquote(self, el, text, convert_as_inline):
        # handle some early-exit scenarios
        text = (text or '').strip(' \t\r\n')
        if '_inline' in parent_tags:
            return ' ' + text + ' '
        if not text:
            return "\n"
-        # indent lines with blockquote marker
+        if convert_as_inline:
-        def _indent_for_blockquote(match):
+            return text
            line_content = match.group(1)
            return '> ' + line_content if line_content else '>'
        text = re_line_with_content.sub(_indent_for_blockquote, text)
-        return '\n' + text + '\n\n'
+        return '\n' + (line_beginning_re.sub('> ', text.strip()) + '\n\n') if text else ''
-    def convert_br(self, el, text, parent_tags):
+    def convert_br(self, el, text, convert_as_inline):
-        if '_inline' in parent_tags:
+        if convert_as_inline:
-            return ' '
+            return ""
        if self.options['newline_style'].lower() == BACKSLASH:
            return '\\\n'
        else:
            return '  \n'
-    def convert_code(self, el, text, parent_tags):
+    def convert_code(self, el, text, convert_as_inline):
-        if '_noformat' in parent_tags:
+        if el.parent.name == 'pre':
            return text
-
+        converter = abstract_inline_conversion(lambda self: '`')
-        prefix, suffix, text = chomp(text)
+        return converter(self, el, text, convert_as_inline)
        if not text:
            return ''
        # Find the maximum number of consecutive backticks in the text, then
        # delimit the code span with one more backtick than that
        max_backticks = max((len(match) for match in re.findall(re_backtick_runs, text)), default=0)
        markup_delimiter = '`' * (max_backticks + 1)
        # If the maximum number of backticks is greater than zero, add a space
        # to avoid interpretation of inside backticks as literals
        if max_backticks > 0:
            text = " " + text + " "
        return '%s%s%s%s%s' % (prefix, markup_delimiter, text, markup_delimiter, suffix)
    convert_del = abstract_inline_conversion(lambda self: '~~')
    def convert_div(self, el, text, parent_tags):
        if '_inline' in parent_tags:
            return ' ' + text.strip() + ' '
        text = text.strip()
        return '\n\n%s\n\n' % text if text else ''
    convert_article = convert_div
    convert_section = convert_div
    convert_em = abstract_inline_conversion(lambda self: self.options['strong_em_symbol'])
    convert_kbd = convert_code
-    def convert_dd(self, el, text, parent_tags):
+    def convert_hn(self, n, el, text, convert_as_inline):
-        text = (text or '').strip()
+        if convert_as_inline:
        if '_inline' in parent_tags:
            return ' ' + text + ' '
        if not text:
            return '\n'
        # indent definition content lines by four spaces
        def _indent_for_dd(match):
            line_content = match.group(1)
            return '    ' + line_content if line_content else ''
        text = re_line_with_content.sub(_indent_for_dd, text)
        # insert definition marker into first-line indent whitespace
        text = ':' + text[1:]
        return '%s\n' % text
    # definition lists are formatted as follows:
    #   https://pandoc.org/MANUAL.html#definition-lists
    #   https://michelf.ca/projects/php-markdown/extra/#def-list
    convert_dl = convert_div
    def convert_dt(self, el, text, parent_tags):
        # remove newlines from term text
        text = (text or '').strip()
        text = re_all_whitespace.sub(' ', text)
        if '_inline' in parent_tags:
            return ' ' + text + ' '
        if not text:
            return '\n'
        # TODO - format consecutive <dt> elements as directly adjacent lines):
        #   https://michelf.ca/projects/php-markdown/extra/#def-list
        return '\n\n%s\n' % text
    def convert_hN(self, n, el, text, parent_tags):
        # convert_hN() converts <hN> tags, where N is any integer
        if '_inline' in parent_tags:
            return text
        # Markdown does not support heading depths of n > 6
        n = max(1, min(6, n))
        style = self.options['heading_style'].lower()
        text = text.strip()
        if style == UNDERLINED and n <= 2:
            line = '=' if n == 1 else '-'
            return self.underline(text, line)
        text = re_all_whitespace.sub(' ', text)
        hashes = '#' * n
        if style == ATX_CLOSED:
-            return '\n\n%s %s %s\n\n' % (hashes, text, hashes)
+            return '%s %s %s\n\n' % (hashes, text, hashes)
-        return '\n\n%s %s\n\n' % (hashes, text)
+        return '%s %s\n\n' % (hashes, text)
-    def convert_hr(self, el, text, parent_tags):
+    def convert_hr(self, el, text, convert_as_inline):
        return '\n\n---\n\n'
    convert_i = convert_em
-    def convert_img(self, el, text, parent_tags):
+    def convert_img(self, el, text, convert_as_inline):
        alt = el.attrs.get('alt', None) or ''
        src = el.attrs.get('src', None) or ''
        title = el.attrs.get('title', None) or ''
        title_part = ' "%s"' % title.replace('"', r'\"') if title else ''
-        if ('_inline' in parent_tags
+        if (convert_as_inline
                and el.parent.name not in self.options['keep_inline_images_in']):
            return alt
        return '![%s](%s%s)' % (alt, src, title_part)
-    def convert_video(self, el, text, parent_tags):
+    def convert_list(self, el, text, convert_as_inline):
        if ('_inline' in parent_tags
                and el.parent.name not in self.options['keep_inline_images_in']):
            return text
        src = el.attrs.get('src', None) or ''
        if not src:
            sources = el.find_all('source', attrs={'src': True})
            if sources:
                src = sources[0].attrs.get('src', None) or ''
        poster = el.attrs.get('poster', None) or ''
        if src and poster:
            return '[![%s](%s)](%s)' % (text, poster, src)
        if src:
            return '[%s](%s)' % (text, src)
        if poster:
            return '![%s](%s)' % (text, poster)
        return text
    def convert_list(self, el, text, parent_tags):
        # Converting a list to inline is undefined.
-        # Ignoring inline conversion parents for list.
+        # Ignoring convert_to_inline for list.
        nested = False
        before_paragraph = False
-        next_sibling = _next_block_content_sibling(el)
+        if el.next_sibling and el.next_sibling.name not in ['ul', 'ol']:
        if next_sibling and next_sibling.name not in ['ul', 'ol']:
            before_paragraph = True
-        if 'li' in parent_tags:
+        while el:
-            # remove trailing newline if we're in a nested list
+            if el.name == 'li':
-            return '\n' + text.rstrip()
+                nested = True
-        return '\n\n' + text + ('\n' if before_paragraph else '')
+                break
            el = el.parent
        if nested:
            # remove trailing newline if nested
            return '\n' + self.indent(text, 1).rstrip()
        return text + ('\n' if before_paragraph else '')
    convert_ul = convert_list
    convert_ol = convert_list
-    def convert_li(self, el, text, parent_tags):
+    def convert_li(self, el, text, convert_as_inline):
        # handle some early-exit scenarios
        text = (text or '').strip()
        if not text:
            return "\n"
        # determine list item bullet character to use
        parent = el.parent
        if parent is not None and parent.name == 'ol':
            if parent.get("start") and str(parent.get("start")).isnumeric():
                start = int(parent.get("start"))
            else:
                start = 1
-            bullet = '%s.' % (start + len(el.find_previous_siblings('li')))
+            bullet = '%s.' % (start + parent.index(el))
        else:
            depth = -1
            while el:
@@ -647,45 +339,19 @@ class MarkdownConverter(object):
                el = el.parent
            bullets = self.options['bullets']
            bullet = bullets[depth % len(bullets)]
-        bullet = bullet + ' '
+        return '%s %s\n' % (bullet, (text or '').strip())
        bullet_width = len(bullet)
        bullet_indent = ' ' * bullet_width
-        # indent content lines by bullet width
+    def convert_p(self, el, text, convert_as_inline):
-        def _indent_for_li(match):
+        if convert_as_inline:
-            line_content = match.group(1)
+            return text
            return bullet_indent + line_content if line_content else ''
        text = re_line_with_content.sub(_indent_for_li, text)
        # insert bullet into first-line indent whitespace
        text = bullet + text[bullet_width:]
        return '%s\n' % text
    def convert_p(self, el, text, parent_tags):
        if '_inline' in parent_tags:
            return ' ' + text.strip(' \t\r\n') + ' '
        text = text.strip(' \t\r\n')
        if self.options['wrap']:
-            # Preserve newlines (and preceding whitespace) resulting
+            text = fill(text,
-            # from <br> tags.  Newlines in the input have already been
+                        width=self.options['wrap_width'],
-            # replaced by spaces.
+                        break_long_words=False,
-            if self.options['wrap_width'] is not None:
+                        break_on_hyphens=False)
-                lines = text.split('\n')
+        return '%s\n\n' % text if text else ''
                new_lines = []
                for line in lines:
                    line = line.lstrip(' \t\r\n')
                    line_no_trailing = line.rstrip()
                    trailing = line[len(line_no_trailing):]
                    line = fill(line,
                                width=self.options['wrap_width'],
                                break_long_words=False,
                                break_on_hyphens=False)
                    new_lines.append(line + trailing)
                text = '\n'.join(new_lines)
        return '\n\n%s\n\n' % text if text else ''
-    def convert_pre(self, el, text, parent_tags):
+    def convert_pre(self, el, text, convert_as_inline):
        if not text:
            return ''
        code_language = self.options['code_language']
@@ -693,24 +359,12 @@ class MarkdownConverter(object):
        if self.options['code_language_callback']:
            code_language = self.options['code_language_callback'](el) or code_language
-        if self.options['strip_pre'] == STRIP:
+        return '\n```%s\n%s\n```\n' % (code_language, text)
            text = strip_pre(text)  # remove all leading/trailing newlines
        elif self.options['strip_pre'] == STRIP_ONE:
            text = strip1_pre(text)  # remove one leading/trailing newline
        elif self.options['strip_pre'] is None:
            pass  # leave leading and trailing newlines as-is
        else:
            raise ValueError('Invalid value for strip_pre: %s' % self.options['strip_pre'])
-        return '\n\n```%s\n%s\n```\n\n' % (code_language, text)
+    def convert_script(self, el, text, convert_as_inline):
    def convert_q(self, el, text, parent_tags):
        return '"' + text + '"'
    def convert_script(self, el, text, parent_tags):
        return ''
-    def convert_style(self, el, text, parent_tags):
+    def convert_style(self, el, text, convert_as_inline):
        return ''
    convert_s = convert_del
@@ -723,70 +377,55 @@ class MarkdownConverter(object):
    convert_sup = abstract_inline_conversion(lambda self: self.options['sup_symbol'])
-    def convert_table(self, el, text, parent_tags):
+    def convert_table(self, el, text, convert_as_inline):
-        return '\n\n' + text.strip() + '\n\n'
+        return '\n\n' + text + '\n'
-    def convert_caption(self, el, text, parent_tags):
+    def convert_caption(self, el, text, convert_as_inline):
-        return text.strip() + '\n\n'
+        return text + '\n'
-    def convert_figcaption(self, el, text, parent_tags):
+    def convert_figcaption(self, el, text, convert_as_inline):
-        return '\n\n' + text.strip() + '\n\n'
+        return '\n\n' + text + '\n\n'
-    def convert_td(self, el, text, parent_tags):
+    def convert_td(self, el, text, convert_as_inline):
        colspan = 1
        if 'colspan' in el.attrs and el['colspan'].isdigit():
-            colspan = max(1, min(1000, int(el['colspan'])))
+            colspan = int(el['colspan'])
        return ' ' + text.strip().replace("\n", " ") + ' |' * colspan
-    def convert_th(self, el, text, parent_tags):
+    def convert_th(self, el, text, convert_as_inline):
        colspan = 1
        if 'colspan' in el.attrs and el['colspan'].isdigit():
-            colspan = max(1, min(1000, int(el['colspan'])))
+            colspan = int(el['colspan'])
        return ' ' + text.strip().replace("\n", " ") + ' |' * colspan
-    def convert_tr(self, el, text, parent_tags):
+    def convert_tr(self, el, text, convert_as_inline):
        cells = el.find_all(['td', 'th'])
        is_first_row = el.find_previous_sibling() is None
        is_headrow = (
            all([cell.name == 'th' for cell in cells])
-            or (el.parent.name == 'thead'
+            or (not el.previous_sibling and not el.parent.name == 'tbody')
-                # avoid multiple tr in thead
+            or (not el.previous_sibling and el.parent.name == 'tbody' and len(el.parent.parent.find_all(['thead'])) < 1)
                and len(el.parent.find_all('tr')) == 1)
        )
        is_head_row_missing = (
            (is_first_row and not el.parent.name == 'tbody')
            or (is_first_row and el.parent.name == 'tbody' and len(el.parent.parent.find_all(['thead'])) < 1)
        )
        overline = ''
        underline = ''
-        full_colspan = 0
+        if is_headrow and not el.previous_sibling:
-        for cell in cells:
+            # first row and is headline: print headline underline
-            if 'colspan' in cell.attrs and cell['colspan'].isdigit():
+            full_colspan = 0
-                full_colspan += max(1, min(1000, int(cell['colspan'])))
+            for cell in cells:
-            else:
+                if 'colspan' in cell.attrs and cell['colspan'].isdigit():
-                full_colspan += 1
+                    full_colspan += int(cell["colspan"])
-        if ((is_headrow
+                else:
-             or (is_head_row_missing
+                    full_colspan += 1
                 and self.options['table_infer_header']))
                and is_first_row):
            # first row and:
            # - is headline or
            # - headline is missing and header inference is enabled
            # print headline underline
            underline += '| ' + ' | '.join(['---'] * full_colspan) + ' |' + '\n'
-        elif ((is_head_row_missing
+        elif (not el.previous_sibling
-               and not self.options['table_infer_header'])
+              and (el.parent.name == 'table'
-              or (is_first_row
+                   or (el.parent.name == 'tbody'
-                  and (el.parent.name == 'table'
+                       and not el.parent.previous_sibling))):
                       or (el.parent.name == 'tbody'
                           and not el.parent.find_previous_sibling())))):
            # headline is missing and header inference is disabled or:
            # first row, not headline, and:
-            #  - the parent is table or
+            # - the parent is table or
-            #  - the parent is tbody at the beginning of a table.
+            # - the parent is tbody at the beginning of a table.
            # print empty headline above this row
-            overline += '| ' + ' | '.join([''] * full_colspan) + ' |' + '\n'
+            overline += '| ' + ' | '.join([''] * len(cells)) + ' |' + '\n'
-            overline += '| ' + ' | '.join(['---'] * full_colspan) + ' |' + '\n'
+            overline += '| ' + ' | '.join(['---'] * len(cells)) + ' |' + '\n'
        return overline + '|' + text + '\n' + underline
--- a/markdownify/init.pyi
+++ b/markdownify/init.pyi
@@ -1,77 +0,0 @@
 from _typeshed import Incomplete
 from typing import Callable, Union
 ATX: str
 ATX_CLOSED: str
 UNDERLINED: str
 SETEXT = UNDERLINED
 SPACES: str
 BACKSLASH: str
 ASTERISK: str
 UNDERSCORE: str
 LSTRIP: str
 RSTRIP: str
 STRIP: str
 STRIP_ONE: str
 def markdownify(
    html: str,
    autolinks: bool = ...,
    bs4_options: str = ...,
    bullets: str = ...,
    code_language: str = ...,
    code_language_callback: Union[Callable[[Incomplete], Union[str, None]], None] = ...,
    convert: Union[list[str], None] = ...,
    default_title: bool = ...,
    escape_asterisks: bool = ...,
    escape_underscores: bool = ...,
    escape_misc: bool = ...,
    heading_style: str = ...,
    keep_inline_images_in: list[str] = ...,
    newline_style: str = ...,
    strip: Union[list[str], None] = ...,
    strip_document: Union[str, None] = ...,
    strip_pre: str = ...,
    strong_em_symbol: str = ...,
    sub_symbol: str = ...,
    sup_symbol: str = ...,
    table_infer_header: bool = ...,
    wrap: bool = ...,
    wrap_width: int = ...,
 ) -> str: ...
 class MarkdownConverter:
    def __init__(
        self,
        autolinks: bool = ...,
        bs4_options: str = ...,
        bullets: str = ...,
        code_language: str = ...,
        code_language_callback: Union[Callable[[Incomplete], Union[str, None]], None] = ...,
        convert: Union[list[str], None] = ...,
        default_title: bool = ...,
        escape_asterisks: bool = ...,
        escape_underscores: bool = ...,
        escape_misc: bool = ...,
        heading_style: str = ...,
        keep_inline_images_in: list[str] = ...,
        newline_style: str = ...,
        strip: Union[list[str], None] = ...,
        strip_document: Union[str, None] = ...,
        strip_pre: str = ...,
        strong_em_symbol: str = ...,
        sub_symbol: str = ...,
        sup_symbol: str = ...,
        table_infer_header: bool = ...,
        wrap: bool = ...,
        wrap_width: int = ...,
    ) -> None:
        ...
    def convert(self, html: str) -> str:
        ...
    def convert_soup(self, soup: Incomplete) -> str:
        ...
--- a/markdownify/main.py
+++ b/markdownify/main.py
@@ -55,26 +55,15 @@ def main(argv=sys.argv[1:]):
    parser.add_argument('--no-escape-underscores', dest='escape_underscores',
                        action='store_false',
                        help="Do not escape '_' to '\\_' in text.")
-    parser.add_argument('-i', '--keep-inline-images-in',
+    parser.add_argument('-i', '--keep-inline-images-in', nargs='*',
                        default=[],
                        nargs='*',
                        help="Images are converted to their alt-text when the images are "
                        "located inside headlines or table cells. If some inline images "
                        "should be converted to markdown images instead, this option can "
                        "be set to a list of parent tags that should be allowed to "
                        "contain inline images.")
    parser.add_argument('--table-infer-header', dest='table_infer_header',
                        action='store_true',
                        help="When a table has no header row (as indicated by '<thead>' "
                        "or '<th>'), use the first body row as the header row.")
    parser.add_argument('-w', '--wrap', action='store_true',
                        help="Wrap all text paragraphs at --wrap-width characters.")
    parser.add_argument('--wrap-width', type=int, default=80)
    parser.add_argument('--bs4-options',
                        default='html.parser',
                        help="Specifies the parser that BeautifulSoup should use to parse "
                             "the HTML markup. Examples include 'html5.parser', 'lxml', and "
                             "'html5lib'.")
    args = parser.parse_args(argv)
    print(markdownify(**vars(args)))
--- a/markdownify/py.typed
+++ b/markdownify/py.typed
@@ -1 +0,0 @@
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "markdownify"
-version = "1.2.2"
+version = "0.13.0"
 authors = [{name = "Matthew Tretter", email = "m@tthewwithanm.com"}]
 description = "Convert HTML to markdown."
 readme = "README.rst"
--- a/tests/test_advanced.py
+++ b/tests/test_advanced.py
@@ -1,4 +1,4 @@
-from .utils import md
+from markdownify import markdownify as md
 def test_chomp():
@@ -14,7 +14,7 @@ def test_chomp():
 def test_nested():
    text = md('<p>This is an <a href="http://example.com/">example link</a>.</p>')
-    assert text == '\n\nThis is an [example link](http://example.com/).\n\n'
+    assert text == 'This is an [example link](http://example.com/).\n\n'
 def test_ignore_comments():
--- a/tests/test_args.py
+++ b/tests/test_args.py
@@ -2,8 +2,7 @@
 Test whitelisting/blacklisting of specific tags.
 """
-from markdownify import markdownify, LSTRIP, RSTRIP, STRIP, STRIP_ONE
+from markdownify import markdownify as md
 from .utils import md
 def test_strip():
@@ -24,24 +23,3 @@ def test_convert():
 def test_do_not_convert():
    text = md('<a href="https://github.com/matthewwithanm">Some Text</a>', convert=[])
    assert text == 'Some Text'
 def test_strip_document():
    assert markdownify("<p>Hello</p>") == "Hello"  # test default of STRIP
    assert markdownify("<p>Hello</p>", strip_document=LSTRIP) == "Hello\n\n"
    assert markdownify("<p>Hello</p>", strip_document=RSTRIP) == "\n\nHello"
    assert markdownify("<p>Hello</p>", strip_document=STRIP) == "Hello"
    assert markdownify("<p>Hello</p>", strip_document=None) == "\n\nHello\n\n"
 def test_strip_pre():
    assert markdownify("<pre>  \n  \n  Hello  \n  \n  </pre>") == "```\n  Hello\n```"
    assert markdownify("<pre>  \n  \n  Hello  \n  \n  </pre>", strip_pre=STRIP) == "```\n  Hello\n```"
    assert markdownify("<pre>  \n  \n  Hello  \n  \n  </pre>", strip_pre=STRIP_ONE) == "```\n  \n  Hello  \n  \n```"
    assert markdownify("<pre>  \n  \n  Hello  \n  \n  </pre>", strip_pre=None) == "```\n  \n  \n  Hello  \n  \n  \n```"
 def bs4_options():
    assert markdownify("<p>Hello</p>", bs4_options="html.parser") == "Hello"
    assert markdownify("<p>Hello</p>", bs4_options=["html.parser"]) == "Hello"
    assert markdownify("<p>Hello</p>", bs4_options={"features": "html.parser"}) == "Hello"
--- a/tests/test_basic.py
+++ b/tests/test_basic.py
@@ -1,4 +1,4 @@
-from .utils import md
+from markdownify import markdownify as md
 def test_single_tag():
@@ -6,9 +6,8 @@ def test_single_tag():
 def test_soup():
-    assert md('<div><span>Hello</div></span>') == '\n\nHello\n\n'
+    assert md('<div><span>Hello</div></span>') == 'Hello'
 def test_whitespace():
    assert md(' a  b \t\t c ') == ' a b c '
    assert md(' a  b \n\n c ') == ' a b\nc '
--- a/tests/test_conversions.py
+++ b/tests/test_conversions.py
@@ -1,5 +1,4 @@
-from markdownify import ATX, ATX_CLOSED, BACKSLASH, SPACES, UNDERSCORE
+from markdownify import markdownify as md, ATX, ATX_CLOSED, BACKSLASH, UNDERSCORE
 from .utils import md
 def inline_tests(tag, markup):
@@ -40,11 +39,6 @@ def test_a_no_autolinks():
    assert md('<a href="https://google.com">https://google.com</a>', autolinks=False) == '[https://google.com](https://google.com)'
 def test_a_in_code():
    assert md('<code><a href="https://google.com">Google</a></code>') == '`Google`'
    assert md('<pre><a href="https://google.com">Google</a></pre>') == '\n\n```\nGoogle\n```\n\n'
 def test_b():
    assert md('<b>Hello</b>') == '**Hello**'
@@ -59,12 +53,11 @@ def test_b_spaces():
 def test_blockquote():
    assert md('<blockquote>Hello</blockquote>') == '\n> Hello\n\n'
    assert md('<blockquote>\nHello\n</blockquote>') == '\n> Hello\n\n'
    assert md('<blockquote>&nbsp;Hello</blockquote>') == '\n> \u00a0Hello\n\n'
 def test_blockquote_with_nested_paragraph():
    assert md('<blockquote><p>Hello</p></blockquote>') == '\n> Hello\n\n'
-    assert md('<blockquote><p>Hello</p><p>Hello again</p></blockquote>') == '\n> Hello\n>\n> Hello again\n\n'
+    assert md('<blockquote><p>Hello</p><p>Hello again</p></blockquote>') == '\n> Hello\n> \n> Hello again\n\n'
 def test_blockquote_with_paragraph():
@@ -73,14 +66,17 @@ def test_blockquote_with_paragraph():
 def test_blockquote_nested():
    text = md('<blockquote>And she was like <blockquote>Hello</blockquote></blockquote>')
-    assert text == '\n> And she was like\n> > Hello\n\n'
+    assert text == '\n> And she was like \n> > Hello\n\n'
 def test_br():
    assert md('a<br />b<br />c') == 'a  \nb  \nc'
    assert md('a<br />b<br />c', newline_style=BACKSLASH) == 'a\\\nb\\\nc'
-    assert md('<h1>foo<br />bar</h1>', heading_style=ATX) == '\n\n# foo bar\n\n'
+
-    assert md('<td>foo<br />bar</td>', heading_style=ATX) == ' foo bar |'
+
 def test_caption():
    assert md('TEXT<figure><figcaption>Caption</figcaption><span>SPAN</span></figure>') == 'TEXT\n\nCaption\n\nSPAN'
    assert md('<figure><span>SPAN</span><figcaption>Caption</figcaption></figure>TEXT') == 'SPAN\n\nCaption\n\nTEXT'
 def test_code():
@@ -101,86 +97,51 @@ def test_code():
    assert md('<code>foo<s> bar </s>baz</code>') == '`foo bar baz`'
    assert md('<code>foo<sup>bar</sup>baz</code>', sup_symbol='^') == '`foobarbaz`'
    assert md('<code>foo<sub>bar</sub>baz</code>', sub_symbol='^') == '`foobarbaz`'
    assert md('foo<code>`bar`</code>baz') == 'foo`` `bar` ``baz'
    assert md('foo<code>``bar``</code>baz') == 'foo``` ``bar`` ```baz'
    assert md('foo<code> `bar` </code>baz') == 'foo `` `bar` `` baz'
 def test_dl():
    assert md('<dl><dt>term</dt><dd>definition</dd></dl>') == '\n\nterm\n:   definition\n\n'
    assert md('<dl><dt><p>te</p><p>rm</p></dt><dd>definition</dd></dl>') == '\n\nte rm\n:   definition\n\n'
    assert md('<dl><dt>term</dt><dd><p>definition-p1</p><p>definition-p2</p></dd></dl>') == '\n\nterm\n:   definition-p1\n\n    definition-p2\n\n'
    assert md('<dl><dt>term</dt><dd><p>definition 1</p></dd><dd><p>definition 2</p></dd></dl>') == '\n\nterm\n:   definition 1\n:   definition 2\n\n'
    assert md('<dl><dt>term 1</dt><dd>definition 1</dd><dt>term 2</dt><dd>definition 2</dd></dl>') == '\n\nterm 1\n:   definition 1\n\nterm 2\n:   definition 2\n\n'
    assert md('<dl><dt>term</dt><dd><blockquote><p>line 1</p><p>line 2</p></blockquote></dd></dl>') == '\n\nterm\n:   > line 1\n    >\n    > line 2\n\n'
    assert md('<dl><dt>term</dt><dd><ol><li><p>1</p><ul><li>2a</li><li>2b</li></ul></li><li><p>3</p></li></ol></dd></dl>') == '\n\nterm\n:   1. 1\n\n       * 2a\n       * 2b\n    2. 3\n\n'
 def test_del():
    inline_tests('del', '~~')
-def test_div_section_article():
+def test_div():
-    for tag in ['div', 'section', 'article']:
+    assert md('Hello</div> World') == 'Hello World'
        assert md(f'<{tag}>456</{tag}>') == '\n\n456\n\n'
        assert md(f'123<{tag}>456</{tag}>789') == '123\n\n456\n\n789'
        assert md(f'123<{tag}>\n 456 \n</{tag}>789') == '123\n\n456\n\n789'
        assert md(f'123<{tag}><p>456</p></{tag}>789') == '123\n\n456\n\n789'
        assert md(f'123<{tag}>\n<p>456</p>\n</{tag}>789') == '123\n\n456\n\n789'
        assert md(f'123<{tag}><pre>4 5 6</pre></{tag}>789') == '123\n\n```\n4 5 6\n```\n\n789'
        assert md(f'123<{tag}>\n<pre>4 5 6</pre>\n</{tag}>789') == '123\n\n```\n4 5 6\n```\n\n789'
        assert md(f'123<{tag}>4\n5\n6</{tag}>789') == '123\n\n4\n5\n6\n\n789'
        assert md(f'123<{tag}>\n4\n5\n6\n</{tag}>789') == '123\n\n4\n5\n6\n\n789'
        assert md(f'123<{tag}>\n<p>\n4\n5\n6\n</p>\n</{tag}>789') == '123\n\n4\n5\n6\n\n789'
        assert md(f'<{tag}><h1>title</h1>body</{{tag}}>', heading_style=ATX) == '\n\n# title\n\nbody\n\n'
 def test_em():
    inline_tests('em', '*')
 def test_figcaption():
    assert (md("TEXT<figure><figcaption>\nCaption\n</figcaption><span>SPAN</span></figure>") == "TEXT\n\nCaption\n\nSPAN")
    assert (md("<figure><span>SPAN</span><figcaption>\nCaption\n</figcaption></figure>TEXT") == "SPAN\n\nCaption\n\nTEXT")
 def test_header_with_space():
-    assert md('<h3>\n\nHello</h3>') == '\n\n### Hello\n\n'
+    assert md('<h3>\n\nHello</h3>') == '### Hello\n\n'
-    assert md('<h3>Hello\n\n\nWorld</h3>') == '\n\n### Hello World\n\n'
+    assert md('<h4>\n\nHello</h4>') == '#### Hello\n\n'
-    assert md('<h4>\n\nHello</h4>') == '\n\n#### Hello\n\n'
+    assert md('<h5>\n\nHello</h5>') == '##### Hello\n\n'
-    assert md('<h5>\n\nHello</h5>') == '\n\n##### Hello\n\n'
+    assert md('<h5>\n\nHello\n\n</h5>') == '##### Hello\n\n'
-    assert md('<h5>\n\nHello\n\n</h5>') == '\n\n##### Hello\n\n'
+    assert md('<h5>\n\nHello   \n\n</h5>') == '##### Hello\n\n'
    assert md('<h5>\n\nHello   \n\n</h5>') == '\n\n##### Hello\n\n'
 def test_h1():
-    assert md('<h1>Hello</h1>') == '\n\nHello\n=====\n\n'
+    assert md('<h1>Hello</h1>') == 'Hello\n=====\n\n'
 def test_h2():
-    assert md('<h2>Hello</h2>') == '\n\nHello\n-----\n\n'
+    assert md('<h2>Hello</h2>') == 'Hello\n-----\n\n'
 def test_hn():
-    assert md('<h3>Hello</h3>') == '\n\n### Hello\n\n'
+    assert md('<h3>Hello</h3>') == '### Hello\n\n'
-    assert md('<h4>Hello</h4>') == '\n\n#### Hello\n\n'
+    assert md('<h4>Hello</h4>') == '#### Hello\n\n'
-    assert md('<h5>Hello</h5>') == '\n\n##### Hello\n\n'
+    assert md('<h5>Hello</h5>') == '##### Hello\n\n'
-    assert md('<h6>Hello</h6>') == '\n\n###### Hello\n\n'
+    assert md('<h6>Hello</h6>') == '###### Hello\n\n'
    assert md('<h10>Hello</h10>') == md('<h6>Hello</h6>')
    assert md('<h0>Hello</h0>') == md('<h1>Hello</h1>')
    assert md('<hx>Hello</hx>') == md('Hello')
 def test_hn_chained():
-    assert md('<h1>First</h1>\n<h2>Second</h2>\n<h3>Third</h3>', heading_style=ATX) == '\n\n# First\n\n## Second\n\n### Third\n\n'
+    assert md('<h1>First</h1>\n<h2>Second</h2>\n<h3>Third</h3>', heading_style=ATX) == '# First\n\n\n## Second\n\n\n### Third\n\n'
-    assert md('X<h1>First</h1>', heading_style=ATX) == 'X\n\n# First\n\n'
+    assert md('X<h1>First</h1>', heading_style=ATX) == 'X# First\n\n'
    assert md('X<h1>First</h1>', heading_style=ATX_CLOSED) == 'X\n\n# First #\n\n'
    assert md('X<h1>First</h1>') == 'X\n\nFirst\n=====\n\n'
 def test_hn_nested_tag_heading_style():
-    assert md('<h1>A <p>P</p> C </h1>', heading_style=ATX_CLOSED) == '\n\n# A P C #\n\n'
+    assert md('<h1>A <p>P</p> C </h1>', heading_style=ATX_CLOSED) == '# A P C #\n\n'
-    assert md('<h1>A <p>P</p> C </h1>', heading_style=ATX) == '\n\n# A P C\n\n'
+    assert md('<h1>A <p>P</p> C </h1>', heading_style=ATX) == '# A P C\n\n'
 def test_hn_nested_simple_tag():
@@ -196,12 +157,12 @@ def test_hn_nested_simple_tag():
    ]
    for tag, markdown in tag_to_markdown:
-        assert md('<h3>A <' + tag + '>' + tag + '</' + tag + '> B</h3>') == '\n\n### A ' + markdown + ' B\n\n'
+        assert md('<h3>A <' + tag + '>' + tag + '</' + tag + '> B</h3>') == '### A ' + markdown + ' B\n\n'
-    assert md('<h3>A <br>B</h3>', heading_style=ATX) == '\n\n### A B\n\n'
+    assert md('<h3>A <br>B</h3>', heading_style=ATX) == '### A B\n\n'
    # Nested lists not supported
-    # assert md('<h3>A <ul><li>li1</i><li>l2</li></ul></h3>', heading_style=ATX) == '\n### A li1 li2 B\n\n'
+    # assert md('<h3>A <ul><li>li1</i><li>l2</li></ul></h3>', heading_style=ATX) == '### A li1 li2 B\n\n'
 def test_hn_nested_img():
@@ -211,23 +172,18 @@ def test_hn_nested_img():
        ("alt='Alt Text' title='Optional title'", "Alt Text", " \"Optional title\""),
    ]
    for image_attributes, markdown, title in image_attributes_to_markdown:
-        assert md('<h3>A <img src="/path/to/img.jpg" ' + image_attributes + '/> B</h3>') == '\n\n### A' + (' ' + markdown + ' ' if markdown else ' ') + 'B\n\n'
+        assert md('<h3>A <img src="/path/to/img.jpg" ' + image_attributes + '/> B</h3>') == '### A ' + markdown + ' B\n\n'
-        assert md('<h3>A <img src="/path/to/img.jpg" ' + image_attributes + '/> B</h3>', keep_inline_images_in=['h3']) == '\n\n### A ![' + markdown + '](/path/to/img.jpg' + title + ') B\n\n'
+        assert md('<h3>A <img src="/path/to/img.jpg" ' + image_attributes + '/> B</h3>', keep_inline_images_in=['h3']) == '### A ![' + markdown + '](/path/to/img.jpg' + title + ') B\n\n'
 def test_hn_atx_headings():
-    assert md('<h1>Hello</h1>', heading_style=ATX) == '\n\n# Hello\n\n'
+    assert md('<h1>Hello</h1>', heading_style=ATX) == '# Hello\n\n'
-    assert md('<h2>Hello</h2>', heading_style=ATX) == '\n\n## Hello\n\n'
+    assert md('<h2>Hello</h2>', heading_style=ATX) == '## Hello\n\n'
 def test_hn_atx_closed_headings():
-    assert md('<h1>Hello</h1>', heading_style=ATX_CLOSED) == '\n\n# Hello #\n\n'
+    assert md('<h1>Hello</h1>', heading_style=ATX_CLOSED) == '# Hello #\n\n'
-    assert md('<h2>Hello</h2>', heading_style=ATX_CLOSED) == '\n\n## Hello ##\n\n'
+    assert md('<h2>Hello</h2>', heading_style=ATX_CLOSED) == '## Hello ##\n\n'
 def test_hn_newlines():
    assert md("<h1>H1-1</h1>TEXT<h2>H2-2</h2>TEXT<h1>H1-2</h1>TEXT", heading_style=ATX) == '\n\n# H1-1\n\nTEXT\n\n## H2-2\n\nTEXT\n\n# H1-2\n\nTEXT'
    assert md('<h1>H1-1</h1>\n<p>TEXT</p>\n<h2>H2-2</h2>\n<p>TEXT</p>\n<h1>H1-2</h1>\n<p>TEXT</p>', heading_style=ATX) == '\n\n# H1-1\n\nTEXT\n\n## H2-2\n\nTEXT\n\n# H1-2\n\nTEXT\n\n'
 def test_head():
@@ -237,7 +193,7 @@ def test_head():
 def test_hr():
    assert md('Hello<hr>World') == 'Hello\n\n---\n\nWorld'
    assert md('Hello<hr />World') == 'Hello\n\n---\n\nWorld'
-    assert md('<p>Hello</p>\n<hr>\n<p>World</p>') == '\n\nHello\n\n---\n\nWorld\n\n'
+    assert md('<p>Hello</p>\n<hr>\n<p>World</p>') == 'Hello\n\n\n\n\n---\n\n\nWorld\n\n'
 def test_i():
@@ -249,68 +205,37 @@ def test_img():
    assert md('<img src="/path/to/img.jpg" alt="Alt text" />') == '![Alt text](/path/to/img.jpg)'
 def test_video():
    assert md('<video src="/path/to/video.mp4" poster="/path/to/img.jpg">text</video>') == '[![text](/path/to/img.jpg)](/path/to/video.mp4)'
    assert md('<video src="/path/to/video.mp4">text</video>') == '[text](/path/to/video.mp4)'
    assert md('<video><source src="/path/to/video.mp4"/>text</video>') == '[text](/path/to/video.mp4)'
    assert md('<video poster="/path/to/img.jpg">text</video>') == '![text](/path/to/img.jpg)'
    assert md('<video>text</video>') == 'text'
 def test_kbd():
    inline_tests('kbd', '`')
 def test_p():
-    assert md('<p>hello</p>') == '\n\nhello\n\n'
+    assert md('<p>hello</p>') == 'hello\n\n'
-    assert md("<p><p>hello</p></p>") == "\n\nhello\n\n"
+    assert md('<p>123456789 123456789</p>') == '123456789 123456789\n\n'
-    assert md('<p>123456789 123456789</p>') == '\n\n123456789 123456789\n\n'
+    assert md('<p>123456789 123456789</p>', wrap=True, wrap_width=10) == '123456789\n123456789\n\n'
-    assert md('<p>123456789\n\n\n123456789</p>') == '\n\n123456789\n123456789\n\n'
+    assert md('<p><a href="https://example.com">Some long link</a></p>', wrap=True, wrap_width=10) == '[Some long\nlink](https://example.com)\n\n'
-    assert md('<p>123456789\n\n\n123456789</p>', wrap=True, wrap_width=80) == '\n\n123456789 123456789\n\n'
+    assert md('<p>12345<br />67890</p>', wrap=True, wrap_width=10, newline_style=BACKSLASH) == '12345\\\n67890\n\n'
-    assert md('<p>123456789\n\n\n123456789</p>', wrap=True, wrap_width=None) == '\n\n123456789 123456789\n\n'
+    assert md('<p>12345678901<br />12345</p>', wrap=True, wrap_width=10, newline_style=BACKSLASH) == '12345678901\\\n12345\n\n'
    assert md('<p>123456789 123456789</p>', wrap=True, wrap_width=10) == '\n\n123456789\n123456789\n\n'
    assert md('<p><a href="https://example.com">Some long link</a></p>', wrap=True, wrap_width=10) == '\n\n[Some long\nlink](https://example.com)\n\n'
    assert md('<p>12345<br />67890</p>', wrap=True, wrap_width=10, newline_style=BACKSLASH) == '\n\n12345\\\n67890\n\n'
    assert md('<p>12345<br />67890</p>', wrap=True, wrap_width=50, newline_style=BACKSLASH) == '\n\n12345\\\n67890\n\n'
    assert md('<p>12345<br />67890</p>', wrap=True, wrap_width=10, newline_style=SPACES) == '\n\n12345  \n67890\n\n'
    assert md('<p>12345<br />67890</p>', wrap=True, wrap_width=50, newline_style=SPACES) == '\n\n12345  \n67890\n\n'
    assert md('<p>12345678901<br />12345</p>', wrap=True, wrap_width=10, newline_style=BACKSLASH) == '\n\n12345678901\\\n12345\n\n'
    assert md('<p>12345678901<br />12345</p>', wrap=True, wrap_width=50, newline_style=BACKSLASH) == '\n\n12345678901\\\n12345\n\n'
    assert md('<p>12345678901<br />12345</p>', wrap=True, wrap_width=10, newline_style=SPACES) == '\n\n12345678901  \n12345\n\n'
    assert md('<p>12345678901<br />12345</p>', wrap=True, wrap_width=50, newline_style=SPACES) == '\n\n12345678901  \n12345\n\n'
    assert md('<p>1234 5678 9012<br />67890</p>', wrap=True, wrap_width=10, newline_style=BACKSLASH) == '\n\n1234 5678\n9012\\\n67890\n\n'
    assert md('<p>1234 5678 9012<br />67890</p>', wrap=True, wrap_width=10, newline_style=SPACES) == '\n\n1234 5678\n9012  \n67890\n\n'
    assert md('First<p>Second</p><p>Third</p>Fourth') == 'First\n\nSecond\n\nThird\n\nFourth'
    assert md('<p>&nbsp;x y</p>', wrap=True, wrap_width=80) == '\n\n\u00a0x y\n\n'
 def test_pre():
-    assert md('<pre>test\n    foo\nbar</pre>') == '\n\n```\ntest\n    foo\nbar\n```\n\n'
+    assert md('<pre>test\n    foo\nbar</pre>') == '\n```\ntest\n    foo\nbar\n```\n'
-    assert md('<pre><code>test\n    foo\nbar</code></pre>') == '\n\n```\ntest\n    foo\nbar\n```\n\n'
+    assert md('<pre><code>test\n    foo\nbar</code></pre>') == '\n```\ntest\n    foo\nbar\n```\n'
-    assert md('<pre>*this_should_not_escape*</pre>') == '\n\n```\n*this_should_not_escape*\n```\n\n'
+    assert md('<pre>*this_should_not_escape*</pre>') == '\n```\n*this_should_not_escape*\n```\n'
-    assert md('<pre><span>*this_should_not_escape*</span></pre>') == '\n\n```\n*this_should_not_escape*\n```\n\n'
+    assert md('<pre><span>*this_should_not_escape*</span></pre>') == '\n```\n*this_should_not_escape*\n```\n'
-    assert md('<pre>\t\tthis  should\t\tnot  normalize</pre>') == '\n\n```\n\t\tthis  should\t\tnot  normalize\n```\n\n'
+    assert md('<pre>\t\tthis  should\t\tnot  normalize</pre>') == '\n```\n\t\tthis  should\t\tnot  normalize\n```\n'
-    assert md('<pre><span>\t\tthis  should\t\tnot  normalize</span></pre>') == '\n\n```\n\t\tthis  should\t\tnot  normalize\n```\n\n'
+    assert md('<pre><span>\t\tthis  should\t\tnot  normalize</span></pre>') == '\n```\n\t\tthis  should\t\tnot  normalize\n```\n'
-    assert md('<pre>foo<b>\nbar\n</b>baz</pre>') == '\n\n```\nfoo\nbar\nbaz\n```\n\n'
+    assert md('<pre>foo<b>\nbar\n</b>baz</pre>') == '\n```\nfoo\nbar\nbaz\n```\n'
-    assert md('<pre>foo<i>\nbar\n</i>baz</pre>') == '\n\n```\nfoo\nbar\nbaz\n```\n\n'
+    assert md('<pre>foo<i>\nbar\n</i>baz</pre>') == '\n```\nfoo\nbar\nbaz\n```\n'
-    assert md('<pre>foo\n<i>bar</i>\nbaz</pre>') == '\n\n```\nfoo\nbar\nbaz\n```\n\n'
+    assert md('<pre>foo\n<i>bar</i>\nbaz</pre>') == '\n```\nfoo\nbar\nbaz\n```\n'
-    assert md('<pre>foo<i>\n</i>baz</pre>') == '\n\n```\nfoo\nbaz\n```\n\n'
+    assert md('<pre>foo<i>\n</i>baz</pre>') == '\n```\nfoo\nbaz\n```\n'
-    assert md('<pre>foo<del>\nbar\n</del>baz</pre>') == '\n\n```\nfoo\nbar\nbaz\n```\n\n'
+    assert md('<pre>foo<del>\nbar\n</del>baz</pre>') == '\n```\nfoo\nbar\nbaz\n```\n'
-    assert md('<pre>foo<em>\nbar\n</em>baz</pre>') == '\n\n```\nfoo\nbar\nbaz\n```\n\n'
+    assert md('<pre>foo<em>\nbar\n</em>baz</pre>') == '\n```\nfoo\nbar\nbaz\n```\n'
-    assert md('<pre>foo<code>\nbar\n</code>baz</pre>') == '\n\n```\nfoo\nbar\nbaz\n```\n\n'
+    assert md('<pre>foo<code>\nbar\n</code>baz</pre>') == '\n```\nfoo\nbar\nbaz\n```\n'
-    assert md('<pre>foo<strong>\nbar\n</strong>baz</pre>') == '\n\n```\nfoo\nbar\nbaz\n```\n\n'
+    assert md('<pre>foo<strong>\nbar\n</strong>baz</pre>') == '\n```\nfoo\nbar\nbaz\n```\n'
-    assert md('<pre>foo<s>\nbar\n</s>baz</pre>') == '\n\n```\nfoo\nbar\nbaz\n```\n\n'
+    assert md('<pre>foo<s>\nbar\n</s>baz</pre>') == '\n```\nfoo\nbar\nbaz\n```\n'
-    assert md('<pre>foo<sup>\nbar\n</sup>baz</pre>', sup_symbol='^') == '\n\n```\nfoo\nbar\nbaz\n```\n\n'
+    assert md('<pre>foo<sup>\nbar\n</sup>baz</pre>', sup_symbol='^') == '\n```\nfoo\nbar\nbaz\n```\n'
-    assert md('<pre>foo<sub>\nbar\n</sub>baz</pre>', sub_symbol='^') == '\n\n```\nfoo\nbar\nbaz\n```\n\n'
+    assert md('<pre>foo<sub>\nbar\n</sub>baz</pre>', sub_symbol='^') == '\n```\nfoo\nbar\nbaz\n```\n'
    assert md('<pre>foo<sub>\nbar\n</sub>baz</pre>', sub_symbol='^') == '\n\n```\nfoo\nbar\nbaz\n```\n\n'
    assert md('foo<pre>bar</pre>baz', sub_symbol='^') == 'foo\n\n```\nbar\n```\n\nbaz'
    assert md("<p>foo</p>\n<pre>bar</pre>\n</p>baz</p>", sub_symbol="^") == "\n\nfoo\n\n```\nbar\n```\n\nbaz"
 def test_q():
    assert md('foo <q>quote</q> bar') == 'foo "quote" bar'
    assert md('foo <q cite="https://example.com">quote</q> bar') == 'foo "quote" bar'
 def test_script():
@@ -353,24 +278,14 @@ def test_sup():
 def test_lang():
-    assert md('<pre>test\n    foo\nbar</pre>', code_language='python') == '\n\n```python\ntest\n    foo\nbar\n```\n\n'
+    assert md('<pre>test\n    foo\nbar</pre>', code_language='python') == '\n```python\ntest\n    foo\nbar\n```\n'
-    assert md('<pre><code>test\n    foo\nbar</code></pre>', code_language='javascript') == '\n\n```javascript\ntest\n    foo\nbar\n```\n\n'
+    assert md('<pre><code>test\n    foo\nbar</code></pre>', code_language='javascript') == '\n```javascript\ntest\n    foo\nbar\n```\n'
 def test_lang_callback():
    def callback(el):
        return el['class'][0] if el.has_attr('class') else None
-    assert md('<pre class="python">test\n    foo\nbar</pre>', code_language_callback=callback) == '\n\n```python\ntest\n    foo\nbar\n```\n\n'
+    assert md('<pre class="python">test\n    foo\nbar</pre>', code_language_callback=callback) == '\n```python\ntest\n    foo\nbar\n```\n'
-    assert md('<pre class="javascript"><code>test\n    foo\nbar</code></pre>', code_language_callback=callback) == '\n\n```javascript\ntest\n    foo\nbar\n```\n\n'
+    assert md('<pre class="javascript"><code>test\n    foo\nbar</code></pre>', code_language_callback=callback) == '\n```javascript\ntest\n    foo\nbar\n```\n'
-    assert md('<pre class="javascript"><code class="javascript">test\n    foo\nbar</code></pre>', code_language_callback=callback) == '\n\n```javascript\ntest\n    foo\nbar\n```\n\n'
+    assert md('<pre class="javascript"><code class="javascript">test\n    foo\nbar</code></pre>', code_language_callback=callback) == '\n```javascript\ntest\n    foo\nbar\n```\n'
 def test_spaces():
    assert md('<p> a b </p> <p> c d </p>') == '\n\na b\n\nc d\n\n'
    assert md('<p> <i>a</i> </p>') == '\n\n*a*\n\n'
    assert md('test <p> again </p>') == 'test\n\nagain\n\n'
    assert md('test <blockquote> text </blockquote> after') == 'test\n> text\n\nafter'
    assert md(' <ol> <li> x </li> <li> y </li> </ol> ') == '\n\n1. x\n2. y\n'
    assert md(' <ul> <li> x </li> <li> y </li> </ol> ') == '\n\n* x\n* y\n'
    assert md('test <pre> foo </pre> bar') == 'test\n\n```\n foo\n```\n\nbar'
--- a/tests/test_custom_converter.py
+++ b/tests/test_custom_converter.py
@@ -2,40 +2,21 @@ from markdownify import MarkdownConverter
 from bs4 import BeautifulSoup
-class UnitTestConverter(MarkdownConverter):
+class ImageBlockConverter(MarkdownConverter):
    """
-    Create a custom MarkdownConverter for unit tests
+    Create a custom MarkdownConverter that adds two newlines after an image
    """
-    def convert_img(self, el, text, parent_tags):
+    def convert_img(self, el, text, convert_as_inline):
-        """Add two newlines after an image"""
+        return super().convert_img(el, text, convert_as_inline) + '\n\n'
        return super().convert_img(el, text, parent_tags) + '\n\n'
    def convert_custom_tag(self, el, text, parent_tags):
        """Ensure conversion function is found for tags with special characters in name"""
        return "convert_custom_tag(): %s" % text
    def convert_h1(self, el, text, parent_tags):
        """Ensure explicit heading conversion function is used"""
        return "convert_h1: %s" % (text)
    def convert_hN(self, n, el, text, parent_tags):
        """Ensure general heading conversion function is used"""
        return "convert_hN(%d): %s" % (n, text)
-def test_custom_conversion_functions():
+def test_img():
    # Create shorthand method for conversion
    def md(html, **options):
-        return UnitTestConverter(**options).convert(html)
+        return ImageBlockConverter(**options).convert(html)
-    assert md('<img src="/path/to/img.jpg" alt="Alt text" title="Optional title" />text') == '![Alt text](/path/to/img.jpg "Optional title")\n\ntext'
+    assert md('<img src="/path/to/img.jpg" alt="Alt text" title="Optional title" />') == '![Alt text](/path/to/img.jpg "Optional title")\n\n'
-    assert md('<img src="/path/to/img.jpg" alt="Alt text" />text') == '![Alt text](/path/to/img.jpg)\n\ntext'
+    assert md('<img src="/path/to/img.jpg" alt="Alt text" />') == '![Alt text](/path/to/img.jpg)\n\n'
    assert md("<custom-tag>text</custom-tag>") == "convert_custom_tag(): text"
    assert md("<h1>text</h1>") == "convert_h1: text"
    assert md("<h3>text</h3>") == "convert_hN(3): text"
 def test_soup():
--- a/tests/test_escaping.py
+++ b/tests/test_escaping.py
@@ -1,6 +1,4 @@
-import warnings
+from markdownify import markdownify as md
 from bs4 import MarkupResemblesLocatorWarning
 from .utils import md
 def test_asterisks():
@@ -14,7 +12,7 @@ def test_underscore():
 def test_xml_entities():
-    assert md('&amp;', escape_misc=True) == r'\&'
+    assert md('&amp;') == r'\&'
 def test_named_entities():
@@ -27,51 +25,23 @@ def test_hexadecimal_entities():
 def test_single_escaping_entities():
-    assert md('&amp;amp;', escape_misc=True) == r'\&amp;'
+    assert md('&amp;amp;') == r'\&amp;'
-def test_misc():
+def text_misc():
-    # ignore the bs4 warning that "1.2" or "*" looks like a filename
+    assert md('\\*') == r'\\\*'
-    warnings.filterwarnings("ignore", category=MarkupResemblesLocatorWarning)
+    assert md('<foo>') == r'\<foo\>'
-
+    assert md('# foo') == r'\# foo'
-    assert md('\\*', escape_misc=True) == r'\\\*'
+    assert md('> foo') == r'\> foo'
-    assert md('&lt;foo>', escape_misc=True) == r'\<foo\>'
+    assert md('~~foo~~') == r'\~\~foo\~\~'
-    assert md('# foo', escape_misc=True) == r'\# foo'
+    assert md('foo\n===\n') == 'foo\n\\=\\=\\=\n'
-    assert md('#5', escape_misc=True) == r'#5'
+    assert md('---\n') == '\\-\\-\\-\n'
-    assert md('5#', escape_misc=True) == '5#'
+    assert md('+ x\n+ y\n') == '\\+ x\n\\+ y\n'
-    assert md('####### foo', escape_misc=True) == r'####### foo'
+    assert md('`x`') == r'\`x\`'
-    assert md('> foo', escape_misc=True) == r'\> foo'
+    assert md('[text](link)') == r'\[text](link)'
-    assert md('~~foo~~', escape_misc=True) == r'\~\~foo\~\~'
+    assert md('1. x') == r'1\. x'
-    assert md('foo\n===\n', escape_misc=True) == 'foo\n\\=\\=\\=\n'
+    assert md('not a number. x') == r'not a number. x'
-    assert md('---\n', escape_misc=True) == '\\---\n'
+    assert md('1) x') == r'1\) x'
-    assert md('- test', escape_misc=True) == r'\- test'
+    assert md('not a number) x') == r'not a number) x'
-    assert md('x - y', escape_misc=True) == r'x \- y'
+    assert md('|not table|') == r'\|not table\|'
-    assert md('test-case', escape_misc=True) == 'test-case'
+    assert md(r'\ <foo> &amp;amp; | ` `', escape_misc=False) == r'\ <foo> &amp; | ` `'
    assert md('x-', escape_misc=True) == 'x-'
    assert md('-y', escape_misc=True) == '-y'
    assert md('+ x\n+ y\n', escape_misc=True) == '\\+ x\n\\+ y\n'
    assert md('`x`', escape_misc=True) == r'\`x\`'
    assert md('[text](notalink)', escape_misc=True) == r'\[text\](notalink)'
    assert md('<a href="link">text]</a>', escape_misc=True) == r'[text\]](link)'
    assert md('<a href="link">[text]</a>', escape_misc=True) == r'[\[text\]](link)'
    assert md('1. x', escape_misc=True) == r'1\. x'
    # assert md('1<span>.</span> x', escape_misc=True) == r'1\. x'
    assert md('<span>1.</span> x', escape_misc=True) == r'1\. x'
    assert md(' 1. x', escape_misc=True) == r' 1\. x'
    assert md('123456789. x', escape_misc=True) == r'123456789\. x'
    assert md('1234567890. x', escape_misc=True) == r'1234567890. x'
    assert md('A1. x', escape_misc=True) == r'A1. x'
    assert md('1.2', escape_misc=True) == r'1.2'
    assert md('not a number. x', escape_misc=True) == r'not a number. x'
    assert md('1) x', escape_misc=True) == r'1\) x'
    # assert md('1<span>)</span> x', escape_misc=True) == r'1\) x'
    assert md('<span>1)</span> x', escape_misc=True) == r'1\) x'
    assert md(' 1) x', escape_misc=True) == r' 1\) x'
    assert md('123456789) x', escape_misc=True) == r'123456789\) x'
    assert md('1234567890) x', escape_misc=True) == r'1234567890) x'
    assert md('(1) x', escape_misc=True) == r'(1) x'
    assert md('A1) x', escape_misc=True) == r'A1) x'
    assert md('1)x', escape_misc=True) == r'1)x'
    assert md('not a number) x', escape_misc=True) == r'not a number) x'
    assert md('|not table|', escape_misc=True) == r'\|not table\|'
    assert md(r'\ &lt;foo> &amp;amp; | ` `', escape_misc=False) == r'\ <foo> &amp; | ` `'
--- a/tests/test_lists.py
+++ b/tests/test_lists.py
@@ -1,4 +1,4 @@
-from .utils import md
+from markdownify import markdownify as md
 nested_uls = """
@@ -41,22 +41,19 @@ nested_ols = """
 def test_ol():
-    assert md('<ol><li>a</li><li>b</li></ol>') == '\n\n1. a\n2. b\n'
+    assert md('<ol><li>a</li><li>b</li></ol>') == '1. a\n2. b\n'
-    assert md('<ol><!--comment--><li>a</li><span/><li>b</li></ol>') == '\n\n1. a\n2. b\n'
+    assert md('<ol start="3"><li>a</li><li>b</li></ol>') == '3. a\n4. b\n'
-    assert md('<ol start="3"><li>a</li><li>b</li></ol>') == '\n\n3. a\n4. b\n'
+    assert md('<ol start="-1"><li>a</li><li>b</li></ol>') == '1. a\n2. b\n'
-    assert md('foo<ol start="3"><li>a</li><li>b</li></ol>bar') == 'foo\n\n3. a\n4. b\n\nbar'
+    assert md('<ol start="foo"><li>a</li><li>b</li></ol>') == '1. a\n2. b\n'
-    assert md('<ol start="-1"><li>a</li><li>b</li></ol>') == '\n\n1. a\n2. b\n'
+    assert md('<ol start="1.5"><li>a</li><li>b</li></ol>') == '1. a\n2. b\n'
    assert md('<ol start="foo"><li>a</li><li>b</li></ol>') == '\n\n1. a\n2. b\n'
    assert md('<ol start="1.5"><li>a</li><li>b</li></ol>') == '\n\n1. a\n2. b\n'
    assert md('<ol start="1234"><li><p>first para</p><p>second para</p></li><li><p>third para</p><p>fourth para</p></li></ol>') == '\n\n1234. first para\n\n      second para\n1235. third para\n\n      fourth para\n'
 def test_nested_ols():
-    assert md(nested_ols) == '\n\n1. 1\n   1. a\n      1. I\n      2. II\n      3. III\n   2. b\n   3. c\n2. 2\n3. 3\n'
+    assert md(nested_ols) == '\n1. 1\n\t1. a\n\t\t1. I\n\t\t2. II\n\t\t3. III\n\t2. b\n\t3. c\n2. 2\n3. 3\n'
 def test_ul():
-    assert md('<ul><li>a</li><li>b</li></ul>') == '\n\n* a\n* b\n'
+    assert md('<ul><li>a</li><li>b</li></ul>') == '* a\n* b\n'
    assert md("""<ul>
     <li>
             a
@@ -64,13 +61,11 @@ def test_ul():
     <li> b </li>
     <li>   c
     </li>
- </ul>""") == '\n\n* a\n* b\n* c\n'
+ </ul>""") == '* a\n* b\n* c\n'
    assert md('<ul><li><p>first para</p><p>second para</p></li><li><p>third para</p><p>fourth para</p></li></ul>') == '\n\n* first para\n\n  second para\n* third para\n\n  fourth para\n'
 def test_inline_ul():
-    assert md('<p>foo</p><ul><li>a</li><li>b</li></ul><p>bar</p>') == '\n\nfoo\n\n* a\n* b\n\nbar\n\n'
+    assert md('<p>foo</p><ul><li>a</li><li>b</li></ul><p>bar</p>') == 'foo\n\n* a\n* b\n\nbar\n\n'
    assert md('foo<ul><li>bar</li></ul>baz') == 'foo\n\n* bar\n\nbaz'
 def test_nested_uls():
@@ -78,12 +73,12 @@ def test_nested_uls():
    Nested ULs should alternate bullet characters.
    """
-    assert md(nested_uls) == '\n\n* 1\n  + a\n    - I\n    - II\n    - III\n  + b\n  + c\n* 2\n* 3\n'
+    assert md(nested_uls) == '\n* 1\n\t+ a\n\t\t- I\n\t\t- II\n\t\t- III\n\t+ b\n\t+ c\n* 2\n* 3\n'
 def test_bullets():
-    assert md(nested_uls, bullets='-') == '\n\n- 1\n  - a\n    - I\n    - II\n    - III\n  - b\n  - c\n- 2\n- 3\n'
+    assert md(nested_uls, bullets='-') == '\n- 1\n\t- a\n\t\t- I\n\t\t- II\n\t\t- III\n\t- b\n\t- c\n- 2\n- 3\n'
 def test_li_text():
-    assert md('<ul><li>foo <a href="#">bar</a></li><li>foo bar  </li><li>foo <b>bar</b>   <i>space</i>.</ul>') == '\n\n* foo [bar](#)\n* foo bar\n* foo **bar** *space*.\n'
+    assert md('<ul><li>foo <a href="#">bar</a></li><li>foo bar  </li><li>foo <b>bar</b>   <i>space</i>.</ul>') == '* foo [bar](#)\n* foo bar\n* foo **bar** *space*.\n'
--- a/tests/test_tables.py
+++ b/tests/test_tables.py
@@ -1,4 +1,4 @@
-from .utils import md
+from markdownify import markdownify as md
 table = """<table>
@@ -141,33 +141,6 @@ table_head_body_missing_head = """<table>
    </tbody>
 </table>"""
 table_head_body_multiple_head = """<table>
    <thead>
        <tr>
            <td>Creator</td>
            <td>Editor</td>
            <td>Server</td>
        </tr>
        <tr>
            <td>Operator</td>
            <td>Manager</td>
            <td>Engineer</td>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Bob</td>
            <td>Oliver</td>
            <td>Tom</td>
        </tr>
        <tr>
            <td>Thomas</td>
            <td>Lucas</td>
            <td>Ethan</td>
        </tr>
    </tbody>
 </table>"""
 table_missing_text = """<table>
    <thead>
        <tr>
@@ -228,10 +201,7 @@ table_body = """<table>
    </tbody>
 </table>"""
-table_with_caption = """TEXT<table>
+table_with_caption = """TEXT<table><caption>Caption</caption>
    <caption>
        Caption
    </caption>
    <tbody><tr><td>Firstname</td>
            <td>Lastname</td>
            <td>Age</td>
@@ -267,55 +237,18 @@ table_with_undefined_colspan = """<table>
    </tr>
 </table>"""
 table_with_colspan_missing_head = """<table>
    <tr>
        <td colspan="2">Name</td>
        <td>Age</td>
    </tr>
    <tr>
        <td>Jill</td>
        <td>Smith</td>
        <td>50</td>
    </tr>
    <tr>
        <td>Eve</td>
        <td>Jackson</td>
        <td>94</td>
    </tr>
 </table>"""
 def test_table():
    assert md(table) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
    assert md(table_with_html_content) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| **Jill** | *Smith* | [50](#) |\n| Eve | Jackson | 94 |\n\n'
    assert md(table_with_paragraphs) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
-    assert md(table_with_linebreaks) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith Jackson | 50 |\n| Eve | Jackson Smith | 94 |\n\n'
+    assert md(table_with_linebreaks) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith  Jackson | 50 |\n| Eve | Jackson  Smith | 94 |\n\n'
    assert md(table_with_header_column) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
    assert md(table_head_body) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
    assert md(table_head_body_multiple_head) == '\n\n|  |  |  |\n| --- | --- | --- |\n| Creator | Editor | Server |\n| Operator | Manager | Engineer |\n| Bob | Oliver | Tom |\n| Thomas | Lucas | Ethan |\n\n'
    assert md(table_head_body_missing_head) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
    assert md(table_missing_text) == '\n\n|  | Lastname | Age |\n| --- | --- | --- |\n| Jill |  | 50 |\n| Eve | Jackson | 94 |\n\n'
-    assert md(table_missing_head) == '\n\n|  |  |  |\n| --- | --- | --- |\n| Firstname | Lastname | Age |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
+    assert md(table_missing_head) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
-    assert md(table_body) == '\n\n|  |  |  |\n| --- | --- | --- |\n| Firstname | Lastname | Age |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
+    assert md(table_body) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
-    assert md(table_with_caption) == 'TEXT\n\nCaption\n\n|  |  |  |\n| --- | --- | --- |\n| Firstname | Lastname | Age |\n\n'
+    assert md(table_with_caption) == 'TEXT\n\nCaption\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n\n'
    assert md(table_with_colspan) == '\n\n| Name | | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
    assert md(table_with_undefined_colspan) == '\n\n| Name | Age |\n| --- | --- |\n| Jill | Smith |\n\n'
    assert md(table_with_colspan_missing_head) == '\n\n|  |  |  |\n| --- | --- | --- |\n| Name | | Age |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
 def test_table_infer_header():
    assert md(table, table_infer_header=True) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
    assert md(table_with_html_content, table_infer_header=True) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| **Jill** | *Smith* | [50](#) |\n| Eve | Jackson | 94 |\n\n'
    assert md(table_with_paragraphs, table_infer_header=True) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
    assert md(table_with_linebreaks, table_infer_header=True) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith Jackson | 50 |\n| Eve | Jackson Smith | 94 |\n\n'
    assert md(table_with_header_column, table_infer_header=True) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
    assert md(table_head_body, table_infer_header=True) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
    assert md(table_head_body_multiple_head, table_infer_header=True) == '\n\n| Creator | Editor | Server |\n| --- | --- | --- |\n| Operator | Manager | Engineer |\n| Bob | Oliver | Tom |\n| Thomas | Lucas | Ethan |\n\n'
    assert md(table_head_body_missing_head, table_infer_header=True) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
    assert md(table_missing_text, table_infer_header=True) == '\n\n|  | Lastname | Age |\n| --- | --- | --- |\n| Jill |  | 50 |\n| Eve | Jackson | 94 |\n\n'
    assert md(table_missing_head, table_infer_header=True) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
    assert md(table_body, table_infer_header=True) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
    assert md(table_with_caption, table_infer_header=True) == 'TEXT\n\nCaption\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n\n'
    assert md(table_with_colspan, table_infer_header=True) == '\n\n| Name | | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
    assert md(table_with_undefined_colspan, table_infer_header=True) == '\n\n| Name | Age |\n| --- | --- |\n| Jill | Smith |\n\n'
    assert md(table_with_colspan_missing_head, table_infer_header=True) == '\n\n| Name | | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
--- a/tests/types.py
+++ b/tests/types.py
@@ -1,70 +0,0 @@
 from markdownify import markdownify, ASTERISK, BACKSLASH, LSTRIP, RSTRIP, SPACES, STRIP, UNDERLINED, UNDERSCORE, MarkdownConverter
 from bs4 import BeautifulSoup
 from typing import Union
 markdownify("<p>Hello</p>") == "Hello"  # test default of STRIP
 markdownify("<p>Hello</p>", strip_document=LSTRIP) == "Hello\n\n"
 markdownify("<p>Hello</p>", strip_document=RSTRIP) == "\n\nHello"
 markdownify("<p>Hello</p>", strip_document=STRIP) == "Hello"
 markdownify("<p>Hello</p>", strip_document=None) == "\n\nHello\n\n"
 # default options
 MarkdownConverter(
    autolinks=True,
    bs4_options='html.parser',
    bullets='*+-',
    code_language='',
    code_language_callback=None,
    convert=None,
    default_title=False,
    escape_asterisks=True,
    escape_underscores=True,
    escape_misc=False,
    heading_style=UNDERLINED,
    keep_inline_images_in=[],
    newline_style=SPACES,
    strip=None,
    strip_document=STRIP,
    strip_pre=STRIP,
    strong_em_symbol=ASTERISK,
    sub_symbol='',
    sup_symbol='',
    table_infer_header=False,
    wrap=False,
    wrap_width=80,
 ).convert("")
 # custom options
 MarkdownConverter(
    strip_document=None,
    bullets="-",
    escape_asterisks=True,
    escape_underscores=True,
    escape_misc=True,
    autolinks=True,
    default_title=True,
    newline_style=BACKSLASH,
    sup_symbol='^',
    sub_symbol='^',
    keep_inline_images_in=['h3'],
    wrap=True,
    wrap_width=80,
    strong_em_symbol=UNDERSCORE,
    code_language='python',
    code_language_callback=None
 ).convert("")
 html = '<b>test</b>'
 soup = BeautifulSoup(html, 'html.parser')
 MarkdownConverter().convert_soup(soup) == '**test**'
 def callback(el: BeautifulSoup) -> Union[str, None]:
    return el['class'][0] if el.has_attr('class') else None
 MarkdownConverter(code_language_callback=callback).convert("")
 MarkdownConverter(code_language_callback=lambda el: None).convert("")
 markdownify('<pre class="python">test\n    foo\nbar</pre>', code_language_callback=callback)
 markdownify('<pre class="python">test\n    foo\nbar</pre>', code_language_callback=lambda el: None)
--- a/tests/utils.py
+++ b/tests/utils.py
@@ -1,9 +0,0 @@
 from markdownify import MarkdownConverter
 # for unit testing, disable document-level stripping by default so that
 # separation newlines are included in testing
 def md(html, **options):
    options = {"strip_document": None, **options}
    return MarkdownConverter(**options).convert(html)
Author	SHA1	Message	Date
AlexVonB	4c23c0655f	use static version instead of dynamic git tag info	2024-07-14 22:34:30 +02:00
AlexVonB	e2ace9d633	test build in develop and pull requests	2024-07-14 22:10:01 +02:00
AlexVonB	a5615f7d80	Merge branch 'pyproject.toml' of https://github.com/KOLANICH-libs/markdownify.py into KOLANICH-libs-pyproject.toml	2024-07-14 21:53:09 +02:00
KOLANICH	67100595ae	Migrated the metadata into `PEP 621`-compliant `pyproject.toml`, got rid of `setup.cfg`.	2022-11-10 15:29:25 +03:00
KOLANICH	deba8b5e54	Started populating version automatically from git tags using `setuptools_scm`.	2022-11-10 15:27:26 +03:00
KOLANICH	ca88e4e49d	Move the metadata from `setup.py` into `setup.cfg`. Added `pyproject.toml`. Removed `setup.py` - it is no longer needed. Got rid of tests erroroneously finding their way into the wheel.	2022-11-10 15:25:39 +03:00