use static version instead of dynamic git tag info

test build in develop and pull requests
Merge branch 'pyproject.toml' of https://github.com/KOLANICH-libs/markdownify.py into KOLANICH-libs-pyproject.toml
2024-07-14 22:34:30 +02:00 · 2024-07-14 22:10:01 +02:00 · 2024-07-14 21:53:09 +02:00 · 2024-07-14 21:19:35 +02:00 · 2024-07-14 21:02:49 +02:00 · 2024-06-23 14:30:07 +02:00
18 changed files with 1059 additions and 389 deletions
--- a/.github/workflows/python-app.yml
+++ b/.github/workflows/python-app.yml
@@ -23,11 +23,10 @@ jobs:
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
-        pip install flake8==3.8.4 pytest
-        if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
-    - name: Lint with flake8
+        pip install --upgrade setuptools setuptools_scm wheel build tox
+    - name: Lint and test
      run: |
-        python setup.py lint
-    - name: Test with pytest
+        tox
+    - name: Build
      run: |
-        python setup.py test
+        python -m build -nwsx .
--- a/.github/workflows/python-publish.yml
+++ b/.github/workflows/python-publish.yml
@@ -21,11 +21,11 @@ jobs:
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
-        pip install setuptools wheel twine
+        pip install --upgrade setuptools setuptools_scm wheel build twine
    - name: Build and publish
      env:
        TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
        TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
      run: |
-        python setup.py sdist bdist_wheel
+        python -m build -nwsx .
        twine upload dist/*
--- a/.gitignore
+++ b/.gitignore
@@ -8,3 +8,5 @@
 /MANIFEST
 /venv
 build/
+.vscode/settings.json
+.tox/
--- a/MANIFEST.in
+++ b/MANIFEST.in
@@ -1 +1,2 @@
 include README.rst
+prune tests
--- a/README.rst
+++ b/README.rst
@@ -1,8 +1,8 @@
 |build| |version| |license| |downloads|

-.. |build| image:: https://img.shields.io/github/workflow/status/matthewwithanm/python-markdownify/Python%20application/develop
+.. |build| image:: https://img.shields.io/github/actions/workflow/status/matthewwithanm/python-markdownify/python-app.yml?branch=develop
    :alt: GitHub Workflow Status
-    :target: https://github.com/matthewwithanm/python-markdownify/actions?query=workflow%3A%22Python+application%22
+    :target: https://github.com/matthewwithanm/python-markdownify/actions/workflows/python-app.yml?query=workflow%3A%22Python+application%22

 .. |version| image:: https://img.shields.io/pypi/v/markdownify
    :alt: Pypi version
@@ -32,14 +32,14 @@ Convert some HTML to Markdown:
    from markdownify import markdownify as md
    md('<b>Yay</b> <a href="http://github.com">GitHub</a>')  # > '**Yay** [GitHub](http://github.com)'

-Specify tags to exclude (blacklist):
+Specify tags to exclude:

 .. code:: python

    from markdownify import markdownify as md
    md('<b>Yay</b> <a href="http://github.com">GitHub</a>', strip=['a'])  # > '**Yay** GitHub'

-\...or specify the tags you want to include (whitelist):
+\...or specify the tags you want to include:

 .. code:: python

@@ -53,16 +53,20 @@ Options
 Markdownify supports the following options:

 strip
-  A list of tags to strip (blacklist). This option can't be used with the
+  A list of tags to strip. This option can't be used with the
  ``convert`` option.

 convert
-  A list of tags to convert (whitelist). This option can't be used with the
+  A list of tags to convert. This option can't be used with the
  ``strip`` option.

 autolinks
  A boolean indicating whether the "automatic link" style should be used when
-  a ``a`` tag's contents match its href. Defaults to ``True``
+  a ``a`` tag's contents match its href. Defaults to ``True``.
+
+default_title
+  A boolean to enable setting the title of a link to its href, if no title is
+  given. Defaults to ``False``.

 heading_style
  Defines how headings should be converted. Accepted values are ``ATX``,
@@ -80,24 +84,135 @@ strong_em_symbol
  *emphasized* texts. Either of these symbols can be chosen by the options
  ``ASTERISK`` (default) or ``UNDERSCORE`` respectively.

+sub_symbol, sup_symbol
+  Define the chars that surround ``<sub>`` and ``<sup>`` text. Defaults to an
+  empty string, because this is non-standard behavior. Could be something like
+  ``~`` and ``^`` to result in ``~sub~`` and ``^sup^``.  If the value starts
+  with ``<`` and ends with ``>``, it is treated as an HTML tag and a ``/`` is
+  inserted after the ``<`` in the string used after the text; this allows
+  specifying ``<sub>`` to use raw HTML in the output for subscripts, for
+  example.
+
 newline_style
  Defines the style of marking linebreaks (``<br>``) in markdown. The default
  value ``SPACES`` of this option will adopt the usual two spaces and a newline,
-  while ``BACKSLASH`` will convert a linebreak to ``\\n`` (a backslash an a
+  while ``BACKSLASH`` will convert a linebreak to ``\\n`` (a backslash and a
  newline). While the latter convention is non-standard, it is commonly
  preferred and supported by a lot of interpreters.

+code_language
+  Defines the language that should be assumed for all ``<pre>`` sections.
+  Useful, if all code on a page is in the same programming language and
+  should be annotated with `````python`` or similar.
+  Defaults to ``''`` (empty string) and can be any string.
+
+code_language_callback
+  When the HTML code contains ``pre`` tags that in some way provide the code
+  language, for example as class, this callback can be used to extract the
+  language from the tag and prefix it to the converted ``pre`` tag.
+  The callback gets one single argument, an BeautifylSoup object, and returns
+  a string containing the code language, or ``None``.
+  An example to use the class name as code language could be::
+
+    def callback(el):
+        return el['class'][0] if el.has_attr('class') else None
+
+  Defaults to ``None``.
+
+escape_asterisks
+  If set to ``False``, do not escape ``*`` to ``\*`` in text.
+  Defaults to ``True``.
+
+escape_underscores
+  If set to ``False``, do not escape ``_`` to ``\_`` in text.
+  Defaults to ``True``.
+
+escape_misc
+  If set to ``False``, do not escape miscellaneous punctuation characters
+  that sometimes have Markdown significance in text.
+  Defaults to ``True``.
+
+keep_inline_images_in
+  Images are converted to their alt-text when the images are located inside
+  headlines or table cells. If some inline images should be converted to
+  markdown images instead, this option can be set to a list of parent tags
+  that should be allowed to contain inline images, for example ``['td']``.
+  Defaults to an empty list.
+
+wrap, wrap_width
+  If ``wrap`` is set to ``True``, all text paragraphs are wrapped at
+  ``wrap_width`` characters. Defaults to ``False`` and ``80``.
+  Use with ``newline_style=BACKSLASH`` to keep line breaks in paragraphs.
+
 Options may be specified as kwargs to the ``markdownify`` function, or as a
 nested ``Options`` class in ``MarkdownConverter`` subclasses.


+Converting BeautifulSoup objects
+================================
+
+.. code:: python
+
+    from markdownify import MarkdownConverter
+
+    # Create shorthand method for conversion
+    def md(soup, **options):
+        return MarkdownConverter(**options).convert_soup(soup)
+
+
+Creating Custom Converters
+==========================
+
+If you have a special usecase that calls for a special conversion, you can
+always inherit from ``MarkdownConverter`` and override the method you want to
+change.
+The function that handles a HTML tag named ``abc`` is called
+``convert_abc(self, el, text, convert_as_inline)`` and returns a string
+containing the converted HTML tag.
+The ``MarkdownConverter`` object will handle the conversion based on the
+function names:
+
+.. code:: python
+
+    from markdownify import MarkdownConverter
+
+    class ImageBlockConverter(MarkdownConverter):
+        """
+        Create a custom MarkdownConverter that adds two newlines after an image
+        """
+        def convert_img(self, el, text, convert_as_inline):
+            return super().convert_img(el, text, convert_as_inline) + '\n\n'
+
+    # Create shorthand method for conversion
+    def md(html, **options):
+        return ImageBlockConverter(**options).convert(html)
+
+.. code:: python
+
+    from markdownify import MarkdownConverter
+
+    class IgnoreParagraphsConverter(MarkdownConverter):
+        """
+        Create a custom MarkdownConverter that ignores paragraphs
+        """
+        def convert_p(self, el, text, convert_as_inline):
+            return ''
+
+    # Create shorthand method for conversion
+    def md(html, **options):
+        return IgnoreParagraphsConverter(**options).convert(html)
+
+
+Command Line Interface
+======================
+
+Use ``markdownify example.html > example.md`` or pipe input from stdin
+(``cat example.html | markdownify > example.md``).
+Call ``markdownify -h`` to see all available options.
+They are the same as listed above and take the same arguments.
+
+
 Development
 ===========

-To run tests:
-
-``python setup.py test``
-
-To lint:
-
-``python setup.py lint``
+To run tests and the linter run ``pip install tox`` once, then ``tox``.
--- a/markdownify/init.py
+++ b/markdownify/init.py
@@ -1,4 +1,5 @@
-from bs4 import BeautifulSoup, NavigableString, Comment
+from bs4 import BeautifulSoup, NavigableString, Comment, Doctype
+from textwrap import fill
 import re
 import six

@@ -25,12 +26,6 @@ ASTERISK = '*'
 UNDERSCORE = '_'


-def escape(text):
-    if not text:
-        return ''
-    return text.replace('_', r'\_')
-
-
 def chomp(text):
    """
    If the text in an inline tag like b, a, or em contains a leading or trailing
@@ -44,19 +39,53 @@ def chomp(text):
    return (prefix, suffix, text)


+def abstract_inline_conversion(markup_fn):
+    """
+    This abstracts all simple inline tags like b, em, del, ...
+    Returns a function that wraps the chomped text in a pair of the string
+    that is returned by markup_fn, with '/' inserted in the string used after
+    the text if it looks like an HTML tag. markup_fn is necessary to allow for
+    references to self.strong_em_symbol etc.
+    """
+    def implementation(self, el, text, convert_as_inline):
+        markup_prefix = markup_fn(self)
+        if markup_prefix.startswith('<') and markup_prefix.endswith('>'):
+            markup_suffix = '</' + markup_prefix[1:]
+        else:
+            markup_suffix = markup_prefix
+        if el.find_parent(['pre', 'code', 'kbd', 'samp']):
+            return text
+        prefix, suffix, text = chomp(text)
+        if not text:
+            return ''
+        return '%s%s%s%s%s' % (prefix, markup_prefix, text, markup_suffix, suffix)
+    return implementation
+
+
 def _todict(obj):
    return dict((k, getattr(obj, k)) for k in dir(obj) if not k.startswith('_'))


 class MarkdownConverter(object):
    class DefaultOptions:
-        strip = None
-        convert = None
        autolinks = True
-        heading_style = UNDERLINED
        bullets = '*+-'  # An iterable of bullet types.
-        strong_em_symbol = ASTERISK
+        code_language = ''
+        code_language_callback = None
+        convert = None
+        default_title = False
+        escape_asterisks = True
+        escape_underscores = True
+        escape_misc = True
+        heading_style = UNDERLINED
+        keep_inline_images_in = []
        newline_style = SPACES
+        strip = None
+        strong_em_symbol = ASTERISK
+        sub_symbol = ''
+        sup_symbol = ''
+        wrap = False
+        wrap_width = 80

    class Options(DefaultOptions):
        pass
@@ -73,26 +102,48 @@ class MarkdownConverter(object):

    def convert(self, html):
        soup = BeautifulSoup(html, 'html.parser')
+        return self.convert_soup(soup)
+
+    def convert_soup(self, soup):
        return self.process_tag(soup, convert_as_inline=False, children_only=True)

    def process_tag(self, node, convert_as_inline, children_only=False):
        text = ''
-        # markdown headings can't include block elements (elements w/newlines)
+
+        # markdown headings or cells can't include
+        # block elements (elements w/newlines)
        isHeading = html_heading_re.match(node.name) is not None
+        isCell = node.name in ['td', 'th']
        convert_children_as_inline = convert_as_inline

-        if not children_only and isHeading:
+        if not children_only and (isHeading or isCell):
            convert_children_as_inline = True

-        # Remove whitespace-only textnodes in lists
-        if node.name in ['ol', 'ul', 'li']:
+        # Remove whitespace-only textnodes in purely nested nodes
+        def is_nested_node(el):
+            return el and el.name in ['ol', 'ul', 'li',
+                                      'table', 'thead', 'tbody', 'tfoot',
+                                      'tr', 'td', 'th']
+
+        if is_nested_node(node):
            for el in node.children:
-                if isinstance(el, NavigableString) and six.text_type(el).strip() == '':
+                # Only extract (remove) whitespace-only text node if any of the
+                # conditions is true:
+                # - el is the first element in its parent
+                # - el is the last element in its parent
+                # - el is adjacent to an nested node
+                can_extract = (not el.previous_sibling
+                               or not el.next_sibling
+                               or is_nested_node(el.previous_sibling)
+                               or is_nested_node(el.next_sibling))
+                if (isinstance(el, NavigableString)
+                        and six.text_type(el).strip() == ''
+                        and can_extract):
                    el.extract()

        # Convert the children first
        for el in node.children:
-            if isinstance(el, Comment):
+            if isinstance(el, Comment) or isinstance(el, Doctype):
                continue
            elif isinstance(el, NavigableString):
                text += self.process_text(el)
@@ -107,10 +158,25 @@ class MarkdownConverter(object):
        return text

    def process_text(self, el):
-        text = six.text_type(el)
-        if el.parent.name == 'li':
-            return escape(all_whitespace_re.sub(' ', text or '')).rstrip()
-        return escape(whitespace_re.sub(' ', text or ''))
+        text = six.text_type(el) or ''
+
+        # normalize whitespace if we're not inside a preformatted element
+        if not el.find_parent('pre'):
+            text = whitespace_re.sub(' ', text)
+
+        # escape special characters if we're not inside a preformatted or code element
+        if not el.find_parent(['pre', 'code', 'kbd', 'samp']):
+            text = self.escape(text)
+
+        # remove trailing whitespaces if any of the following condition is true:
+        # - current text node is the last node in li
+        # - current text node is followed by an embedded list
+        if (el.parent.name == 'li'
+                and (not el.next_sibling
+                     or el.next_sibling.name in ['ul', 'ol'])):
+            text = text.rstrip()
+
+        return text

    def __getattr__(self, attr):
        # Handle headings
@@ -138,6 +204,18 @@ class MarkdownConverter(object):
        else:
            return True

+    def escape(self, text):
+        if not text:
+            return ''
+        if self.options['escape_misc']:
+            text = re.sub(r'([\\&<`[>~#=+|-])', r'\\\1', text)
+            text = re.sub(r'([0-9])([.)])', r'\1\\\2', text)
+        if self.options['escape_asterisks']:
+            text = text.replace('*', r'\*')
+        if self.options['escape_underscores']:
+            text = text.replace('_', r'\_')
+        return text
+
    def indent(self, text, level):
        return line_beginning_re.sub('\t' * level, text) if text else ''

@@ -149,26 +227,28 @@ class MarkdownConverter(object):
        prefix, suffix, text = chomp(text)
        if not text:
            return ''
-        if convert_as_inline:
-            return text
        href = el.get('href')
        title = el.get('title')
        # For the replacement see #29: text nodes underscores are escaped
-        if self.options['autolinks'] and text.replace(r'\_', '_') == href and not title:
+        if (self.options['autolinks']
+                and text.replace(r'\_', '_') == href
+                and not title
+                and not self.options['default_title']):
            # Shortcut syntax
            return '<%s>' % href
+        if self.options['default_title'] and not title:
+            title = href
        title_part = ' "%s"' % title.replace('"', r'\"') if title else ''
        return '%s[%s](%s%s)%s' % (prefix, text, href, title_part, suffix) if href else text

-    def convert_b(self, el, text, convert_as_inline):
-        return self.convert_strong(el, text, convert_as_inline)
+    convert_b = abstract_inline_conversion(lambda self: 2 * self.options['strong_em_symbol'])

    def convert_blockquote(self, el, text, convert_as_inline):

        if convert_as_inline:
            return text

-        return '\n' + (line_beginning_re.sub('> ', text) + '\n\n') if text else ''
+        return '\n' + (line_beginning_re.sub('> ', text.strip()) + '\n\n') if text else ''

    def convert_br(self, el, text, convert_as_inline):
        if convert_as_inline:
@@ -179,19 +259,24 @@ class MarkdownConverter(object):
        else:
            return '  \n'

-    def convert_em(self, el, text, convert_as_inline):
-        em_tag = self.options['strong_em_symbol']
-        prefix, suffix, text = chomp(text)
-        if not text:
-            return ''
-        return '%s%s%s%s%s' % (prefix, em_tag, text, em_tag, suffix)
+    def convert_code(self, el, text, convert_as_inline):
+        if el.parent.name == 'pre':
+            return text
+        converter = abstract_inline_conversion(lambda self: '`')
+        return converter(self, el, text, convert_as_inline)
+
+    convert_del = abstract_inline_conversion(lambda self: '~~')
+
+    convert_em = abstract_inline_conversion(lambda self: self.options['strong_em_symbol'])
+
+    convert_kbd = convert_code

    def convert_hn(self, n, el, text, convert_as_inline):
        if convert_as_inline:
            return text

        style = self.options['heading_style'].lower()
-        text = text.rstrip()
+        text = text.strip()
        if style == UNDERLINED and n <= 2:
            line = '=' if n == 1 else '-'
            return self.underline(text, line)
@@ -200,8 +285,21 @@ class MarkdownConverter(object):
            return '%s %s %s\n\n' % (hashes, text, hashes)
        return '%s %s\n\n' % (hashes, text)

-    def convert_i(self, el, text, convert_as_inline):
-        return self.convert_em(el, text, convert_as_inline)
+    def convert_hr(self, el, text, convert_as_inline):
+        return '\n\n---\n\n'
+
+    convert_i = convert_em
+
+    def convert_img(self, el, text, convert_as_inline):
+        alt = el.attrs.get('alt', None) or ''
+        src = el.attrs.get('src', None) or ''
+        title = el.attrs.get('title', None) or ''
+        title_part = ' "%s"' % title.replace('"', r'\"') if title else ''
+        if (convert_as_inline
+                and el.parent.name not in self.options['keep_inline_images_in']):
+            return alt
+
+        return '![%s](%s%s)' % (alt, src, title_part)

    def convert_list(self, el, text, convert_as_inline):

@@ -228,7 +326,7 @@ class MarkdownConverter(object):
    def convert_li(self, el, text, convert_as_inline):
        parent = el.parent
        if parent is not None and parent.name == 'ol':
-            if parent.get("start"):
+            if parent.get("start") and str(parent.get("start")).isnumeric():
                start = int(parent.get("start"))
            else:
                start = 1
@@ -241,49 +339,94 @@ class MarkdownConverter(object):
                el = el.parent
            bullets = self.options['bullets']
            bullet = bullets[depth % len(bullets)]
-        return '%s %s\n' % (bullet, text or '')
+        return '%s %s\n' % (bullet, (text or '').strip())

    def convert_p(self, el, text, convert_as_inline):
        if convert_as_inline:
            return text
+        if self.options['wrap']:
+            text = fill(text,
+                        width=self.options['wrap_width'],
+                        break_long_words=False,
+                        break_on_hyphens=False)
        return '%s\n\n' % text if text else ''

-    def convert_strong(self, el, text, convert_as_inline):
-        strong_tag = 2 * self.options['strong_em_symbol']
-        prefix, suffix, text = chomp(text)
+    def convert_pre(self, el, text, convert_as_inline):
        if not text:
            return ''
-        return '%s%s%s%s%s' % (prefix, strong_tag, text, strong_tag, suffix)
+        code_language = self.options['code_language']

-    def convert_img(self, el, text, convert_as_inline):
-        alt = el.attrs.get('alt', None) or ''
-        src = el.attrs.get('src', None) or ''
-        title = el.attrs.get('title', None) or ''
-        title_part = ' "%s"' % title.replace('"', r'\"') if title else ''
-        if convert_as_inline:
-            return alt
+        if self.options['code_language_callback']:
+            code_language = self.options['code_language_callback'](el) or code_language

-        return '![%s](%s%s)' % (alt, src, title_part)
+        return '\n```%s\n%s\n```\n' % (code_language, text)
+
+    def convert_script(self, el, text, convert_as_inline):
+        return ''
+
+    def convert_style(self, el, text, convert_as_inline):
+        return ''
+
+    convert_s = convert_del
+
+    convert_strong = convert_b
+
+    convert_samp = convert_code
+
+    convert_sub = abstract_inline_conversion(lambda self: self.options['sub_symbol'])
+
+    convert_sup = abstract_inline_conversion(lambda self: self.options['sup_symbol'])

    def convert_table(self, el, text, convert_as_inline):
-        rows = el.find_all('tr')
-        text_data = []
-        for row in rows:
-            headers = row.find_all('th')
-            columns = row.find_all('td')
-            if len(headers) > 0:
-                headers = [head.text.strip() for head in headers]
-                text_data.append('| ' + ' | '.join(headers) + ' |')
-                text_data.append('| ' + ' | '.join(['---'] * len(headers)) + ' |')
-            elif len(columns) > 0:
-                columns = [colm.text.strip() for colm in columns]
-                text_data.append('| ' + ' | '.join(columns) + ' |')
-            else:
-                continue
-        return '\n'.join(text_data)
+        return '\n\n' + text + '\n'

-    def convert_hr(self, el, text, convert_as_inline):
-        return '\n\n---\n\n'
+    def convert_caption(self, el, text, convert_as_inline):
+        return text + '\n'
+
+    def convert_figcaption(self, el, text, convert_as_inline):
+        return '\n\n' + text + '\n\n'
+
+    def convert_td(self, el, text, convert_as_inline):
+        colspan = 1
+        if 'colspan' in el.attrs and el['colspan'].isdigit():
+            colspan = int(el['colspan'])
+        return ' ' + text.strip().replace("\n", " ") + ' |' * colspan
+
+    def convert_th(self, el, text, convert_as_inline):
+        colspan = 1
+        if 'colspan' in el.attrs and el['colspan'].isdigit():
+            colspan = int(el['colspan'])
+        return ' ' + text.strip().replace("\n", " ") + ' |' * colspan
+
+    def convert_tr(self, el, text, convert_as_inline):
+        cells = el.find_all(['td', 'th'])
+        is_headrow = (
+            all([cell.name == 'th' for cell in cells])
+            or (not el.previous_sibling and not el.parent.name == 'tbody')
+            or (not el.previous_sibling and el.parent.name == 'tbody' and len(el.parent.parent.find_all(['thead'])) < 1)
+        )
+        overline = ''
+        underline = ''
+        if is_headrow and not el.previous_sibling:
+            # first row and is headline: print headline underline
+            full_colspan = 0
+            for cell in cells:
+                if 'colspan' in cell.attrs and cell['colspan'].isdigit():
+                    full_colspan += int(cell["colspan"])
+                else:
+                    full_colspan += 1
+            underline += '| ' + ' | '.join(['---'] * full_colspan) + ' |' + '\n'
+        elif (not el.previous_sibling
+              and (el.parent.name == 'table'
+                   or (el.parent.name == 'tbody'
+                       and not el.parent.previous_sibling))):
+            # first row, not headline, and:
+            # - the parent is table or
+            # - the parent is tbody at the beginning of a table.
+            # print empty headline above this row
+            overline += '| ' + ' | '.join([''] * len(cells)) + ' |' + '\n'
+            overline += '| ' + ' | '.join(['---'] * len(cells)) + ' |' + '\n'
+        return overline + '|' + text + '\n' + underline


 def markdownify(html, **options):
--- a/markdownify/main.py
+++ b/markdownify/main.py
@@ -0,0 +1,73 @@
+#!/usr/bin/env python
+
+import argparse
+import sys
+
+from markdownify import markdownify, ATX, ATX_CLOSED, UNDERLINED, \
+    SPACES, BACKSLASH, ASTERISK, UNDERSCORE
+
+
+def main(argv=sys.argv[1:]):
+    parser = argparse.ArgumentParser(
+        prog='markdownify',
+        description='Converts html to markdown.',
+    )
+
+    parser.add_argument('html', nargs='?', type=argparse.FileType('r'),
+                        default=sys.stdin,
+                        help="The html file to convert. Defaults to STDIN if not "
+                        "provided.")
+    parser.add_argument('-s', '--strip', nargs='*',
+                        help="A list of tags to strip. This option can't be used with "
+                        "the --convert option.")
+    parser.add_argument('-c', '--convert', nargs='*',
+                        help="A list of tags to convert. This option can't be used with "
+                        "the --strip option.")
+    parser.add_argument('-a', '--autolinks', action='store_true',
+                        help="A boolean indicating whether the 'automatic link' style "
+                        "should be used when a 'a' tag's contents match its href.")
+    parser.add_argument('--default-title', action='store_false',
+                        help="A boolean to enable setting the title of a link to its "
+                        "href, if no title is given.")
+    parser.add_argument('--heading-style', default=UNDERLINED,
+                        choices=(ATX, ATX_CLOSED, UNDERLINED),
+                        help="Defines how headings should be converted.")
+    parser.add_argument('-b', '--bullets', default='*+-',
+                        help="A string of bullet styles to use; the bullet will "
+                        "alternate based on nesting level.")
+    parser.add_argument('--strong-em-symbol', default=ASTERISK,
+                        choices=(ASTERISK, UNDERSCORE),
+                        help="Use * or _ to convert strong and italics text"),
+    parser.add_argument('--sub-symbol', default='',
+                        help="Define the chars that surround '<sub>'.")
+    parser.add_argument('--sup-symbol', default='',
+                        help="Define the chars that surround '<sup>'.")
+    parser.add_argument('--newline-style', default=SPACES,
+                        choices=(SPACES, BACKSLASH),
+                        help="Defines the style of <br> conversions: two spaces "
+                        "or backslash at the and of the line thet should break.")
+    parser.add_argument('--code-language', default='',
+                        help="Defines the language that should be assumed for all "
+                        "'<pre>' sections.")
+    parser.add_argument('--no-escape-asterisks', dest='escape_asterisks',
+                        action='store_false',
+                        help="Do not escape '*' to '\\*' in text.")
+    parser.add_argument('--no-escape-underscores', dest='escape_underscores',
+                        action='store_false',
+                        help="Do not escape '_' to '\\_' in text.")
+    parser.add_argument('-i', '--keep-inline-images-in', nargs='*',
+                        help="Images are converted to their alt-text when the images are "
+                        "located inside headlines or table cells. If some inline images "
+                        "should be converted to markdown images instead, this option can "
+                        "be set to a list of parent tags that should be allowed to "
+                        "contain inline images.")
+    parser.add_argument('-w', '--wrap', action='store_true',
+                        help="Wrap all text paragraphs at --wrap-width characters.")
+    parser.add_argument('--wrap-width', type=int, default=80)
+
+    args = parser.parse_args(argv)
+    print(markdownify(**vars(args)))
+
+
+if __name__ == '__main__':
+    main()
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -0,0 +1,45 @@
+[build-system]
+requires = ["setuptools>=61.2", "setuptools_scm[toml]>=3.4.3"]
+build-backend = "setuptools.build_meta"
+
+[project]
+name = "markdownify"
+version = "0.13.0"
+authors = [{name = "Matthew Tretter", email = "m@tthewwithanm.com"}]
+description = "Convert HTML to markdown."
+readme = "README.rst"
+classifiers = [
+    "Environment :: Web Environment",
+    "Framework :: Django",
+    "Intended Audience :: Developers",
+    "License :: OSI Approved :: MIT License",
+    "Operating System :: OS Independent",
+    "Programming Language :: Python :: 2.5",
+    "Programming Language :: Python :: 2.6",
+    "Programming Language :: Python :: 2.7",
+    "Programming Language :: Python :: 3.6",
+    "Programming Language :: Python :: 3.7",
+    "Programming Language :: Python :: 3.8",
+    "Topic :: Utilities",
+]
+dependencies = [
+    "beautifulsoup4>=4.9,<5",
+    "six>=1.15,<2"
+]
+
+[project.urls]
+Homepage = "http://github.com/matthewwithanm/python-markdownify"
+Download = "http://github.com/matthewwithanm/python-markdownify/tarball/master"
+
+[project.scripts]
+markdownify = "markdownify.main:main"
+
+[tool.setuptools]
+zip-safe = false
+include-package-data = true
+
+[tool.setuptools.packages.find]
+include = ["markdownify", "markdownify.*"]
+namespaces = false
+
+[tool.setuptools_scm]
--- a/setup.cfg
+++ b/setup.cfg
@@ -1,2 +0,0 @@
-[flake8]
-ignore = E501
--- a/setup.py
+++ b/setup.py
@@ -1,99 +0,0 @@
-#/usr/bin/env python
-import codecs
-import os
-from setuptools import setup, find_packages
-from setuptools.command.test import test as TestCommand, Command
-
-
-read = lambda filepath: codecs.open(filepath, 'r', 'utf-8').read()
-
-pkgmeta = {
-    '__title__': 'markdownify',
-    '__author__': 'Matthew Tretter',
-    '__version__': '0.7.2',
-}
-
-
-class PyTest(TestCommand):
-    def finalize_options(self):
-        TestCommand.finalize_options(self)
-        self.test_args = ['tests', '-s']
-        self.test_suite = True
-
-    def run_tests(self):
-        import pytest
-        errno = pytest.main(self.test_args)
-        raise SystemExit(errno)
-
-
-class LintCommand(Command):
-    """
-    A copy of flake8's Flake8Command
-
-    """
-    description = "Run flake8 on modules registered in setuptools"
-    user_options = []
-
-    def initialize_options(self):
-        pass
-
-    def finalize_options(self):
-        pass
-
-    def distribution_files(self):
-        if self.distribution.packages:
-            for package in self.distribution.packages:
-                yield package.replace(".", os.path.sep)
-
-        if self.distribution.py_modules:
-            for filename in self.distribution.py_modules:
-                yield "%s.py" % filename
-
-    def run(self):
-        from flake8.api.legacy import get_style_guide
-        flake8_style = get_style_guide(config_file='setup.cfg')
-        paths = self.distribution_files()
-        report = flake8_style.check_files(paths)
-        raise SystemExit(report.total_errors > 0)
-
-
-setup(
-    name='markdownify',
-    description='Convert HTML to markdown.',
-    long_description=read(os.path.join(os.path.dirname(__file__), 'README.rst')),
-    version=pkgmeta['__version__'],
-    author=pkgmeta['__author__'],
-    author_email='m@tthewwithanm.com',
-    url='http://github.com/matthewwithanm/python-markdownify',
-    download_url='http://github.com/matthewwithanm/python-markdownify/tarball/master',
-    packages=find_packages(),
-    zip_safe=False,
-    include_package_data=True,
-    setup_requires=[
-        'flake8>=3.8,<4',
-    ],
-    tests_require=[
-        'pytest>=6.2,<7',
-    ],
-    install_requires=[
-        'beautifulsoup4>=4.9,<5', 'six>=1.15,<2'
-    ],
-    classifiers=[
-        'Environment :: Web Environment',
-        'Framework :: Django',
-        'Intended Audience :: Developers',
-        'License :: OSI Approved :: MIT License',
-        'Operating System :: OS Independent',
-        'Programming Language :: Python :: 2.5',
-        'Programming Language :: Python :: 2.6',
-        'Programming Language :: Python :: 2.7',
-        'Programming Language :: Python :: 3.6',
-        'Programming Language :: Python :: 3.7',
-        'Programming Language :: Python :: 3.8',
-        'Topic :: Utilities'
-    ],
-    cmdclass={
-        'test': PyTest,
-        'lint': LintCommand,
-    },
-)
--- a/shell.nix
+++ b/shell.nix
@@ -0,0 +1,10 @@
+{ pkgs ? import <nixpkgs> {} }:
+pkgs.mkShell {
+  name = "python-shell";
+  buildInputs = with pkgs; [
+    python38
+    python38Packages.tox
+    python38Packages.setuptools
+    python38Packages.virtualenv
+  ];
+}
--- a/tests/test_advanced.py
+++ b/tests/test_advanced.py
@@ -1,6 +1,17 @@
 from markdownify import markdownify as md


+def test_chomp():
+    assert md(' <b></b> ') == '  '
+    assert md(' <b> </b> ') == '  '
+    assert md(' <b>  </b> ') == '  '
+    assert md(' <b>   </b> ') == '  '
+    assert md(' <b>s </b> ') == ' **s**  '
+    assert md(' <b> s</b> ') == '  **s** '
+    assert md(' <b> s </b> ') == '  **s**  '
+    assert md(' <b>  s  </b> ') == '  **s**  '
+
+
 def test_nested():
    text = md('<p>This is an <a href="http://example.com/">example link</a>.</p>')
    assert text == 'This is an [example link](http://example.com/).\n\n'
@@ -14,3 +25,15 @@ def test_ignore_comments():
 def test_ignore_comments_with_other_tags():
    text = md("<!-- This is a comment --><a href='http://example.com/'>example link</a>")
    assert text == "[example link](http://example.com/)"
+
+
+def test_code_with_tricky_content():
+    assert md('<code>></code>') == "`>`"
+    assert md('<code>/home/</code><b>username</b>') == "`/home/`**username**"
+    assert md('First line <code>blah blah<br />blah blah</code> second line') \
+        == "First line `blah blah  \nblah blah` second line"
+
+
+def test_special_tags():
+    assert md('<!DOCTYPE html>') == ''
+    assert md('<![CDATA[foobar]]>') == 'foobar'
--- a/tests/test_conversions.py
+++ b/tests/test_conversions.py
@@ -1,130 +1,17 @@
 from markdownify import markdownify as md, ATX, ATX_CLOSED, BACKSLASH, UNDERSCORE
-import re


-nested_uls = """
-    <ul>
-        <li>1
-            <ul>
-                <li>a
-                    <ul>
-                        <li>I</li>
-                        <li>II</li>
-                        <li>III</li>
-                    </ul>
-                </li>
-                <li>b</li>
-                <li>c</li>
-            </ul>
-        </li>
-        <li>2</li>
-        <li>3</li>
-    </ul>"""
-
-nested_ols = """
-    <ol>
-        <li>1
-            <ol>
-                <li>a
-                    <ol>
-                        <li>I</li>
-                        <li>II</li>
-                        <li>III</li>
-                    </ol>
-                </li>
-                <li>b</li>
-                <li>c</li>
-            </ol>
-        </li>
-        <li>2</li>
-        <li>3</li>
-    </ul>"""
-
-
-table = re.sub(r'\s+', '', """
-<table>
-    <tr>
-        <th>Firstname</th>
-        <th>Lastname</th>
-        <th>Age</th>
-    </tr>
-    <tr>
-        <td>Jill</td>
-        <td>Smith</td>
-        <td>50</td>
-    </tr>
-    <tr>
-        <td>Eve</td>
-        <td>Jackson</td>
-        <td>94</td>
-    </tr>
-</table>
-""")
-
-
-table_head_body = re.sub(r'\s+', '', """
-<table>
-    <thead>
-            <tr>
-            <th>Firstname</th>
-            <th>Lastname</th>
-            <th>Age</th>
-            </tr>
-    </thead>
-    <tbody>
-        <tr>
-            <td>Jill</td>
-            <td>Smith</td>
-            <td>50</td>
-        </tr>
-        <tr>
-            <td>Eve</td>
-            <td>Jackson</td>
-            <td>94</td>
-        </tr>
-    </tbody>
-</table>
-""")
-
-table_missing_text = re.sub(r'\s+', '', """
-<table>
-    <thead>
-            <tr>
-            <th></th>
-            <th>Lastname</th>
-            <th>Age</th>
-            </tr>
-    </thead>
-    <tbody>
-        <tr>
-            <td>Jill</td>
-            <td></td>
-            <td>50</td>
-        </tr>
-        <tr>
-            <td>Eve</td>
-            <td>Jackson</td>
-            <td>94</td>
-        </tr>
-    </tbody>
-</table>
-""")
-
-
-def test_chomp():
-    assert md(' <b></b> ') == '  '
-    assert md(' <b> </b> ') == '  '
-    assert md(' <b>  </b> ') == '  '
-    assert md(' <b>   </b> ') == '  '
-    assert md(' <b>s </b> ') == ' **s**  '
-    assert md(' <b> s</b> ') == '  **s** '
-    assert md(' <b> s </b> ') == '  **s**  '
-    assert md(' <b>  s  </b> ') == '  **s**  '
+def inline_tests(tag, markup):
+    # test template for different inline tags
+    assert md(f'<{tag}>Hello</{tag}>') == f'{markup}Hello{markup}'
+    assert md(f'foo <{tag}>Hello</{tag}> bar') == f'foo {markup}Hello{markup} bar'
+    assert md(f'foo<{tag}> Hello</{tag}> bar') == f'foo {markup}Hello{markup} bar'
+    assert md(f'foo <{tag}>Hello </{tag}>bar') == f'foo {markup}Hello{markup} bar'
+    assert md(f'foo <{tag}></{tag}> bar') in ['foo  bar', 'foo bar']  # Either is OK


 def test_a():
    assert md('<a href="https://google.com">Google</a>') == '[Google](https://google.com)'
-    assert md('<a href="https://google.com">https://google.com</a>', autolinks=False) == '[https://google.com](https://google.com)'
    assert md('<a href="https://google.com">https://google.com</a>') == '<https://google.com>'
    assert md('<a href="https://community.kde.org/Get_Involved">https://community.kde.org/Get_Involved</a>') == '<https://community.kde.org/Get_Involved>'
    assert md('<a href="https://community.kde.org/Get_Involved">https://community.kde.org/Get_Involved</a>', autolinks=False) == '[https://community.kde.org/Get\\_Involved](https://community.kde.org/Get_Involved)'
@@ -140,6 +27,7 @@ def test_a_spaces():
 def test_a_with_title():
    text = md('<a href="http://google.com" title="The &quot;Goog&quot;">Google</a>')
    assert text == r'[Google](http://google.com "The \"Goog\"")'
+    assert md('<a href="https://google.com">https://google.com</a>', default_title=True) == '[https://google.com](https://google.com "https://google.com")'


 def test_a_shortcut():
@@ -148,8 +36,7 @@ def test_a_shortcut():


 def test_a_no_autolinks():
-    text = md('<a href="http://google.com">http://google.com</a>', autolinks=False)
-    assert text == '[http://google.com](http://google.com)'
+    assert md('<a href="https://google.com">https://google.com</a>', autolinks=False) == '[https://google.com](https://google.com)'


 def test_b():
@@ -165,30 +52,71 @@ def test_b_spaces():

 def test_blockquote():
    assert md('<blockquote>Hello</blockquote>') == '\n> Hello\n\n'
+    assert md('<blockquote>\nHello\n</blockquote>') == '\n> Hello\n\n'
+
+
+def test_blockquote_with_nested_paragraph():
+    assert md('<blockquote><p>Hello</p></blockquote>') == '\n> Hello\n\n'
+    assert md('<blockquote><p>Hello</p><p>Hello again</p></blockquote>') == '\n> Hello\n> \n> Hello again\n\n'


 def test_blockquote_with_paragraph():
    assert md('<blockquote>Hello</blockquote><p>handsome</p>') == '\n> Hello\n\nhandsome\n\n'


-def test_nested_blockquote():
+def test_blockquote_nested():
    text = md('<blockquote>And she was like <blockquote>Hello</blockquote></blockquote>')
-    assert text == '\n> And she was like \n> > Hello\n> \n> \n\n'
+    assert text == '\n> And she was like \n> > Hello\n\n'


 def test_br():
    assert md('a<br />b<br />c') == 'a  \nb  \nc'
+    assert md('a<br />b<br />c', newline_style=BACKSLASH) == 'a\\\nb\\\nc'
+
+
+def test_caption():
+    assert md('TEXT<figure><figcaption>Caption</figcaption><span>SPAN</span></figure>') == 'TEXT\n\nCaption\n\nSPAN'
+    assert md('<figure><span>SPAN</span><figcaption>Caption</figcaption></figure>TEXT') == 'SPAN\n\nCaption\n\nTEXT'
+
+
+def test_code():
+    inline_tests('code', '`')
+    assert md('<code>*this_should_not_escape*</code>') == '`*this_should_not_escape*`'
+    assert md('<kbd>*this_should_not_escape*</kbd>') == '`*this_should_not_escape*`'
+    assert md('<samp>*this_should_not_escape*</samp>') == '`*this_should_not_escape*`'
+    assert md('<code><span>*this_should_not_escape*</span></code>') == '`*this_should_not_escape*`'
+    assert md('<code>this  should\t\tnormalize</code>') == '`this should normalize`'
+    assert md('<code><span>this  should\t\tnormalize</span></code>') == '`this should normalize`'
+    assert md('<code>foo<b>bar</b>baz</code>') == '`foobarbaz`'
+    assert md('<kbd>foo<i>bar</i>baz</kbd>') == '`foobarbaz`'
+    assert md('<samp>foo<del> bar </del>baz</samp>') == '`foo bar baz`'
+    assert md('<samp>foo <del>bar</del> baz</samp>') == '`foo bar baz`'
+    assert md('<code>foo<em> bar </em>baz</code>') == '`foo bar baz`'
+    assert md('<code>foo<code> bar </code>baz</code>') == '`foo bar baz`'
+    assert md('<code>foo<strong> bar </strong>baz</code>') == '`foo bar baz`'
+    assert md('<code>foo<s> bar </s>baz</code>') == '`foo bar baz`'
+    assert md('<code>foo<sup>bar</sup>baz</code>', sup_symbol='^') == '`foobarbaz`'
+    assert md('<code>foo<sub>bar</sub>baz</code>', sub_symbol='^') == '`foobarbaz`'
+
+
+def test_del():
+    inline_tests('del', '~~')
+
+
+def test_div():
+    assert md('Hello</div> World') == 'Hello World'


 def test_em():
-    assert md('<em>Hello</em>') == '*Hello*'
+    inline_tests('em', '*')


-def test_em_spaces():
-    assert md('foo <em>Hello</em> bar') == 'foo *Hello* bar'
-    assert md('foo<em> Hello</em> bar') == 'foo *Hello* bar'
-    assert md('foo <em>Hello </em>bar') == 'foo *Hello* bar'
-    assert md('foo <em></em> bar') == 'foo  bar'
+def test_header_with_space():
+    assert md('<h3>\n\nHello</h3>') == '### Hello\n\n'
+    assert md('<h4>\n\nHello</h4>') == '#### Hello\n\n'
+    assert md('<h5>\n\nHello</h5>') == '##### Hello\n\n'
+    assert md('<h5>\n\nHello\n\n</h5>') == '##### Hello\n\n'
+    assert md('<h5>\n\nHello   \n\n</h5>') == '##### Hello\n\n'


 def test_h1():
@@ -201,6 +129,8 @@ def test_h2():

 def test_hn():
    assert md('<h3>Hello</h3>') == '### Hello\n\n'
+    assert md('<h4>Hello</h4>') == '#### Hello\n\n'
+    assert md('<h5>Hello</h5>') == '##### Hello\n\n'
    assert md('<h6>Hello</h6>') == '###### Hello\n\n'


@@ -236,15 +166,28 @@ def test_hn_nested_simple_tag():


 def test_hn_nested_img():
-    assert md('<img src="/path/to/img.jpg" alt="Alt text" title="Optional title" />') == '![Alt text](/path/to/img.jpg "Optional title")'
-    assert md('<img src="/path/to/img.jpg" alt="Alt text" />') == '![Alt text](/path/to/img.jpg)'
    image_attributes_to_markdown = [
-        ("", ""),
-        ("alt='Alt Text'", "Alt Text"),
-        ("alt='Alt Text' title='Optional title'", "Alt Text"),
+        ("", "", ""),
+        ("alt='Alt Text'", "Alt Text", ""),
+        ("alt='Alt Text' title='Optional title'", "Alt Text", " \"Optional title\""),
    ]
-    for image_attributes, markdown in image_attributes_to_markdown:
-        assert md('<h3>A <img src="/path/to/img.jpg " ' + image_attributes + '/> B</h3>') == '### A ' + markdown + ' B\n\n'
+    for image_attributes, markdown, title in image_attributes_to_markdown:
+        assert md('<h3>A <img src="/path/to/img.jpg" ' + image_attributes + '/> B</h3>') == '### A ' + markdown + ' B\n\n'
+        assert md('<h3>A <img src="/path/to/img.jpg" ' + image_attributes + '/> B</h3>', keep_inline_images_in=['h3']) == '### A ![' + markdown + '](/path/to/img.jpg' + title + ') B\n\n'
+
+
+def test_hn_atx_headings():
+    assert md('<h1>Hello</h1>', heading_style=ATX) == '# Hello\n\n'
+    assert md('<h2>Hello</h2>', heading_style=ATX) == '## Hello\n\n'
+
+
+def test_hn_atx_closed_headings():
+    assert md('<h1>Hello</h1>', heading_style=ATX_CLOSED) == '# Hello #\n\n'
+    assert md('<h2>Hello</h2>', heading_style=ATX_CLOSED) == '## Hello ##\n\n'
+
+
+def test_head():
+    assert md('<head>head</head>') == 'head'


 def test_hr():
@@ -253,74 +196,66 @@ def test_hr():
    assert md('<p>Hello</p>\n<hr>\n<p>World</p>') == 'Hello\n\n\n\n\n---\n\n\nWorld\n\n'


-def test_head():
-    assert md('<head>head</head>') == 'head'
-
-
-def test_atx_headings():
-    assert md('<h1>Hello</h1>', heading_style=ATX) == '# Hello\n\n'
-    assert md('<h2>Hello</h2>', heading_style=ATX) == '## Hello\n\n'
-
-
-def test_atx_closed_headings():
-    assert md('<h1>Hello</h1>', heading_style=ATX_CLOSED) == '# Hello #\n\n'
-    assert md('<h2>Hello</h2>', heading_style=ATX_CLOSED) == '## Hello ##\n\n'
-
-
 def test_i():
    assert md('<i>Hello</i>') == '*Hello*'


-def test_ol():
-    assert md('<ol><li>a</li><li>b</li></ol>') == '1. a\n2. b\n'
-    assert md('<ol start="3"><li>a</li><li>b</li></ol>') == '3. a\n4. b\n'
-
-
-def test_p():
-    assert md('<p>hello</p>') == 'hello\n\n'
-
-
-def test_strong():
-    assert md('<strong>Hello</strong>') == '**Hello**'
-
-
-def test_ul():
-    assert md('<ul><li>a</li><li>b</li></ul>') == '* a\n* b\n'
-
-
-def test_nested_ols():
-    assert md(nested_ols) == '\n1. 1\n\t1. a\n\t\t1. I\n\t\t2. II\n\t\t3. III\n\t2. b\n\t3. c\n2. 2\n3. 3\n'
-
-
-def test_inline_ul():
-    assert md('<p>foo</p><ul><li>a</li><li>b</li></ul><p>bar</p>') == 'foo\n\n* a\n* b\n\nbar\n\n'
-
-
-def test_nested_uls():
-    """
-    Nested ULs should alternate bullet characters.
-
-    """
-    assert md(nested_uls) == '\n* 1\n\t+ a\n\t\t- I\n\t\t- II\n\t\t- III\n\t+ b\n\t+ c\n* 2\n* 3\n'
-
-
-def test_bullets():
-    assert md(nested_uls, bullets='-') == '\n- 1\n\t- a\n\t\t- I\n\t\t- II\n\t\t- III\n\t- b\n\t- c\n- 2\n- 3\n'
-
-
 def test_img():
    assert md('<img src="/path/to/img.jpg" alt="Alt text" title="Optional title" />') == '![Alt text](/path/to/img.jpg "Optional title")'
    assert md('<img src="/path/to/img.jpg" alt="Alt text" />') == '![Alt text](/path/to/img.jpg)'


-def test_div():
-    assert md('Hello</div> World') == 'Hello World'
+def test_kbd():
+    inline_tests('kbd', '`')


-def test_table():
-    assert md(table) == '| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |'
-    assert md(table_head_body) == '| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |'
-    assert md(table_missing_text) == '|  | Lastname | Age |\n| --- | --- | --- |\n| Jill |  | 50 |\n| Eve | Jackson | 94 |'
+def test_p():
+    assert md('<p>hello</p>') == 'hello\n\n'
+    assert md('<p>123456789 123456789</p>') == '123456789 123456789\n\n'
+    assert md('<p>123456789 123456789</p>', wrap=True, wrap_width=10) == '123456789\n123456789\n\n'
+    assert md('<p><a href="https://example.com">Some long link</a></p>', wrap=True, wrap_width=10) == '[Some long\nlink](https://example.com)\n\n'
+    assert md('<p>12345<br />67890</p>', wrap=True, wrap_width=10, newline_style=BACKSLASH) == '12345\\\n67890\n\n'
+    assert md('<p>12345678901<br />12345</p>', wrap=True, wrap_width=10, newline_style=BACKSLASH) == '12345678901\\\n12345\n\n'
+
+
+def test_pre():
+    assert md('<pre>test\n    foo\nbar</pre>') == '\n```\ntest\n    foo\nbar\n```\n'
+    assert md('<pre><code>test\n    foo\nbar</code></pre>') == '\n```\ntest\n    foo\nbar\n```\n'
+    assert md('<pre>*this_should_not_escape*</pre>') == '\n```\n*this_should_not_escape*\n```\n'
+    assert md('<pre><span>*this_should_not_escape*</span></pre>') == '\n```\n*this_should_not_escape*\n```\n'
+    assert md('<pre>\t\tthis  should\t\tnot  normalize</pre>') == '\n```\n\t\tthis  should\t\tnot  normalize\n```\n'
+    assert md('<pre><span>\t\tthis  should\t\tnot  normalize</span></pre>') == '\n```\n\t\tthis  should\t\tnot  normalize\n```\n'
+    assert md('<pre>foo<b>\nbar\n</b>baz</pre>') == '\n```\nfoo\nbar\nbaz\n```\n'
+    assert md('<pre>foo<i>\nbar\n</i>baz</pre>') == '\n```\nfoo\nbar\nbaz\n```\n'
+    assert md('<pre>foo\n<i>bar</i>\nbaz</pre>') == '\n```\nfoo\nbar\nbaz\n```\n'
+    assert md('<pre>foo<i>\n</i>baz</pre>') == '\n```\nfoo\nbaz\n```\n'
+    assert md('<pre>foo<del>\nbar\n</del>baz</pre>') == '\n```\nfoo\nbar\nbaz\n```\n'
+    assert md('<pre>foo<em>\nbar\n</em>baz</pre>') == '\n```\nfoo\nbar\nbaz\n```\n'
+    assert md('<pre>foo<code>\nbar\n</code>baz</pre>') == '\n```\nfoo\nbar\nbaz\n```\n'
+    assert md('<pre>foo<strong>\nbar\n</strong>baz</pre>') == '\n```\nfoo\nbar\nbaz\n```\n'
+    assert md('<pre>foo<s>\nbar\n</s>baz</pre>') == '\n```\nfoo\nbar\nbaz\n```\n'
+    assert md('<pre>foo<sup>\nbar\n</sup>baz</pre>', sup_symbol='^') == '\n```\nfoo\nbar\nbaz\n```\n'
+    assert md('<pre>foo<sub>\nbar\n</sub>baz</pre>', sub_symbol='^') == '\n```\nfoo\nbar\nbaz\n```\n'
+
+
+def test_script():
+    assert md('foo <script>var foo=42;</script> bar') == 'foo  bar'
+
+
+def test_style():
+    assert md('foo <style>h1 { font-size: larger }</style> bar') == 'foo  bar'
+
+
+def test_s():
+    inline_tests('s', '~~')
+
+
+def test_samp():
+    inline_tests('samp', '`')
+
+
+def test_strong():
+    assert md('<strong>Hello</strong>') == '**Hello**'


 def test_strong_em_symbol():
@@ -330,5 +265,27 @@ def test_strong_em_symbol():
    assert md('<i>Hello</i>', strong_em_symbol=UNDERSCORE) == '_Hello_'


-def test_newline_style():
-    assert md('a<br />b<br />c', newline_style=BACKSLASH) == 'a\\\nb\\\nc'
+def test_sub():
+    assert md('<sub>foo</sub>') == 'foo'
+    assert md('<sub>foo</sub>', sub_symbol='~') == '~foo~'
+    assert md('<sub>foo</sub>', sub_symbol='<sub>') == '<sub>foo</sub>'
+
+
+def test_sup():
+    assert md('<sup>foo</sup>') == 'foo'
+    assert md('<sup>foo</sup>', sup_symbol='^') == '^foo^'
+    assert md('<sup>foo</sup>', sup_symbol='<sup>') == '<sup>foo</sup>'
+
+
+def test_lang():
+    assert md('<pre>test\n    foo\nbar</pre>', code_language='python') == '\n```python\ntest\n    foo\nbar\n```\n'
+    assert md('<pre><code>test\n    foo\nbar</code></pre>', code_language='javascript') == '\n```javascript\ntest\n    foo\nbar\n```\n'
+
+
+def test_lang_callback():
+    def callback(el):
+        return el['class'][0] if el.has_attr('class') else None
+
+    assert md('<pre class="python">test\n    foo\nbar</pre>', code_language_callback=callback) == '\n```python\ntest\n    foo\nbar\n```\n'
+    assert md('<pre class="javascript"><code>test\n    foo\nbar</code></pre>', code_language_callback=callback) == '\n```javascript\ntest\n    foo\nbar\n```\n'
+    assert md('<pre class="javascript"><code class="javascript">test\n    foo\nbar</code></pre>', code_language_callback=callback) == '\n```javascript\ntest\n    foo\nbar\n```\n'
--- a/tests/test_custom_converter.py
+++ b/tests/test_custom_converter.py
@@ -0,0 +1,25 @@
+from markdownify import MarkdownConverter
+from bs4 import BeautifulSoup
+
+
+class ImageBlockConverter(MarkdownConverter):
+    """
+    Create a custom MarkdownConverter that adds two newlines after an image
+    """
+    def convert_img(self, el, text, convert_as_inline):
+        return super().convert_img(el, text, convert_as_inline) + '\n\n'
+
+
+def test_img():
+    # Create shorthand method for conversion
+    def md(html, **options):
+        return ImageBlockConverter(**options).convert(html)
+
+    assert md('<img src="/path/to/img.jpg" alt="Alt text" title="Optional title" />') == '![Alt text](/path/to/img.jpg "Optional title")\n\n'
+    assert md('<img src="/path/to/img.jpg" alt="Alt text" />') == '![Alt text](/path/to/img.jpg)\n\n'
+
+
+def test_soup():
+    html = '<b>test</b>'
+    soup = BeautifulSoup(html, 'html.parser')
+    assert MarkdownConverter().convert_soup(soup) == '**test**'
--- a/tests/test_escaping.py
+++ b/tests/test_escaping.py
@@ -1,12 +1,18 @@
 from markdownify import markdownify as md


+def test_asterisks():
+    assert md('*hey*dude*') == r'\*hey\*dude\*'
+    assert md('*hey*dude*', escape_asterisks=False) == r'*hey*dude*'
+
+
 def test_underscore():
    assert md('_hey_dude_') == r'\_hey\_dude\_'
+    assert md('_hey_dude_', escape_underscores=False) == r'_hey_dude_'


 def test_xml_entities():
-    assert md('&amp;') == '&'
+    assert md('&amp;') == r'\&'


 def test_named_entities():
@@ -19,4 +25,23 @@ def test_hexadecimal_entities():


 def test_single_escaping_entities():
-    assert md('&amp;amp;') == '&amp;'
+    assert md('&amp;amp;') == r'\&amp;'
+
+
+def text_misc():
+    assert md('\\*') == r'\\\*'
+    assert md('<foo>') == r'\<foo\>'
+    assert md('# foo') == r'\# foo'
+    assert md('> foo') == r'\> foo'
+    assert md('~~foo~~') == r'\~\~foo\~\~'
+    assert md('foo\n===\n') == 'foo\n\\=\\=\\=\n'
+    assert md('---\n') == '\\-\\-\\-\n'
+    assert md('+ x\n+ y\n') == '\\+ x\n\\+ y\n'
+    assert md('`x`') == r'\`x\`'
+    assert md('[text](link)') == r'\[text](link)'
+    assert md('1. x') == r'1\. x'
+    assert md('not a number. x') == r'not a number. x'
+    assert md('1) x') == r'1\) x'
+    assert md('not a number) x') == r'not a number) x'
+    assert md('|not table|') == r'\|not table\|'
+    assert md(r'\ <foo> &amp;amp; | ` `', escape_misc=False) == r'\ <foo> &amp; | ` `'
--- a/tests/test_lists.py
+++ b/tests/test_lists.py
@@ -0,0 +1,84 @@
+from markdownify import markdownify as md
+
+
+nested_uls = """
+    <ul>
+        <li>1
+            <ul>
+                <li>a
+                    <ul>
+                        <li>I</li>
+                        <li>II</li>
+                        <li>III</li>
+                    </ul>
+                </li>
+                <li>b</li>
+                <li>c</li>
+            </ul>
+        </li>
+        <li>2</li>
+        <li>3</li>
+    </ul>"""
+
+nested_ols = """
+    <ol>
+        <li>1
+            <ol>
+                <li>a
+                    <ol>
+                        <li>I</li>
+                        <li>II</li>
+                        <li>III</li>
+                    </ol>
+                </li>
+                <li>b</li>
+                <li>c</li>
+            </ol>
+        </li>
+        <li>2</li>
+        <li>3</li>
+    </ul>"""
+
+
+def test_ol():
+    assert md('<ol><li>a</li><li>b</li></ol>') == '1. a\n2. b\n'
+    assert md('<ol start="3"><li>a</li><li>b</li></ol>') == '3. a\n4. b\n'
+    assert md('<ol start="-1"><li>a</li><li>b</li></ol>') == '1. a\n2. b\n'
+    assert md('<ol start="foo"><li>a</li><li>b</li></ol>') == '1. a\n2. b\n'
+    assert md('<ol start="1.5"><li>a</li><li>b</li></ol>') == '1. a\n2. b\n'
+
+
+def test_nested_ols():
+    assert md(nested_ols) == '\n1. 1\n\t1. a\n\t\t1. I\n\t\t2. II\n\t\t3. III\n\t2. b\n\t3. c\n2. 2\n3. 3\n'
+
+
+def test_ul():
+    assert md('<ul><li>a</li><li>b</li></ul>') == '* a\n* b\n'
+    assert md("""<ul>
+     <li>
+             a
+     </li>
+     <li> b </li>
+     <li>   c
+     </li>
+ </ul>""") == '* a\n* b\n* c\n'
+
+
+def test_inline_ul():
+    assert md('<p>foo</p><ul><li>a</li><li>b</li></ul><p>bar</p>') == 'foo\n\n* a\n* b\n\nbar\n\n'
+
+
+def test_nested_uls():
+    """
+    Nested ULs should alternate bullet characters.
+
+    """
+    assert md(nested_uls) == '\n* 1\n\t+ a\n\t\t- I\n\t\t- II\n\t\t- III\n\t+ b\n\t+ c\n* 2\n* 3\n'
+
+
+def test_bullets():
+    assert md(nested_uls, bullets='-') == '\n- 1\n\t- a\n\t\t- I\n\t\t- II\n\t\t- III\n\t- b\n\t- c\n- 2\n- 3\n'
+
+
+def test_li_text():
+    assert md('<ul><li>foo <a href="#">bar</a></li><li>foo bar  </li><li>foo <b>bar</b>   <i>space</i>.</ul>') == '* foo [bar](#)\n* foo bar\n* foo **bar** *space*.\n'
--- a/tests/test_tables.py
+++ b/tests/test_tables.py
@@ -0,0 +1,254 @@
+from markdownify import markdownify as md
+
+
+table = """<table>
+    <tr>
+        <th>Firstname</th>
+        <th>Lastname</th>
+        <th>Age</th>
+    </tr>
+    <tr>
+        <td>Jill</td>
+        <td>Smith</td>
+        <td>50</td>
+    </tr>
+    <tr>
+        <td>Eve</td>
+        <td>Jackson</td>
+        <td>94</td>
+    </tr>
+</table>"""
+
+
+table_with_html_content = """<table>
+    <tr>
+        <th>Firstname</th>
+        <th>Lastname</th>
+        <th>Age</th>
+    </tr>
+    <tr>
+        <td><b>Jill</b></td>
+        <td><i>Smith</i></td>
+        <td><a href="#">50</a></td>
+    </tr>
+    <tr>
+        <td>Eve</td>
+        <td>Jackson</td>
+        <td>94</td>
+    </tr>
+</table>"""
+
+
+table_with_paragraphs = """<table>
+    <tr>
+        <th>Firstname</th>
+        <th><p>Lastname</p></th>
+        <th>Age</th>
+    </tr>
+    <tr>
+        <td><p>Jill</p></td>
+        <td><p>Smith</p></td>
+        <td><p>50</p></td>
+    </tr>
+    <tr>
+        <td>Eve</td>
+        <td>Jackson</td>
+        <td>94</td>
+    </tr>
+</table>"""
+
+table_with_linebreaks = """<table>
+    <tr>
+        <th>Firstname</th>
+        <th>Lastname</th>
+        <th>Age</th>
+    </tr>
+    <tr>
+        <td>Jill</td>
+        <td>Smith
+        Jackson</td>
+        <td>50</td>
+    </tr>
+    <tr>
+        <td>Eve</td>
+        <td>Jackson
+        Smith</td>
+        <td>94</td>
+    </tr>
+</table>"""
+
+
+table_with_header_column = """<table>
+    <tr>
+        <th>Firstname</th>
+        <th>Lastname</th>
+        <th>Age</th>
+    </tr>
+    <tr>
+        <th>Jill</th>
+        <td>Smith</td>
+        <td>50</td>
+    </tr>
+    <tr>
+        <th>Eve</th>
+        <td>Jackson</td>
+        <td>94</td>
+    </tr>
+</table>"""
+
+
+table_head_body = """<table>
+    <thead>
+        <tr>
+            <th>Firstname</th>
+            <th>Lastname</th>
+            <th>Age</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td>Jill</td>
+            <td>Smith</td>
+            <td>50</td>
+        </tr>
+        <tr>
+            <td>Eve</td>
+            <td>Jackson</td>
+            <td>94</td>
+        </tr>
+    </tbody>
+</table>"""
+
+table_head_body_missing_head = """<table>
+    <thead>
+        <tr>
+            <td>Firstname</td>
+            <td>Lastname</td>
+            <td>Age</td>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td>Jill</td>
+            <td>Smith</td>
+            <td>50</td>
+        </tr>
+        <tr>
+            <td>Eve</td>
+            <td>Jackson</td>
+            <td>94</td>
+        </tr>
+    </tbody>
+</table>"""
+
+table_missing_text = """<table>
+    <thead>
+        <tr>
+            <th></th>
+            <th>Lastname</th>
+            <th>Age</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td>Jill</td>
+            <td></td>
+            <td>50</td>
+        </tr>
+        <tr>
+            <td>Eve</td>
+            <td>Jackson</td>
+            <td>94</td>
+        </tr>
+    </tbody>
+</table>"""
+
+table_missing_head = """<table>
+    <tr>
+        <td>Firstname</td>
+        <td>Lastname</td>
+        <td>Age</td>
+    </tr>
+    <tr>
+        <td>Jill</td>
+        <td>Smith</td>
+        <td>50</td>
+    </tr>
+    <tr>
+        <td>Eve</td>
+        <td>Jackson</td>
+        <td>94</td>
+    </tr>
+</table>"""
+
+table_body = """<table>
+    <tbody>
+        <tr>
+            <td>Firstname</td>
+            <td>Lastname</td>
+            <td>Age</td>
+        </tr>
+        <tr>
+            <td>Jill</td>
+            <td>Smith</td>
+            <td>50</td>
+        </tr>
+        <tr>
+            <td>Eve</td>
+            <td>Jackson</td>
+            <td>94</td>
+        </tr>
+    </tbody>
+</table>"""
+
+table_with_caption = """TEXT<table><caption>Caption</caption>
+    <tbody><tr><td>Firstname</td>
+            <td>Lastname</td>
+            <td>Age</td>
+        </tr>
+    </tbody>
+</table>"""
+
+table_with_colspan = """<table>
+    <tr>
+        <th colspan="2">Name</th>
+        <th>Age</th>
+    </tr>
+    <tr>
+        <td colspan="1">Jill</td>
+        <td>Smith</td>
+        <td>50</td>
+    </tr>
+    <tr>
+        <td>Eve</td>
+        <td>Jackson</td>
+        <td>94</td>
+    </tr>
+</table>"""
+
+table_with_undefined_colspan = """<table>
+    <tr>
+        <th colspan="undefined">Name</th>
+        <th>Age</th>
+    </tr>
+    <tr>
+        <td colspan="-1">Jill</td>
+        <td>Smith</td>
+    </tr>
+</table>"""
+
+
+def test_table():
+    assert md(table) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
+    assert md(table_with_html_content) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| **Jill** | *Smith* | [50](#) |\n| Eve | Jackson | 94 |\n\n'
+    assert md(table_with_paragraphs) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
+    assert md(table_with_linebreaks) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith  Jackson | 50 |\n| Eve | Jackson  Smith | 94 |\n\n'
+    assert md(table_with_header_column) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
+    assert md(table_head_body) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
+    assert md(table_head_body_missing_head) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
+    assert md(table_missing_text) == '\n\n|  | Lastname | Age |\n| --- | --- | --- |\n| Jill |  | 50 |\n| Eve | Jackson | 94 |\n\n'
+    assert md(table_missing_head) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
+    assert md(table_body) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
+    assert md(table_with_caption) == 'TEXT\n\nCaption\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n\n'
+    assert md(table_with_colspan) == '\n\n| Name | | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
+    assert md(table_with_undefined_colspan) == '\n\n| Name | Age |\n| --- | --- |\n| Jill | Smith |\n\n'
--- a/tox.ini
+++ b/tox.ini
@@ -0,0 +1,15 @@
+[tox]
+envlist = py38
+
+[testenv]
+passenv = PYTHONPATH
+deps =
+	pytest==8
+	flake8
+	restructuredtext_lint
+	Pygments
+commands =
+	pytest
+	flake8 --ignore=E501,W503 markdownify tests
+	restructuredtext-lint README.rst
+
Author	SHA1	Message	Date
AlexVonB	4c23c0655f	use static version instead of dynamic git tag info	2024-07-14 22:34:30 +02:00
AlexVonB	e2ace9d633	test build in develop and pull requests	2024-07-14 22:10:01 +02:00
AlexVonB	a5615f7d80	Merge branch 'pyproject.toml' of https://github.com/KOLANICH-libs/markdownify.py into KOLANICH-libs-pyproject.toml	2024-07-14 21:53:09 +02:00
AlexVonB	f6c8daf8a5	bump to v0.13.0	2024-07-14 21:19:35 +02:00
AlexVonB	75a678dab9	fix pytest version to 8	2024-07-14 21:02:49 +02:00
AlexVonB	0a5c89aa49	added test for ol start check	2024-06-23 14:30:07 +02:00
microdnd	51390d7389	handle ol start value is not number (#127 ) Co-authored-by: Mico <mico_wu@trendmicro.com>	2024-06-23 14:28:53 +02:00
AlexVonB	50b4640db2	better naming for markup variables	2024-06-23 13:30:08 +02:00
Joseph Myers	7861b330cd	Special-case use of HTML tags for converting `<sub>` / `<sup>` (#119 ) Allow different strings before / after `<sub>` / `<sup>` content In particular, this allows setting `sub_symbol='<sub>'`, `sup_symbol='<sup>'`, to use raw HTML in the output when converting subscripts and superscripts.	2024-06-23 13:28:05 +02:00
AlexVonB	2ec33384de	handle un-parsable colspan values fixes #126	2024-06-23 13:17:20 +02:00
samypr100	c1672aee44	Update MANIFEST.in to exclude tests during packaging (#125 )	2024-06-23 12:59:14 +02:00
AlexVonB	43dbe20aaf	fixed github action badges see https://github.com/badges/shields/issues/8671	2024-04-04 21:50:02 +02:00
Joseph Myers	46af45bb3c	Escape all characters with Markdown significance (#118 ) * Escape all characters with Markdown significance There are many punctuation characters that sometimes have significance in Markdown; more systematically escape them all (based on a new escape_misc configuration option). A limited attempt is made to limit the escaping of '.' and ')' to the context where they might have Markdown significance (after a number, where they can indicate an ordered list item); no such attempt is made for the other characters (and even that limiting of '.' and ')' may not be entirely safe in all cases, as it's possible the HTML could have the number outside the block being escaped in one go, e.g. `<span>1</span>.`. --------- Co-authored-by: AlexVonB <AlexVonB@users.noreply.github.com>	2024-04-04 21:42:58 +02:00
Joseph Myers	2bd0772685	Avoid inline styles inside `<code>` / `<pre>` conversion (#117 ) * Avoid inline styles inside `<code>` / `<pre>` conversion The check used for this is analogous to that used to avoid escaping potential markup characters inside such tags. Fixes #103 --------- Co-authored-by: AlexVonB <AlexVonB@users.noreply.github.com>	2024-04-04 20:55:54 +02:00
AlexVonB	74ddc408cc	bump to v0.12.1	2024-03-26 21:56:00 +01:00
Eric Xu	3b4a014f25	Table merge cell horizontally (#110 ) * Fix #109 Table merge cell horizontally * Add test case for colspan --------- Co-authored-by: AlexVonB <AlexVonB@users.noreply.github.com>	2024-03-26 21:50:54 +01:00
AlexVonB	57d4f37923	fixed tests for table caption	2024-03-26 21:43:25 +01:00
Chris Papademetrious	d5fb0fbb85	make sure there are blank lines around table/figure captions (#114 ) Signed-off-by: chrispy <chrispy@synopsys.com> Co-authored-by: AlexVonB <AlexVonB@users.noreply.github.com>	2024-03-26 21:41:56 +01:00
huuya	e4df41225d	Support conversion of header rows in tables without th tag (#83 ) * Fixed support for header row conversion for tables without th tag	2024-03-26 21:32:36 +01:00
AlexVonB	804a3f8f07	added further readme for custom converters	2024-03-26 21:21:45 +01:00
Chris Papademetrious	7d0bf46057	revert workaround example in README.rst for <script> and <style> now that it is properly fixed (#115 ) Signed-off-by: chrispy <chrispy@synopsys.com>	2024-03-26 21:15:22 +01:00
André van Delft	2f9a42d3b8	Strip text before adding blockquote markers (#76 )	2024-03-26 21:07:28 +01:00
AlexVonB	96a25cfbf3	added tests for linebreaks in table cells	2024-03-26 21:05:31 +01:00
Carina de Oliveira Antunes	0477a0c8a0	convert_td: strip text (#91 )	2024-03-26 20:49:50 +01:00
Veronika Butkevich	f33ccd7c1a	Fix newline start in header tags (#89 ) * Fix newline start in header tags	2024-03-26 20:46:30 +01:00
G	a2f82678f7	Add no css example to readme (#111 ) * Add no css example --------- Co-authored-by: G <17325189+Chichilele@users.noreply.github.com>	2024-03-11 21:10:08 +01:00
Thomas L. Kjeldsen	60967c1c95	ignore script and style content (such as css and javascript) (#112 )	2024-03-11 21:07:24 +01:00
Chris Papademetrious	c7718b6d81	Merge pull request #104 from chrispy-snps/fix/97-101-102 improve text normalization/escaping for preformatted/code contexts	2024-01-15 15:46:51 -05:00
chrispy	2b22d239ad	avoid text normalization/escaping in any preformatted/code context Signed-off-by: chrispy <chrispy@synopsys.com>	2024-01-15 10:53:14 -05:00
KOLANICH	67100595ae	Migrated the metadata into `PEP 621`-compliant `pyproject.toml`, got rid of `setup.cfg`.	2022-11-10 15:29:25 +03:00
KOLANICH	deba8b5e54	Started populating version automatically from git tags using `setuptools_scm`.	2022-11-10 15:27:26 +03:00
KOLANICH	ca88e4e49d	Move the metadata from `setup.py` into `setup.cfg`. Added `pyproject.toml`. Removed `setup.py` - it is no longer needed. Got rid of tests erroroneously finding their way into the wheel.	2022-11-10 15:25:39 +03:00
AlexVonB	e6e23fd512	bump to v0.11.6	2022-09-02 10:10:27 +02:00
Alex	433fad2dec	added nix shell file	2022-09-02 08:50:45 +02:00
Alex	4fb451ffa6	fixed cli parameters closes #75	2022-09-02 08:44:41 +02:00
AlexVonB	e8d041c251	bump to v0.11.5	2022-08-31 21:45:24 +02:00
AlexVonB	f729c3ba43	first test, then lint	2022-08-31 21:44:53 +02:00
AlexVonB	eddfdae4ca	fix cli options: default heading, em symbols	2022-08-31 21:44:42 +02:00
AlexVonB	50b3b73a8f	bump to v0.11.4	2022-08-28 22:03:14 +02:00
AlexVonB	0310216877	fixed readme and added linter to detect this earlier	2022-08-28 22:02:49 +02:00
AlexVonB	9914474828	bump to v0.11.3	2022-08-28 21:42:46 +02:00
AlexVonB	6263f0e5f0	Switch to `tox` for tests (#73 )	2022-08-28 21:40:52 +02:00
Adam Bambuch	17d8586843	don't escape text in pre tag (Fenced Code Blocks) (#67 ) don't escape text in pre tag (Fenced Code Blocks)	2022-08-28 20:58:54 +02:00
AlexVonB	59eb069700	added readme for cli	2022-08-28 20:56:23 +02:00
Daniel J. Perry	e79971a7eb	Add console entry point (#72 ) * Add console entry point * Make entry point conform to linter settings.	2022-08-28 20:53:15 +02:00
AlexVonB	5adda130b8	bump to v0.11.2	2022-04-24 11:01:29 +02:00
AlexVonB	5f1b98e25d	added wrap option closes #66	2022-04-24 11:00:04 +02:00
AlexVonB	16acd2b763	typo in readme	2022-04-24 10:59:22 +02:00
AlexVonB	207d0f4ec6	bump to v0.11.1	2022-04-14 10:25:25 +02:00
Mikko Korpela	ebb9ea713d	Fix detection of "first row, not headline" (#63 ) Improved handling of "first row, not headline". Works for tables with 1) neither thead nor tbody 2) tbody but no thead	2022-04-14 10:24:32 +02:00
AlexVonB	87b9f6c88e	bump to v0.11.0	2022-04-13 20:47:30 +02:00
AlexVonB	bda367dad9	Merge branch 'tdgroot-code_language_callback' into develop closes #64	2022-04-13 20:44:18 +02:00
AlexVonB	61e8940486	added readme for callback	2022-04-13 20:42:38 +02:00
AlexVonB	35479d2d3b	Merge branch 'code_language_callback' of https://github.com/tdgroot/python-markdownify into tdgroot-code_language_callback	2022-04-13 20:25:37 +02:00
AlexVonB	b589863715	add escaping of asterisks and option to disable it closes #62	2022-04-13 20:04:12 +02:00
AlexVonB	423b7e948c	add option to allow inline images in selected tags fixes #61	2022-04-13 19:55:34 +02:00
Timon de Groot	0ea95de4d0	Add code language callback	2022-04-09 13:22:28 +02:00
AlexVonB	ed3eee78d2	fixed readme	2022-01-24 18:18:19 +01:00
AlexVonB	ddda696396	bump to v0.10.3	2022-01-23 11:01:26 +01:00
AlexVonB	0a1343a538	allow BeautifulSoup objects to be converted	2022-01-23 11:00:19 +01:00
AlexVonB	9d0b839b73	wording	2022-01-23 10:59:24 +01:00
AlexVonB	d3eff11617	bump to v0.10.2	2022-01-18 08:53:33 +01:00
AlexVonB	bd6b581122	add option to not escape underscores closes #59	2022-01-18 08:51:44 +01:00
AlexVonB	c8f7cf63e3	bump to v0.10.1	2021-12-11 14:44:34 +01:00
AlexVonB	12a68a7d14	allow flake8 v4.x closes #57	2021-12-11 14:43:14 +01:00
AlexVonB	478b1c7e13	bump to v0.10.0	2021-11-17 17:10:15 +01:00
AlexVonB	ffcf6cbcb2	fix readme for code_language	2021-11-17 17:09:47 +01:00
AlexVonB	0ab0452414	add readme for code_language	2021-11-17 17:08:14 +01:00
AlexVonB	b62b067cbd	Merge branch 'Inzaniak-develop' into develop	2021-11-17 17:05:07 +01:00
AlexVonB	cb2646cd93	differentiated between text and code language	2021-11-17 17:03:31 +01:00
AlexVonB	9692b5e714	satisfy linter	2021-11-17 16:55:00 +01:00
Umberto Grando	ac68c53a7d	added language for multiline code	2021-11-01 21:19:35 +01:00
AlexVonB	40dd30419c	bump to v0.9.4	2021-09-04 21:50:05 +02:00
AlexVonB	da56f7f56a	Merge pull request #53 from Hozhyi/fix/bullet_list_tags_in_separate_lines Fixed issue #52 - added stripping of text to list	2021-09-04 21:48:16 +02:00
AlexVonB	8400b39dd9	remove trailing whitespace to satisfy the linter	2021-09-04 21:47:27 +02:00
Viktor Hozhyi	5fc1441fe7	Added appropriate test	2021-09-04 20:51:08 +03:00
Viktor Hozhyi	044615eff1	Fixed issue #52 - added stripping of text to list	2021-09-04 12:39:30 +03:00
AlexVonB	dbd9f3f3d2	bump to v0.9.3	2021-08-25 08:53:17 +02:00
AlexVonB	0fdeb1ff6e	convert tags inside table cells as inline in part resolves #49	2021-08-25 08:48:30 +02:00
AlexVonB	6a2f3a4b42	fix rst syntax error	2021-07-11 13:21:02 +02:00
AlexVonB	22180a166d	bump to v0.9.1	2021-07-11 13:13:31 +02:00
AlexVonB	16d8a0e1f7	Revert "add figure/figcaption" This reverts commit `828e116530`.	2021-07-11 13:12:16 +02:00
AlexVonB	4aa6cf2a24	rewrote text processing to not escape _ in code fixes #47	2021-07-11 13:10:59 +02:00
AlexVonB	828e116530	add figure/figcaption for #46	2021-06-30 13:02:42 +02:00
AlexVonB	62e9f0de02	add examples for custom converters closes #46	2021-06-27 15:53:23 +02:00
AlexVonB	cec570fc49	bump to v0.9.0	2021-05-30 19:10:31 +02:00
AlexVonB	a6a31624ad	add options for sub and sup tags fixes #44	2021-05-30 19:07:43 +02:00
AlexVonB	6f3732307d	restructured test files	2021-05-30 19:06:52 +02:00
AlexVonB	8f6d7e500d	add option 'default_title' to links fixes #39	2021-05-30 18:40:40 +02:00
AlexVonB	e96351b666	bump to v0.8.1	2021-05-30 11:20:16 +02:00
AlexVonB	129c4ef060	ignore doctype tag, test cdata tag fixes #45	2021-05-30 11:18:18 +02:00
AlexVonB	9cb940cbc0	bump to v0.8.0	2021-05-21 14:17:51 +02:00
AlexVonB	70ef9b6e48	added pre tag closes #15	2021-05-21 14:15:41 +02:00
AlexVonB	91d53ddd5a	refactor simple inline conversions	2021-05-21 13:53:00 +02:00
AlexVonB	079f32f6cd	added del and s tags	2021-05-21 12:27:49 +02:00
AlexVonB	89b577e91e	ordering functions alphabetically	2021-05-21 12:21:21 +02:00
AlexVonB	4bf2ea44fc	Merge branch 'AndrewCRichards-andrewcrichards/add_code_samp_kbd_tags' into develop	2021-05-21 12:13:48 +02:00
AlexVonB	77797ebb79	Merge branch 'andrewcrichards/add_code_samp_kbd_tags' of https://github.com/AndrewCRichards/python-markdownify into AndrewCRichards-andrewcrichards/add_code_samp_kbd_tags	2021-05-21 12:11:59 +02:00
AlexVonB	9f3c4c9fa0	bump to v0.7.4	2021-05-18 10:42:16 +02:00
AlexVonB	967db26b3a	Merge branch 'fix-headless-tables' into develop	2021-05-18 10:41:42 +02:00
AlexVonB	ea81407b87	implemented table parsing correctly instead of manually walking down the dom tree in a table, we now rely on the main descent loop and just implement conversion for rows and cells correctly. this enables the use of html inside a table cell.	2021-05-17 14:00:00 +02:00
AlexVonB	e6da15c173	allow tables with headers in first (or any) column	2021-05-17 12:36:48 +02:00
AlexVonB	7dac92e85e	Allow for tables without header row fixes #42	2021-05-16 19:02:04 +02:00
AlexVonB	fc29483899	bump to v0.7.3	2021-05-16 18:41:08 +02:00
AlexVonB	bd7a8d6990	Merge pull request #43 from jiulongw/develop Fix missing whitespaces in <li> node	2021-05-16 18:39:58 +02:00
Jiulong Wang	ddfbf6a364	Keep important spaces in <li> element	2021-05-10 16:07:54 -07:00
Jiulong Wang	91a64e3cd4	Fix missing whitespaces in <li> node	2021-05-10 14:42:05 -07:00
Andrew Richards	7685738344	Formatting tweak Change indent of continuation line; squashes a flake8 warning.	2020-11-27 14:18:08 +00:00
Andrew Richards	92a73c8dfe	Correct test_code_with_tricky_content() Result of previous test didn't check for the trailing ' ' that convert_br() adds: This is needed to ensure that the resulting markdown not only has \n for the <br> but also renders it as a newline.	2020-11-26 22:20:29 +00:00
Andrew Richards	3354f143d8	Add method for <code> tag Add method and tests for inline tag <code>.	2020-11-23 17:28:23 +00:00