Merge branch 'develop'

bump to v0.10.3
allow BeautifulSoup objects to be converted
2022-01-23 11:01:45 +01:00 · 2022-01-23 11:01:26 +01:00 · 2022-01-23 11:00:19 +01:00 · 2022-01-23 10:59:24 +01:00 · 2022-01-18 08:56:33 +01:00 · 2022-01-18 08:53:33 +01:00
21 changed files with 1209 additions and 203 deletions
--- a/.github/workflows/python-app.yml
+++ b/.github/workflows/python-app.yml
@@ -0,0 +1,33 @@
+# This workflow will install Python dependencies, run tests and lint with a single version of Python
+# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions
+
+name: Python application
+
+on:
+  push:
+    branches: [ develop ]
+  pull_request:
+    branches: [ develop ]
+
+jobs:
+  build:
+
+    runs-on: ubuntu-latest
+
+    steps:
+    - uses: actions/checkout@v2
+    - name: Set up Python 3.8
+      uses: actions/setup-python@v2
+      with:
+        python-version: 3.8
+    - name: Install dependencies
+      run: |
+        python -m pip install --upgrade pip
+        pip install flake8==3.8.4 pytest
+        if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
+    - name: Lint with flake8
+      run: |
+        python setup.py lint
+    - name: Test with pytest
+      run: |
+        python setup.py test
--- a/.github/workflows/python-publish.yml
+++ b/.github/workflows/python-publish.yml
@@ -0,0 +1,31 @@
+# This workflow will upload a Python Package using Twine when a release is created
+# For more information see: https://help.github.com/en/actions/language-and-framework-guides/using-python-with-github-actions#publishing-to-package-registries
+
+name: Upload Python Package
+
+on:
+  release:
+    types: [created]
+
+jobs:
+  deploy:
+
+    runs-on: ubuntu-latest
+
+    steps:
+    - uses: actions/checkout@v2
+    - name: Set up Python
+      uses: actions/setup-python@v2
+      with:
+        python-version: '3.8'
+    - name: Install dependencies
+      run: |
+        python -m pip install --upgrade pip
+        pip install setuptools wheel twine
+    - name: Build and publish
+      env:
+        TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
+        TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
+      run: |
+        python setup.py sdist bdist_wheel
+        twine upload dist/*
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,11 @@
+*.pyc
+*.egg
+.eggs/
+*.egg-info/
+.DS_Store
+/.env
+/dist
+/MANIFEST
+/venv
+build/
+.vscode/settings.json
--- a/21
+++ b/21
@@ -0,0 +1,21 @@
+The MIT License (MIT)
+
+Copyright 2012-2018 Matthew Tretter
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/MANIFEST.in
+++ b/MANIFEST.in
@@ -0,0 +1 @@
+include README.rst
--- a/README.rst
+++ b/README.rst
@@ -0,0 +1,157 @@
+|build| |version| |license| |downloads|
+
+.. |build| image:: https://img.shields.io/github/workflow/status/matthewwithanm/python-markdownify/Python%20application/develop
+    :alt: GitHub Workflow Status
+    :target: https://github.com/matthewwithanm/python-markdownify/actions?query=workflow%3A%22Python+application%22
+
+.. |version| image:: https://img.shields.io/pypi/v/markdownify
+    :alt: Pypi version
+    :target: https://pypi.org/project/markdownify/
+
+.. |license| image:: https://img.shields.io/pypi/l/markdownify
+    :alt: License
+    :target: https://github.com/matthewwithanm/python-markdownify/blob/develop/LICENSE
+
+.. |downloads| image:: https://pepy.tech/badge/markdownify
+    :alt: Pypi Downloads
+    :target: https://pepy.tech/project/markdownify
+
+Installation
+============
+
+``pip install markdownify``
+
+
+Usage
+=====
+
+Convert some HTML to Markdown:
+
+.. code:: python
+
+    from markdownify import markdownify as md
+    md('<b>Yay</b> <a href="http://github.com">GitHub</a>')  # > '**Yay** [GitHub](http://github.com)'
+
+Specify tags to exclude:
+
+.. code:: python
+
+    from markdownify import markdownify as md
+    md('<b>Yay</b> <a href="http://github.com">GitHub</a>', strip=['a'])  # > '**Yay** GitHub'
+
+\...or specify the tags you want to include:
+
+.. code:: python
+
+    from markdownify import markdownify as md
+    md('<b>Yay</b> <a href="http://github.com">GitHub</a>', convert=['b'])  # > '**Yay** GitHub'
+
+
+Options
+=======
+
+Markdownify supports the following options:
+
+strip
+  A list of tags to strip. This option can't be used with the
+  ``convert`` option.
+
+convert
+  A list of tags to convert. This option can't be used with the
+  ``strip`` option.
+
+autolinks
+  A boolean indicating whether the "automatic link" style should be used when
+  a ``a`` tag's contents match its href. Defaults to ``True``.
+
+default_title
+  A boolean to enable setting the title of a link to its href, if no title is
+  given. Defaults to ``False``.
+
+heading_style
+  Defines how headings should be converted. Accepted values are ``ATX``,
+  ``ATX_CLOSED``, ``SETEXT``, and ``UNDERLINED`` (which is an alias for
+  ``SETEXT``). Defaults to ``UNDERLINED``.
+
+bullets
+  An iterable (string, list, or tuple) of bullet styles to be used. If the
+  iterable only contains one item, it will be used regardless of how deeply
+  lists are nested. Otherwise, the bullet will alternate based on nesting
+  level. Defaults to ``'*+-'``.
+
+strong_em_symbol
+  In markdown, both ``*`` and ``_`` are used to encode **strong** or
+  *emphasized* texts. Either of these symbols can be chosen by the options
+  ``ASTERISK`` (default) or ``UNDERSCORE`` respectively.
+
+sub_symbol, sup_symbol
+  Define the chars that surround ``<sub>`` and ``<sup>`` text. Defaults to an
+  empty string, because this is non-standard behavior. Could be something like
+  ``~`` and ``^`` to result in ``~sub~`` and ``^sup^``.
+
+newline_style
+  Defines the style of marking linebreaks (``<br>``) in markdown. The default
+  value ``SPACES`` of this option will adopt the usual two spaces and a newline,
+  while ``BACKSLASH`` will convert a linebreak to ``\\n`` (a backslash an a
+  newline). While the latter convention is non-standard, it is commonly
+  preferred and supported by a lot of interpreters.
+
+code_language
+  Defines the language that should be assumed for all ``<pre>`` sections.
+  Useful, if all code on a page is in the same programming language and
+  should be annotated with `````python`` or similar.
+  Defaults to ``''`` (empty string) and can be any string.
+
+escape_underscores
+  If set to ``False``, do not escape ``_`` to ``\_`` in text.
+  Defaults to ``True``.
+
+Options may be specified as kwargs to the ``markdownify`` function, or as a
+nested ``Options`` class in ``MarkdownConverter`` subclasses.
+
+
+Converting BeautifulSoup objects
+================================
+
+.. code:: python
+
+    from markdownify import MarkdownConverter
+
+    # Create shorthand method for conversion
+    def md(soup, **options):
+        return ImageBlockConverter(**options).convert_soup(soup)
+
+
+Creating Custom Converters
+==========================
+
+If you have a special usecase that calls for a special conversion, you can
+always inherit from ``MarkdownConverter`` and override the method you want to
+change:
+
+.. code:: python
+
+    from markdownify import MarkdownConverter
+
+    class ImageBlockConverter(MarkdownConverter):
+        """
+        Create a custom MarkdownConverter that adds two newlines after an image
+        """
+        def convert_img(self, el, text, convert_as_inline):
+            return super().convert_img(el, text, convert_as_inline) + '\n\n'
+
+    # Create shorthand method for conversion
+    def md(html, **options):
+        return ImageBlockConverter(**options).convert(html)
+
+
+Development
+===========
+
+To run tests:
+
+``python setup.py test``
+
+To lint:
+
+``python setup.py lint``
--- a/markdownify/init.py
+++ b/markdownify/init.py
@@ -1,56 +1,186 @@
-from lxml.html.soupparser import fromstring
+from bs4 import BeautifulSoup, NavigableString, Comment, Doctype
 import re
+import six


 convert_heading_re = re.compile(r'convert_h(\d+)')
 line_beginning_re = re.compile(r'^', re.MULTILINE)
-whitespace_re = re.compile(r'[\r\n\s\t ]+')
+whitespace_re = re.compile(r'[\t ]+')
+all_whitespace_re = re.compile(r'[\s]+')
+html_heading_re = re.compile(r'h[1-6]')


-def escape(text):
+# Heading styles
+ATX = 'atx'
+ATX_CLOSED = 'atx_closed'
+UNDERLINED = 'underlined'
+SETEXT = UNDERLINED
+
+# Newline style
+SPACES = 'spaces'
+BACKSLASH = 'backslash'
+
+# Strong and emphasis style
+ASTERISK = '*'
+UNDERSCORE = '_'
+
+
+def escape(text, escape_underscores):
    if not text:
        return ''
-    return text.replace('_', r'\_')
+    if escape_underscores:
+        return text.replace('_', r'\_')
+    return text
+
+
+def chomp(text):
+    """
+    If the text in an inline tag like b, a, or em contains a leading or trailing
+    space, strip the string and return a space as suffix of prefix, if needed.
+    This function is used to prevent conversions like
+        <b> foo</b> => ** foo**
+    """
+    prefix = ' ' if text and text[0] == ' ' else ''
+    suffix = ' ' if text and text[-1] == ' ' else ''
+    text = text.strip()
+    return (prefix, suffix, text)
+
+
+def abstract_inline_conversion(markup_fn):
+    """
+    This abstracts all simple inline tags like b, em, del, ...
+    Returns a function that wraps the chomped text in a pair of the string
+    that is returned by markup_fn. markup_fn is necessary to allow for
+    references to self.strong_em_symbol etc.
+    """
+    def implementation(self, el, text, convert_as_inline):
+        markup = markup_fn(self)
+        prefix, suffix, text = chomp(text)
+        if not text:
+            return ''
+        return '%s%s%s%s%s' % (prefix, markup, text, markup, suffix)
+    return implementation
+
+
+def _todict(obj):
+    return dict((k, getattr(obj, k)) for k in dir(obj) if not k.startswith('_'))


 class MarkdownConverter(object):
-    def __init__(self, tags_to_strip=None, tags_to_convert=None):
-        if tags_to_strip is not None and tags_to_convert is not None:
+    class DefaultOptions:
+        autolinks = True
+        bullets = '*+-'  # An iterable of bullet types.
+        code_language = ''
+        convert = None
+        default_title = False
+        escape_underscores = True
+        heading_style = UNDERLINED
+        newline_style = SPACES
+        strip = None
+        strong_em_symbol = ASTERISK
+        sub_symbol = ''
+        sup_symbol = ''
+
+    class Options(DefaultOptions):
+        pass
+
+    def __init__(self, **options):
+        # Create an options dictionary. Use DefaultOptions as a base so that
+        # it doesn't have to be extended.
+        self.options = _todict(self.DefaultOptions)
+        self.options.update(_todict(self.Options))
+        self.options.update(options)
+        if self.options['strip'] is not None and self.options['convert'] is not None:
            raise ValueError('You may specify either tags to strip or tags to'
-                    ' convert, but not both.')
-        self.tags_to_strip = tags_to_strip
-        self.tags_to_convert = tags_to_convert
+                             ' convert, but not both.')

    def convert(self, html):
-        soup = fromstring(html)
-        return self.process_tag(soup)
+        soup = BeautifulSoup(html, 'html.parser')
+        return self.convert_soup(soup)

-    def process_tag(self, node):
-        text = self.process_text(node.text)
+    def convert_soup(self, soup):
+        return self.process_tag(soup, convert_as_inline=False, children_only=True)
+
+    def process_tag(self, node, convert_as_inline, children_only=False):
+        text = ''
+
+        # markdown headings or cells can't include
+        # block elements (elements w/newlines)
+        isHeading = html_heading_re.match(node.name) is not None
+        isCell = node.name in ['td', 'th']
+        convert_children_as_inline = convert_as_inline
+
+        if not children_only and (isHeading or isCell):
+            convert_children_as_inline = True
+
+        # Remove whitespace-only textnodes in purely nested nodes
+        def is_nested_node(el):
+            return el and el.name in ['ol', 'ul', 'li',
+                                      'table', 'thead', 'tbody', 'tfoot',
+                                      'tr', 'td', 'th']
+
+        if is_nested_node(node):
+            for el in node.children:
+                # Only extract (remove) whitespace-only text node if any of the
+                # conditions is true:
+                # - el is the first element in its parent
+                # - el is the last element in its parent
+                # - el is adjacent to an nested node
+                can_extract = (not el.previous_sibling
+                               or not el.next_sibling
+                               or is_nested_node(el.previous_sibling)
+                               or is_nested_node(el.next_sibling))
+                if (isinstance(el, NavigableString)
+                        and six.text_type(el).strip() == ''
+                        and can_extract):
+                    el.extract()

        # Convert the children first
-        for el in node.findall('*'):
-            text += self.process_tag(el)
+        for el in node.children:
+            if isinstance(el, Comment) or isinstance(el, Doctype):
+                continue
+            elif isinstance(el, NavigableString):
+                text += self.process_text(el)
+            else:
+                text += self.process_tag(el, convert_children_as_inline)

-        convert_fn = getattr(self, 'convert_%s' % node.tag, None)
-        if convert_fn and self.should_convert_tag(node.tag):
-            text = convert_fn(node, text)
-
-        text += self.process_text(node.tail)
+        if not children_only:
+            convert_fn = getattr(self, 'convert_%s' % node.name, None)
+            if convert_fn and self.should_convert_tag(node.name):
+                text = convert_fn(node, text, convert_as_inline)

        return text

-    def process_text(self, text):
-        return escape(whitespace_re.sub(' ', text or ''))
+    def process_text(self, el):
+        text = six.text_type(el) or ''
+
+        # dont remove any whitespace when handling pre or code in pre
+        if not (el.parent.name == 'pre'
+                or (el.parent.name == 'code'
+                    and el.parent.parent.name == 'pre')):
+            text = whitespace_re.sub(' ', text)
+
+        if el.parent.name != 'code':
+            text = escape(text, self.options['escape_underscores'])
+
+        # remove trailing whitespaces if any of the following condition is true:
+        # - current text node is the last node in li
+        # - current text node is followed by an embedded list
+        if (el.parent.name == 'li'
+                and (not el.next_sibling
+                     or el.next_sibling.name in ['ul', 'ol'])):
+            text = text.rstrip()
+
+        return text

    def __getattr__(self, attr):
-        # Handle heading levels > 2
+        # Handle headings
        m = convert_heading_re.match(attr)
        if m:
            n = int(m.group(1))

-            def convert_tag(el, text):
-                return self.convert_hn(n, el, text)
+            def convert_tag(el, text, convert_as_inline):
+                return self.convert_hn(n, el, text, convert_as_inline)

            convert_tag.__name__ = 'convert_h%s' % n
            setattr(self, convert_tag.__name__, convert_tag)
@@ -60,62 +190,183 @@ class MarkdownConverter(object):

    def should_convert_tag(self, tag):
        tag = tag.lower()
-        if self.tags_to_strip is not None:
-            return tag not in self.tags_to_strip
-        elif self.tags_to_convert is not None:
-            return tag in self.tags_to_convert
+        strip = self.options['strip']
+        convert = self.options['convert']
+        if strip is not None:
+            return tag not in strip
+        elif convert is not None:
+            return tag in convert
        else:
            return True

+    def indent(self, text, level):
+        return line_beginning_re.sub('\t' * level, text) if text else ''
+
    def underline(self, text, pad_char):
        text = (text or '').rstrip()
        return '%s\n%s\n\n' % (text, pad_char * len(text)) if text else ''

-    def convert_a(self, el, text):
+    def convert_a(self, el, text, convert_as_inline):
+        prefix, suffix, text = chomp(text)
+        if not text:
+            return ''
        href = el.get('href')
        title = el.get('title')
+        # For the replacement see #29: text nodes underscores are escaped
+        if (self.options['autolinks']
+                and text.replace(r'\_', '_') == href
+                and not title
+                and not self.options['default_title']):
+            # Shortcut syntax
+            return '<%s>' % href
+        if self.options['default_title'] and not title:
+            title = href
        title_part = ' "%s"' % title.replace('"', r'\"') if title else ''
-        return '[%s](%s%s)' % (text or '', href, title_part) if href else text or ''
+        return '%s[%s](%s%s)%s' % (prefix, text, href, title_part, suffix) if href else text

-    def convert_b(self, el, text):
-        return self.convert_strong(el, text)
+    convert_b = abstract_inline_conversion(lambda self: 2 * self.options['strong_em_symbol'])

-    def convert_blockquote(self, el, text):
-        return '\n' + line_beginning_re.sub('> ', text) if text else ''
+    def convert_blockquote(self, el, text, convert_as_inline):

-    def convert_br(self, el, text):
-        return '  \n'
+        if convert_as_inline:
+            return text

-    def convert_em(self, el, text):
-        return '*%s*' % text if text else ''
+        return '\n' + (line_beginning_re.sub('> ', text) + '\n\n') if text else ''

-    def convert_h1(self, el, text):
-        return self.underline(text, '=')
+    def convert_br(self, el, text, convert_as_inline):
+        if convert_as_inline:
+            return ""

-    def convert_h2(self, el, text):
-        return self.underline(text, '-')
-
-    def convert_hn(self, n, el, text):
-        return '%s %s\n\n' % ('#' * n, text.rstrip()) if text else ''
-
-    def convert_i(self, el, text):
-        return self.convert_em(el, text)
-
-    def convert_li(self, el, text):
-        parent = el.getparent()
-        if parent is not None and parent.tag == 'ol':
-            bullet = '%s.' % (parent.index(el) + 1)
+        if self.options['newline_style'].lower() == BACKSLASH:
+            return '\\\n'
        else:
-            bullet = '*'
-        return '%s %s\n' % (bullet, text or '')
+            return '  \n'

-    def convert_p(self, el, text):
+    def convert_code(self, el, text, convert_as_inline):
+        if el.parent.name == 'pre':
+            return text
+        converter = abstract_inline_conversion(lambda self: '`')
+        return converter(self, el, text, convert_as_inline)
+
+    convert_del = abstract_inline_conversion(lambda self: '~~')
+
+    convert_em = abstract_inline_conversion(lambda self: self.options['strong_em_symbol'])
+
+    convert_kbd = convert_code
+
+    def convert_hn(self, n, el, text, convert_as_inline):
+        if convert_as_inline:
+            return text
+
+        style = self.options['heading_style'].lower()
+        text = text.rstrip()
+        if style == UNDERLINED and n <= 2:
+            line = '=' if n == 1 else '-'
+            return self.underline(text, line)
+        hashes = '#' * n
+        if style == ATX_CLOSED:
+            return '%s %s %s\n\n' % (hashes, text, hashes)
+        return '%s %s\n\n' % (hashes, text)
+
+    def convert_hr(self, el, text, convert_as_inline):
+        return '\n\n---\n\n'
+
+    convert_i = convert_em
+
+    def convert_img(self, el, text, convert_as_inline):
+        alt = el.attrs.get('alt', None) or ''
+        src = el.attrs.get('src', None) or ''
+        title = el.attrs.get('title', None) or ''
+        title_part = ' "%s"' % title.replace('"', r'\"') if title else ''
+        if convert_as_inline:
+            return alt
+
+        return '![%s](%s%s)' % (alt, src, title_part)
+
+    def convert_list(self, el, text, convert_as_inline):
+
+        # Converting a list to inline is undefined.
+        # Ignoring convert_to_inline for list.
+
+        nested = False
+        before_paragraph = False
+        if el.next_sibling and el.next_sibling.name not in ['ul', 'ol']:
+            before_paragraph = True
+        while el:
+            if el.name == 'li':
+                nested = True
+                break
+            el = el.parent
+        if nested:
+            # remove trailing newline if nested
+            return '\n' + self.indent(text, 1).rstrip()
+        return text + ('\n' if before_paragraph else '')
+
+    convert_ul = convert_list
+    convert_ol = convert_list
+
+    def convert_li(self, el, text, convert_as_inline):
+        parent = el.parent
+        if parent is not None and parent.name == 'ol':
+            if parent.get("start"):
+                start = int(parent.get("start"))
+            else:
+                start = 1
+            bullet = '%s.' % (start + parent.index(el))
+        else:
+            depth = -1
+            while el:
+                if el.name == 'ul':
+                    depth += 1
+                el = el.parent
+            bullets = self.options['bullets']
+            bullet = bullets[depth % len(bullets)]
+        return '%s %s\n' % (bullet, (text or '').strip())
+
+    def convert_p(self, el, text, convert_as_inline):
+        if convert_as_inline:
+            return text
        return '%s\n\n' % text if text else ''

-    def convert_strong(self, el, text):
-        return '**%s**' % text if text else ''
+    def convert_pre(self, el, text, convert_as_inline):
+        if not text:
+            return ''
+        return '\n```%s\n%s\n```\n' % (self.options['code_language'], text)
+
+    convert_s = convert_del
+
+    convert_strong = convert_b
+
+    convert_samp = convert_code
+
+    convert_sub = abstract_inline_conversion(lambda self: self.options['sub_symbol'])
+
+    convert_sup = abstract_inline_conversion(lambda self: self.options['sup_symbol'])
+
+    def convert_table(self, el, text, convert_as_inline):
+        return '\n\n' + text + '\n'
+
+    def convert_td(self, el, text, convert_as_inline):
+        return ' ' + text + ' |'
+
+    def convert_th(self, el, text, convert_as_inline):
+        return ' ' + text + ' |'
+
+    def convert_tr(self, el, text, convert_as_inline):
+        cells = el.find_all(['td', 'th'])
+        is_headrow = all([cell.name == 'th' for cell in cells])
+        overline = ''
+        underline = ''
+        if is_headrow and not el.previous_sibling:
+            # first row and is headline: print headline underline
+            underline += '| ' + ' | '.join(['---'] * len(cells)) + ' |' + '\n'
+        elif not el.previous_sibling and not el.parent.name != 'table':
+            # first row, not headline, and the parent is sth. like tbody:
+            # print empty headline above this row
+            overline += '| ' + ' | '.join([''] * len(cells)) + ' |' + '\n'
+            overline += '| ' + ' | '.join(['---'] * len(cells)) + ' |' + '\n'
+        return overline + '|' + text + '\n' + underline


-def markdownify(html, strip=None, convert=None):
-    converter = MarkdownConverter(strip, convert)
-    return converter.convert(html)
+def markdownify(html, **options):
+    return MarkdownConverter(**options).convert(html)
--- a/markdownify/version.py
+++ b/markdownify/version.py
@@ -1 +0,0 @@
-__version__ = '0.1.0'
--- a/runtests.py
+++ b/runtests.py
@@ -1,5 +0,0 @@
-#!/usr/bin/env python
-from nose.core import run, collector
-
-if __name__ == '__main__':
-    run()
--- a/setup.cfg
+++ b/setup.cfg
@@ -0,0 +1,2 @@
+[flake8]
+ignore = E501 W503
--- a/setup.py
+++ b/setup.py
@@ -2,43 +2,98 @@
 import codecs
 import os
 from setuptools import setup, find_packages
+from setuptools.command.test import test as TestCommand, Command


 read = lambda filepath: codecs.open(filepath, 'r', 'utf-8').read()
-execfile(os.path.join(os.path.dirname(__file__), 'markdownify', 'version.py'))
+
+pkgmeta = {
+    '__title__': 'markdownify',
+    '__author__': 'Matthew Tretter',
+    '__version__': '0.10.3',
+}
+
+
+class PyTest(TestCommand):
+    def finalize_options(self):
+        TestCommand.finalize_options(self)
+        self.test_args = ['tests', '-s']
+        self.test_suite = True
+
+    def run_tests(self):
+        import pytest
+        errno = pytest.main(self.test_args)
+        raise SystemExit(errno)
+
+
+class LintCommand(Command):
+    """
+    A copy of flake8's Flake8Command
+
+    """
+    description = "Run flake8 on modules registered in setuptools"
+    user_options = []
+
+    def initialize_options(self):
+        pass
+
+    def finalize_options(self):
+        pass
+
+    def distribution_files(self):
+        if self.distribution.packages:
+            for package in self.distribution.packages:
+                yield package.replace(".", os.path.sep)
+
+        if self.distribution.py_modules:
+            for filename in self.distribution.py_modules:
+                yield "%s.py" % filename
+
+    def run(self):
+        from flake8.api.legacy import get_style_guide
+        flake8_style = get_style_guide(config_file='setup.cfg')
+        paths = self.distribution_files()
+        report = flake8_style.check_files(paths)
+        raise SystemExit(report.total_errors > 0)


 setup(
-    name='python-markdownify',
+    name='markdownify',
    description='Convert HTML to markdown.',
    long_description=read(os.path.join(os.path.dirname(__file__), 'README.rst')),
-    version=__version__,
-    author='Matthew Tretter',
-    author_email='matthew@exanimo.com',
+    version=pkgmeta['__version__'],
+    author=pkgmeta['__author__'],
+    author_email='m@tthewwithanm.com',
    url='http://github.com/matthewwithanm/python-markdownify',
    download_url='http://github.com/matthewwithanm/python-markdownify/tarball/master',
    packages=find_packages(),
    zip_safe=False,
    include_package_data=True,
+    setup_requires=[
+        'flake8>=3.8,<5',
+    ],
    tests_require=[
-        'nose',
-        'unittest2',
+        'pytest>=6.2,<7',
    ],
    install_requires=[
-        'lxml',
-        'BeautifulSoup',
+        'beautifulsoup4>=4.9,<5', 'six>=1.15,<2'
    ],
    classifiers=[
        'Environment :: Web Environment',
        'Framework :: Django',
        'Intended Audience :: Developers',
-        'License :: OSI Approved :: BSD License',
+        'License :: OSI Approved :: MIT License',
        'Operating System :: OS Independent',
        'Programming Language :: Python :: 2.5',
        'Programming Language :: Python :: 2.6',
        'Programming Language :: Python :: 2.7',
+        'Programming Language :: Python :: 3.6',
+        'Programming Language :: Python :: 3.7',
+        'Programming Language :: Python :: 3.8',
        'Topic :: Utilities'
    ],
-    setup_requires=[],
-    test_suite='runtests.collector',
+    cmdclass={
+        'test': PyTest,
+        'lint': LintCommand,
+    },
 )
--- a/tests.py
+++ b/tests.py
@@ -1,123 +0,0 @@
-import unittest
-from markdownify import markdownify as md
-
-
-class BasicTests(unittest.TestCase):
-
-    def test_single_tag(self):
-        self.assertEqual(md('<span>Hello</span>'), 'Hello')
-
-    def test_soup(self):
-        self.assertEqual(md('<div><span>Hello</div></span>'), 'Hello')
-
-    def test_whitespace(self):
-        self.assertEqual(md(' a  b \n\n c '), ' a b c ')
-
-
-class ArgTests(unittest.TestCase):
-
-    def test_strip(self):
-        self.assertEqual(
-            md('<a href="https://github.com/matthewwithanm">Some Text</a>', strip=['a']),
-            'Some Text')
-
-    def test_do_not_strip(self):
-        self.assertEqual(
-            md('<a href="https://github.com/matthewwithanm">Some Text</a>', strip=[]),
-            '[Some Text](https://github.com/matthewwithanm)')
-
-    def test_convert(self):
-        self.assertEqual(
-            md('<a href="https://github.com/matthewwithanm">Some Text</a>', convert=['a']),
-            '[Some Text](https://github.com/matthewwithanm)')
-
-    def test_do_not_convert(self):
-        self.assertEqual(
-            md('<a href="https://github.com/matthewwithanm">Some Text</a>', convert=[]),
-            'Some Text')
-
-
-class EscapeTests(unittest.TestCase):
-
-    def test_underscore(self):
-        self.assertEqual(md('_hey_dude_'), '\_hey\_dude\_')
-
-    def test_xml_entities(self):
-        self.assertEqual(md('&amp;'), '&')
-
-    def test_named_entities(self):
-        self.assertEqual(md('&raquo;'), u'\xbb')
-
-    def test_hexadecimal_entities(self):
-        # This looks to be a bug in BeautifulSoup (fixed in bs4) that we have to work around.
-        self.assertEqual(md('&#x27;'), '\x27')
-
-    def test_single_escaping_entities(self):
-        self.assertEqual(md('&amp;amp;'), '&amp;')
-
-
-class ConversionTests(unittest.TestCase):
-
-    def test_a(self):
-        self.assertEqual(
-            md('<a href="http://google.com">Google</a>'),
-            '[Google](http://google.com)'
-        )
-
-    def test_a_with_title(self):
-        self.assertEqual(
-            md('<a href="http://google.com" title="The &quot;Goog&quot;">Google</a>'),
-            r'[Google](http://google.com "The \"Goog\"")'
-        )
-
-    def test_b(self):
-        self.assertEqual(md('<b>Hello</b>'), '**Hello**')
-
-    def test_blockquote(self):
-        self.assertEqual(md('<blockquote>Hello</blockquote>').strip(), '> Hello')
-
-    def test_nested_blockquote(self):
-        self.assertEqual(
-            md('<blockquote>And she was like <blockquote>Hello</blockquote></blockquote>').strip(),
-            '> And she was like \n> > Hello'
-        )
-
-    def test_br(self):
-        self.assertEqual(md('a<br />b<br />c'), 'a  \nb  \nc')
-
-    def test_em(self):
-        self.assertEqual(md('<em>Hello</em>'), '*Hello*')
-
-    def test_h1(self):
-        self.assertEqual(md('<h1>Hello</h1>'), 'Hello\n=====\n\n')
-
-    def test_h2(self):
-        self.assertEqual(md('<h2>Hello</h2>'), 'Hello\n-----\n\n')
-
-    def test_hn(self):
-        self.assertEqual(md('<h3>Hello</h3>'), '### Hello\n\n')
-        self.assertEqual(md('<h6>Hello</h6>'), '###### Hello\n\n')
-
-    def test_i(self):
-        self.assertEqual(md('<i>Hello</i>'), '*Hello*')
-
-    def test_ol(self):
-        self.assertEqual(md('<ol><li>a</li><li>b</li></ol>'), '1. a\n2. b\n')
-
-    def test_p(self):
-        self.assertEqual(md('<p>hello</p>'), 'hello\n\n')
-
-    def test_strong(self):
-        self.assertEqual(md('<strong>Hello</strong>'), '**Hello**')
-
-    def test_ul(self):
-        self.assertEqual(md('<ul><li>a</li><li>b</li></ul>'), '* a\n* b\n')
-
-
-class AdvancedTests(unittest.TestCase):
-
-    def test_nested(self):
-        self.assertEqual(
-            md('<p>This is an <a href="http://example.com/">example link</a>.</p>'),
-            'This is an [example link](http://example.com/).\n\n'
-        )
--- a/tests/init.py
+++ b/tests/init.py
--- a/tests/test_advanced.py
+++ b/tests/test_advanced.py
@@ -0,0 +1,39 @@
+from markdownify import markdownify as md
+
+
+def test_chomp():
+    assert md(' <b></b> ') == '  '
+    assert md(' <b> </b> ') == '  '
+    assert md(' <b>  </b> ') == '  '
+    assert md(' <b>   </b> ') == '  '
+    assert md(' <b>s </b> ') == ' **s**  '
+    assert md(' <b> s</b> ') == '  **s** '
+    assert md(' <b> s </b> ') == '  **s**  '
+    assert md(' <b>  s  </b> ') == '  **s**  '
+
+
+def test_nested():
+    text = md('<p>This is an <a href="http://example.com/">example link</a>.</p>')
+    assert text == 'This is an [example link](http://example.com/).\n\n'
+
+
+def test_ignore_comments():
+    text = md("<!-- This is a comment -->")
+    assert text == ""
+
+
+def test_ignore_comments_with_other_tags():
+    text = md("<!-- This is a comment --><a href='http://example.com/'>example link</a>")
+    assert text == "[example link](http://example.com/)"
+
+
+def test_code_with_tricky_content():
+    assert md('<code>></code>') == "`>`"
+    assert md('<code>/home/</code><b>username</b>') == "`/home/`**username**"
+    assert md('First line <code>blah blah<br />blah blah</code> second line') \
+        == "First line `blah blah  \nblah blah` second line"
+
+
+def test_special_tags():
+    assert md('<!DOCTYPE html>') == ''
+    assert md('<![CDATA[foobar]]>') == 'foobar'
--- a/tests/test_args.py
+++ b/tests/test_args.py
@@ -0,0 +1,25 @@
+"""
+Test whitelisting/blacklisting of specific tags.
+
+"""
+from markdownify import markdownify as md
+
+
+def test_strip():
+    text = md('<a href="https://github.com/matthewwithanm">Some Text</a>', strip=['a'])
+    assert text == 'Some Text'
+
+
+def test_do_not_strip():
+    text = md('<a href="https://github.com/matthewwithanm">Some Text</a>', strip=[])
+    assert text == '[Some Text](https://github.com/matthewwithanm)'
+
+
+def test_convert():
+    text = md('<a href="https://github.com/matthewwithanm">Some Text</a>', convert=['a'])
+    assert text == '[Some Text](https://github.com/matthewwithanm)'
+
+
+def test_do_not_convert():
+    text = md('<a href="https://github.com/matthewwithanm">Some Text</a>', convert=[])
+    assert text == 'Some Text'
--- a/tests/test_basic.py
+++ b/tests/test_basic.py
@@ -0,0 +1,13 @@
+from markdownify import markdownify as md
+
+
+def test_single_tag():
+    assert md('<span>Hello</span>') == 'Hello'
+
+
+def test_soup():
+    assert md('<div><span>Hello</div></span>') == 'Hello'
+
+
+def test_whitespace():
+    assert md(' a  b \t\t c ') == ' a b c '
--- a/tests/test_conversions.py
+++ b/tests/test_conversions.py
@@ -0,0 +1,217 @@
+from markdownify import markdownify as md, ATX, ATX_CLOSED, BACKSLASH, UNDERSCORE
+
+
+def inline_tests(tag, markup):
+    # test template for different inline tags
+    assert md(f'<{tag}>Hello</{tag}>') == f'{markup}Hello{markup}'
+    assert md(f'foo <{tag}>Hello</{tag}> bar') == f'foo {markup}Hello{markup} bar'
+    assert md(f'foo<{tag}> Hello</{tag}> bar') == f'foo {markup}Hello{markup} bar'
+    assert md(f'foo <{tag}>Hello </{tag}>bar') == f'foo {markup}Hello{markup} bar'
+    assert md(f'foo <{tag}></{tag}> bar') in ['foo  bar', 'foo bar']  # Either is OK
+
+
+def test_a():
+    assert md('<a href="https://google.com">Google</a>') == '[Google](https://google.com)'
+    assert md('<a href="https://google.com">https://google.com</a>') == '<https://google.com>'
+    assert md('<a href="https://community.kde.org/Get_Involved">https://community.kde.org/Get_Involved</a>') == '<https://community.kde.org/Get_Involved>'
+    assert md('<a href="https://community.kde.org/Get_Involved">https://community.kde.org/Get_Involved</a>', autolinks=False) == '[https://community.kde.org/Get\\_Involved](https://community.kde.org/Get_Involved)'
+
+
+def test_a_spaces():
+    assert md('foo <a href="http://google.com">Google</a> bar') == 'foo [Google](http://google.com) bar'
+    assert md('foo<a href="http://google.com"> Google</a> bar') == 'foo [Google](http://google.com) bar'
+    assert md('foo <a href="http://google.com">Google </a>bar') == 'foo [Google](http://google.com) bar'
+    assert md('foo <a href="http://google.com"></a> bar') == 'foo  bar'
+
+
+def test_a_with_title():
+    text = md('<a href="http://google.com" title="The &quot;Goog&quot;">Google</a>')
+    assert text == r'[Google](http://google.com "The \"Goog\"")'
+    assert md('<a href="https://google.com">https://google.com</a>', default_title=True) == '[https://google.com](https://google.com "https://google.com")'
+
+
+def test_a_shortcut():
+    text = md('<a href="http://google.com">http://google.com</a>')
+    assert text == '<http://google.com>'
+
+
+def test_a_no_autolinks():
+    assert md('<a href="https://google.com">https://google.com</a>', autolinks=False) == '[https://google.com](https://google.com)'
+
+
+def test_b():
+    assert md('<b>Hello</b>') == '**Hello**'
+
+
+def test_b_spaces():
+    assert md('foo <b>Hello</b> bar') == 'foo **Hello** bar'
+    assert md('foo<b> Hello</b> bar') == 'foo **Hello** bar'
+    assert md('foo <b>Hello </b>bar') == 'foo **Hello** bar'
+    assert md('foo <b></b> bar') == 'foo  bar'
+
+
+def test_blockquote():
+    assert md('<blockquote>Hello</blockquote>') == '\n> Hello\n\n'
+
+
+def test_blockquote_with_paragraph():
+    assert md('<blockquote>Hello</blockquote><p>handsome</p>') == '\n> Hello\n\nhandsome\n\n'
+
+
+def test_blockquote_nested():
+    text = md('<blockquote>And she was like <blockquote>Hello</blockquote></blockquote>')
+    assert text == '\n> And she was like \n> > Hello\n> \n> \n\n'
+
+
+def test_br():
+    assert md('a<br />b<br />c') == 'a  \nb  \nc'
+    assert md('a<br />b<br />c', newline_style=BACKSLASH) == 'a\\\nb\\\nc'
+
+
+def test_code():
+    inline_tests('code', '`')
+    assert md('<code>this_should_not_escape</code>') == '`this_should_not_escape`'
+
+
+def test_del():
+    inline_tests('del', '~~')
+
+
+def test_div():
+    assert md('Hello</div> World') == 'Hello World'
+
+
+def test_em():
+    inline_tests('em', '*')
+
+
+def test_h1():
+    assert md('<h1>Hello</h1>') == 'Hello\n=====\n\n'
+
+
+def test_h2():
+    assert md('<h2>Hello</h2>') == 'Hello\n-----\n\n'
+
+
+def test_hn():
+    assert md('<h3>Hello</h3>') == '### Hello\n\n'
+    assert md('<h4>Hello</h4>') == '#### Hello\n\n'
+    assert md('<h5>Hello</h5>') == '##### Hello\n\n'
+    assert md('<h6>Hello</h6>') == '###### Hello\n\n'
+
+
+def test_hn_chained():
+    assert md('<h1>First</h1>\n<h2>Second</h2>\n<h3>Third</h3>', heading_style=ATX) == '# First\n\n\n## Second\n\n\n### Third\n\n'
+    assert md('X<h1>First</h1>', heading_style=ATX) == 'X# First\n\n'
+
+
+def test_hn_nested_tag_heading_style():
+    assert md('<h1>A <p>P</p> C </h1>', heading_style=ATX_CLOSED) == '# A P C #\n\n'
+    assert md('<h1>A <p>P</p> C </h1>', heading_style=ATX) == '# A P C\n\n'
+
+
+def test_hn_nested_simple_tag():
+    tag_to_markdown = [
+        ("strong", "**strong**"),
+        ("b", "**b**"),
+        ("em", "*em*"),
+        ("i", "*i*"),
+        ("p", "p"),
+        ("a", "a"),
+        ("div", "div"),
+        ("blockquote", "blockquote"),
+    ]
+
+    for tag, markdown in tag_to_markdown:
+        assert md('<h3>A <' + tag + '>' + tag + '</' + tag + '> B</h3>') == '### A ' + markdown + ' B\n\n'
+
+    assert md('<h3>A <br>B</h3>', heading_style=ATX) == '### A B\n\n'
+
+    # Nested lists not supported
+    # assert md('<h3>A <ul><li>li1</i><li>l2</li></ul></h3>', heading_style=ATX) == '### A li1 li2 B\n\n'
+
+
+def test_hn_nested_img():
+    image_attributes_to_markdown = [
+        ("", ""),
+        ("alt='Alt Text'", "Alt Text"),
+        ("alt='Alt Text' title='Optional title'", "Alt Text"),
+    ]
+    for image_attributes, markdown in image_attributes_to_markdown:
+        assert md('<h3>A <img src="/path/to/img.jpg " ' + image_attributes + '/> B</h3>') == '### A ' + markdown + ' B\n\n'
+
+
+def test_hn_atx_headings():
+    assert md('<h1>Hello</h1>', heading_style=ATX) == '# Hello\n\n'
+    assert md('<h2>Hello</h2>', heading_style=ATX) == '## Hello\n\n'
+
+
+def test_hn_atx_closed_headings():
+    assert md('<h1>Hello</h1>', heading_style=ATX_CLOSED) == '# Hello #\n\n'
+    assert md('<h2>Hello</h2>', heading_style=ATX_CLOSED) == '## Hello ##\n\n'
+
+
+def test_head():
+    assert md('<head>head</head>') == 'head'
+
+
+def test_hr():
+    assert md('Hello<hr>World') == 'Hello\n\n---\n\nWorld'
+    assert md('Hello<hr />World') == 'Hello\n\n---\n\nWorld'
+    assert md('<p>Hello</p>\n<hr>\n<p>World</p>') == 'Hello\n\n\n\n\n---\n\n\nWorld\n\n'
+
+
+def test_i():
+    assert md('<i>Hello</i>') == '*Hello*'
+
+
+def test_img():
+    assert md('<img src="/path/to/img.jpg" alt="Alt text" title="Optional title" />') == '![Alt text](/path/to/img.jpg "Optional title")'
+    assert md('<img src="/path/to/img.jpg" alt="Alt text" />') == '![Alt text](/path/to/img.jpg)'
+
+
+def test_kbd():
+    inline_tests('kbd', '`')
+
+
+def test_p():
+    assert md('<p>hello</p>') == 'hello\n\n'
+
+
+def test_pre():
+    assert md('<pre>test\n    foo\nbar</pre>') == '\n```\ntest\n    foo\nbar\n```\n'
+    assert md('<pre><code>test\n    foo\nbar</code></pre>') == '\n```\ntest\n    foo\nbar\n```\n'
+
+
+def test_s():
+    inline_tests('s', '~~')
+
+
+def test_samp():
+    inline_tests('samp', '`')
+
+
+def test_strong():
+    assert md('<strong>Hello</strong>') == '**Hello**'
+
+
+def test_strong_em_symbol():
+    assert md('<strong>Hello</strong>', strong_em_symbol=UNDERSCORE) == '__Hello__'
+    assert md('<b>Hello</b>', strong_em_symbol=UNDERSCORE) == '__Hello__'
+    assert md('<em>Hello</em>', strong_em_symbol=UNDERSCORE) == '_Hello_'
+    assert md('<i>Hello</i>', strong_em_symbol=UNDERSCORE) == '_Hello_'
+
+
+def test_sub():
+    assert md('<sub>foo</sub>') == 'foo'
+    assert md('<sub>foo</sub>', sub_symbol='~') == '~foo~'
+
+
+def test_sup():
+    assert md('<sup>foo</sup>') == 'foo'
+    assert md('<sup>foo</sup>', sup_symbol='^') == '^foo^'
+
+
+def test_lang():
+    assert md('<pre>test\n    foo\nbar</pre>', code_language='python') == '\n```python\ntest\n    foo\nbar\n```\n'
+    assert md('<pre><code>test\n    foo\nbar</code></pre>', code_language='javascript') == '\n```javascript\ntest\n    foo\nbar\n```\n'
--- a/tests/test_custom_converter.py
+++ b/tests/test_custom_converter.py
@@ -0,0 +1,25 @@
+from markdownify import MarkdownConverter
+from bs4 import BeautifulSoup
+
+
+class ImageBlockConverter(MarkdownConverter):
+    """
+    Create a custom MarkdownConverter that adds two newlines after an image
+    """
+    def convert_img(self, el, text, convert_as_inline):
+        return super().convert_img(el, text, convert_as_inline) + '\n\n'
+
+
+def test_img():
+    # Create shorthand method for conversion
+    def md(html, **options):
+        return ImageBlockConverter(**options).convert(html)
+
+    assert md('<img src="/path/to/img.jpg" alt="Alt text" title="Optional title" />') == '![Alt text](/path/to/img.jpg "Optional title")\n\n'
+    assert md('<img src="/path/to/img.jpg" alt="Alt text" />') == '![Alt text](/path/to/img.jpg)\n\n'
+
+
+def test_soup():
+    html = '<b>test</b>'
+    soup = BeautifulSoup(html, 'html.parser')
+    assert MarkdownConverter().convert_soup(soup) == '**test**'
--- a/tests/test_escaping.py
+++ b/tests/test_escaping.py
@@ -0,0 +1,23 @@
+from markdownify import markdownify as md
+
+
+def test_underscore():
+    assert md('_hey_dude_') == r'\_hey\_dude\_'
+    assert md('_hey_dude_', escape_underscores=False) == r'_hey_dude_'
+
+
+def test_xml_entities():
+    assert md('&amp;') == '&'
+
+
+def test_named_entities():
+    assert md('&raquo;') == u'\xbb'
+
+
+def test_hexadecimal_entities():
+    # This looks to be a bug in BeautifulSoup (fixed in bs4) that we have to work around.
+    assert md('&#x27;') == '\x27'
+
+
+def test_single_escaping_entities():
+    assert md('&amp;amp;') == '&amp;'
--- a/tests/test_lists.py
+++ b/tests/test_lists.py
@@ -0,0 +1,81 @@
+from markdownify import markdownify as md
+
+
+nested_uls = """
+    <ul>
+        <li>1
+            <ul>
+                <li>a
+                    <ul>
+                        <li>I</li>
+                        <li>II</li>
+                        <li>III</li>
+                    </ul>
+                </li>
+                <li>b</li>
+                <li>c</li>
+            </ul>
+        </li>
+        <li>2</li>
+        <li>3</li>
+    </ul>"""
+
+nested_ols = """
+    <ol>
+        <li>1
+            <ol>
+                <li>a
+                    <ol>
+                        <li>I</li>
+                        <li>II</li>
+                        <li>III</li>
+                    </ol>
+                </li>
+                <li>b</li>
+                <li>c</li>
+            </ol>
+        </li>
+        <li>2</li>
+        <li>3</li>
+    </ul>"""
+
+
+def test_ol():
+    assert md('<ol><li>a</li><li>b</li></ol>') == '1. a\n2. b\n'
+    assert md('<ol start="3"><li>a</li><li>b</li></ol>') == '3. a\n4. b\n'
+
+
+def test_nested_ols():
+    assert md(nested_ols) == '\n1. 1\n\t1. a\n\t\t1. I\n\t\t2. II\n\t\t3. III\n\t2. b\n\t3. c\n2. 2\n3. 3\n'
+
+
+def test_ul():
+    assert md('<ul><li>a</li><li>b</li></ul>') == '* a\n* b\n'
+    assert md("""<ul>
+     <li>
+             a
+     </li>
+     <li> b </li>
+     <li>   c
+     </li>
+ </ul>""") == '* a\n* b\n* c\n'
+
+
+def test_inline_ul():
+    assert md('<p>foo</p><ul><li>a</li><li>b</li></ul><p>bar</p>') == 'foo\n\n* a\n* b\n\nbar\n\n'
+
+
+def test_nested_uls():
+    """
+    Nested ULs should alternate bullet characters.
+
+    """
+    assert md(nested_uls) == '\n* 1\n\t+ a\n\t\t- I\n\t\t- II\n\t\t- III\n\t+ b\n\t+ c\n* 2\n* 3\n'
+
+
+def test_bullets():
+    assert md(nested_uls, bullets='-') == '\n- 1\n\t- a\n\t\t- I\n\t\t- II\n\t\t- III\n\t- b\n\t- c\n- 2\n- 3\n'
+
+
+def test_li_text():
+    assert md('<ul><li>foo <a href="#">bar</a></li><li>foo bar  </li><li>foo <b>bar</b>   <i>space</i>.</ul>') == '* foo [bar](#)\n* foo bar\n* foo **bar** *space*.\n'
--- a/tests/test_tables.py
+++ b/tests/test_tables.py
@@ -0,0 +1,150 @@
+from markdownify import markdownify as md
+
+
+table = """<table>
+    <tr>
+        <th>Firstname</th>
+        <th>Lastname</th>
+        <th>Age</th>
+    </tr>
+    <tr>
+        <td>Jill</td>
+        <td>Smith</td>
+        <td>50</td>
+    </tr>
+    <tr>
+        <td>Eve</td>
+        <td>Jackson</td>
+        <td>94</td>
+    </tr>
+</table>"""
+
+
+table_with_html_content = """<table>
+    <tr>
+        <th>Firstname</th>
+        <th>Lastname</th>
+        <th>Age</th>
+    </tr>
+    <tr>
+        <td><b>Jill</b></td>
+        <td><i>Smith</i></td>
+        <td><a href="#">50</a></td>
+    </tr>
+    <tr>
+        <td>Eve</td>
+        <td>Jackson</td>
+        <td>94</td>
+    </tr>
+</table>"""
+
+
+table_with_paragraphs = """<table>
+    <tr>
+        <th>Firstname</th>
+        <th><p>Lastname</p></th>
+        <th>Age</th>
+    </tr>
+    <tr>
+        <td><p>Jill</p></td>
+        <td><p>Smith</p></td>
+        <td><p>50</p></td>
+    </tr>
+    <tr>
+        <td>Eve</td>
+        <td>Jackson</td>
+        <td>94</td>
+    </tr>
+</table>"""
+
+
+table_with_header_column = """<table>
+    <tr>
+        <th>Firstname</th>
+        <th>Lastname</th>
+        <th>Age</th>
+    </tr>
+    <tr>
+        <th>Jill</th>
+        <td>Smith</td>
+        <td>50</td>
+    </tr>
+    <tr>
+        <th>Eve</th>
+        <td>Jackson</td>
+        <td>94</td>
+    </tr>
+</table>"""
+
+
+table_head_body = """<table>
+    <thead>
+        <tr>
+            <th>Firstname</th>
+            <th>Lastname</th>
+            <th>Age</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td>Jill</td>
+            <td>Smith</td>
+            <td>50</td>
+        </tr>
+        <tr>
+            <td>Eve</td>
+            <td>Jackson</td>
+            <td>94</td>
+        </tr>
+    </tbody>
+</table>"""
+
+table_missing_text = """<table>
+    <thead>
+        <tr>
+            <th></th>
+            <th>Lastname</th>
+            <th>Age</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td>Jill</td>
+            <td></td>
+            <td>50</td>
+        </tr>
+        <tr>
+            <td>Eve</td>
+            <td>Jackson</td>
+            <td>94</td>
+        </tr>
+    </tbody>
+</table>"""
+
+table_missing_head = """<table>
+    <tr>
+        <td>Firstname</td>
+        <td>Lastname</td>
+        <td>Age</td>
+    </tr>
+    <tr>
+        <td>Jill</td>
+        <td>Smith</td>
+        <td>50</td>
+    </tr>
+    <tr>
+        <td>Eve</td>
+        <td>Jackson</td>
+        <td>94</td>
+    </tr>
+</table>"""
+
+
+def test_table():
+    assert md(table) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
+    assert md(table_with_html_content) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| **Jill** | *Smith* | [50](#) |\n| Eve | Jackson | 94 |\n\n'
+    assert md(table_with_paragraphs) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
+    assert md(table_with_header_column) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
+    assert md(table_head_body) == '\n\n| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
+    assert md(table_missing_text) == '\n\n|  | Lastname | Age |\n| --- | --- | --- |\n| Jill |  | 50 |\n| Eve | Jackson | 94 |\n\n'
+    assert md(table_missing_head) == '\n\n|  |  |  |\n| --- | --- | --- |\n| Firstname | Lastname | Age |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |\n\n'
Author	SHA1	Message	Date
AlexVonB	eb0330bfc6	Merge branch 'develop'	2022-01-23 11:01:45 +01:00
AlexVonB	ddda696396	bump to v0.10.3	2022-01-23 11:01:26 +01:00
AlexVonB	0a1343a538	allow BeautifulSoup objects to be converted	2022-01-23 11:00:19 +01:00
AlexVonB	9d0b839b73	wording	2022-01-23 10:59:24 +01:00
AlexVonB	28793ac0b3	Merge branch 'develop'	2022-01-18 08:56:33 +01:00
AlexVonB	d3eff11617	bump to v0.10.2	2022-01-18 08:53:33 +01:00
AlexVonB	bd6b581122	add option to not escape underscores closes #59	2022-01-18 08:51:44 +01:00
AlexVonB	9231704988	Merge branch 'develop'	2021-12-11 14:44:58 +01:00
AlexVonB	c8f7cf63e3	bump to v0.10.1	2021-12-11 14:44:34 +01:00
AlexVonB	12a68a7d14	allow flake8 v4.x closes #57	2021-12-11 14:43:14 +01:00
AlexVonB	1613c302bc	Merge branch 'develop'	2021-11-17 17:11:01 +01:00
AlexVonB	478b1c7e13	bump to v0.10.0	2021-11-17 17:10:15 +01:00
AlexVonB	ffcf6cbcb2	fix readme for code_language	2021-11-17 17:09:47 +01:00
AlexVonB	0ab0452414	add readme for code_language	2021-11-17 17:08:14 +01:00
AlexVonB	b62b067cbd	Merge branch 'Inzaniak-develop' into develop	2021-11-17 17:05:07 +01:00
AlexVonB	cb2646cd93	differentiated between text and code language	2021-11-17 17:03:31 +01:00
AlexVonB	9692b5e714	satisfy linter	2021-11-17 16:55:00 +01:00
Umberto Grando	ac68c53a7d	added language for multiline code	2021-11-01 21:19:35 +01:00
AlexVonB	55c9e84f38	Merge branch 'develop'	2021-09-04 21:50:34 +02:00
AlexVonB	40dd30419c	bump to v0.9.4	2021-09-04 21:50:05 +02:00
AlexVonB	da56f7f56a	Merge pull request #53 from Hozhyi/fix/bullet_list_tags_in_separate_lines Fixed issue #52 - added stripping of text to list	2021-09-04 21:48:16 +02:00
AlexVonB	8400b39dd9	remove trailing whitespace to satisfy the linter	2021-09-04 21:47:27 +02:00
Viktor Hozhyi	5fc1441fe7	Added appropriate test	2021-09-04 20:51:08 +03:00
Viktor Hozhyi	044615eff1	Fixed issue #52 - added stripping of text to list	2021-09-04 12:39:30 +03:00
AlexVonB	99875683ac	Merge branch 'develop'	2021-08-25 08:53:38 +02:00
AlexVonB	dbd9f3f3d2	bump to v0.9.3	2021-08-25 08:53:17 +02:00
AlexVonB	0fdeb1ff6e	convert tags inside table cells as inline in part resolves #49	2021-08-25 08:48:30 +02:00
AlexVonB	eaeb0603eb	Merge branch 'develop'	2021-07-11 13:21:20 +02:00
AlexVonB	6a2f3a4b42	fix rst syntax error	2021-07-11 13:21:02 +02:00
AlexVonB	cb73590623	Merge branch 'develop'	2021-07-11 13:14:29 +02:00
AlexVonB	22180a166d	bump to v0.9.1	2021-07-11 13:13:31 +02:00
AlexVonB	16d8a0e1f7	Revert "add figure/figcaption" This reverts commit `828e116530`.	2021-07-11 13:12:16 +02:00
AlexVonB	4aa6cf2a24	rewrote text processing to not escape _ in code fixes #47	2021-07-11 13:10:59 +02:00
AlexVonB	828e116530	add figure/figcaption for #46	2021-06-30 13:02:42 +02:00
AlexVonB	62e9f0de02	add examples for custom converters closes #46	2021-06-27 15:53:23 +02:00
AlexVonB	59417ab115	Merge branch 'develop'	2021-05-30 19:10:49 +02:00
AlexVonB	cec570fc49	bump to v0.9.0	2021-05-30 19:10:31 +02:00
AlexVonB	a6a31624ad	add options for sub and sup tags fixes #44	2021-05-30 19:07:43 +02:00
AlexVonB	6f3732307d	restructured test files	2021-05-30 19:06:52 +02:00
AlexVonB	8f6d7e500d	add option 'default_title' to links fixes #39	2021-05-30 18:40:40 +02:00
AlexVonB	917b01e548	Merge branch 'develop'	2021-05-30 11:20:32 +02:00
AlexVonB	e96351b666	bump to v0.8.1	2021-05-30 11:20:16 +02:00
AlexVonB	129c4ef060	ignore doctype tag, test cdata tag fixes #45	2021-05-30 11:18:18 +02:00
AlexVonB	652714859d	Merge branch 'develop'	2021-05-21 14:18:14 +02:00
AlexVonB	9cb940cbc0	bump to v0.8.0	2021-05-21 14:17:51 +02:00
AlexVonB	70ef9b6e48	added pre tag closes #15	2021-05-21 14:15:41 +02:00
AlexVonB	91d53ddd5a	refactor simple inline conversions	2021-05-21 13:53:00 +02:00
AlexVonB	079f32f6cd	added del and s tags	2021-05-21 12:27:49 +02:00
AlexVonB	89b577e91e	ordering functions alphabetically	2021-05-21 12:21:21 +02:00
AlexVonB	4bf2ea44fc	Merge branch 'AndrewCRichards-andrewcrichards/add_code_samp_kbd_tags' into develop	2021-05-21 12:13:48 +02:00
AlexVonB	77797ebb79	Merge branch 'andrewcrichards/add_code_samp_kbd_tags' of https://github.com/AndrewCRichards/python-markdownify into AndrewCRichards-andrewcrichards/add_code_samp_kbd_tags	2021-05-21 12:11:59 +02:00
AlexVonB	ea5b22824b	Merge branch 'develop'	2021-05-18 10:42:27 +02:00
AlexVonB	9f3c4c9fa0	bump to v0.7.4	2021-05-18 10:42:16 +02:00
AlexVonB	967db26b3a	Merge branch 'fix-headless-tables' into develop	2021-05-18 10:41:42 +02:00
AlexVonB	ea81407b87	implemented table parsing correctly instead of manually walking down the dom tree in a table, we now rely on the main descent loop and just implement conversion for rows and cells correctly. this enables the use of html inside a table cell.	2021-05-17 14:00:00 +02:00
AlexVonB	e6da15c173	allow tables with headers in first (or any) column	2021-05-17 12:36:48 +02:00
AlexVonB	7dac92e85e	Allow for tables without header row fixes #42	2021-05-16 19:02:04 +02:00
AlexVonB	ec5858e42f	Merge branch 'develop'	2021-05-16 18:41:24 +02:00
AlexVonB	fc29483899	bump to v0.7.3	2021-05-16 18:41:08 +02:00
AlexVonB	bd7a8d6990	Merge pull request #43 from jiulongw/develop Fix missing whitespaces in <li> node	2021-05-16 18:39:58 +02:00
Jiulong Wang	ddfbf6a364	Keep important spaces in <li> element	2021-05-10 16:07:54 -07:00
Jiulong Wang	91a64e3cd4	Fix missing whitespaces in <li> node	2021-05-10 14:42:05 -07:00
AlexVonB	02bb914ef3	Merge branch 'develop'	2021-05-02 13:49:30 +02:00
AlexVonB	0fee4b0a80	bump to v0.7.2	2021-05-02 13:49:14 +02:00
AlexVonB	10e1ff3e6e	Merge pull request #23 from SimonIT/ordere-list-update Ordered list update	2021-05-02 13:47:43 +02:00
AlexVonB	73800ced36	fixed whitespace issues at nested lists	2021-05-02 13:44:09 +02:00
AlexVonB	1538cacb94	Merge branch 'develop' into ordere-list-update	2021-05-02 10:58:13 +02:00
AlexVonB	21c0d034d0	Merge branch 'develop'	2021-05-02 10:51:00 +02:00
AlexVonB	f59f9f9a54	bump to v0.7.1	2021-05-02 10:50:49 +02:00
AlexVonB	bd22a16c9e	Merge pull request #40 from jiulongw/jiulongw/hr Add conversion for hr element	2021-05-02 10:47:32 +02:00
AlexVonB	55fb96e3c0	fix hr tests	2021-05-02 10:45:52 +02:00
Jiulong Wang	5f102d5223	Add conversion for hr element	2021-04-29 13:41:28 -07:00
AlexVonB	e3ddc789a2	Merge branch 'develop'	2021-04-22 12:43:27 +02:00
AlexVonB	651d5f00e8	bump to v0.7.0	2021-04-22 12:43:17 +02:00
AlexVonB	3cf324d03d	Merge pull request #36 from BrunoMiguens/add-basic-support-for-tables Add basic support for tables	2021-04-22 12:41:54 +02:00
AlexVonB	96f7e7d307	Merge branch 'develop' into add-basic-support-for-tables	2021-04-22 12:40:16 +02:00
AlexVonB	e1dbbfad42	guard table lines with pipes, resolves the empty header problem	2021-04-22 12:36:11 +02:00
AlexVonB	2d0cd97323	Merge branch 'develop'	2021-04-22 12:13:03 +02:00
AlexVonB	d4882b86b9	bump to v0.6.6	2021-04-22 12:12:51 +02:00
AlexVonB	b47d5f11c8	Merge pull request #37 from andredelft/develop Add `strong_em_symbol` and `newline` options to the converter	2021-04-18 21:35:16 +02:00
André van Delft	29c794e17d	Introduce OPTIONs for `strong_em_symbol`	2021-04-18 18:13:29 +02:00
André van Delft	e877602a5e	Separate the strong_em_symbol and newline style tests	2021-04-05 11:28:42 +02:00
André van Delft	5580b0b51d	Update README.rst	2021-04-05 11:13:52 +02:00
André van Delft	650f377b64	Fix linting	2021-04-05 11:13:19 +02:00
André van Delft	7ee87b1d32	Use .lower() on _style option fetching	2021-04-05 10:50:23 +02:00
André van Delft	16dbc471b9	Test newline_style	2021-04-05 10:47:55 +02:00
André van Delft	c04ec855dd	Change option to newline_style and use variables like heading_style does	2021-04-05 10:44:20 +02:00
André van Delft	8da0bdf998	Test strong_em_symbol	2021-04-05 10:28:46 +02:00
AlexVonB	ec185e2e9c	Merge branch 'develop'	2021-02-21 23:09:55 +01:00
AlexVonB	a59e4b9f48	bump to v0.6.5	2021-02-21 23:09:44 +01:00
AlexVonB	fd293a9714	use python 3.8 instead of 3.6	2021-02-21 23:08:49 +01:00
AlexVonB	99365de669	upgrading code for python 3.x closes #38	2021-02-21 23:06:21 +01:00
AlexVonB	079d1721aa	Merge branch 'develop'	2021-02-21 20:58:34 +01:00
AlexVonB	ed406d3206	bump to v0.6.4	2021-02-21 20:57:57 +01:00
AlexVonB	f320cf87ff	closing #25 and #18 Adds newlines after blockquotes, allowing for paragraphs after a blockquote. Due to merging problems with @lucafrance 's code I had to quickly copy and paste their code. Thanks for the contribution!	2021-02-21 20:53:44 +01:00
André van Delft	a79ed44ec3	Fix code ticks in README	2021-02-15 16:51:20 +01:00
André van Delft	29a4e551f7	Update README with the two new options	2021-02-15 16:37:13 +01:00
André van Delft	b3ac4606a6	Allow for the use of backslash for newlines	2021-02-15 16:29:14 +01:00
André van Delft	f093843f40	Allow for a custom strong or emphasis symbol	2021-02-15 16:19:19 +01:00
Bruno Miguens	de6f91af0e	Revert header validation and leave possibility to empty column	2021-02-08 20:56:18 +00:00
Bruno Miguens	8c28ade348	Remove empty header validation to allow empty header	2021-02-08 20:50:15 +00:00
Bruno Miguens	a152c5b706	Fix lint	2021-02-08 19:32:35 +00:00
Bruno Miguens	292d64bbf4	Remove unnecessary tests	2021-02-08 19:26:27 +00:00
Bruno Miguens	db96eeb785	Add tests for basic and thead/tbody tables	2021-02-08 17:00:09 +00:00
Bruno Miguens	73f7644c0d	Add basic support for HTML tables	2021-02-08 17:00:09 +00:00
AlexVonB	a4d134df97	Merge pull request #34 from BrunoMiguens/add-ignore-comment-tags Add ignore comment tags	2021-02-07 19:46:49 +01:00
Bruno Miguens	457454c713	Add new line at the end of file	2021-02-05 19:49:57 +00:00
Bruno Miguens	321e9eb5f6	Add ignore comment tags	2021-02-05 19:40:43 +00:00
AlexVonB	bf24df3e2e	bump to v0.6.3	2021-01-12 22:43:18 +01:00
AlexVonB	15329588b1	Merge branch 'develop'	2021-01-12 22:42:58 +01:00
AlexVonB	77d1e99bd5	satisfy linter	2021-01-12 22:42:06 +01:00
AlexVonB	34ad8485fa	bump to v0.6.2	2021-01-12 22:40:03 +01:00
AlexVonB	f0ce934bf8	Merge branch 'develop'	2021-01-12 22:39:47 +01:00
AlexVonB	97c78ef55b	Merge branch 'fix-extra-headline-whitespace' into develop	2021-01-12 22:38:59 +01:00
AlexVonB	99cd237f27	Merge branch 'develop'	2021-01-04 10:22:02 +01:00
AlexVonB	b7e1ab889d	bump to v0.6.1	2021-01-04 10:21:27 +01:00
AlexVonB	29e86aec55	Merge branch 'fix-link-underscores' into develop	2021-01-04 10:18:05 +01:00
AlexVonB	453b604096	Fixing autolinks When checking a links href and text for equality, first un-escape the underscores in the text -- because six escapes them. This should fix #29.	2021-01-02 17:22:36 +01:00
AlexVonB	2bde8d3e8e	Merge branch 'develop'	2021-01-02 16:49:28 +01:00
AlexVonB	4f8937810b	dont replace newlines and tabs with spaces this should fix #17, as all leading new lines were replaced with a single space, which in turn was rendered before the # of a headline	2020-12-29 10:28:50 +01:00
AlexVonB	3544322ed2	Bump Version 0.6.0	2020-12-13 23:41:56 +01:00
AlexVonB	c4d0a14ce5	Merge pull request #26 from idvorkin/develop Add support for headings that include nested divs	2020-12-13 23:39:34 +01:00
Igor Dvorkin	05ea8dc58a	Add many tests and support image tag	2020-12-13 17:40:53 +00:00
Igor Dvorkin	7780f82c30	Using a regexp to determine if a tag is a heading.	2020-12-11 16:54:14 -08:00
Andrew Richards	7685738344	Formatting tweak Change indent of continuation line; squashes a flake8 warning.	2020-11-27 14:18:08 +00:00
Andrew Richards	92a73c8dfe	Correct test_code_with_tricky_content() Result of previous test didn't check for the trailing ' ' that convert_br() adds: This is needed to ensure that the resulting markdown not only has \n for the <br> but also renders it as a newline.	2020-11-26 22:20:29 +00:00
Andrew Richards	3354f143d8	Add method for <code> tag Add method and tests for inline tag <code>.	2020-11-23 17:28:23 +00:00
Igor Dvorkin	d558617cd7	Add support for headings that include nested block elements	2020-11-20 06:03:51 -08:00
AlexVonB	8c9b029756	Merge branch 'develop'	2020-09-01 18:10:07 +02:00
AlexVonB	25d68b4265	Bump version 0.5.3	2020-09-01 18:09:24 +02:00
AlexVonB	5561106991	Merge pull request #24 from SimonIT/fix-corrupt-html Fix parsing corrupt html	2020-09-01 18:04:17 +02:00
SimonIT	1b3136ad04	Fix parsing corrupt html	2020-08-31 13:15:10 +02:00
SimonIT	2c7e4a0100	Fix tests	2020-08-26 19:47:11 +02:00
SimonIT	4f00d638d2	Merge remote-tracking branch 'upstream/develop' into ordered-list # Conflicts: # markdownify/__init__.py # tests/test_conversions.py	2020-08-26 19:41:43 +02:00
AlexVonB	987a2a9cae	Merge pull request #20 from SimonIT/badges Add some fancy badges	2020-08-19 10:32:30 +02:00
SimonIT	a4461161bc	Make badges inline	2020-08-19 10:06:21 +02:00
AlexVonB	ae50065872	Merge branch 'develop'	2020-08-18 18:53:10 +02:00
AlexVonB	19e2c3db0d	Bump version 0.5.2	2020-08-18 18:52:53 +02:00
AlexVonB	ba51bbee12	Merge pull request #22 from SimonIT/ol-start-attribute Support the start attribute for ordered lists	2020-08-18 18:44:59 +02:00
AlexVonB	9f3d497053	use python3.6 for linting	2020-08-18 18:41:46 +02:00
AlexVonB	d2fc689b66	set max flake8 version again3	2020-08-18 18:39:20 +02:00
AlexVonB	ab78385b56	set max flake8 version again2	2020-08-18 18:38:17 +02:00
AlexVonB	9ebf726e78	set max flake8 version again	2020-08-18 18:37:39 +02:00
AlexVonB	3f8403aa7a	set max flake8 version	2020-08-18 18:35:31 +02:00
AlexVonB	5b6e76f984	Create python-app.yml	2020-08-18 18:30:55 +02:00
SimonIT	04711027e6	Replace downloads badge	2020-08-13 20:11:18 +02:00
SimonIT	ca98892953	Support the start attribute for ordered lists	2020-08-11 11:43:02 +02:00
AlexVonB	0dc281e6ea	Bump version 0.5.1	2020-08-11 09:51:04 +02:00
AlexVonB	4e6e20e756	Merge pull request #21 from matthewwithanm/python-publish Create python-publish.yml	2020-08-11 09:49:29 +02:00
Matthew Dapena-Tretter	9358522c73	Create python-publish.yml Add workflow for publishing to PyPI.	2020-08-10 19:42:48 -07:00
SimonIT	28d7a22da3	Remove alt because it makes some trouble	2020-08-10 17:42:18 +02:00
SimonIT	8b882ca3c9	Add some fancy badges	2020-08-10 16:24:00 +02:00
AlexVonB	1078610066	ignore build folder	2020-08-10 13:03:12 +02:00
AlexVonB	d23dbc77e4	Merge branch 'master' into develop	2020-08-10 13:01:34 +02:00
AlexVonB	0c4b856b9c	Bump to 0.5.0	2020-08-09 21:22:15 +02:00
AlexVonB	e9cc01938a	Merge branch 'develop'	2020-08-09 21:20:44 +02:00
AlexVonB	aceced68eb	cleaning up changes with help of linter	2020-08-09 21:17:39 +02:00
AlexVonB	3b049cdb9c	added egg dirs to gitignore	2020-08-09 21:13:33 +02:00
AlexVonB	b747378b52	fixed nested lists and wrote correct tests nested lists did not work: after a nested list was over, a new line was inserted. this leads to a large gap before the rest of the parent list. lists are prefixed and suffixed with a single newline, this is now represented in the tests.	2020-08-09 21:11:16 +02:00
AlexVonB	ee73d89879	Merge pull request #14 from AlexVonB/fix-inline-spaces remove prefixed and suffixed spaces from inline tags	2020-08-09 20:24:23 +02:00
Rémi	d23596706d	Remove debug prints	2019-11-22 11:49:22 +01:00
Rémi	6a0e5d8176	Correct inline UL test as paragraphs are followed by two newlines	2019-11-21 09:46:22 +01:00
Rémi	7b788bafd4	Add nested OL test (for newlines) and correct lists nesting	2019-11-21 09:35:34 +01:00
Rémi	146104b41f	Remove newline-only textnodes outside <pre>	2019-11-20 10:37:39 +01:00
AlexVonB	5563161c86	remove needless checks for emtpy text	2019-07-12 10:23:17 +02:00
AlexVonB	28e447d9ae	remove prefixed and suffixed spaces from inline tags fixes matthewwithanm#13	2019-07-11 23:27:52 +02:00
Matthew Dapena-Tretter	89d14f4487	Merge pull request #11 from AlexVonB/AlexVonB-patch-1 Add newline before and after a markdown list	2019-07-04 08:53:25 -07:00
AlexVonB	5f9243d91d	added tests for matthewwithanm#11	2019-07-04 16:32:21 +02:00
AlexVonB	d0f688d2e4	Add newline before and after a markdown list Fixes matthewwithanm#5 as well as an issue where `<p>foo<p><ul><li>bar</li></ul>` gets converted to `foo * bar` which is not correct	2019-07-04 16:26:09 +02:00
Jonathan Vanasco	5ac08522be	updating classifer to mit license issue #9	2019-06-19 16:17:47 -07:00
Thomas Lange	78afcc173e	Adding MIT license file	2018-10-16 19:11:02 -07:00
Steven Skoczen	b132a6f5b3	Updates to 0.4.1, pkgmeta included directly in setup.	2017-11-28 12:07:31 +13:00
Steven Skoczen	0abe0a29e8	Merge pull request #2 from crhallberg/html-parser Suppress BeautifulSoup warning	2017-11-13 08:48:45 +13:00
Steven Skoczen	4932df631f	Merge pull request #1 from dmpayton/develop Fixes to get tests passing in Python 3.	2017-11-13 08:48:38 +13:00
Chris Hallberg	8696e2bde1	Suppress BeautifulSoup warning by explicitly passing in the default parser as recommended by the error message: ``` /home/challberg/.local/lib/python2.7/site-packages/bs4/__init__.py:181: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. The code that caused this warning is on line 35 of the file unroll.py. To get rid of this warning, change code that looks like this: BeautifulSoup(YOUR_MARKUP}) to this: BeautifulSoup(YOUR_MARKUP, "html.parser") markup_type=markup_type)) ```	2017-06-12 16:03:04 -04:00
dmpayton	ee53d85c41	Fixes to get tests passing in Python 3.	2016-02-23 15:15:29 -08:00
Matthew Tretter	53ba0daa77	Document options	2013-07-31 23:23:44 -04:00
Matthew Tretter	fb98e9878f	Bump to 0.4.0	2013-07-31 23:12:53 -04:00
Matthew Tretter	aa10053fbb	Test custom bullets	2013-07-31 23:11:39 -04:00
Matthew Tretter	253a34c2d7	Test nested unordered lists	2013-07-31 23:08:39 -04:00
Matthew Tretter	3ea09609e6	Add support for "bullets" option	2013-07-31 23:08:36 -04:00
Matthew Tretter	1cd8e56c47	Test ATX and ATX_CLOSED style headings	2013-07-31 22:19:41 -04:00
Matthew Tretter	891a4a8d08	Add "heading_style" option Allow the user to specify a heading style.	2013-07-31 22:17:22 -04:00
Matthew Tretter	e5a1784f30	Remove unneeded raw string	2013-07-31 21:59:35 -04:00
Matthew Tretter	f60d910335	Add "autolinks" option This option allows you to disable the creation of "autolink" style links.	2013-07-31 21:58:48 -04:00
Matthew Tretter	d707d107f6	Support inner Options class	2013-07-31 21:55:30 -04:00
Matthew Tretter	1ef4dd1468	Add shortcut link syntax	2013-07-31 19:23:39 -04:00
Matthew Tretter	934c97b342	Test img tag conversion	2013-07-31 19:23:38 -04:00
Matthew Tretter	8a1e2d9403	Add simple img conversion	2013-07-31 19:23:36 -04:00
Matthew Tretter	5563723cbc	Bump to 0.3.0	2013-07-31 18:16:02 -04:00
Matthew Tretter	a9c13a56da	Identify and single out HTML fragment	2013-07-31 18:13:50 -04:00
Matthew Tretter	7bdeb15b18	Use bs4 This causes a lot more tests to fail. But it'll be worth it in the end.	2013-07-31 18:01:52 -04:00
Matthew Tretter	87c8f3bd5e	Add development notes to README	2013-07-31 17:20:36 -04:00
Matthew Tretter	0211ac6619	Lint code	2013-07-31 17:20:36 -04:00
Matthew Tretter	2515e9e107	Add lint command	2013-07-31 17:20:32 -04:00
Matthew Tretter	ece61a5b1f	Bump to 0.2.0	2013-07-31 17:11:12 -04:00
Matthew Tretter	f46fb8ebbb	Add short description to README	2013-07-31 17:05:37 -04:00
Matthew Tretter	e521fd402f	Add manifest template	2013-07-31 16:55:53 -04:00
Matthew Tretter	fd6f8db132	Add gitignore	2013-07-31 16:55:30 -04:00
Matthew Tretter	c2f32b8049	Switch to pytest	2013-07-31 16:54:37 -04:00
Matthew Tretter	b92428466d	Change name to markdownify	2013-07-31 16:41:08 -04:00
Matthew Tretter	7f75b0bbce	Update package meta	2013-07-31 16:40:56 -04:00