Merge branch 'develop'

bump to v0.7.1
Merge pull request #40 from jiulongw/jiulongw/hr
2021-05-02 10:51:00 +02:00 · 2021-05-02 10:50:49 +02:00 · 2021-05-02 10:47:32 +02:00 · 2021-05-02 10:45:52 +02:00 · 2021-04-29 13:41:28 -07:00 · 2021-04-22 12:43:27 +02:00
18 changed files with 868 additions and 198 deletions
--- a/.github/workflows/python-app.yml
+++ b/.github/workflows/python-app.yml
@@ -0,0 +1,33 @@
+# This workflow will install Python dependencies, run tests and lint with a single version of Python
+# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions
+
+name: Python application
+
+on:
+  push:
+    branches: [ develop ]
+  pull_request:
+    branches: [ develop ]
+
+jobs:
+  build:
+
+    runs-on: ubuntu-latest
+
+    steps:
+    - uses: actions/checkout@v2
+    - name: Set up Python 3.8
+      uses: actions/setup-python@v2
+      with:
+        python-version: 3.8
+    - name: Install dependencies
+      run: |
+        python -m pip install --upgrade pip
+        pip install flake8==3.8.4 pytest
+        if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
+    - name: Lint with flake8
+      run: |
+        python setup.py lint
+    - name: Test with pytest
+      run: |
+        python setup.py test
--- a/.github/workflows/python-publish.yml
+++ b/.github/workflows/python-publish.yml
@@ -0,0 +1,31 @@
+# This workflow will upload a Python Package using Twine when a release is created
+# For more information see: https://help.github.com/en/actions/language-and-framework-guides/using-python-with-github-actions#publishing-to-package-registries
+
+name: Upload Python Package
+
+on:
+  release:
+    types: [created]
+
+jobs:
+  deploy:
+
+    runs-on: ubuntu-latest
+
+    steps:
+    - uses: actions/checkout@v2
+    - name: Set up Python
+      uses: actions/setup-python@v2
+      with:
+        python-version: '3.8'
+    - name: Install dependencies
+      run: |
+        python -m pip install --upgrade pip
+        pip install setuptools wheel twine
+    - name: Build and publish
+      env:
+        TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
+        TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
+      run: |
+        python setup.py sdist bdist_wheel
+        twine upload dist/*
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,10 @@
+*.pyc
+*.egg
+.eggs/
+*.egg-info/
+.DS_Store
+/.env
+/dist
+/MANIFEST
+/venv
+build/
--- a/21
+++ b/21
@@ -0,0 +1,21 @@
+The MIT License (MIT)
+
+Copyright 2012-2018 Matthew Tretter
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/MANIFEST.in
+++ b/MANIFEST.in
@@ -0,0 +1 @@
+include README.rst
--- a/README.rst
+++ b/README.rst
@@ -0,0 +1,103 @@
+|build| |version| |license| |downloads|
+
+.. |build| image:: https://img.shields.io/github/workflow/status/matthewwithanm/python-markdownify/Python%20application/develop
+    :alt: GitHub Workflow Status
+    :target: https://github.com/matthewwithanm/python-markdownify/actions?query=workflow%3A%22Python+application%22
+
+.. |version| image:: https://img.shields.io/pypi/v/markdownify
+    :alt: Pypi version
+    :target: https://pypi.org/project/markdownify/
+
+.. |license| image:: https://img.shields.io/pypi/l/markdownify
+    :alt: License
+    :target: https://github.com/matthewwithanm/python-markdownify/blob/develop/LICENSE
+
+.. |downloads| image:: https://pepy.tech/badge/markdownify
+    :alt: Pypi Downloads
+    :target: https://pepy.tech/project/markdownify
+
+Installation
+============
+
+``pip install markdownify``
+
+
+Usage
+=====
+
+Convert some HTML to Markdown:
+
+.. code:: python
+
+    from markdownify import markdownify as md
+    md('<b>Yay</b> <a href="http://github.com">GitHub</a>')  # > '**Yay** [GitHub](http://github.com)'
+
+Specify tags to exclude (blacklist):
+
+.. code:: python
+
+    from markdownify import markdownify as md
+    md('<b>Yay</b> <a href="http://github.com">GitHub</a>', strip=['a'])  # > '**Yay** GitHub'
+
+\...or specify the tags you want to include (whitelist):
+
+.. code:: python
+
+    from markdownify import markdownify as md
+    md('<b>Yay</b> <a href="http://github.com">GitHub</a>', convert=['b'])  # > '**Yay** GitHub'
+
+
+Options
+=======
+
+Markdownify supports the following options:
+
+strip
+  A list of tags to strip (blacklist). This option can't be used with the
+  ``convert`` option.
+
+convert
+  A list of tags to convert (whitelist). This option can't be used with the
+  ``strip`` option.
+
+autolinks
+  A boolean indicating whether the "automatic link" style should be used when
+  a ``a`` tag's contents match its href. Defaults to ``True``
+
+heading_style
+  Defines how headings should be converted. Accepted values are ``ATX``,
+  ``ATX_CLOSED``, ``SETEXT``, and ``UNDERLINED`` (which is an alias for
+  ``SETEXT``). Defaults to ``UNDERLINED``.
+
+bullets
+  An iterable (string, list, or tuple) of bullet styles to be used. If the
+  iterable only contains one item, it will be used regardless of how deeply
+  lists are nested. Otherwise, the bullet will alternate based on nesting
+  level. Defaults to ``'*+-'``.
+
+strong_em_symbol
+  In markdown, both ``*`` and ``_`` are used to encode **strong** or
+  *emphasized* texts. Either of these symbols can be chosen by the options
+  ``ASTERISK`` (default) or ``UNDERSCORE`` respectively.
+
+newline_style
+  Defines the style of marking linebreaks (``<br>``) in markdown. The default
+  value ``SPACES`` of this option will adopt the usual two spaces and a newline,
+  while ``BACKSLASH`` will convert a linebreak to ``\\n`` (a backslash an a
+  newline). While the latter convention is non-standard, it is commonly
+  preferred and supported by a lot of interpreters.
+
+Options may be specified as kwargs to the ``markdownify`` function, or as a
+nested ``Options`` class in ``MarkdownConverter`` subclasses.
+
+
+Development
+===========
+
+To run tests:
+
+``python setup.py test``
+
+To lint:
+
+``python setup.py lint``
--- a/markdownify/init.py
+++ b/markdownify/init.py
@@ -1,10 +1,27 @@
-from lxml.html.soupparser import fromstring
+from bs4 import BeautifulSoup, NavigableString, Comment
 import re
+import six


 convert_heading_re = re.compile(r'convert_h(\d+)')
 line_beginning_re = re.compile(r'^', re.MULTILINE)
-whitespace_re = re.compile(r'[\r\n\s\t ]+')
+whitespace_re = re.compile(r'[\t ]+')
+html_heading_re = re.compile(r'h[1-6]')
+
+
+# Heading styles
+ATX = 'atx'
+ATX_CLOSED = 'atx_closed'
+UNDERLINED = 'underlined'
+SETEXT = UNDERLINED
+
+# Newline style
+SPACES = 'spaces'
+BACKSLASH = 'backslash'
+
+# Strong and emphasis style
+ASTERISK = '*'
+UNDERSCORE = '_'


 def escape(text):
@@ -13,30 +30,72 @@ def escape(text):
    return text.replace('_', r'\_')


+def chomp(text):
+    """
+    If the text in an inline tag like b, a, or em contains a leading or trailing
+    space, strip the string and return a space as suffix of prefix, if needed.
+    This function is used to prevent conversions like
+        <b> foo</b> => ** foo**
+    """
+    prefix = ' ' if text and text[0] == ' ' else ''
+    suffix = ' ' if text and text[-1] == ' ' else ''
+    text = text.strip()
+    return (prefix, suffix, text)
+
+
+def _todict(obj):
+    return dict((k, getattr(obj, k)) for k in dir(obj) if not k.startswith('_'))
+
+
 class MarkdownConverter(object):
-    def __init__(self, tags_to_strip=None, tags_to_convert=None):
-        if tags_to_strip is not None and tags_to_convert is not None:
+    class DefaultOptions:
+        strip = None
+        convert = None
+        autolinks = True
+        heading_style = UNDERLINED
+        bullets = '*+-'  # An iterable of bullet types.
+        strong_em_symbol = ASTERISK
+        newline_style = SPACES
+
+    class Options(DefaultOptions):
+        pass
+
+    def __init__(self, **options):
+        # Create an options dictionary. Use DefaultOptions as a base so that
+        # it doesn't have to be extended.
+        self.options = _todict(self.DefaultOptions)
+        self.options.update(_todict(self.Options))
+        self.options.update(options)
+        if self.options['strip'] is not None and self.options['convert'] is not None:
            raise ValueError('You may specify either tags to strip or tags to'
-                    ' convert, but not both.')
-        self.tags_to_strip = tags_to_strip
-        self.tags_to_convert = tags_to_convert
+                             ' convert, but not both.')

    def convert(self, html):
-        soup = fromstring(html)
-        return self.process_tag(soup)
+        soup = BeautifulSoup(html, 'html.parser')
+        return self.process_tag(soup, convert_as_inline=False, children_only=True)

-    def process_tag(self, node):
-        text = self.process_text(node.text)
+    def process_tag(self, node, convert_as_inline, children_only=False):
+        text = ''
+        # markdown headings can't include block elements (elements w/newlines)
+        isHeading = html_heading_re.match(node.name) is not None
+        convert_children_as_inline = convert_as_inline
+
+        if not children_only and isHeading:
+            convert_children_as_inline = True

        # Convert the children first
-        for el in node.findall('*'):
-            text += self.process_tag(el)
+        for el in node.children:
+            if isinstance(el, Comment):
+                continue
+            elif isinstance(el, NavigableString):
+                text += self.process_text(six.text_type(el))
+            else:
+                text += self.process_tag(el, convert_children_as_inline)

-        convert_fn = getattr(self, 'convert_%s' % node.tag, None)
-        if convert_fn and self.should_convert_tag(node.tag):
-            text = convert_fn(node, text)
-
-        text += self.process_text(node.tail)
+        if not children_only:
+            convert_fn = getattr(self, 'convert_%s' % node.name, None)
+            if convert_fn and self.should_convert_tag(node.name):
+                text = convert_fn(node, text, convert_as_inline)

        return text

@@ -44,13 +103,13 @@ class MarkdownConverter(object):
        return escape(whitespace_re.sub(' ', text or ''))

    def __getattr__(self, attr):
-        # Handle heading levels > 2
+        # Handle headings
        m = convert_heading_re.match(attr)
        if m:
            n = int(m.group(1))

-            def convert_tag(el, text):
-                return self.convert_hn(n, el, text)
+            def convert_tag(el, text, convert_as_inline):
+                return self.convert_hn(n, el, text, convert_as_inline)

            convert_tag.__name__ = 'convert_h%s' % n
            setattr(self, convert_tag.__name__, convert_tag)
@@ -60,62 +119,159 @@ class MarkdownConverter(object):

    def should_convert_tag(self, tag):
        tag = tag.lower()
-        if self.tags_to_strip is not None:
-            return tag not in self.tags_to_strip
-        elif self.tags_to_convert is not None:
-            return tag in self.tags_to_convert
+        strip = self.options['strip']
+        convert = self.options['convert']
+        if strip is not None:
+            return tag not in strip
+        elif convert is not None:
+            return tag in convert
        else:
            return True

+    def indent(self, text, level):
+        return line_beginning_re.sub('\t' * level, text) if text else ''
+
    def underline(self, text, pad_char):
        text = (text or '').rstrip()
        return '%s\n%s\n\n' % (text, pad_char * len(text)) if text else ''

-    def convert_a(self, el, text):
+    def convert_a(self, el, text, convert_as_inline):
+        prefix, suffix, text = chomp(text)
+        if not text:
+            return ''
+        if convert_as_inline:
+            return text
        href = el.get('href')
        title = el.get('title')
+        # For the replacement see #29: text nodes underscores are escaped
+        if self.options['autolinks'] and text.replace(r'\_', '_') == href and not title:
+            # Shortcut syntax
+            return '<%s>' % href
        title_part = ' "%s"' % title.replace('"', r'\"') if title else ''
-        return '[%s](%s%s)' % (text or '', href, title_part) if href else text or ''
+        return '%s[%s](%s%s)%s' % (prefix, text, href, title_part, suffix) if href else text

-    def convert_b(self, el, text):
-        return self.convert_strong(el, text)
+    def convert_b(self, el, text, convert_as_inline):
+        return self.convert_strong(el, text, convert_as_inline)

-    def convert_blockquote(self, el, text):
-        return '\n' + line_beginning_re.sub('> ', text) if text else ''
+    def convert_blockquote(self, el, text, convert_as_inline):

-    def convert_br(self, el, text):
-        return '  \n'
+        if convert_as_inline:
+            return text

-    def convert_em(self, el, text):
-        return '*%s*' % text if text else ''
+        return '\n' + (line_beginning_re.sub('> ', text) + '\n\n') if text else ''

-    def convert_h1(self, el, text):
-        return self.underline(text, '=')
+    def convert_br(self, el, text, convert_as_inline):
+        if convert_as_inline:
+            return ""

-    def convert_h2(self, el, text):
-        return self.underline(text, '-')
-
-    def convert_hn(self, n, el, text):
-        return '%s %s\n\n' % ('#' * n, text.rstrip()) if text else ''
-
-    def convert_i(self, el, text):
-        return self.convert_em(el, text)
-
-    def convert_li(self, el, text):
-        parent = el.getparent()
-        if parent is not None and parent.tag == 'ol':
-            bullet = '%s.' % (parent.index(el) + 1)
+        if self.options['newline_style'].lower() == BACKSLASH:
+            return '\\\n'
        else:
-            bullet = '*'
+            return '  \n'
+
+    def convert_em(self, el, text, convert_as_inline):
+        em_tag = self.options['strong_em_symbol']
+        prefix, suffix, text = chomp(text)
+        if not text:
+            return ''
+        return '%s%s%s%s%s' % (prefix, em_tag, text, em_tag, suffix)
+
+    def convert_hn(self, n, el, text, convert_as_inline):
+        if convert_as_inline:
+            return text
+
+        style = self.options['heading_style'].lower()
+        text = text.rstrip()
+        if style == UNDERLINED and n <= 2:
+            line = '=' if n == 1 else '-'
+            return self.underline(text, line)
+        hashes = '#' * n
+        if style == ATX_CLOSED:
+            return '%s %s %s\n\n' % (hashes, text, hashes)
+        return '%s %s\n\n' % (hashes, text)
+
+    def convert_i(self, el, text, convert_as_inline):
+        return self.convert_em(el, text, convert_as_inline)
+
+    def convert_list(self, el, text, convert_as_inline):
+
+        # Converting a list to inline is undefined.
+        # Ignoring convert_to_inline for list.
+
+        nested = False
+        while el:
+            if el.name == 'li':
+                nested = True
+                break
+            el = el.parent
+        if nested:
+            # remove trailing newline if nested
+            return '\n' + self.indent(text, 1).rstrip()
+        return '\n' + text + '\n'
+
+    convert_ul = convert_list
+    convert_ol = convert_list
+
+    def convert_li(self, el, text, convert_as_inline):
+        parent = el.parent
+        if parent is not None and parent.name == 'ol':
+            if parent.get("start"):
+                start = int(parent.get("start"))
+            else:
+                start = 1
+            bullet = '%s.' % (start + parent.index(el))
+        else:
+            depth = -1
+            while el:
+                if el.name == 'ul':
+                    depth += 1
+                el = el.parent
+            bullets = self.options['bullets']
+            bullet = bullets[depth % len(bullets)]
        return '%s %s\n' % (bullet, text or '')

-    def convert_p(self, el, text):
+    def convert_p(self, el, text, convert_as_inline):
+        if convert_as_inline:
+            return text
        return '%s\n\n' % text if text else ''

-    def convert_strong(self, el, text):
-        return '**%s**' % text if text else ''
+    def convert_strong(self, el, text, convert_as_inline):
+        strong_tag = 2 * self.options['strong_em_symbol']
+        prefix, suffix, text = chomp(text)
+        if not text:
+            return ''
+        return '%s%s%s%s%s' % (prefix, strong_tag, text, strong_tag, suffix)
+
+    def convert_img(self, el, text, convert_as_inline):
+        alt = el.attrs.get('alt', None) or ''
+        src = el.attrs.get('src', None) or ''
+        title = el.attrs.get('title', None) or ''
+        title_part = ' "%s"' % title.replace('"', r'\"') if title else ''
+        if convert_as_inline:
+            return alt
+
+        return '![%s](%s%s)' % (alt, src, title_part)
+
+    def convert_table(self, el, text, convert_as_inline):
+        rows = el.find_all('tr')
+        text_data = []
+        for row in rows:
+            headers = row.find_all('th')
+            columns = row.find_all('td')
+            if len(headers) > 0:
+                headers = [head.text.strip() for head in headers]
+                text_data.append('| ' + ' | '.join(headers) + ' |')
+                text_data.append('| ' + ' | '.join(['---'] * len(headers)) + ' |')
+            elif len(columns) > 0:
+                columns = [colm.text.strip() for colm in columns]
+                text_data.append('| ' + ' | '.join(columns) + ' |')
+            else:
+                continue
+        return '\n'.join(text_data)
+
+    def convert_hr(self, el, text, convert_as_inline):
+        return '\n\n---\n\n'


-def markdownify(html, strip=None, convert=None):
-    converter = MarkdownConverter(strip, convert)
-    return converter.convert(html)
+def markdownify(html, **options):
+    return MarkdownConverter(**options).convert(html)
--- a/markdownify/version.py
+++ b/markdownify/version.py
@@ -1 +0,0 @@
-__version__ = '0.1.0'
--- a/runtests.py
+++ b/runtests.py
@@ -1,5 +0,0 @@
-#!/usr/bin/env python
-from nose.core import run, collector
-
-if __name__ == '__main__':
-    run()
--- a/setup.cfg
+++ b/setup.cfg
@@ -0,0 +1,2 @@
+[flake8]
+ignore = E501
--- a/setup.py
+++ b/setup.py
@@ -2,43 +2,98 @@
 import codecs
 import os
 from setuptools import setup, find_packages
+from setuptools.command.test import test as TestCommand, Command


 read = lambda filepath: codecs.open(filepath, 'r', 'utf-8').read()
-execfile(os.path.join(os.path.dirname(__file__), 'markdownify', 'version.py'))
+
+pkgmeta = {
+    '__title__': 'markdownify',
+    '__author__': 'Matthew Tretter',
+    '__version__': '0.7.1',
+}
+
+
+class PyTest(TestCommand):
+    def finalize_options(self):
+        TestCommand.finalize_options(self)
+        self.test_args = ['tests', '-s']
+        self.test_suite = True
+
+    def run_tests(self):
+        import pytest
+        errno = pytest.main(self.test_args)
+        raise SystemExit(errno)
+
+
+class LintCommand(Command):
+    """
+    A copy of flake8's Flake8Command
+
+    """
+    description = "Run flake8 on modules registered in setuptools"
+    user_options = []
+
+    def initialize_options(self):
+        pass
+
+    def finalize_options(self):
+        pass
+
+    def distribution_files(self):
+        if self.distribution.packages:
+            for package in self.distribution.packages:
+                yield package.replace(".", os.path.sep)
+
+        if self.distribution.py_modules:
+            for filename in self.distribution.py_modules:
+                yield "%s.py" % filename
+
+    def run(self):
+        from flake8.api.legacy import get_style_guide
+        flake8_style = get_style_guide(config_file='setup.cfg')
+        paths = self.distribution_files()
+        report = flake8_style.check_files(paths)
+        raise SystemExit(report.total_errors > 0)


 setup(
-    name='python-markdownify',
+    name='markdownify',
    description='Convert HTML to markdown.',
    long_description=read(os.path.join(os.path.dirname(__file__), 'README.rst')),
-    version=__version__,
-    author='Matthew Tretter',
-    author_email='matthew@exanimo.com',
+    version=pkgmeta['__version__'],
+    author=pkgmeta['__author__'],
+    author_email='m@tthewwithanm.com',
    url='http://github.com/matthewwithanm/python-markdownify',
    download_url='http://github.com/matthewwithanm/python-markdownify/tarball/master',
    packages=find_packages(),
    zip_safe=False,
    include_package_data=True,
+    setup_requires=[
+        'flake8>=3.8,<4',
+    ],
    tests_require=[
-        'nose',
-        'unittest2',
+        'pytest>=6.2,<7',
    ],
    install_requires=[
-        'lxml',
-        'BeautifulSoup',
+        'beautifulsoup4>=4.9,<5', 'six>=1.15,<2'
    ],
    classifiers=[
        'Environment :: Web Environment',
        'Framework :: Django',
        'Intended Audience :: Developers',
-        'License :: OSI Approved :: BSD License',
+        'License :: OSI Approved :: MIT License',
        'Operating System :: OS Independent',
        'Programming Language :: Python :: 2.5',
        'Programming Language :: Python :: 2.6',
        'Programming Language :: Python :: 2.7',
+        'Programming Language :: Python :: 3.6',
+        'Programming Language :: Python :: 3.7',
+        'Programming Language :: Python :: 3.8',
        'Topic :: Utilities'
    ],
-    setup_requires=[],
-    test_suite='runtests.collector',
+    cmdclass={
+        'test': PyTest,
+        'lint': LintCommand,
+    },
 )
--- a/tests.py
+++ b/tests.py
@@ -1,123 +0,0 @@
-import unittest
-from markdownify import markdownify as md
-
-
-class BasicTests(unittest.TestCase):
-
-    def test_single_tag(self):
-        self.assertEqual(md('<span>Hello</span>'), 'Hello')
-
-    def test_soup(self):
-        self.assertEqual(md('<div><span>Hello</div></span>'), 'Hello')
-
-    def test_whitespace(self):
-        self.assertEqual(md(' a  b \n\n c '), ' a b c ')
-
-
-class ArgTests(unittest.TestCase):
-
-    def test_strip(self):
-        self.assertEqual(
-            md('<a href="https://github.com/matthewwithanm">Some Text</a>', strip=['a']),
-            'Some Text')
-
-    def test_do_not_strip(self):
-        self.assertEqual(
-            md('<a href="https://github.com/matthewwithanm">Some Text</a>', strip=[]),
-            '[Some Text](https://github.com/matthewwithanm)')
-
-    def test_convert(self):
-        self.assertEqual(
-            md('<a href="https://github.com/matthewwithanm">Some Text</a>', convert=['a']),
-            '[Some Text](https://github.com/matthewwithanm)')
-
-    def test_do_not_convert(self):
-        self.assertEqual(
-            md('<a href="https://github.com/matthewwithanm">Some Text</a>', convert=[]),
-            'Some Text')
-
-
-class EscapeTests(unittest.TestCase):
-
-    def test_underscore(self):
-        self.assertEqual(md('_hey_dude_'), '\_hey\_dude\_')
-
-    def test_xml_entities(self):
-        self.assertEqual(md('&amp;'), '&')
-
-    def test_named_entities(self):
-        self.assertEqual(md('&raquo;'), u'\xbb')
-
-    def test_hexadecimal_entities(self):
-        # This looks to be a bug in BeautifulSoup (fixed in bs4) that we have to work around.
-        self.assertEqual(md('&#x27;'), '\x27')
-
-    def test_single_escaping_entities(self):
-        self.assertEqual(md('&amp;amp;'), '&amp;')
-
-
-class ConversionTests(unittest.TestCase):
-
-    def test_a(self):
-        self.assertEqual(
-            md('<a href="http://google.com">Google</a>'),
-            '[Google](http://google.com)'
-        )
-
-    def test_a_with_title(self):
-        self.assertEqual(
-            md('<a href="http://google.com" title="The &quot;Goog&quot;">Google</a>'),
-            r'[Google](http://google.com "The \"Goog\"")'
-        )
-
-    def test_b(self):
-        self.assertEqual(md('<b>Hello</b>'), '**Hello**')
-
-    def test_blockquote(self):
-        self.assertEqual(md('<blockquote>Hello</blockquote>').strip(), '> Hello')
-
-    def test_nested_blockquote(self):
-        self.assertEqual(
-            md('<blockquote>And she was like <blockquote>Hello</blockquote></blockquote>').strip(),
-            '> And she was like \n> > Hello'
-        )
-
-    def test_br(self):
-        self.assertEqual(md('a<br />b<br />c'), 'a  \nb  \nc')
-
-    def test_em(self):
-        self.assertEqual(md('<em>Hello</em>'), '*Hello*')
-
-    def test_h1(self):
-        self.assertEqual(md('<h1>Hello</h1>'), 'Hello\n=====\n\n')
-
-    def test_h2(self):
-        self.assertEqual(md('<h2>Hello</h2>'), 'Hello\n-----\n\n')
-
-    def test_hn(self):
-        self.assertEqual(md('<h3>Hello</h3>'), '### Hello\n\n')
-        self.assertEqual(md('<h6>Hello</h6>'), '###### Hello\n\n')
-
-    def test_i(self):
-        self.assertEqual(md('<i>Hello</i>'), '*Hello*')
-
-    def test_ol(self):
-        self.assertEqual(md('<ol><li>a</li><li>b</li></ol>'), '1. a\n2. b\n')
-
-    def test_p(self):
-        self.assertEqual(md('<p>hello</p>'), 'hello\n\n')
-
-    def test_strong(self):
-        self.assertEqual(md('<strong>Hello</strong>'), '**Hello**')
-
-    def test_ul(self):
-        self.assertEqual(md('<ul><li>a</li><li>b</li></ul>'), '* a\n* b\n')
-
-
-class AdvancedTests(unittest.TestCase):
-
-    def test_nested(self):
-        self.assertEqual(
-            md('<p>This is an <a href="http://example.com/">example link</a>.</p>'),
-            'This is an [example link](http://example.com/).\n\n'
-        )
--- a/tests/init.py
+++ b/tests/init.py
--- a/tests/test_advanced.py
+++ b/tests/test_advanced.py
@@ -0,0 +1,16 @@
+from markdownify import markdownify as md
+
+
+def test_nested():
+    text = md('<p>This is an <a href="http://example.com/">example link</a>.</p>')
+    assert text == 'This is an [example link](http://example.com/).\n\n'
+
+
+def test_ignore_comments():
+    text = md("<!-- This is a comment -->")
+    assert text == ""
+
+
+def test_ignore_comments_with_other_tags():
+    text = md("<!-- This is a comment --><a href='http://example.com/'>example link</a>")
+    assert text == "[example link](http://example.com/)"
--- a/tests/test_args.py
+++ b/tests/test_args.py
@@ -0,0 +1,25 @@
+"""
+Test whitelisting/blacklisting of specific tags.
+
+"""
+from markdownify import markdownify as md
+
+
+def test_strip():
+    text = md('<a href="https://github.com/matthewwithanm">Some Text</a>', strip=['a'])
+    assert text == 'Some Text'
+
+
+def test_do_not_strip():
+    text = md('<a href="https://github.com/matthewwithanm">Some Text</a>', strip=[])
+    assert text == '[Some Text](https://github.com/matthewwithanm)'
+
+
+def test_convert():
+    text = md('<a href="https://github.com/matthewwithanm">Some Text</a>', convert=['a'])
+    assert text == '[Some Text](https://github.com/matthewwithanm)'
+
+
+def test_do_not_convert():
+    text = md('<a href="https://github.com/matthewwithanm">Some Text</a>', convert=[])
+    assert text == 'Some Text'
--- a/tests/test_basic.py
+++ b/tests/test_basic.py
@@ -0,0 +1,13 @@
+from markdownify import markdownify as md
+
+
+def test_single_tag():
+    assert md('<span>Hello</span>') == 'Hello'
+
+
+def test_soup():
+    assert md('<div><span>Hello</div></span>') == 'Hello'
+
+
+def test_whitespace():
+    assert md(' a  b \t\t c ') == ' a b c '
--- a/tests/test_conversions.py
+++ b/tests/test_conversions.py
@@ -0,0 +1,311 @@
+from markdownify import markdownify as md, ATX, ATX_CLOSED, BACKSLASH, UNDERSCORE
+import re
+
+
+nested_uls = re.sub(r'\s+', '', """
+    <ul>
+        <li>1
+            <ul>
+                <li>a
+                    <ul>
+                        <li>I</li>
+                        <li>II</li>
+                        <li>III</li>
+                    </ul>
+                </li>
+                <li>b</li>
+                <li>c</li>
+            </ul>
+        </li>
+        <li>2</li>
+        <li>3</li>
+    </ul>""")
+
+
+table = re.sub(r'\s+', '', """
+<table>
+    <tr>
+        <th>Firstname</th>
+        <th>Lastname</th>
+        <th>Age</th>
+    </tr>
+    <tr>
+        <td>Jill</td>
+        <td>Smith</td>
+        <td>50</td>
+    </tr>
+    <tr>
+        <td>Eve</td>
+        <td>Jackson</td>
+        <td>94</td>
+    </tr>
+</table>
+""")
+
+
+table_head_body = re.sub(r'\s+', '', """
+<table>
+    <thead>
+            <tr>
+            <th>Firstname</th>
+            <th>Lastname</th>
+            <th>Age</th>
+            </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td>Jill</td>
+            <td>Smith</td>
+            <td>50</td>
+        </tr>
+        <tr>
+            <td>Eve</td>
+            <td>Jackson</td>
+            <td>94</td>
+        </tr>
+    </tbody>
+</table>
+""")
+
+table_missing_text = re.sub(r'\s+', '', """
+<table>
+    <thead>
+            <tr>
+            <th></th>
+            <th>Lastname</th>
+            <th>Age</th>
+            </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td>Jill</td>
+            <td></td>
+            <td>50</td>
+        </tr>
+        <tr>
+            <td>Eve</td>
+            <td>Jackson</td>
+            <td>94</td>
+        </tr>
+    </tbody>
+</table>
+""")
+
+
+def test_chomp():
+    assert md(' <b></b> ') == '  '
+    assert md(' <b> </b> ') == '  '
+    assert md(' <b>  </b> ') == '  '
+    assert md(' <b>   </b> ') == '  '
+    assert md(' <b>s </b> ') == ' **s**  '
+    assert md(' <b> s</b> ') == '  **s** '
+    assert md(' <b> s </b> ') == '  **s**  '
+    assert md(' <b>  s  </b> ') == '  **s**  '
+
+
+def test_a():
+    assert md('<a href="https://google.com">Google</a>') == '[Google](https://google.com)'
+    assert md('<a href="https://google.com">https://google.com</a>', autolinks=False) == '[https://google.com](https://google.com)'
+    assert md('<a href="https://google.com">https://google.com</a>') == '<https://google.com>'
+    assert md('<a href="https://community.kde.org/Get_Involved">https://community.kde.org/Get_Involved</a>') == '<https://community.kde.org/Get_Involved>'
+    assert md('<a href="https://community.kde.org/Get_Involved">https://community.kde.org/Get_Involved</a>', autolinks=False) == '[https://community.kde.org/Get\\_Involved](https://community.kde.org/Get_Involved)'
+
+
+def test_a_spaces():
+    assert md('foo <a href="http://google.com">Google</a> bar') == 'foo [Google](http://google.com) bar'
+    assert md('foo<a href="http://google.com"> Google</a> bar') == 'foo [Google](http://google.com) bar'
+    assert md('foo <a href="http://google.com">Google </a>bar') == 'foo [Google](http://google.com) bar'
+    assert md('foo <a href="http://google.com"></a> bar') == 'foo  bar'
+
+
+def test_a_with_title():
+    text = md('<a href="http://google.com" title="The &quot;Goog&quot;">Google</a>')
+    assert text == r'[Google](http://google.com "The \"Goog\"")'
+
+
+def test_a_shortcut():
+    text = md('<a href="http://google.com">http://google.com</a>')
+    assert text == '<http://google.com>'
+
+
+def test_a_no_autolinks():
+    text = md('<a href="http://google.com">http://google.com</a>', autolinks=False)
+    assert text == '[http://google.com](http://google.com)'
+
+
+def test_b():
+    assert md('<b>Hello</b>') == '**Hello**'
+
+
+def test_b_spaces():
+    assert md('foo <b>Hello</b> bar') == 'foo **Hello** bar'
+    assert md('foo<b> Hello</b> bar') == 'foo **Hello** bar'
+    assert md('foo <b>Hello </b>bar') == 'foo **Hello** bar'
+    assert md('foo <b></b> bar') == 'foo  bar'
+
+
+def test_blockquote():
+    assert md('<blockquote>Hello</blockquote>') == '\n> Hello\n\n'
+
+
+def test_blockquote_with_paragraph():
+    assert md('<blockquote>Hello</blockquote><p>handsome</p>') == '\n> Hello\n\nhandsome\n\n'
+
+
+def test_nested_blockquote():
+    text = md('<blockquote>And she was like <blockquote>Hello</blockquote></blockquote>')
+    assert text == '\n> And she was like \n> > Hello\n> \n> \n\n'
+
+
+def test_br():
+    assert md('a<br />b<br />c') == 'a  \nb  \nc'
+
+
+def test_em():
+    assert md('<em>Hello</em>') == '*Hello*'
+
+
+def test_em_spaces():
+    assert md('foo <em>Hello</em> bar') == 'foo *Hello* bar'
+    assert md('foo<em> Hello</em> bar') == 'foo *Hello* bar'
+    assert md('foo <em>Hello </em>bar') == 'foo *Hello* bar'
+    assert md('foo <em></em> bar') == 'foo  bar'
+
+
+def test_h1():
+    assert md('<h1>Hello</h1>') == 'Hello\n=====\n\n'
+
+
+def test_h2():
+    assert md('<h2>Hello</h2>') == 'Hello\n-----\n\n'
+
+
+def test_hn():
+    assert md('<h3>Hello</h3>') == '### Hello\n\n'
+    assert md('<h6>Hello</h6>') == '###### Hello\n\n'
+
+
+def test_hn_chained():
+    assert md('<h1>First</h1>\n<h2>Second</h2>\n<h3>Third</h3>', heading_style=ATX) == '# First\n\n\n## Second\n\n\n### Third\n\n'
+    assert md('X<h1>First</h1>', heading_style=ATX) == 'X# First\n\n'
+
+
+def test_hn_nested_tag_heading_style():
+    assert md('<h1>A <p>P</p> C </h1>', heading_style=ATX_CLOSED) == '# A P C #\n\n'
+    assert md('<h1>A <p>P</p> C </h1>', heading_style=ATX) == '# A P C\n\n'
+
+
+def test_hn_nested_simple_tag():
+    tag_to_markdown = [
+        ("strong", "**strong**"),
+        ("b", "**b**"),
+        ("em", "*em*"),
+        ("i", "*i*"),
+        ("p", "p"),
+        ("a", "a"),
+        ("div", "div"),
+        ("blockquote", "blockquote"),
+    ]
+
+    for tag, markdown in tag_to_markdown:
+        assert md('<h3>A <' + tag + '>' + tag + '</' + tag + '> B</h3>') == '### A ' + markdown + ' B\n\n'
+
+    assert md('<h3>A <br>B</h3>', heading_style=ATX) == '### A B\n\n'
+
+    # Nested lists not supported
+    # assert md('<h3>A <ul><li>li1</i><li>l2</li></ul></h3>', heading_style=ATX) == '### A li1 li2 B\n\n'
+
+
+def test_hn_nested_img():
+    assert md('<img src="/path/to/img.jpg" alt="Alt text" title="Optional title" />') == '![Alt text](/path/to/img.jpg "Optional title")'
+    assert md('<img src="/path/to/img.jpg" alt="Alt text" />') == '![Alt text](/path/to/img.jpg)'
+    image_attributes_to_markdown = [
+        ("", ""),
+        ("alt='Alt Text'", "Alt Text"),
+        ("alt='Alt Text' title='Optional title'", "Alt Text"),
+    ]
+    for image_attributes, markdown in image_attributes_to_markdown:
+        assert md('<h3>A <img src="/path/to/img.jpg " ' + image_attributes + '/> B</h3>') == '### A ' + markdown + ' B\n\n'
+
+
+def test_hr():
+    assert md('Hello<hr>World') == 'Hello\n\n---\n\nWorld'
+    assert md('Hello<hr />World') == 'Hello\n\n---\n\nWorld'
+    assert md('<p>Hello</p>\n<hr>\n<p>World</p>') == 'Hello\n\n\n\n\n---\n\n\nWorld\n\n'
+
+
+def test_head():
+    assert md('<head>head</head>') == 'head'
+
+
+def test_atx_headings():
+    assert md('<h1>Hello</h1>', heading_style=ATX) == '# Hello\n\n'
+    assert md('<h2>Hello</h2>', heading_style=ATX) == '## Hello\n\n'
+
+
+def test_atx_closed_headings():
+    assert md('<h1>Hello</h1>', heading_style=ATX_CLOSED) == '# Hello #\n\n'
+    assert md('<h2>Hello</h2>', heading_style=ATX_CLOSED) == '## Hello ##\n\n'
+
+
+def test_i():
+    assert md('<i>Hello</i>') == '*Hello*'
+
+
+def test_ol():
+    assert md('<ol><li>a</li><li>b</li></ol>') == '\n1. a\n2. b\n\n'
+    assert md('<ol start="3"><li>a</li><li>b</li></ol>') == '\n3. a\n4. b\n\n'
+
+
+def test_p():
+    assert md('<p>hello</p>') == 'hello\n\n'
+
+
+def test_strong():
+    assert md('<strong>Hello</strong>') == '**Hello**'
+
+
+def test_ul():
+    assert md('<ul><li>a</li><li>b</li></ul>') == '\n* a\n* b\n\n'
+
+
+def test_inline_ul():
+    assert md('<p>foo</p><ul><li>a</li><li>b</li></ul><p>bar</p>') == 'foo\n\n\n* a\n* b\n\nbar\n\n'
+
+
+def test_nested_uls():
+    """
+    Nested ULs should alternate bullet characters.
+
+    """
+    assert md(nested_uls) == '\n* 1\n\t+ a\n\t\t- I\n\t\t- II\n\t\t- III\n\t+ b\n\t+ c\n* 2\n* 3\n\n'
+
+
+def test_bullets():
+    assert md(nested_uls, bullets='-') == '\n- 1\n\t- a\n\t\t- I\n\t\t- II\n\t\t- III\n\t- b\n\t- c\n- 2\n- 3\n\n'
+
+
+def test_img():
+    assert md('<img src="/path/to/img.jpg" alt="Alt text" title="Optional title" />') == '![Alt text](/path/to/img.jpg "Optional title")'
+    assert md('<img src="/path/to/img.jpg" alt="Alt text" />') == '![Alt text](/path/to/img.jpg)'
+
+
+def test_div():
+    assert md('Hello</div> World') == 'Hello World'
+
+
+def test_table():
+    assert md(table) == '| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |'
+    assert md(table_head_body) == '| Firstname | Lastname | Age |\n| --- | --- | --- |\n| Jill | Smith | 50 |\n| Eve | Jackson | 94 |'
+    assert md(table_missing_text) == '|  | Lastname | Age |\n| --- | --- | --- |\n| Jill |  | 50 |\n| Eve | Jackson | 94 |'
+
+
+def test_strong_em_symbol():
+    assert md('<strong>Hello</strong>', strong_em_symbol=UNDERSCORE) == '__Hello__'
+    assert md('<b>Hello</b>', strong_em_symbol=UNDERSCORE) == '__Hello__'
+    assert md('<em>Hello</em>', strong_em_symbol=UNDERSCORE) == '_Hello_'
+    assert md('<i>Hello</i>', strong_em_symbol=UNDERSCORE) == '_Hello_'
+
+
+def test_newline_style():
+    assert md('a<br />b<br />c', newline_style=BACKSLASH) == 'a\\\nb\\\nc'
--- a/tests/test_escaping.py
+++ b/tests/test_escaping.py
@@ -0,0 +1,22 @@
+from markdownify import markdownify as md
+
+
+def test_underscore():
+    assert md('_hey_dude_') == r'\_hey\_dude\_'
+
+
+def test_xml_entities():
+    assert md('&amp;') == '&'
+
+
+def test_named_entities():
+    assert md('&raquo;') == u'\xbb'
+
+
+def test_hexadecimal_entities():
+    # This looks to be a bug in BeautifulSoup (fixed in bs4) that we have to work around.
+    assert md('&#x27;') == '\x27'
+
+
+def test_single_escaping_entities():
+    assert md('&amp;amp;') == '&amp;'
Author	SHA1	Message	Date
AlexVonB	21c0d034d0	Merge branch 'develop'	2021-05-02 10:51:00 +02:00
AlexVonB	f59f9f9a54	bump to v0.7.1	2021-05-02 10:50:49 +02:00
AlexVonB	bd22a16c9e	Merge pull request #40 from jiulongw/jiulongw/hr Add conversion for hr element	2021-05-02 10:47:32 +02:00
AlexVonB	55fb96e3c0	fix hr tests	2021-05-02 10:45:52 +02:00
Jiulong Wang	5f102d5223	Add conversion for hr element	2021-04-29 13:41:28 -07:00
AlexVonB	e3ddc789a2	Merge branch 'develop'	2021-04-22 12:43:27 +02:00
AlexVonB	651d5f00e8	bump to v0.7.0	2021-04-22 12:43:17 +02:00
AlexVonB	3cf324d03d	Merge pull request #36 from BrunoMiguens/add-basic-support-for-tables Add basic support for tables	2021-04-22 12:41:54 +02:00
AlexVonB	96f7e7d307	Merge branch 'develop' into add-basic-support-for-tables	2021-04-22 12:40:16 +02:00
AlexVonB	e1dbbfad42	guard table lines with pipes, resolves the empty header problem	2021-04-22 12:36:11 +02:00
AlexVonB	2d0cd97323	Merge branch 'develop'	2021-04-22 12:13:03 +02:00
AlexVonB	d4882b86b9	bump to v0.6.6	2021-04-22 12:12:51 +02:00
AlexVonB	b47d5f11c8	Merge pull request #37 from andredelft/develop Add `strong_em_symbol` and `newline` options to the converter	2021-04-18 21:35:16 +02:00
André van Delft	29c794e17d	Introduce OPTIONs for `strong_em_symbol`	2021-04-18 18:13:29 +02:00
André van Delft	e877602a5e	Separate the strong_em_symbol and newline style tests	2021-04-05 11:28:42 +02:00
André van Delft	5580b0b51d	Update README.rst	2021-04-05 11:13:52 +02:00
André van Delft	650f377b64	Fix linting	2021-04-05 11:13:19 +02:00
André van Delft	7ee87b1d32	Use .lower() on _style option fetching	2021-04-05 10:50:23 +02:00
André van Delft	16dbc471b9	Test newline_style	2021-04-05 10:47:55 +02:00
André van Delft	c04ec855dd	Change option to newline_style and use variables like heading_style does	2021-04-05 10:44:20 +02:00
André van Delft	8da0bdf998	Test strong_em_symbol	2021-04-05 10:28:46 +02:00
AlexVonB	ec185e2e9c	Merge branch 'develop'	2021-02-21 23:09:55 +01:00
AlexVonB	a59e4b9f48	bump to v0.6.5	2021-02-21 23:09:44 +01:00
AlexVonB	fd293a9714	use python 3.8 instead of 3.6	2021-02-21 23:08:49 +01:00
AlexVonB	99365de669	upgrading code for python 3.x closes #38	2021-02-21 23:06:21 +01:00
AlexVonB	079d1721aa	Merge branch 'develop'	2021-02-21 20:58:34 +01:00
AlexVonB	ed406d3206	bump to v0.6.4	2021-02-21 20:57:57 +01:00
AlexVonB	f320cf87ff	closing #25 and #18 Adds newlines after blockquotes, allowing for paragraphs after a blockquote. Due to merging problems with @lucafrance 's code I had to quickly copy and paste their code. Thanks for the contribution!	2021-02-21 20:53:44 +01:00
André van Delft	a79ed44ec3	Fix code ticks in README	2021-02-15 16:51:20 +01:00
André van Delft	29a4e551f7	Update README with the two new options	2021-02-15 16:37:13 +01:00
André van Delft	b3ac4606a6	Allow for the use of backslash for newlines	2021-02-15 16:29:14 +01:00
André van Delft	f093843f40	Allow for a custom strong or emphasis symbol	2021-02-15 16:19:19 +01:00
Bruno Miguens	de6f91af0e	Revert header validation and leave possibility to empty column	2021-02-08 20:56:18 +00:00
Bruno Miguens	8c28ade348	Remove empty header validation to allow empty header	2021-02-08 20:50:15 +00:00
Bruno Miguens	a152c5b706	Fix lint	2021-02-08 19:32:35 +00:00
Bruno Miguens	292d64bbf4	Remove unnecessary tests	2021-02-08 19:26:27 +00:00
Bruno Miguens	db96eeb785	Add tests for basic and thead/tbody tables	2021-02-08 17:00:09 +00:00
Bruno Miguens	73f7644c0d	Add basic support for HTML tables	2021-02-08 17:00:09 +00:00
AlexVonB	a4d134df97	Merge pull request #34 from BrunoMiguens/add-ignore-comment-tags Add ignore comment tags	2021-02-07 19:46:49 +01:00
Bruno Miguens	457454c713	Add new line at the end of file	2021-02-05 19:49:57 +00:00
Bruno Miguens	321e9eb5f6	Add ignore comment tags	2021-02-05 19:40:43 +00:00
AlexVonB	bf24df3e2e	bump to v0.6.3	2021-01-12 22:43:18 +01:00
AlexVonB	15329588b1	Merge branch 'develop'	2021-01-12 22:42:58 +01:00
AlexVonB	77d1e99bd5	satisfy linter	2021-01-12 22:42:06 +01:00
AlexVonB	34ad8485fa	bump to v0.6.2	2021-01-12 22:40:03 +01:00
AlexVonB	f0ce934bf8	Merge branch 'develop'	2021-01-12 22:39:47 +01:00
AlexVonB	97c78ef55b	Merge branch 'fix-extra-headline-whitespace' into develop	2021-01-12 22:38:59 +01:00
AlexVonB	99cd237f27	Merge branch 'develop'	2021-01-04 10:22:02 +01:00
AlexVonB	b7e1ab889d	bump to v0.6.1	2021-01-04 10:21:27 +01:00
AlexVonB	29e86aec55	Merge branch 'fix-link-underscores' into develop	2021-01-04 10:18:05 +01:00
AlexVonB	453b604096	Fixing autolinks When checking a links href and text for equality, first un-escape the underscores in the text -- because six escapes them. This should fix #29.	2021-01-02 17:22:36 +01:00
AlexVonB	2bde8d3e8e	Merge branch 'develop'	2021-01-02 16:49:28 +01:00
AlexVonB	4f8937810b	dont replace newlines and tabs with spaces this should fix #17, as all leading new lines were replaced with a single space, which in turn was rendered before the # of a headline	2020-12-29 10:28:50 +01:00
AlexVonB	3544322ed2	Bump Version 0.6.0	2020-12-13 23:41:56 +01:00
AlexVonB	c4d0a14ce5	Merge pull request #26 from idvorkin/develop Add support for headings that include nested divs	2020-12-13 23:39:34 +01:00
Igor Dvorkin	05ea8dc58a	Add many tests and support image tag	2020-12-13 17:40:53 +00:00
Igor Dvorkin	7780f82c30	Using a regexp to determine if a tag is a heading.	2020-12-11 16:54:14 -08:00
Igor Dvorkin	d558617cd7	Add support for headings that include nested block elements	2020-11-20 06:03:51 -08:00
AlexVonB	8c9b029756	Merge branch 'develop'	2020-09-01 18:10:07 +02:00
AlexVonB	25d68b4265	Bump version 0.5.3	2020-09-01 18:09:24 +02:00
AlexVonB	5561106991	Merge pull request #24 from SimonIT/fix-corrupt-html Fix parsing corrupt html	2020-09-01 18:04:17 +02:00
SimonIT	1b3136ad04	Fix parsing corrupt html	2020-08-31 13:15:10 +02:00
AlexVonB	987a2a9cae	Merge pull request #20 from SimonIT/badges Add some fancy badges	2020-08-19 10:32:30 +02:00
SimonIT	a4461161bc	Make badges inline	2020-08-19 10:06:21 +02:00
AlexVonB	ae50065872	Merge branch 'develop'	2020-08-18 18:53:10 +02:00
AlexVonB	19e2c3db0d	Bump version 0.5.2	2020-08-18 18:52:53 +02:00
AlexVonB	ba51bbee12	Merge pull request #22 from SimonIT/ol-start-attribute Support the start attribute for ordered lists	2020-08-18 18:44:59 +02:00
AlexVonB	9f3d497053	use python3.6 for linting	2020-08-18 18:41:46 +02:00
AlexVonB	d2fc689b66	set max flake8 version again3	2020-08-18 18:39:20 +02:00
AlexVonB	ab78385b56	set max flake8 version again2	2020-08-18 18:38:17 +02:00
AlexVonB	9ebf726e78	set max flake8 version again	2020-08-18 18:37:39 +02:00
AlexVonB	3f8403aa7a	set max flake8 version	2020-08-18 18:35:31 +02:00
AlexVonB	5b6e76f984	Create python-app.yml	2020-08-18 18:30:55 +02:00
SimonIT	04711027e6	Replace downloads badge	2020-08-13 20:11:18 +02:00
SimonIT	ca98892953	Support the start attribute for ordered lists	2020-08-11 11:43:02 +02:00
AlexVonB	0dc281e6ea	Bump version 0.5.1	2020-08-11 09:51:04 +02:00
AlexVonB	4e6e20e756	Merge pull request #21 from matthewwithanm/python-publish Create python-publish.yml	2020-08-11 09:49:29 +02:00
Matthew Dapena-Tretter	9358522c73	Create python-publish.yml Add workflow for publishing to PyPI.	2020-08-10 19:42:48 -07:00
SimonIT	28d7a22da3	Remove alt because it makes some trouble	2020-08-10 17:42:18 +02:00
SimonIT	8b882ca3c9	Add some fancy badges	2020-08-10 16:24:00 +02:00
AlexVonB	1078610066	ignore build folder	2020-08-10 13:03:12 +02:00
AlexVonB	d23dbc77e4	Merge branch 'master' into develop	2020-08-10 13:01:34 +02:00
AlexVonB	0c4b856b9c	Bump to 0.5.0	2020-08-09 21:22:15 +02:00
AlexVonB	e9cc01938a	Merge branch 'develop'	2020-08-09 21:20:44 +02:00
AlexVonB	aceced68eb	cleaning up changes with help of linter	2020-08-09 21:17:39 +02:00
AlexVonB	3b049cdb9c	added egg dirs to gitignore	2020-08-09 21:13:33 +02:00
AlexVonB	b747378b52	fixed nested lists and wrote correct tests nested lists did not work: after a nested list was over, a new line was inserted. this leads to a large gap before the rest of the parent list. lists are prefixed and suffixed with a single newline, this is now represented in the tests.	2020-08-09 21:11:16 +02:00
AlexVonB	ee73d89879	Merge pull request #14 from AlexVonB/fix-inline-spaces remove prefixed and suffixed spaces from inline tags	2020-08-09 20:24:23 +02:00
AlexVonB	5563161c86	remove needless checks for emtpy text	2019-07-12 10:23:17 +02:00
AlexVonB	28e447d9ae	remove prefixed and suffixed spaces from inline tags fixes matthewwithanm#13	2019-07-11 23:27:52 +02:00
Matthew Dapena-Tretter	89d14f4487	Merge pull request #11 from AlexVonB/AlexVonB-patch-1 Add newline before and after a markdown list	2019-07-04 08:53:25 -07:00
AlexVonB	5f9243d91d	added tests for matthewwithanm#11	2019-07-04 16:32:21 +02:00
AlexVonB	d0f688d2e4	Add newline before and after a markdown list Fixes matthewwithanm#5 as well as an issue where `<p>foo<p><ul><li>bar</li></ul>` gets converted to `foo * bar` which is not correct	2019-07-04 16:26:09 +02:00
Jonathan Vanasco	5ac08522be	updating classifer to mit license issue #9	2019-06-19 16:17:47 -07:00
Thomas Lange	78afcc173e	Adding MIT license file	2018-10-16 19:11:02 -07:00
Steven Skoczen	b132a6f5b3	Updates to 0.4.1, pkgmeta included directly in setup.	2017-11-28 12:07:31 +13:00
Steven Skoczen	0abe0a29e8	Merge pull request #2 from crhallberg/html-parser Suppress BeautifulSoup warning	2017-11-13 08:48:45 +13:00
Steven Skoczen	4932df631f	Merge pull request #1 from dmpayton/develop Fixes to get tests passing in Python 3.	2017-11-13 08:48:38 +13:00
Chris Hallberg	8696e2bde1	Suppress BeautifulSoup warning by explicitly passing in the default parser as recommended by the error message: ``` /home/challberg/.local/lib/python2.7/site-packages/bs4/__init__.py:181: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. The code that caused this warning is on line 35 of the file unroll.py. To get rid of this warning, change code that looks like this: BeautifulSoup(YOUR_MARKUP}) to this: BeautifulSoup(YOUR_MARKUP, "html.parser") markup_type=markup_type)) ```	2017-06-12 16:03:04 -04:00
dmpayton	ee53d85c41	Fixes to get tests passing in Python 3.	2016-02-23 15:15:29 -08:00
Matthew Tretter	53ba0daa77	Document options	2013-07-31 23:23:44 -04:00
Matthew Tretter	fb98e9878f	Bump to 0.4.0	2013-07-31 23:12:53 -04:00
Matthew Tretter	aa10053fbb	Test custom bullets	2013-07-31 23:11:39 -04:00
Matthew Tretter	253a34c2d7	Test nested unordered lists	2013-07-31 23:08:39 -04:00
Matthew Tretter	3ea09609e6	Add support for "bullets" option	2013-07-31 23:08:36 -04:00
Matthew Tretter	1cd8e56c47	Test ATX and ATX_CLOSED style headings	2013-07-31 22:19:41 -04:00
Matthew Tretter	891a4a8d08	Add "heading_style" option Allow the user to specify a heading style.	2013-07-31 22:17:22 -04:00
Matthew Tretter	e5a1784f30	Remove unneeded raw string	2013-07-31 21:59:35 -04:00
Matthew Tretter	f60d910335	Add "autolinks" option This option allows you to disable the creation of "autolink" style links.	2013-07-31 21:58:48 -04:00
Matthew Tretter	d707d107f6	Support inner Options class	2013-07-31 21:55:30 -04:00
Matthew Tretter	1ef4dd1468	Add shortcut link syntax	2013-07-31 19:23:39 -04:00
Matthew Tretter	934c97b342	Test img tag conversion	2013-07-31 19:23:38 -04:00
Matthew Tretter	8a1e2d9403	Add simple img conversion	2013-07-31 19:23:36 -04:00
Matthew Tretter	5563723cbc	Bump to 0.3.0	2013-07-31 18:16:02 -04:00
Matthew Tretter	a9c13a56da	Identify and single out HTML fragment	2013-07-31 18:13:50 -04:00
Matthew Tretter	7bdeb15b18	Use bs4 This causes a lot more tests to fail. But it'll be worth it in the end.	2013-07-31 18:01:52 -04:00
Matthew Tretter	87c8f3bd5e	Add development notes to README	2013-07-31 17:20:36 -04:00
Matthew Tretter	0211ac6619	Lint code	2013-07-31 17:20:36 -04:00
Matthew Tretter	2515e9e107	Add lint command	2013-07-31 17:20:32 -04:00
Matthew Tretter	ece61a5b1f	Bump to 0.2.0	2013-07-31 17:11:12 -04:00
Matthew Tretter	f46fb8ebbb	Add short description to README	2013-07-31 17:05:37 -04:00
Matthew Tretter	e521fd402f	Add manifest template	2013-07-31 16:55:53 -04:00
Matthew Tretter	fd6f8db132	Add gitignore	2013-07-31 16:55:30 -04:00
Matthew Tretter	c2f32b8049	Switch to pytest	2013-07-31 16:54:37 -04:00
Matthew Tretter	b92428466d	Change name to markdownify	2013-07-31 16:41:08 -04:00
Matthew Tretter	7f75b0bbce	Update package meta	2013-07-31 16:40:56 -04:00