Commit Graph

259 Commits

Author SHA1 Message Date
Joseph Myers
c2ffe46e85 Fix whitespace issues around wrapping
This fixes various issues relating to how input whitespace is handled
and how wrapping handles whitespace resulting from hard line breaks.

This PR uses a branch based on that for #120 to avoid conflicts with
the fixes and associated test changes there.  My suggestion is thus
first to merge #120 (which fixes two open issues), then to merge the
remaining changes from this PR.

Wrapping paragraphs has the effect of losing all newlines including
those from `<br>` tags, contrary to HTML semantics (wrapping should be
a matter of pretty-printing the output; input whitespace from the HTML
input should be normalized, but `<br>` should remain as a hard line
break).  To fix this, we need to wrap the portions of a paragraph
between hard line breaks separately.  For this to work, ensure that
when wrapping, all input whitespace is normalized at an early stage,
including turning newlines into spaces.  (Only ASCII whitespace is
handled this way; `\s` is not used as it's not clear Unicode
whitespace should get such normalization.)

When not wrapping, there is still too much input whitespace
preservation.  If the input contains a blank line, that ends up as a
paragraph break in the output, or breaks the header formatting when
appearing in a header tag, though in terms of HTML semantics such a
blank line is no different from a space.  In the case of an ATX
header, even a single newline appearing in the output breaks the
Markdown.  Thus, when not wrapping, arrange for input whitespace
containing at least one `\r` or `\n` to be normalized to a single
newline, and in the ATX header case, normalize to a space.

Fixes #130
(probably, not sure exactly what the HTML input there is)

Fixes #88
(a related case, anyway; the actual input in #88 has already been fixed)
2024-10-03 00:30:50 +00:00
Joseph Myers
4399ee75db Merge branch 'develop' into para-newlines-92-98 2024-09-30 18:05:32 +00:00
AlexVonB
964d89fa8a bump to version v0.13.1 2024-07-14 22:40:02 +02:00
AlexVonB
46dc1a002d Migrated the metadata into PEP 621-compliant pyproject.toml (#138)
* Move the metadata from `setup.py` into `setup.cfg`.
Added `pyproject.toml`.
Removed `setup.py` - it is no longer needed.
Got rid of tests erroroneously finding their way into the wheel.

* Started populating version automatically from git tags using `setuptools_scm`.

* Migrated the metadata into `PEP 621`-compliant `pyproject.toml`, got rid of `setup.cfg`.

* test build in develop and pull requests

* use static version instead of dynamic git tag info

---------

Co-authored-by: KOLANICH <kolan_n@mail.ru>
2024-07-14 22:38:29 +02:00
AlexVonB
f6c8daf8a5 bump to v0.13.0 2024-07-14 21:19:35 +02:00
AlexVonB
75a678dab9 fix pytest version to 8 2024-07-14 21:02:49 +02:00
AlexVonB
0a5c89aa49 added test for ol start check 2024-06-23 14:30:07 +02:00
microdnd
51390d7389 handle ol start value is not number (#127)
Co-authored-by: Mico <mico_wu@trendmicro.com>
2024-06-23 14:28:53 +02:00
AlexVonB
50b4640db2 better naming for markup variables 2024-06-23 13:30:08 +02:00
Joseph Myers
7861b330cd Special-case use of HTML tags for converting <sub> / <sup> (#119)
Allow different strings before / after `<sub>` / `<sup>` content

In particular, this allows setting `sub_symbol='<sub>'`,
`sup_symbol='<sup>'`, to use raw HTML in the output when
converting subscripts and superscripts.
2024-06-23 13:28:05 +02:00
AlexVonB
2ec33384de handle un-parsable colspan values
fixes #126
2024-06-23 13:17:20 +02:00
samypr100
c1672aee44 Update MANIFEST.in to exclude tests during packaging (#125) 2024-06-23 12:59:14 +02:00
Joseph Myers
60d86663d7 More carefully separate inline text from block content
There are various cases in which inline text fails to be separated by
(sufficiently many) newlines from adjacent block content.  A paragraph
needs a blank line (two newlines) separating it from prior text, as
does an underlined header; an ATX header needs a single newline
separating it from prior text.  A list needs at least one newline
separating it from prior text, but in general two newlines (for an
ordered list starting other than at 1, which will only be recognized
given a blank line before).

To avoid accumulation of more newlines than necessary, take care when
concatenating the results of converting consecutive tags to remove
redundant newlines (keeping the greater of the number ending the prior
text and the number starting the subsequent text).

This is thus an alternative to #108 that tries to avoid the excess
newline accumulation that was a concern there, as well as fixing more
cases than just paragraphs, and updating tests.

Fixes #92

Fixes #98
2024-04-09 16:54:33 +00:00
AlexVonB
43dbe20aaf fixed github action badges
see https://github.com/badges/shields/issues/8671
2024-04-04 21:50:02 +02:00
Joseph Myers
46af45bb3c Escape all characters with Markdown significance (#118)
* Escape all characters with Markdown significance

There are many punctuation characters that sometimes have significance
in Markdown; more systematically escape them all (based on a new
escape_misc configuration option).

A limited attempt is made to limit the escaping of '.' and ')' to the
context where they might have Markdown significance (after a number,
where they can indicate an ordered list item); no such attempt is made
for the other characters (and even that limiting of '.' and ')' may
not be entirely safe in all cases, as it's possible the HTML could
have the number outside the block being escaped in one go,
e.g. `<span>1</span>.`.

---------

Co-authored-by: AlexVonB <AlexVonB@users.noreply.github.com>
2024-04-04 21:42:58 +02:00
Joseph Myers
2bd0772685 Avoid inline styles inside <code> / <pre> conversion (#117)
* Avoid inline styles inside `<code>` / `<pre>` conversion

The check used for this is analogous to that used to avoid escaping
potential markup characters inside such tags.

Fixes #103

---------

Co-authored-by: AlexVonB <AlexVonB@users.noreply.github.com>
2024-04-04 20:55:54 +02:00
AlexVonB
74ddc408cc bump to v0.12.1 2024-03-26 21:56:00 +01:00
Eric Xu
3b4a014f25 Table merge cell horizontally (#110)
* Fix #109 Table merge cell horizontally

* Add test case for colspan

---------

Co-authored-by: AlexVonB <AlexVonB@users.noreply.github.com>
2024-03-26 21:50:54 +01:00
AlexVonB
57d4f37923 fixed tests for table caption 2024-03-26 21:43:25 +01:00
Chris Papademetrious
d5fb0fbb85 make sure there are blank lines around table/figure captions (#114)
Signed-off-by: chrispy <chrispy@synopsys.com>
Co-authored-by: AlexVonB <AlexVonB@users.noreply.github.com>
2024-03-26 21:41:56 +01:00
huuya
e4df41225d Support conversion of header rows in tables without th tag (#83)
* Fixed support for header row conversion for tables without th tag
2024-03-26 21:32:36 +01:00
AlexVonB
804a3f8f07 added further readme for custom converters 2024-03-26 21:21:45 +01:00
Chris Papademetrious
7d0bf46057 revert workaround example in README.rst for <script> and <style> now that it is properly fixed (#115)
Signed-off-by: chrispy <chrispy@synopsys.com>
2024-03-26 21:15:22 +01:00
André van Delft
2f9a42d3b8 Strip text before adding blockquote markers (#76) 2024-03-26 21:07:28 +01:00
AlexVonB
96a25cfbf3 added tests for linebreaks in table cells 2024-03-26 21:05:31 +01:00
Carina de Oliveira Antunes
0477a0c8a0 convert_td: strip text (#91) 2024-03-26 20:49:50 +01:00
Veronika Butkevich
f33ccd7c1a Fix newline start in header tags (#89)
* Fix newline start in header tags
2024-03-26 20:46:30 +01:00
G
a2f82678f7 Add no css example to readme (#111)
* Add no css example

---------

Co-authored-by: G <17325189+Chichilele@users.noreply.github.com>
2024-03-11 21:10:08 +01:00
Thomas L. Kjeldsen
60967c1c95 ignore script and style content (such as css and javascript) (#112) 2024-03-11 21:07:24 +01:00
Chris Papademetrious
c7718b6d81 Merge pull request #104 from chrispy-snps/fix/97-101-102
improve text normalization/escaping for preformatted/code contexts
2024-01-15 15:46:51 -05:00
chrispy
2b22d239ad avoid text normalization/escaping in any preformatted/code context
Signed-off-by: chrispy <chrispy@synopsys.com>
2024-01-15 10:53:14 -05:00
AlexVonB
e6e23fd512 bump to v0.11.6 2022-09-02 10:10:27 +02:00
Alex
433fad2dec added nix shell file 2022-09-02 08:50:45 +02:00
Alex
4fb451ffa6 fixed cli parameters
closes #75
2022-09-02 08:44:41 +02:00
AlexVonB
e8d041c251 bump to v0.11.5 2022-08-31 21:45:24 +02:00
AlexVonB
f729c3ba43 first test, then lint 2022-08-31 21:44:53 +02:00
AlexVonB
eddfdae4ca fix cli options: default heading, em symbols 2022-08-31 21:44:42 +02:00
AlexVonB
50b3b73a8f bump to v0.11.4 2022-08-28 22:03:14 +02:00
AlexVonB
0310216877 fixed readme and added linter to detect this earlier 2022-08-28 22:02:49 +02:00
AlexVonB
9914474828 bump to v0.11.3 2022-08-28 21:42:46 +02:00
AlexVonB
6263f0e5f0 Switch to tox for tests (#73) 2022-08-28 21:40:52 +02:00
Adam Bambuch
17d8586843 don't escape text in pre tag (Fenced Code Blocks) (#67)
don't escape text in pre tag (Fenced Code Blocks)
2022-08-28 20:58:54 +02:00
AlexVonB
59eb069700 added readme for cli 2022-08-28 20:56:23 +02:00
Daniel J. Perry
e79971a7eb Add console entry point (#72)
* Add console entry point

* Make entry point conform to linter settings.
2022-08-28 20:53:15 +02:00
AlexVonB
5adda130b8 bump to v0.11.2 2022-04-24 11:01:29 +02:00
AlexVonB
5f1b98e25d added wrap option
closes #66
2022-04-24 11:00:04 +02:00
AlexVonB
16acd2b763 typo in readme 2022-04-24 10:59:22 +02:00
AlexVonB
207d0f4ec6 bump to v0.11.1 2022-04-14 10:25:25 +02:00
Mikko Korpela
ebb9ea713d Fix detection of "first row, not headline" (#63)
Improved handling of "first row, not headline".

Works for tables with
1) neither thead nor tbody
2) tbody but no thead
2022-04-14 10:24:32 +02:00
AlexVonB
87b9f6c88e bump to v0.11.0 2022-04-13 20:47:30 +02:00