Commit Graph

124 Commits

Author SHA1 Message Date
Joseph Myers
a369e07211 More selective escaping of -#.) (alternative approach)
This is a partial alternative to #122 (open since April) for more
selective escaping of some special characters.

Here, we fix the test function naming (as noted in that PR) so the
tests are actually run (and fix some incorrect test assertions so they
pass).  We also make escaping of `-#.)` (the most common cases of
unnecessary escaping in my use case) more selective, while still being
conservatively safe in escaping all cases of those characters that
might have Markdown significance (including in the presence of
wrapping, unlike in #122).  (Being conservatively safe doesn't include
the cases where `.` or `)` start a fragment, where the existing code
already was not conservatively safe.)

There are certainly more cases where the code could also be made more
selective while remaining conservatively safe (including in the
presence of wrapping), so this is not a complete replacement for #122,
but by fixing some of the most common cases in a safe way, and getting
the tests actually running, I hope this allows progress to be made
where the previous attempt appears to have stalled, while still
allowing further incremental progress with appropriately safe logic
for other characters where useful.
2024-10-02 21:59:39 +00:00
microdnd
51390d7389 handle ol start value is not number (#127)
Co-authored-by: Mico <mico_wu@trendmicro.com>
2024-06-23 14:28:53 +02:00
AlexVonB
50b4640db2 better naming for markup variables 2024-06-23 13:30:08 +02:00
Joseph Myers
7861b330cd Special-case use of HTML tags for converting <sub> / <sup> (#119)
Allow different strings before / after `<sub>` / `<sup>` content

In particular, this allows setting `sub_symbol='<sub>'`,
`sup_symbol='<sup>'`, to use raw HTML in the output when
converting subscripts and superscripts.
2024-06-23 13:28:05 +02:00
AlexVonB
2ec33384de handle un-parsable colspan values
fixes #126
2024-06-23 13:17:20 +02:00
Joseph Myers
46af45bb3c Escape all characters with Markdown significance (#118)
* Escape all characters with Markdown significance

There are many punctuation characters that sometimes have significance
in Markdown; more systematically escape them all (based on a new
escape_misc configuration option).

A limited attempt is made to limit the escaping of '.' and ')' to the
context where they might have Markdown significance (after a number,
where they can indicate an ordered list item); no such attempt is made
for the other characters (and even that limiting of '.' and ')' may
not be entirely safe in all cases, as it's possible the HTML could
have the number outside the block being escaped in one go,
e.g. `<span>1</span>.`.

---------

Co-authored-by: AlexVonB <AlexVonB@users.noreply.github.com>
2024-04-04 21:42:58 +02:00
Joseph Myers
2bd0772685 Avoid inline styles inside <code> / <pre> conversion (#117)
* Avoid inline styles inside `<code>` / `<pre>` conversion

The check used for this is analogous to that used to avoid escaping
potential markup characters inside such tags.

Fixes #103

---------

Co-authored-by: AlexVonB <AlexVonB@users.noreply.github.com>
2024-04-04 20:55:54 +02:00
Eric Xu
3b4a014f25 Table merge cell horizontally (#110)
* Fix #109 Table merge cell horizontally

* Add test case for colspan

---------

Co-authored-by: AlexVonB <AlexVonB@users.noreply.github.com>
2024-03-26 21:50:54 +01:00
Chris Papademetrious
d5fb0fbb85 make sure there are blank lines around table/figure captions (#114)
Signed-off-by: chrispy <chrispy@synopsys.com>
Co-authored-by: AlexVonB <AlexVonB@users.noreply.github.com>
2024-03-26 21:41:56 +01:00
huuya
e4df41225d Support conversion of header rows in tables without th tag (#83)
* Fixed support for header row conversion for tables without th tag
2024-03-26 21:32:36 +01:00
André van Delft
2f9a42d3b8 Strip text before adding blockquote markers (#76) 2024-03-26 21:07:28 +01:00
AlexVonB
96a25cfbf3 added tests for linebreaks in table cells 2024-03-26 21:05:31 +01:00
Carina de Oliveira Antunes
0477a0c8a0 convert_td: strip text (#91) 2024-03-26 20:49:50 +01:00
Veronika Butkevich
f33ccd7c1a Fix newline start in header tags (#89)
* Fix newline start in header tags
2024-03-26 20:46:30 +01:00
Thomas L. Kjeldsen
60967c1c95 ignore script and style content (such as css and javascript) (#112) 2024-03-11 21:07:24 +01:00
chrispy
2b22d239ad avoid text normalization/escaping in any preformatted/code context
Signed-off-by: chrispy <chrispy@synopsys.com>
2024-01-15 10:53:14 -05:00
Alex
4fb451ffa6 fixed cli parameters
closes #75
2022-09-02 08:44:41 +02:00
AlexVonB
eddfdae4ca fix cli options: default heading, em symbols 2022-08-31 21:44:42 +02:00
Adam Bambuch
17d8586843 don't escape text in pre tag (Fenced Code Blocks) (#67)
don't escape text in pre tag (Fenced Code Blocks)
2022-08-28 20:58:54 +02:00
Daniel J. Perry
e79971a7eb Add console entry point (#72)
* Add console entry point

* Make entry point conform to linter settings.
2022-08-28 20:53:15 +02:00
AlexVonB
5f1b98e25d added wrap option
closes #66
2022-04-24 11:00:04 +02:00
Mikko Korpela
ebb9ea713d Fix detection of "first row, not headline" (#63)
Improved handling of "first row, not headline".

Works for tables with
1) neither thead nor tbody
2) tbody but no thead
2022-04-14 10:24:32 +02:00
AlexVonB
35479d2d3b Merge branch 'code_language_callback' of https://github.com/tdgroot/python-markdownify into tdgroot-code_language_callback 2022-04-13 20:25:37 +02:00
AlexVonB
b589863715 add escaping of asterisks and option to disable it
closes #62
2022-04-13 20:04:12 +02:00
AlexVonB
423b7e948c add option to allow inline images in selected tags
fixes #61
2022-04-13 19:55:34 +02:00
Timon de Groot
0ea95de4d0 Add code language callback 2022-04-09 13:22:28 +02:00
AlexVonB
0a1343a538 allow BeautifulSoup objects to be converted 2022-01-23 11:00:19 +01:00
AlexVonB
bd6b581122 add option to not escape underscores
closes #59
2022-01-18 08:51:44 +01:00
AlexVonB
cb2646cd93 differentiated between text and code language 2021-11-17 17:03:31 +01:00
Umberto Grando
ac68c53a7d added language for multiline code 2021-11-01 21:19:35 +01:00
Viktor Hozhyi
044615eff1 Fixed issue #52 - added stripping of text to list 2021-09-04 12:39:30 +03:00
AlexVonB
0fdeb1ff6e convert tags inside table cells as inline
in part resolves #49
2021-08-25 08:48:30 +02:00
AlexVonB
16d8a0e1f7 Revert "add figure/figcaption"
This reverts commit 828e116530.
2021-07-11 13:12:16 +02:00
AlexVonB
4aa6cf2a24 rewrote text processing to not escape _ in code
fixes #47
2021-07-11 13:10:59 +02:00
AlexVonB
828e116530 add figure/figcaption
for #46
2021-06-30 13:02:42 +02:00
AlexVonB
a6a31624ad add options for sub and sup tags
fixes #44
2021-05-30 19:07:43 +02:00
AlexVonB
8f6d7e500d add option 'default_title' to links
fixes #39
2021-05-30 18:40:40 +02:00
AlexVonB
129c4ef060 ignore doctype tag, test cdata tag
fixes #45
2021-05-30 11:18:18 +02:00
AlexVonB
70ef9b6e48 added pre tag
closes #15
2021-05-21 14:15:41 +02:00
AlexVonB
91d53ddd5a refactor simple inline conversions 2021-05-21 13:53:00 +02:00
AlexVonB
079f32f6cd added del and s tags 2021-05-21 12:27:49 +02:00
AlexVonB
89b577e91e ordering functions alphabetically 2021-05-21 12:21:21 +02:00
AlexVonB
77797ebb79 Merge branch 'andrewcrichards/add_code_samp_kbd_tags' of https://github.com/AndrewCRichards/python-markdownify into AndrewCRichards-andrewcrichards/add_code_samp_kbd_tags 2021-05-21 12:11:59 +02:00
AlexVonB
ea81407b87 implemented table parsing correctly
instead of manually walking down the dom tree
in a table, we now rely on the main descent loop
and just implement conversion for rows and cells
correctly. this enables the use of html inside a
table cell.
2021-05-17 14:00:00 +02:00
AlexVonB
e6da15c173 allow tables with headers in first (or any) column 2021-05-17 12:36:48 +02:00
AlexVonB
7dac92e85e Allow for tables without header row
fixes #42
2021-05-16 19:02:04 +02:00
Jiulong Wang
ddfbf6a364 Keep important spaces in <li> element 2021-05-10 16:07:54 -07:00
Jiulong Wang
91a64e3cd4 Fix missing whitespaces in <li> node 2021-05-10 14:42:05 -07:00
AlexVonB
73800ced36 fixed whitespace issues at nested lists 2021-05-02 13:44:09 +02:00
AlexVonB
1538cacb94 Merge branch 'develop' into ordere-list-update 2021-05-02 10:58:13 +02:00