397 lines
15 KiB
Plaintext
Executable File
397 lines
15 KiB
Plaintext
Executable File
:imagesdir: png
|
|
:source-highlighter: rouge
|
|
:toc:
|
|
:toclevels: 5
|
|
|
|
|
|
|
|
|
|
# Some lwIP Notes
|
|
|
|
lwIP is used for creation of a network interface of YAPicoprobe to avoid
|
|
an uncountable number of additional CDC COM ports.
|
|
|
|
Currently a Segger SysView server is listening on port 19111.
|
|
But I guess there is more to come.
|
|
|
|
|
|
## Pitfalls
|
|
|
|
lwIP had some traps waiting for me. Despite reading the official
|
|
https://www.nongnu.org/lwip/2_1_x/pitfalls.html[Common Pitfalls]
|
|
I took some of them.
|
|
|
|
|
|
### MAC
|
|
|
|
Do not use a random MAC! At least not for the first byte.
|
|
I was "lucky" choosing one with an odd first byte. Unfortunately
|
|
bit 0 marks a group address. At least Linux rejects such MAC
|
|
addresses with a more or less useless error message in the syslog.
|
|
Took a while until I found out that the MAC address was the culprit
|
|
for no communication. +
|
|
Finally I have used 0xfe as the first byte and the remaining five
|
|
bytes of the MAC address are copied from the last bytes of Picos serial number
|
|
(which actually is the serial number of the external flash).
|
|
|
|
### OS Mode
|
|
|
|
*really* take it serious what they are writing about "raw" APIs
|
|
and only using the TCPIP thread for any calls to it. +
|
|
Use `tcpip_callback_with_block(<func>, <void-arg>, 0)` for
|
|
non-blocking invocations of some function. This also saves you
|
|
from creating extra simple threads for communication tasks line
|
|
in the TineUSB/lwIP glue code. That was my first idea and it took
|
|
a while until I found out how bad that idea was. +
|
|
Same was true for the thread(!) which stuffed data for SysView into
|
|
it's server: bad idea! +
|
|
Effect of wrong API handling were random crashes or connection
|
|
disruptions.
|
|
|
|
|
|
### RNDIS/ECM/NCM
|
|
|
|
To be honest, I'm very confused about the system behaviour of RNDIS/ECM/NCM.
|
|
Sometimes the host gets "disconnected", because the DNS in `/etc/resolv.conf`
|
|
are changed, sometimes not. Sometimes the probe needs `dhclient` to get
|
|
the TCP/IP connection, sometimes not. Sometimes the probe has an IPv6 address, sometimes
|
|
not. And this all just on my Linux host. Interoperation with Windows
|
|
makes things even worse. +
|
|
And `/etc/network/interfaces` generates error
|
|
messages even when the device has `allow-hotplug`.
|
|
|
|
* *RNDIS*: this was my former favorite, because it is supported by all
|
|
relevant OSs. Also throughput seemed to be good.
|
|
Unfortunately RNDIS seems to manipulate routing in a way that the
|
|
default route on my Linux wants to go through the probe. Not
|
|
really what I want...
|
|
|
|
[NOTE]
|
|
====
|
|
RNDIS on Win10 works only, if RNDIS on the probe is the only USB class selected.
|
|
So it is possible to create a special probe firmware which provides only features
|
|
like SystemView, but you cannot have a probe which does SystemView and CMSIS-DAP. +
|
|
This is not a fault of lwIP, it is a bug in the Win10 driver(?).
|
|
====
|
|
|
|
* *ECM*: works good, packets are transferred continuously, throughput
|
|
also seems to be ok. So this is the way to go. +
|
|
Unfortunately there is no driver integrated into Win10, so possible
|
|
extra trouble appears. Yes... extra trouble: cannot find a driver
|
|
for Win10.
|
|
|
|
* *NCM*: is said to be the best choice. And in fact it is.
|
|
At least after creation of a `ncm_device_simple.c` driver which is a
|
|
stripped down version of `ncm_simple.c` which revealed as very buggy. +
|
|
Now thoughput under Linux and Windows is ok. Operation with SystemView
|
|
works without glitches, `iperf` tests sometimes crashes the probe.
|
|
So consider this driver as beta and work in progress.
|
|
|
|
|
|
|
|
## Performance
|
|
|
|
To measure performance, `iperf` is used (which implies, that `OPT_NET_IPERF_SERVER`
|
|
must be set on built). Good command line for measurement:
|
|
|
|
iperf -c 192.168.14.1 -e -i 1 -l 1024
|
|
|
|
## Testing
|
|
|
|
Good test cases are the following command lines:
|
|
|
|
for MSS in 90 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1459 1460 1500; do iperf -c 192.168.14.1 -e -i 1 -l 1024 -M $MSS; sleep 10; done
|
|
|
|
for LEN in 40000 16000 1400 1024 800 200 80 22 8 2 1; do for MSS in 90 93 100 150 200 255 256 300 400 500 511 512 600 700 800 900 1000 1100 1200 1300 1400 1450 1459 1460 1500; do iperf -c 192.168.14.1 -e -i 1 -l $LEN -M $MSS; sleep 2; done; done
|
|
|
|
Monitor performance/errors with Wireshark.
|
|
|
|
|
|
## Some words about...
|
|
|
|
### net_glue.c
|
|
|
|
I'm really trying hard to switch context between lwIP and TinyUSB correctly. This leads
|
|
to some kind of delayed call chains and also does not make the code neither nice nor
|
|
very much maintainable.
|
|
|
|
|
|
### NCM
|
|
|
|
TinyUSB NCM driver implementations is more or less buggy, so I'm doing my best
|
|
implementing a driver on my own.
|
|
|
|
Current work consists of debug versions of `ncm_device` and an almost
|
|
working `ncm_device_simple`.
|
|
|
|
Link:
|
|
|
|
* link:extern/NCM10-20101124-track.pdf[NCM Specification]
|
|
|
|
|
|
#### ncm_device_simple.c
|
|
|
|
`ncm_device_simple.c` is actually a mixture of `ecm_rndis_device.c` and `ncm_device.c`.
|
|
From `ecm_rndis_device.c` the structure has been inherited and from `ncm_device.c` the
|
|
functionality. +
|
|
The driver can be considered work in progress, because in conjunction with `iperf`
|
|
crashes sometimes happen. But for operation with SystemView quality seems to be good enough.
|
|
|
|
WARNING: There must be a nasty bug in `ncm_simple_device`. It reveals on startup
|
|
because startup time with `ncm_device_simple` is much longer compared to ECM/RNDIS and even
|
|
`ncm_device`.
|
|
|
|
|
|
#### possible bugs in ncm_device.c
|
|
|
|
This is more or less obsoleted by `ncm_device_simple.c`. But as a short summary: the original
|
|
driver is very buggy. Perhaps it is working in certain scenarios, but for sure not together with
|
|
SystemView.
|
|
|
|
* not sure, but perhaps it is best to call all functions within ncm_device in the FreeRTOS
|
|
context of TinyUSB
|
|
* `wNtbOutMaxDatagrams` must be set to 1 [2023-06-27]
|
|
** iperf runs then
|
|
** Systemview still has problems
|
|
** `wNtbOutMaxDatagrams == 0` generates a lot of retries with iperf
|
|
* I guess that the *major problem* lies within handle_incoming_datagram() because it changes values
|
|
on an incoming packet although tud_network_recv_renew() is still handling the old one
|
|
* is there multicore a problem!? (14.7.2023: no!) I have seen retries with multicore even with
|
|
`wNtbOutMaxDatagrams = 1`
|
|
* I think it is assumed, that TinyUSB and lwIP are running in the same task (but in my scenario they don't)
|
|
* if removing debug messages, then the receive path seems to work better, which
|
|
indicates a race condition somewhere
|
|
|
|
There is an open issue in the TinyUSB repo for this issue: https://github.com/hathach/tinyusb/issues/2068
|
|
|
|
|
|
## TinyUSB Driver API
|
|
|
|
### TinyUSB -> Driver
|
|
|
|
The following API is for calls from TinyUSB to the driver.
|
|
The calls are all initiated from within the TinyUSB stack. Thus all are done in the context of TinyUSB.
|
|
|
|
[%autowidth]
|
|
[%header]
|
|
|===
|
|
|Name | Comment
|
|
|
|
|netd_init()
|
|
|Initialization of the driver on startup. Called several times.
|
|
|
|
|netd_reset(rhport)
|
|
|Called several times on startup. `rhport` seems to be zero in all calls.
|
|
|
|
|netd_open(rhport, *itf_desc, max_len)
|
|
|Connects the USB endpoints. This is called once when the host driver
|
|
connects with the device.
|
|
|
|
|netd_control_xfer_cb(rhport, stage, *request)
|
|
|called after `netd_open()`. Only `stage==CONTROL_STAGE_SETUP` seems to be
|
|
of interest.
|
|
|
|
|netd_xfer_cb(rhport, ep_addr, result, xferred_bytes)
|
|
a|Depending on `ep_addr` the driver is told here, that a
|
|
|
|
* packet can be fetched from the stack for further processing within the driver
|
|
* packet transmission can be started
|
|
* notification packet should be transmitted (that's about communication parameters)
|
|
|===
|
|
|
|
|
|
### Driver -> TinyUSB
|
|
|
|
The driver has a whole bunch of available API calls. The most important are:
|
|
|
|
[%autowidth]
|
|
[%header]
|
|
|===
|
|
|Name | Comment
|
|
|
|
|tud_control_status()
|
|
|Send STATUS (zero length) packet. Called in `netd_control_xfer_cb()`.
|
|
|
|
|tud_control_xfer()
|
|
|Carry out Data and Status stage of control transfer. Called in `netd_control_xfer_cb()`.
|
|
|
|
|usbd_edpt_busy()
|
|
|Check whether an endpoint is busy or ready for the next `usbd_edpt_xfer()`.
|
|
|
|
|usbd_edpt_open()
|
|
|Used during `netd_open()`.
|
|
|
|
|usbd_edpt_xfer()
|
|
|Submit a USB transfer. For receive operation, the specified buffer must be empty.
|
|
For transmit operation, the buffer may not be touched, until the corresponding
|
|
`netd_xfer_cb()` is received.
|
|
|
|
|usbd_open_edpt_pair()
|
|
|Used during `netd_open()`.
|
|
|===
|
|
|
|
|
|
### Glue Logic -> Driver
|
|
|
|
The following API is for call from glue logic to the driver. The glue logic tries hard to issue
|
|
the calls in the TinyUSB context as well. But this is not guaranteed I'm afraid (other developers).
|
|
|
|
[%autowidth]
|
|
[%header]
|
|
|===
|
|
|Name | Comment
|
|
|
|
|tud_network_can_xmit(size)
|
|
|check if the driver buffer allows another datagram with the specified size.
|
|
If the driver tells the glue logic that there is space enough for the datagram, the glue logic
|
|
calls in the next step `tud_network_xmit()`. +
|
|
Not sure how recovery works if there is no space left. So at the moment the glue logic
|
|
is responsible for retries.
|
|
|
|
|[.line-through]#tud_network_link_state_cb(state)#
|
|
|[.line-through]#seems to be obsolete. No call found within the stack. So do not implement.
|
|
PR at TinyUSB pending to remove the call.#
|
|
|
|
|tud_network_recv_renew()
|
|
|Called when the glue logic has the opinion that the driver should check if it
|
|
can enable receive logic. The function has to check, if the USB channel
|
|
and receive buffer are available. Another option (for NCM) is, that there are
|
|
still buffered datagrams which can then be transferred via `tud_network_recv_cb()`.
|
|
|
|
|tud_network_xmit(*ref, arg)
|
|
|The glue logic requests a datagram transfer into the driver. The driver may then
|
|
prepare for the actual copy operation from glue logic which is performed via
|
|
`tud_network_xmit_cb(dst, ref, arg)`. Transmission does not have to take place. E.g.
|
|
the NCM driver should be capable of buffering multiple datagrams into one
|
|
big NCM transfer block.
|
|
The call must succeed.
|
|
|===
|
|
|
|
|
|
### Driver -> Glue Logic
|
|
|
|
The glue logic also provides some API which has to be used by the driver. The driver always
|
|
calls the glue logic in the TinyUSB context.
|
|
|
|
[%autowidth]
|
|
[%header]
|
|
|===
|
|
|Name | Comment
|
|
|
|
|tud_network_recv_cb(*src, size)
|
|
|Transfer a single datagram from the driver to the glue logic. When the layer above the glue logic (lwIP)
|
|
has handled the datagram, it issues a `tud_network_recv_renew()` so the process of datagram reception
|
|
does not die.
|
|
|
|
|tud_network_xmit_cb(*dst, *ref, arg)
|
|
|The driver calls this function from `tud_network_xmit()` to perform the actual copy operation
|
|
of the datagram from glue logic into the driver. The two parameters are not changed by
|
|
the driver, except that it specifies an additional copy destination.
|
|
|===
|
|
|
|
|
|
## The ncm_device_new driver & comparison
|
|
|
|
The following table holds a comparison between the several network drivers. The first seven bars are
|
|
created with
|
|
`for MSS in 100 200 400 800 1200 1450 1500; do iperf -c 192.168.14.1 -e -i 1 -M $MSS -l 8192 -P 1; sleep 2; done`.
|
|
After that SystemView is started with almost maximum load (~85000 events/s, 325 KByte/s) and after a while
|
|
iperf is started again in parallel.
|
|
|
|
The images are recorded with Wireshark. The red bars are "TCP Window Full" if not otherwise noted.
|
|
|
|
[%header]
|
|
|===
|
|
|Driver |
|
|
|
|
|**ECM** +
|
|
The driver shows expected behavior, nothing actually special.
|
|
a|image::benchmark-ecm.png[ECM]
|
|
|
|
|**RNDIS** +
|
|
Again nothing special.
|
|
a|image::benchmark-rndis.png[RNDIS]
|
|
|
|
|**Original NCM** +
|
|
The original NCM driver is very buggy as said. The red bars in the graph are not caused by "TCP Window Full".
|
|
Obscure messages in Wireshark show that the protocol is more or less garbage.
|
|
a|image::benchmark-ncm_device.png[Original NCM]
|
|
|
|
|**Simple NCM** +
|
|
The simple NCM driver behaves much better, but revealed some weaknesses in parallel operation (it also had
|
|
some overflows in SystemView without iperf in paralell. It had some stability issues and it also had bugs
|
|
on startup of the probe which was the actual reason to create `ncm_device_new`.
|
|
a|image::benchmark-ncm_device_simple.png[Simple NCM]
|
|
|
|
|**New NCM** +
|
|
`ncm_device_new` clearly shows best behavior. Throughput is best and during parallel operation there was
|
|
no packet loss when iperf used large packets. Also no obsucre Wireshark messages in parallel operation.
|
|
a|image::benchmark-ncm_device_new.png[New NCM]
|
|
|===
|
|
|
|
So obviously `ncm_device_new` is the clear winner: best in performance, best in functionality, best in compatibility.
|
|
What else is needed?
|
|
|
|
|
|
## Log
|
|
|
|
### 2023-05-12
|
|
|
|
* for unknown reasons the probe is even with ECM in stutter mode, don't know
|
|
why, that worked before smoothly. Transfer rate is bad
|
|
* systemview test program (NoOS) on the target:
|
|
** it already worked with around 10000 events/s, now the limit is ~3000
|
|
** if there is a SysTick ISR then SystemView is completely messed up.
|
|
Checked that locking is included. Seems to be so.
|
|
|
|
### 2023-06-26
|
|
|
|
* after some changes to `rtt_console.c`, `net_sysview.c` and `net_glue.c`
|
|
ECM is working again as expected
|
|
|
|
### 2023-06-30
|
|
|
|
* for debugging purposes reimplemented `ncm_device_simple.c` which can hold only
|
|
one ethernet frame per NTB (NCM Transfer Block). This unfortunately requires
|
|
that the original `ncm_device.c` must be outcommented via `#if` on top.
|
|
|
|
### 2023-07-14
|
|
|
|
* did some performance tuning with lwIP and TinyUSB
|
|
* stripped sources
|
|
* BUG: `ncm_device_simple` sometimes crashes with `iperf`
|
|
|
|
### 2023-08-11
|
|
|
|
* BUG: with `ncm_device_simple` startup time of the probe is much longer compared
|
|
to ECM/RNDIS or even `ncm_device`. With startup time I mean the time until there is
|
|
something visible on the probes debug output. For ECM/RNDIS/ncm_device this is almost
|
|
instantly, with `ncm_device_simple` it takes ~10s! +
|
|
-> reverted to `ncm_device` because SystemView runs without problems with it +
|
|
-> solved with `ncm_device_new`
|
|
|
|
### 2023-08-16
|
|
|
|
* new driver: `ncm_device_new`
|
|
** works (better then `ncm_device_simple`), but
|
|
*** [x] problems, if `wNtbOutMaxDatagrams!=1` -> see comment
|
|
*** [x] iperf also shows problems if `-P` is > 1. I guess this is an iperf problem, because iperf
|
|
and SystemView are running parallel without such errors
|
|
*** [ ] surprisingly `iperf` performance is much better with actual firmware.
|
|
`cmake-create-debugEE` has just half performance
|
|
*** [x] but all these problems also exists with `ncm_device`. Is it in the glue code?
|
|
Possible, because the effect is also with ECM driver. No, not in the glue code, because
|
|
iperf and SystemView work in parallel
|
|
*** [x] packets/s is changing heavily, setting `wNtbOutMaxDatagrams==0` helped to prevent raising
|
|
of packet rate (sometimes there are two datagrams in one NTB even with SystemView)
|
|
** how to continue?
|
|
*** [x] need a test case where `tud_network_can_xmit()` collects datagrams. Currently
|
|
there is always only one active xmit datagram, perhaps `iperf` with `-P 4` does it. +
|
|
-> this all happens under load testing
|
|
*** [x] check if there is a problem in the glue code for datagram reception. Glue buffer freed too soon?
|
|
No, I doubt it. But examples are few.
|
|
|
|
### 2023-08-18
|
|
|
|
* the new driver is ready. Had some optimization loops, but now it seems to work pretty well
|