Bug 218375 - [GTK] imported WPT tests are very flaky on Ubuntu 20.04 (xdg-desktop-portal 1.6)
Summary: [GTK] imported WPT tests are very flaky on Ubuntu 20.04 (xdg-desktop-portal 1.6)
Status: RESOLVED CONFIGURATION CHANGED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-10-29 23:58 PDT by Fujii Hironori
Modified: 2020-11-05 06:16 PST (History)
3 users (show)

See Also:


Attachments
debug log (5.50 MB, application/gzip)
2020-11-01 12:54 PST, Fujii Hironori
no flags Details
dbus session log (34.98 KB, application/gzip)
2020-11-04 17:40 PST, Fujii Hironori
no flags Details
dbus-session-2.log.gz (68.49 KB, application/gzip)
2020-11-04 18:34 PST, Fujii Hironori
no flags Details
dbus-system-2.log.gz (5.03 KB, application/gzip)
2020-11-04 18:35 PST, Fujii Hironori
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Fujii Hironori 2020-10-29 23:58:30 PDT
[GTK] wpt is very flaky on my Linux box

wpt reports a lot of flaky failures on my Linux box.

  ./Tools/Scripts/run-webkit-tests --gtk --release imported/w3c/web-platform-tests

I'm using Ubuntu 20.04 on VirtualBox on Windows.
Comment 1 Fujii Hironori 2020-10-30 00:00:22 PDT
This is reproducible by running a single test case repeatedly.

  ./Tools/Scripts/run-webkit-tests --gtk --release imported/w3c/web-platform-tests/css/css-cascade/all-prop-001.html --repeat-each=3000

> Running 1 test
> 
> Running 1 WebKitTestRunner.     
> 
> [181/3000] imported/w3c/web-platform-tests/css/css-cascade/all-prop-001.html failed unexpectedly (reference mismatch)
> [363/3000] imported/w3c/web-platform-tests/css/css-cascade/all-prop-001.html failed unexpectedly (reference mismatch)
> [548/3000] imported/w3c/web-platform-tests/css/css-cascade/all-prop-001.html failed unexpectedly (reference mismatch)
> [727/3000] imported/w3c/web-platform-tests/css/css-cascade/all-prop-001.html failed unexpectedly (reference mismatch)
> [914/3000] imported/w3c/web-platform-tests/css/css-cascade/all-prop-001.html failed unexpectedly (reference mismatch)
> [1110/3000] imported/w3c/web-platform-tests/css/css-cascade/all-prop-001.html failed unexpectedly (reference mismatch)
> [1306/3000] imported/w3c/web-platform-tests/css/css-cascade/all-prop-001.html failed unexpectedly (reference mismatch)
> [1496/3000] imported/w3c/web-platform-tests/css/css-cascade/all-prop-001.html failed unexpectedly (reference mismatch)
> [1678/3000] imported/w3c/web-platform-tests/css/css-cascade/all-prop-001.html failed unexpectedly (reference mismatch)
> [1860/3000] imported/w3c/web-platform-tests/css/css-cascade/all-prop-001.html failed unexpectedly (reference mismatch)
> [2045/3000] imported/w3c/web-platform-tests/css/css-cascade/all-prop-001.html failed unexpectedly (reference mismatch)
> [2220/3000] imported/w3c/web-platform-tests/css/css-cascade/all-prop-001.html failed unexpectedly (reference mismatch)
> [2395/3000] imported/w3c/web-platform-tests/css/css-cascade/all-prop-001.html failed unexpectedly (reference mismatch)
> [2571/3000] imported/w3c/web-platform-tests/css/css-cascade/all-prop-001.html failed unexpectedly (reference mismatch)
> [2754/3000] imported/w3c/web-platform-tests/css/css-cascade/all-prop-001.html failed unexpectedly (reference mismatch)
> [2941/3000] imported/w3c/web-platform-tests/css/css-cascade/all-prop-001.html failed unexpectedly (reference mismatch)
>                                                                                     
> Retrying 1 unexpected failure ...
> 
> Running 1 WebKitTestRunner.
> 
>                                                                        
> 2984 tests ran as expected, 16 didn't:

Hmm, it is constantly failing every 180-190 iterations.
Comment 2 Fujii Hironori 2020-10-30 00:09:04 PDT
If I copy all-prop-001.html and all-prop-001-expected.html into LayoutTests/fast and LayoutTests/http/tests/css directories, it doesn't reproduce the flaky failures.

  ./Tools/Scripts/run-webkit-tests --gtk --release fast/all-prop-001.html --repeat-each=3000 -f
  ./Tools/Scripts/run-webkit-tests --gtk --release http/tests/css/all-prop-001.html --repeat-each=3000 -f
Comment 3 Fujii Hironori 2020-10-30 00:17:27 PDT
This issue can be reproduced by invoking WebKitTestRunner manually.

./Tools/Scripts/run-webkit-httpd --no-httpd
./Tools/Scripts/webkit-flatpak --gtk --release -c /usr/bin/bash
export TEST_RUNNER_TEST_PLUGIN_PATH=$PWD/WebKitBuild/GTK/Release/lib
yes http://localhost:8800/css/css-cascade/all-prop-001.html | head -3000 | WebKitBuild/GTK/Release/bin/WebKitTestRunner - | tee wpt.log

WebKitTestRunner renders blank pages intermittently.

> Content-Type: text/plain
> layer at (0,0) size 800x600
>   RenderView at (0,0) size 800x600
> layer at (0,0) size 800x158
>   RenderBlock {HTML} at (0,0) size 800x158
>     RenderBody {BODY} at (8,16) size 784x134
>       RenderBlock {P} at (0,0) size 784x18
>         RenderText {#text} at (0,0) size 294x17
>           text run at (0,0) width 294: "Test passes if there is a filled green square and "
>         RenderInline {STRONG} at (0,0) size 45x17
>           RenderText {#text} at (293,0) size 45x17
>             text run at (293,0) width 45: "no red"
>         RenderText {#text} at (337,0) size 5x17
>           text run at (337,0) width 5: "."
> layer at (8,50) size 784x100
>   RenderBlock (relative positioned) {DIV} at (0,34) size 784x100
>     RenderBlock {DIV} at (684,0) size 100x100 [bgcolor=#FF0000]
> layer at (692,50) size 100x100
>   RenderBlock (positioned) {DIV} at (684,0) size 100x100 [bgcolor=#008000]
> #EOF
> #EOF
> Content-Type: text/plain
> layer at (0,0) size 800x600
>   RenderView at (0,0) size 800x600
> layer at (0,0) size 800x600
>   RenderBlock {HTML} at (0,0) size 800x600
>     RenderBody {BODY} at (8,8) size 784x584
> #EOF
> #EOF

Grepping RenderBody and numbering and greping the blank page.

grep RenderBody wpt.log | cat -n | grep 784x584
   293      RenderBody {BODY} at (8,8) size 784x584
   586      RenderBody {BODY} at (8,8) size 784x584
   886      RenderBody {BODY} at (8,8) size 784x584
  1163      RenderBody {BODY} at (8,8) size 784x584
  1442      RenderBody {BODY} at (8,8) size 784x584
  1716      RenderBody {BODY} at (8,8) size 784x584
  1995      RenderBody {BODY} at (8,8) size 784x584
  2263      RenderBody {BODY} at (8,8) size 784x584
  2545      RenderBody {BODY} at (8,8) size 784x584
  2827      RenderBody {BODY} at (8,8) size 784x584

constantly failing.
Comment 4 Carlos Alberto Lopez Perez 2020-10-30 05:32:47 PDT
Strange.

I wonder if it can be related to bug 212622 ?
Perhaps you have a left-over http server running from a previous run?

Can you retry to reboot the linux box and see if you can reproduce the issue after a fresh boot?
Comment 5 Fujii Hironori 2020-10-30 12:49:55 PDT
Thanks, but still no luck in a fresh boot. I'm going to enable debug logging.
Comment 6 Fujii Hironori 2020-11-01 12:54:26 PST
Created attachment 412869 [details]
debug log
Comment 7 Fujii Hironori 2020-11-01 12:56:07 PST
Everytime the blank page is shown, the following error messages were reported.

< HTTP/1.1 7 GDBus.Error:org.freedesktop.DBus.Error.NoReply: Message recipient disconnected from message bus without replying
< Soup-Debug-Timestamp: 1604263197
< Soup-Debug: SoupMessage 0 (0x5597b84550b0)
  
(WebProcess) WebResourceLoader::didFailResourceLoad for 'http://localhost:8800/css/css-cascade/all-prop-001.html'
Failed to load 'http://localhost:8800/css/css-cascade/all-prop-001.html'.
Comment 8 Fujii Hironori 2020-11-01 13:15:14 PST
I don't think this is an issue of wpt.py because I observe no issues by requesting the URL with curl.

seq 30000 | xargs -n 1 -P 30 curl http://localhost:8800/apng/supported-in-source-type.html -o
md5sum * | sort
Comment 9 Michael Catanzaro 2020-11-04 15:32:08 PST
Ouch.

You can try to use bustle (recommended) or dbus-monitor to figure out what message is being sent. Be sure to check both the session bus and the system bus. If you don't see anything, then I guess next step is to try to figure out how to run xdg-dbus-proxy in some debugging mode to see if it's blocking anything.
Comment 10 Fujii Hironori 2020-11-04 17:40:53 PST
Created attachment 413232 [details]
dbus session log

I recorded the session bus with dbus-monitor while runing run-webkit-tests.

> ./Tools/Scripts/run-webkit-tests --gtk --debug imported/w3c/web-platform-tests/css/css-cascade/all-prop-001.html --repeat-each=300 --no-retry-failures --no-show-results

While run-webkit-tests reported three flaky failures,

> [27/300] imported/w3c/web-platform-tests/css/css-cascade/all-prop-001.html failed unexpectedly (reference mismatch)
> [98/300] imported/w3c/web-platform-tests/css/css-cascade/all-prop-001.html failed unexpectedly (reference mismatch)
> [171/300] imported/w3c/web-platform-tests/css/css-cascade/all-prop-001.html failed unexpectedly (reference mismatch)
                                                                                  
dbus-monitor reported 3 org.freedesktop.DBus.Error.NoReply errors.
Comment 11 Fujii Hironori 2020-11-04 18:34:44 PST
Created attachment 413239 [details]
dbus-session-2.log.gz

I recorded the system bus and session bus at the same time whiel running run-webkit-tests.
In this time, run-webkit-tests reported 2 flaky failures.

The session bus reported 2 NoReply error.

> error time=1604542788.420320 sender=org.freedesktop.DBus -> destination=:1.1175 error_name=org.freedesktop.DBus.Error.NoReply reply_serial=310
> error time=1604542849.392397 sender=org.freedesktop.DBus -> destination=:1.1175 error_name=org.freedesktop.DBus.Error.NoReply reply_serial=591

The system bus reported several signals at the same times.
Comment 12 Fujii Hironori 2020-11-04 18:35:05 PST
Created attachment 413240 [details]
dbus-system-2.log.gz
Comment 13 Fujii Hironori 2020-11-04 19:11:35 PST
There is the word 'coredump' in dbus-system-2.log.gz.
coredumpctl has a lot of xdg-desktop-portal coredump.

$ coredumpctl -r | head
TIME                            PID   UID   GID SIG COREFILE  EXE
Thu 2020-11-05 12:07:59 JST  104497  1000  1000  11 present   /usr/libexec/xdg-desktop-portal
Thu 2020-11-05 12:07:28 JST  103814  1000  1000  11 present   /usr/libexec/xdg-desktop-portal
Thu 2020-11-05 12:06:58 JST  103100  1000  1000  11 present   /usr/libexec/xdg-desktop-portal
Thu 2020-11-05 12:06:28 JST  100552  1000  1000  11 present   /usr/libexec/xdg-desktop-portal
Thu 2020-11-05 11:20:49 JST   99870  1000  1000  11 present   /usr/libexec/xdg-desktop-portal
Thu 2020-11-05 11:20:19 JST   99220  1000  1000  11 present   /usr/libexec/xdg-desktop-portal
Thu 2020-11-05 11:19:48 JST   98562  1000  1000  11 present   /usr/libexec/xdg-desktop-portal
Thu 2020-11-05 11:19:18 JST   97888  1000  1000  11 present   /usr/libexec/xdg-desktop-portal
Thu 2020-11-05 11:15:33 JST   95785  1000  1000  11 present   /usr/libexec/xdg-desktop-portal

I confirmed running run-webkit-tests increases coredump of  xdg-desktop-portal.
Comment 14 Fujii Hironori 2020-11-04 19:55:03 PST
This is the backtrace (without debug sysmbols):

#0  0x00007f711bc4c494 in g_str_hash () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#1  0x00007f711bc4b5dc in g_hash_table_lookup () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2  0x0000556f8b75f282 in ?? ()
#3  0x0000556f8b75f778 in ?? ()
#4  0x00007f711bc87931 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#5  0x00007f711bbf2609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6  0x00007f711bb19293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

It seems like this issue.

segfault when running org.freedesktop.Platform/x86_64/19.08 · Issue #433 · flatpak/xdg-desktop-portal · GitHub
https://github.com/flatpak/xdg-desktop-portal/issues/433
Comment 15 Fujii Hironori 2020-11-04 20:31:25 PST
I upgraded to Ubuntu 20.10 (xdg-desktop-portal 1.8). It works fine now.
Comment 16 Michael Catanzaro 2020-11-05 06:16:31 PST
Wow.