March 2023 Android Update Post-mortem
or, “My 3 weeks of hell”
Brenton Bostick
Thursday Apr 6 2023
Overview
Timeline
mon Feb 27 2023:
finished work
https://bostick.github.io/zerotier/android-report/report.htm
fri Mar 3 2023:
1.10.3-1 tag
mon Mar 6 2023:
initial release
all hell broke loose
tues Mar 7 2023:
got Samsung A13 from Micro Center
1.10.3-2 tag
wed Mar 8 2023:
actual fix for ANDROID-56: crash inside newNetworkConfig
cast all arguments to varargs functions as good style
The source for the crashes of the Android is fundamentally due to calling JNI functions like `NewObject()` with incorrect sized arguments. Functions like `NewObject()` are necessarily vararg, and there is no way for C to know that the programmer is providing the correct arguments.
Arguments that were being treated as booleans on the Java side were being passed in as ints on the C side, and the crucial part is that JNI assumes booleans being passed in are only 1 byte.
In the case of calling `NewObject()` for `VirtualNetworkConfig`, the bad booleans were then causing subsequent vararg arguments to be corrupted, i.e., the 0th byte of the boolean argument was treated as the boolean, but then the 1st, 2nd, and 3rd bytes of the boolean argument were being treated as the 0th, 1st, and 2nd bytes of the next argument. So absolute chaos.
Why was this so hard to find?
All of my devices were ARMv8 and believe this shows up on ARMv7 devices. The Samsung A13 I got on Tuesday is ARMv7. I haven’t checked deeper on ABI differences to be definitive.
Why did this not happen before my changes?
A part of my changes was the move to using single calls to `NewObject()` to construct objects. Before my changes, objects tended to be constructed empty, and then have the fields set individually with e.g. `SetBooleanField()`. So no varargs problem.
I’m changing my JNI style to now always cast arguments to the JNI types when calling vararg functions like `NewObject()`, even it is completely redundant, just to be completely safe.
thurs Mar 9 2023:
fixed fromInt exception being thrown:
//
// Before the 1.10.3-1 release,
// NetworkType had these values:
// UNKNOWN(0),
// PRIVATE(1),
// PUBLIC(2);
//
// After 1.10.3-1,
// NetworkType had these values:
// PRIVATE(0),
// PUBLIC(1);
//
// There was a bug:
// when older versions were updated to newer versions,
// the serialized version of NetworkType in the database was sent to fromInt,
// and if the NetworkType had been PUBLIC(2), then this would now throw an exception
//
// This was not originally tested because the NetworkType property of NetworkConfiguration
// is now deprecated in preference to using VirtualNetworkType in VirtualNetworkConfig directly
//
// So make sure to handle the old value of PUBLIC(2)
//
// This is ok to do because this value is not used anywhere, we just need to make sure to not
// throw an exception.
//
1.10.4-1 tag
mon Mar 13 2023:
1.10.4-2 tag
tues Mar 14 2023:
actual fix for ANDROID-63:
do not keep VNC objects as property of Network
GC causes Network objects to be collected and lose VNC objects
Add a map to ZeroTierOneService for keeping VNC objects
Also, NetworkDetailsStatusFragment now needs to bind to ZeroTierOneService
to get access to VNC map
1.10.5-1 tag
wed Mar 15 2023:
fix ANDROID-66: addressToNetmask for IPv6
Xiaomi phone delivered
1.10.5-2 tag
thurs Mar 16 2023:
1.10.5-3 tag
fri Mar 17 2023:
fix ANDROID-67: ANRs due to no network connectivity
//
// First, completely disconnect from previous service
//
// There was a bug:
// Even with no network connectivity, NetworkInfoUtils.getNetworkInfoCurrentConnection()
// was not returning CurrentConnection.CONNECTION_NONE
//
// This was because a previous service may still be connected and that VPN connection is treated
// as a network connection (even if not on WiFi or Cell)
//
// So the gate-keeping for network connectivity failed here.
// But then the service is actually destroyed here before creating a new one.
// So no VPN connection any more.
// So the gate-keeping for network connectivity in ZeroTierOneService then causes a
// 30-second hang on main thread while waiting for connectivity.
//
sat Mar 18 2023:
Motorola Moto G from Micro Center
tues Mar 21 2023:
1.10.5-4 tag
wed Mar 22 2023:
fix ANDROID-79: handle non-empty networks.d after a crash
fix ANDROID-75: race causing NPE in TunTapAdapter$1.run
ThreadMode.BACKGROUND:
EventBus uses a single background thread that will deliver all its events sequentially.
This is important.
hreadMode.ASYNC works from a thread pool and may end up executing on different threads
thurs Mar 23 2023:
1.10.6-1 tag
fri Mar 24 2023:
fix ANDROID-73: relay through main thread to guarantee posting to “the” background thread
1.10.6-2 tag
discuss threads
https://discuss.zerotier.com/t/android-zerotier-keep-stop-working-after-1-10-3-update/12184
https://discuss.zerotier.com/t/zerotier-android-client-keeps-disconnecting/12230
https://discuss.zerotier.com/t/android-1-10-3-problems-continued/12260
https://discuss.zerotier.com/t/android-app-battery-use-wifi-vs-cell/12535
https://discuss.zerotier.com/t/zerotier-is-connected-on-android-device-no-vpn/12601
Biggest Issues
Could not reproduce problems
ARMv7
32-bit
garbage collection
not fully understanding that greenDAO keeps WeakReferences, and Transient fields may be lost at any time
EventBus BACKGROUND
not fully understanding EventBus BACKGROUND
from:
https://greenrobot.org/eventbus/documentation/delivery-threads-threadmode/#ThreadMode_BACKGROUND
Subscribers will be called in a background thread. On Android, if the posting thread is not the main thread, event handler methods will be called directly in the posting thread. If the posting thread is the main thread, EventBus uses a single background thread that will deliver all its events sequentially. Event handlers using this mode should try to return quickly to avoid blocking the background thread.
Future steps
releases: smaller audience, spread over time
more dogfooding
reduce number of moving parts
ONLY update Android code or ONLY update ZeroTierOne code
more QA
testing upgrading from previous versions
more devices
various tricks for forcing GC (Battery Saver)
Beta program