1/* $NetBSD: TODO.modules,v 1.24 2021/08/09 20:49:08 andvar Exp $ */
2
3Some notes on the limitations of our current (as of 7.99.35) module
4subsystem.  This list was triggered by an Email exchange between
5christos and pgoyette.
6
7 1. Builtin drivers can't depend on modularized drivers (the modularized
8    drivers are attempted to load as builtins).
9
10	The assumption is that dependencies are loaded before those
11	modules which depend on them.  At load time, a module's
12	undefined global symbols are resolved;  if any symbols can't
13	be resolved, the load fails.  Similarly, if a module is
14	included in (built-into) the kernel, all of its symbols must
15	be resolvable by the linker, otherwise the link fails.
16
17	There are ways around this (such as, having the parent
18	module's initialization command recursively call the module
19	load code), but they're often gross hacks.
20
21	Another alternative (which is used by ppp) is to provide a
22	"registration" mechanism for the "child" modules, and then when
23	the need for a specific child module is encountered, use
24	module_autoload() to load the child module.  Of course, this
25	requires that the parent module know about all potentially
26	loadable children.
27
28 2. Currently, config(1) has no way to "no define" drivers
29	XXX: I don't think this is true anymore. I think we can
30	undefine drivers now, see MODULAR in amd64, which does
31	no ath* and no select sppp*
32
33 3. It is not always obvious by their names which drivers/options
34    correspond to which modules.
35
36 4. Right now critical drivers that would need to be pre-loaded (ffs,
37    exec_elf64) are still built-in so that we don't need to alter the boot
38    blocks to boot.
39
40	This was a conscious decision by core@ some years ago.  It is
41	not a requirement that ffs or exec_* be built-in.  The only
42	requirement is that the root file-system's module must be
43	available when the module subsystem is initialized, in order
44	to load other modules.  This can be accomplished by having the
45	boot loader "push" the module at boot time.  (It used to do
46	this in all cases; currently the "push" only occurs if the
47	booted filesystem is not ffs.)
48
49 5. Not all parent bus drivers are capable of rescan, so some drivers
50    just have to be built-in.
51
52 6. Many (most?) drivers are not yet modularized
53
54 7. There's currently no provisions for autoconfig to figure out which
55    modules are needed, and thus to load the required modules.
56
57	In the "normal" built-in world, autoconfigure can only ask
58	existing drivers if they're willing to manage (ie, attach) a
59	device.  Removing the built-in drivers tends to limit the
60	availability of possible managers.  There's currently no
61	mechanism for identifying and loading drivers based on what
62	devices might be found.
63
64 8. Even for existing modules, there are "surprise" dependencies with
65    code that has not yet been modularized.
66
67	For example, even though the bpf code has been modularized,
68	there is some shared code in bpf_filter.c which is needed by
69	both ipfilter and ppp.  ipf is already modularized, but ppp
70	is not.  Thus, even though bpf_filter is modular, it MUST be
71	included as a built-in module if you also have ppp in your
72	configuration.
73
74	Another example is sysmon_taskq module.  It is required by
75	other parts of the sysmon subsystem, including the
76	"sysmon_power" module.  Unfortunately, even though the
77	sysmon_power code is modularized, it is referenced by the
78	acpi code which has not been modularized.  Therefore, if your
79	configuration has acpi, then you must include the "sysmon_power"
80	module built-in the kernel.  And therefore you also need to
81	have "sysmon_taskq" and "sysmon" built-in since "sysmon_power"
82	rerefences them.
83
84 9. As a corollary to #8 above, having dependencies on modules from code
85    which has not been modularized makes it extremely difficult to test
86    the module code adequately.  Testing of module code should include
87    both testing-as-a-built-in module and testing-as-a-loaded-module, and
88    all dependencies need to be identified.
89
9010. The current /stand/$ARCH/$VERSION/modules/ hierarchy won't scale as
91    we get more and more modules.  There are hundreds of potential device
92    driver modules.
93
9411. There currently isn't any good way to handle attachment-specific
95    modules.  The build infrastructure (ie, sys/modules/Makefile) doesn't
96    readily lend itself to bus-specific modules irrespective of $ARCH,
97    and maintaining distrib/sets/lists/modules/* is awkward at best.
98
99    Furthermore, devices such as ld(4), which can attach to a large set
100    of parent devices, need to be modified.  The parent devices need to
101    provide a common attribute (for example, ld_bus), and the ld driver
102    should attach to that attribute rather than to each parent.  But
103    currently, config(1) doesn't handle this - it doesn't allow an
104    attribute to be used as the device tree's pseudo-root. The current
105    directory structure where driver foo is split between ic/foo.c
106    and bus1/foo_bus1.c ... busn/foo_busn.c is annoying. It would be
107    better to switch to the FreeBSD model which puts all the driver
108    files in one directory.
109
11012. Item #11 gets even murkier when a particular parent can provide more
111    than one attribute.
112
11313. It seems that we might want some additional sets-lists "attributes"
114    to control contents of distributions.  As an example, many of our
115    architectures have PCI bus capabilities, but not all.  It is rather
116    painful to need to maintain individual architectures' modules/md_*
117    sets lists, especially when we already have to conditionalize the
118    build of the modules based on architecture.  If we had a single
119    "attribute" for PCI-bus-capable, the same attribute could be used to
120    select which modules to build and which modules from modules/mi to
121    include in the release.  (This is not limited to PCI;  recently we
122    encounter similar issues with spkr aka spkr_synth module.)
123
12414. As has been pointed out more than once, the current method of storing
125    modules in a version-specific subdirectory of /stand is sub-optimal
126    and leads to much difficulty and/or confusion.  A better mechanism of
127    associating a kernel and its modules needs to be developed.  Some
128    have suggested having a top-level directory (say, /netbsd) with a
129    kernel and its modules at /netbsd/kernel and /netbsd/modules/...
130    Whatever new mechanism we arrive at will probably require changes to
131    installation procedures and bootstrap code, and will need to handle
132    both the new and old mechanisms for compatibility.
133
134    One additional option mentioned is to be able to specify, at boot
135    loader time, an alternate value for the os-release portion of the
136    default module path,  i.e. /stand/$MACHINE/$ALT-RELEASE/modules/
137
138    The following statement regarding this issue was previously issued
139    by the "core" group:
140
141    Date: Fri, 27 Jul 2012 08:02:56 +0200
142    From: <redacted>
143    To: <redacted>
144    Subject: Core statement on directory naming for kernel modules
145
146    The core group would also like to see the following changes in
147    the near future:
148
149       Implementation of the scheme described by Luke Mewburn in
150       <http://mail-index.NetBSD.org/current-users/2009/05/10/msg009372.html>
151       to allow a kernel and its modules to be kept together.
152       Changes to config(1) to extend the existing notion of whether or not
153       an option is built-in to the kernel, to three states: built-in, not
154       built-in but loadable as a module, entirely excluded and not even
155       loadable as a module.
156
157
15815. The existing config(5) framework provides an excellent mechanism
159    for managing the content of kernels.  Unfortunately, this mechanism
160    does not apply for modules, and instead we need to manually manage
161    a list of files to include in the module, the set of compiler
162    definitions with which to build those files, and also the set of
163    other modules on which a module depends.  We really need a common
164    mechanism to define and build modules, whether they are included as
165    "built-in" modules or as separately-loadable modules.
166
167    (From John Nemeth) Some sort of mechanism for a (driver) module
168    to declare the list of vendor/product/other tuples that it can
169    handle would be nice.  Perhaps this would go in the module's .plist
170    file?  (See #17 below.)  Then drivers that scan for children might
171    be able to search the modules directory for an "appropriate" module
172    for each child, and auto-load.
173
17416. PR kern/52821 exposes another limitation of config(1) WRT modules.
175    Here, an explicit device attachment is required, because we cannot
176    rely on all kernel configs to contain the attribute at which the
177    modular driver wants to attach.  Unfortunately, the explicit
178    attachment causes conflicts with built-in drivers.  (See the PR for
179    more details.)
180
18117. (From John Nemeth) It would be potentially useful if a "push" from
182    the bootloader could also load-and-push a module's .plist (if it
183    exists.
184
18518. (From John Nemeth) Some sort of schema for a module to declare the
186    options (or other things?) that the module understands.  This could
187    result in a module-options editor to manipulate the .plist 
188
18919. (From John Nemeth) Currently, the order of module initialization is
190    based on module classes and declared dependencies.  It might be
191    useful to have additional classes (or sub-classes) with additional
192    invocations of module_class_init(), and it might be useful to have a
193    non-dependency mechanism to provide "IF module-A and module-B are
194    BOTH present, module-A needs to be initialized before module-B".
195
19620. (Long-ago memory rises to the surface) Note that currently there is
197    nothing that requires a module's name to correspond in any way with
198    the name of file from which the module is loaded.  Thus, it is
199    possible to attempt to access device /dev/x, discover that there is
200    no such device so we autoload /stand/.../x/x.kmod and initialize
201    the module loaded, even if the loaded module is for some other
202    device entirely!
203
20421. We currently do not support "weak" symbols in the in-kernel linker.
205    It would take some serious thought to get such support right.  For
206    example, consider module A with a weak reference to symbol S which
207    is defined in module B.  If module B is loaded first, and then
208    module A, the symbol gets resolved.  But if module A is loaded first,
209    the symbol won't be resolved.  If we subsequently load module B, we
210    would have to "go back" and re-run the linker for module A.
211
212    Additional difficulties arise when the module which defines the
213    weak symbol gets unloaded.  Then, you would need to re-run the
214    linker and _unresolve_ the weak symbol which is no longer defined.
215
21622. A fairly large number of modules still require a maximum warning
217    level of WARNS=3 due to signed-vs-unsigned integer comparisons.  We
218    really ought to clean these up.  (I haven't looked at them in any
219    detail, but I have to wonder how code that compiles cleanly in a
220    normal kernel has these issues when compiled in a module, when both
221    are done with WARNS=5).
222
22323. The current process of "load all the emulation/exec modules in case
224    one of them might handle the image currently being exec'd" isn't
225    really cool.  (See sys/kern/kern_exec.c?)  It ends up auto-loading
226    a whole bunch of modules, involving file-system access, just to have
227    most of the modules getting unloaded a few seconds later.  We don't
228    have any way to identify which module is needed for which image (ie,
229    we can't determine that an image needs compat_linux vs some other
230    module).
231
23224. Details are no longer remembered, but there are some issues with
233    building xen-variant modules (on amd4, and likely i386).  In some
234    cases, wrong headers are included (because a XEN-related #define
235    is missing), but even if you add the definition some headers get
236    included in the wrong order.  One particular fallout from this is
237    the inability to have a compat version of x86_64 cpu-microcode
238    module.  PR port-xen/53130
239
240    This is likely to be fixed by Chuck Silvers on 2020-07-04 which
241    removed the differences between the xen and non-xen module ABIs.
242    As of 2021-05-28 the cpu-microcode functionality has once again
243    been enabled for i386 and amd64 compat_60 modules.
244