notes revision 110592
1280905Sganbold$FreeBSD: head/sys/geom/notes 110592 2003-02-09 17:04:57Z phk $
2280905Sganbold
3280905SganboldFor the lack of a better place to put them, this file will contain
4280905Sganboldnotes on some of the more intricate details of geom.
5280905Sganbold
6280905Sganbold-----------------------------------------------------------------------
7280905SganboldLocking of bio_children and bio_inbed
8280905Sganbold
9280905Sganboldbio_children is used by g_std_done() and g_clone_bio() to keep track
10280905Sganboldof children cloned off a request.  g_clone_bio will increment the
11280905Sganboldbio_children counter for each time it is called and g_std_done will
12280905Sganboldincrement bio_inbed for every call, and if the two counters are
13280905Sganboldequal, call g_io_deliver() on the parent bio.
14280905Sganbold
15280905SganboldThe general assumption is that g_clone_bio() is called only in
16280905Sganboldthe g_down thread, and g_std_done() only in the g_up thread and
17280905Sganboldtherefore the two fields do not generally need locking.  These
18280905Sganboldrestrictions are not enforced by the code, but only with great
19280905Sganboldcare should they be violated.
20280905Sganbold
21280905SganboldIt is the responsibility of the class implementation to avoid the
22280905Sganboldfollowing race condition:  A class intend to split a bio in two
23280905Sganboldchildren.  It clones the bio, and requests I/O on the child. 
24280905SganboldThis I/O operation completes before the second child is cloned
25280905Sganboldand g_std_done() sees the counters both equal 1 and finishes off
26280905Sganboldthe bio.
27280905Sganbold
28280905SganboldThere is no race present in the common case where the bio is split
29280905Sganboldin multiple parts in the class start method and the I/O is requested
30280905Sganboldon another GEOM class below:  There is only one g_down thread and
31280905Sganboldthe class below will not get its start method run until we return
32280905Sganboldfrom our start method, and consequently the I/O cannot complete
33280905Sganboldprematurely.
34280905Sganbold
35281418SganboldIn all other cases, this race needs to be mitigated, for instance
36281418Sganboldby cloning all children before I/O is request on any of them.
37280905Sganbold
38280905SganboldNotice that cloning an "extra" child and calling g_std_done() on
39281418Sganboldit directly opens another race since the assumption is that
40280905Sganboldg_std_done() only is called in the g_up thread.
41280905Sganbold
42280905Sganbold-----------------------------------------------------------------------
43280905SganboldStatistics collection
44280905Sganbold
45280905SganboldStatistics collection can run at three levels controlled by the
46280905Sganbold"kern.geom.collectstats" sysctl.
47280905Sganbold
48281418SganboldAt level zero, only the number of transactions started and completed
49280905Sganboldare counted, and this is only because GEOM internally uses the difference
50280905Sganboldbetween these two as sanity checks.
51280905Sganbold
52280905SganboldAt level one we collect the full statistics.  Higher levels are
53280905Sganboldreserved for future use.  Statistics are collected independently
54280905Sganboldon both the provider and the consumer, because multiple consumers
55280905Sganboldcan be active against the same provider at the same time.
56280905Sganbold
57280905SganboldThe statistics collection falls in two parts:
58280905Sganbold
59The first and simpler part consists of g_io_request() timestamping
60the struct bio when the request is first started and g_io_deliver()
61updating the consumer and providers statistics based on fields in
62the bio when it is completed.  There are no concurrency or locking
63concerns in this part.  The statistics collected consists of number
64of requests, number of bytes, number of ENOMEM errors, number of
65other errors and duration of the request for each of the three
66major request types: BIO_READ, BIO_WRITE and BIO_DELETE.
67
68The second part is trying to keep track of the "busy%".
69
70If in g_io_request() we find that there are no outstanding requests,
71(based on the counters for scheduled and completed requests being
72equal), we set a timestamp in the "wentbusy" field.  Since there
73are no outstanding requests, and as long as there is only one thread
74pushing the g_down queue, we cannot possibly conflict with
75g_io_deliver() until we ship the current request down.
76
77In g_io_deliver() we calculate the delta-T from wentbusy and add this
78to the "bt" field, and set wentbusy to the current timestamp.  We
79take care to do this before we increment the "requests completed"
80counter, since that prevents g_io_request() from touching the
81"wentbusy" timestamp concurrently.
82
83The statistics data is made available to userland through the use
84of a special allocator (in geom_stats.c) which through a device
85allows userland to mmap(2) the pages containing the statistics data.
86In order to indicate to userland when the data in a statstics
87structure might be inconsistent, g_io_deliver() atomically sets a
88flag "updating" and resets it when the structure is again consistent.
89