storage revision 1.10
1$NetBSD: storage,v 1.10 2015/11/20 07:20:21 dholland Exp $ 2 3NetBSD Storage Roadmap 4====================== 5 6This is a small roadmap document, and deals with the storage and file 7systems side of the operating system. It discusses elements, projects, 8and goals that are under development or under discussion; and it is 9divided into three categories based on perceived priority. 10 11The following elements, projects, and goals are considered strategic 12priorities for the project: 13 14 1. Improving iscsi 15 2. nfsv4 support 16 3. A better journaling file system solution 17 4. Getting zfs working for real 18 5. Seamless full-disk encryption 19 20The following elements, projects, and goals are not strategic 21priorities but are still important undertakings worth doing: 22 23 6. lfs64 24 7. Per-process namespaces 25 8. lvm tidyup 26 9. Flash translation layer 27 10. Shingled disk support 28 11. ext3/ext4 support 29 12. Port hammer from Dragonfly 30 13. afs maintenance 31 14. execute-in-place 32 33The following elements, projects, and goals are perhaps less pressing; 34this doesn't mean one shouldn't work on them but the expected payoff 35is perhaps less than for other things: 36 37 15. coda maintenance 38 39 40Explanations 41============ 42 431. Improving iscsi 44------------------ 45 46Both the existing iscsi target and initiator are fairly bad code, and 47neither works terribly well. Fixing this is fairly important as iscsi 48is where it's at for remote block devices. Note that there appears to 49be no compelling reason to move the target to the kernel or otherwise 50make major architectural changes. 51 52 - As of November 2015 nobody is known to be working on this. 53 - There is currently no clear timeframe or release target. 54 - Contact agc for further information. 55 56 572. nfsv4 support 58---------------- 59 60nfsv4 is at this point the de facto standard for FS-level (as opposed 61to block-level) network volumes in production settings. The legacy nfs 62code currently in NetBSD only supports nfsv2 and nfsv3. 63 64The intended plan is to port FreeBSD's nfsv4 code, which also includes 65nfsv2 and nfsv3 support, and eventually transition to it completely, 66dropping our current nfs code. (Which is kind of a mess.) So far the 67only step that has been taken is to import the code from FreeBSD. The 68next step is to update that import (since it was done a while ago now) 69and then work on getting it to configure and compile. 70 71 - As of November 2015 nobody is working on this, and a volunteer to 72 take charge is urgently needed. 73 - There is no clear timeframe or release target, although having an 74 experimental version ready for -8 would be great. 75 - Contact dholland for further information. 76 77 783. A better journaling file system solution 79------------------------------------------- 80 81WAPBL, the journaling FFS that NetBSD rolled out some time back, has a 82critical problem: it does not address the historic ffs behavior of 83allowing stale on-disk data to leak into user files in crashes. And 84because it runs faster, this happens more often and with more data. 85This situation is both a correctness and a security liability. Fixing 86it has turned out to be difficult. It is not really clear what the 87best option at this point is: 88 89+ Fixing WAPBL (e.g. to flush newly allocated/newly written blocks to 90disk early) has been examined by several people who know the code base 91and judged difficult. Still, it might be the best way forward. 92 93+ There is another journaling FFS; the Harvard one done by Margo 94Seltzer's group some years back. We have a copy of this, but as it was 95written in BSD/OS circa 1999 it needs a lot of merging, and then will 96undoubtedly also need a certain amount of polishing to be ready for 97production use. It does record-based rather than block-based 98journaling and does not share the stale data problem. 99 100+ We could bring back softupdates (in the softupdates-with-journaling 101form found today in FreeBSD) -- this code is even more complicated 102than the softupdates code we removed back in 2009, and it's not clear 103that it's any more robust either. However, it would solve the stale 104data problem if someone wanted to port it over. It isn't clear that 105this would be any less work than getting the Harvard journaling FFS 106running... or than writing a whole new file system either. 107 108+ We could write a whole new journaling file system. (That is, not 109FFS. Doing a new journaling FFS implementation is probably not 110sensible relative to merging the Harvard journaling FFS.) This is a 111big project. 112 113Right now it is not clear which of these avenues is the best way 114forward. Given the general manpower shortage, it may be that the best 115way is whatever looks best to someone who wants to work on the 116problem. 117 118 - As of November 2015 nobody is working on fixing WAPBL. There has 119 been some interest in the Harvard journaling FFS but no significant 120 progress. Nobody is known to be working on or particularly 121 interested in porting softupdates-with-journaling. And, while 122 dholland has been mumbling for some time about a plan for a 123 specific new file system to solve this problem, there isn't any 124 realistic prospect of significant progress on that in the 125 foreseeable future, and nobody else is known to have or be working 126 on even that much. 127 - There is no clear timeframe or release target; but given that WAPBL 128 has been disabled by default for new installs in -7 this problem 129 can reasonably be said to have become critical. 130 - Contact joerg or martin regarding WAPBL; contact dholland regarding 131 the Harvard journaling FFS. 132 133 1344. Getting zfs working for real 135------------------------------- 136 137ZFS has been almost working for years now. It is high time we got it 138really working. One of the things this entails is updating the ZFS 139code, as what we have is rather old. The Illumos version is probably 140what we want for this. 141 142 - There has been intermittent work on zfs, but as of November 2015 143 nobody is known to be actively working on it 144 - There is no clear timeframe or release target. 145 - Contact riastradh or ?? for further information. 146 147 1485. Seamless full-disk encryption 149-------------------------------- 150 151(This is only sort of a storage issue.) We have cgd, and it is 152believed to still be cryptographically suitable, at least for the time 153being. However, we don't have any of the following things: 154 155+ An easy way to install a machine with full-disk encryption. It 156should really just be a checkbox item in sysinst, or not much more 157than that. 158 159+ Ideally, also an easy way to turn on full-disk encryption for a 160machine that's already been installed, though this is harder. 161 162+ A good story for booting off a disk that is otherwise encrypted; 163obviously one cannot encrypt the bootblocks, but it isn't clear where 164in boot the encrypted volume should take over, or how to make a best 165effort at protecting the unencrypted elements needed to boot. (At 166least, in the absence of something like UEFI secure boot combined with 167an cryptographic oracle to sign your bootloader image so UEFI will 168accept it.) There's also the question of how one runs cgdconfig(8) and 169where the cgdconfig binary comes from. 170 171+ A reasonable way to handle volume passphrases. MacOS apparently uses 172login passwords for this (or as passphrases for secondary keys, or 173something) and this seems to work well enough apart from the somewhat 174surreal experience of sometimes having to log in twice. However, it 175will complicate the bootup story. 176 177Given the increasing regulatory-level importance of full-disk 178encryption, this is at least a de facto requirement for using NetBSD 179on laptops in many circumstances. 180 181 - As of November 2015 nobody is known to be working on this. 182 - There is no clear timeframe or release target. 183 - Contact dholland for further information. 184 185 1866. lfs64 187-------- 188 189LFS currently only supports volumes up to 2 TB. As LFS is of interest 190for use on shingled disks (which are larger than 2 TB) and also for 191use on disk arrays (ditto) this is something of a problem. A 64-bit 192version of LFS for large volumes is in the works. 193 194 - As of November 2015 dholland is working on this. 195 - It is close to being ready for at least experimental use and is 196 expected to be in 8.0. 197 - Responsible: dholland 198 199 2007. Per-process namespaces 201------------------------- 202 203Support for per-process variation of the file system namespace enables 204a number of things; more flexible chroots, for example, and also 205potentially more efficient pkgsrc builds. dholland thought up a 206somewhat hackish but low-footprint way to implement this. 207 208 - As of November 2015 dholland is working on this. 209 - It is scheduled to be in 8.0. 210 - Responsible: dholland 211 212 2138. lvm tidyup 214------------- 215 216[agc says someone should look at our lvm stuff; XXX fill this in] 217 218 - As of November 2015 nobody is known to be working on this. 219 - There is no clear timeframe or release target. 220 - Contact agc for further information. 221 222 2239. Flash translation layer 224-------------------------- 225 226SSDs ship with firmware called a "flash translation layer" that 227arbitrates between the block device software expects to see and the 228raw flash chips. FTLs handle wear leveling, lifetime management, and 229also internal caching, striping, and other performance concerns. While 230NetBSD has a file system for raw flash (chfs), it seems that given 231things NetBSD is often used for it ought to come with a flash 232translation layer as well. 233 234Note that this is an area where writing your own is probably a bad 235plan; it is a complicated area with a lot of prior art that's also 236reportedly full of patent mines. There are a couple of open FTL 237implementations that we might be able to import. 238 239 - As of November 2015 nobody is known to be working on this. 240 - There is no clear timeframe or release target. 241 - Contact dholland for further information. 242 243 24410. Shingled disk support 245------------------------- 246 247Shingled disks (or more technically, disks with "shingled magnetic 248recording" or SMR) can only write whole tracks at once. Thus, to 249operate effectively they require translation support similar to the 250flash translation layers found in SSDs. The nature and structure of 251shingle translation layers is still being researched; however, at some 252point we will want to support these things in NetBSD. 253 254 - As of November 2015 one of dholland's coworkers is looking at this. 255 - There is no clear timeframe or release target. 256 - Contact dholland for further information. 257 258 25911. ext3/ext4 support 260--------------------- 261 262We would like to be able to read and write Linux ext3fs and ext4fs 263volumes. (We can already read clean ext3fs volumes as they're the same 264as ext2fs, modulo volume features our ext2fs code does not support; 265but we can't write them.) 266 267Ideally someone would write ext3 and/or ext4 code, whether integrated 268with or separate from the ext2 code we already have. It might also 269make sense to port or wrap the Linux ext3 or ext4 code so it can be 270loaded as a GPL'd kernel module; it isn't clear if that would be more 271or less work than doing an implementation. 272 273Note however that implementing ext3 has already defeated several 274people; this is a harder project than it looks. 275 276 - As of November 2015 nobody is known to be working on this. 277 - There is no clear timeframe or release target. 278 - Contact ?? for further information. 279 280 28112. Port hammer from Dragonfly 282------------------------------ 283 284While the motivation for and role of hammer isn't perhaps super 285persuasive, it would still be good to have it. Porting it from 286Dragonfly is probably not that painful (compared to, say, zfs) but as 287the Dragonfly and NetBSD VFS layers have diverged in different 288directions from the original 4.4BSD, may not be entirely trivial 289either. 290 291 - As of November 2015 nobody is known to be working on this. 292 - There is no clear timeframe or release target. 293 - There probably isn't any particular person to contact; for VFS 294 concerns contact dholland or hannken. 295 296 29713. afs maintenance 298------------------- 299 300AFS needs periodic care and feeding to continue working as NetBSD 301changes, because the kernel-level bits aren't kept in the NetBSD tree 302and don't get updated with other things. This is an ongoing issue that 303always seems to need more manpower than it gets. It might make sense 304to import some of the kernel AFS code, or maybe even just some of the 305glue layer that it uses, in order to keep it more current. 306 307 - jakllsch sometimes works on this. 308 - We would like every release to have working AFS by the time it's 309 released. 310 - Contact jakllsch or gendalia about AFS; for VFS concerns contact 311 dholland or hannken. 312 313 31414. execute-in-place 315-------------------- 316 317It is likely that the future includes non-volatile storage (so-called 318"nvram") that looks like RAM from the perspective of software. Most 319importantly: the storage is memory-mapped rather than looking like a 320disk controller. There are a number of things NetBSD ought to have to 321be ready for this, of which probably the most important is 322"execute-in-place": when an executable is run from such storage, and 323mapped into user memory with mmap, the storage hardware pages should 324be able to appear directly in user memory. Right now they get 325gratuitously copied into RAM, which is slow and wasteful. There are 326also other reasons (e.g. embedded device ROMs) to want execute-in- 327place support. 328 329Note that at the implementation level this is a UVM issue rather than 330strictly a storage issue. 331 332Also note that one does not need access to nvram hardware to work on 333this issue; given the performance profiles touted for nvram 334technologies, a plain RAM disk like md(4) is sufficient both 335structurally and for performance analysis. 336 337 - As of November 2015 nobody is known to be working on this. Some 338 time back, uebayasi wrote some preliminary patches, but they were 339 rejected by the UVM maintainers. 340 - There is no clear timeframe or release target. 341 - Contact dholland for further information. 342 343 34415. coda maintenance 345-------------------- 346 347Coda only sort of works. [And I think it's behind relative to 348upstream, or something of the sort; XXX fill this in.] Also the code 349appears to have an ugly incestuous relationship with FFS. This should 350really be cleaned up. That or maybe it's time to remove Coda. 351 352 - As of November 2015 nobody is known to be working on this. 353 - There is no clear timeframe or release target. 354 - There isn't anyone in particular to contact. 355 356 357Alistair Crooks, David Holland 358Fri Nov 20 02:17:53 EST 2015 359