History log of /freebsd-current/sys/geom/eli/g_eli.c
Revision Date Author Comments
# 838d5ae6 19-May-2024 Mariusz Zaborski <oshogbo@FreeBSD.org>

geli: fix indentation

no functional changes


# 4b3141f5 19-May-2024 Mariusz Zaborski <oshogbo@FreeBSD.org>

geli: allocate a UMA pool earlier

The functions g_eli_init_uma and g_eli_fini_uma are used to trace
the number of devices in GELI. There is an issue where the g_eli_create
function may fail before g_eli_init_uma is called, however
g_eli_fini_uma is still executed in the fail path. This can
incorrectly decrease the device count to zero, potentially leading to
the UMA pool being freed. Accessing the device after the pool has been
freed causes a system panic.

This commit resolves the issue by ensuring devices count is increassed
eariler.

PR: 278828
Reported by: Andre Albsmeier <mail@fbsd2.e4m.org>
Reviewed by: asomers
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D45225


# 3acf3fea 22-Apr-2024 Alan Somers <asomers@FreeBSD.org>

geli: add a read-only kern.geom.eli.use_uma_bytes sysctl

It reports the value of the g_eli_alloc_sz variable. Allocations of
this size or less will use UMA. Larger allocations will use malloc.
Since malloc is slower, it is useful for users to know this variable so
they can avoid such allocations. For example, ZFS users can set
vfs.zfs.vdev.aggregation_limit to this value.

MFC after: 1 week
Sponsored by: Axcient
Reviewed by: markj, imp
Differential Revision: https://reviews.freebsd.org/D44904


# fdafd315 24-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Automated cleanup of cdefs and other formatting

Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by: Netflix


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# bd5d9037 28-Dec-2022 Zhenlei Huang <zlei@FreeBSD.org>

GEOM: Remove redundant NULL pointer check before g_free()

Reviewed by: melifaro, pjd, imp
Approved by: kp (mentor)
Differential Revision: https://reviews.freebsd.org/D37779


# 081b4452 18-Apr-2022 Mark Johnston <markj@FreeBSD.org>

geli: Add a chicken switch for unmapped I/O

We have a report of a panic in GELI that appears to go away when
unmapped I/O is disabled. Add a tunable to make such investigations
easier in the future. No functional change intended.

PR: 262894
Reviewed by: asomers
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34944


# 8f7878e3 04-Apr-2022 Robert Wing <rew@FreeBSD.org>

geom_eli: fix set but not used warning


# c9048120 09-Dec-2021 Mateusz Guzik <mjg@FreeBSD.org>

geom_eli: mostly plug set-but-not-unused vars

The remaining case is an ignored error.

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 627d5d19 31-Oct-2021 Mateusz Guzik <mjg@FreeBSD.org>

geli: eli data -> eli_data for consistency with other geom classes

PR: 259392
Reported by: dewayne@heuristicsystems.com.au
MFC after: 1 week


# b984d153 01-Oct-2021 Gleb Smirnoff <glebius@FreeBSD.org>

Don't set GELI UMA zone as UMA_ZONE_NOFREE.

That fixes memory leak on last GELI provider destroyed, introduced
in 2dbc9a388ee. This patch was originally developed late 2019 and
the flag was necessary to prevent zone drainage under memory pressure.
Today, with f09cbea31a3f the UMA is fixed not to drain into reserves.

Discussed with: jtl, markj
Fixes: 2dbc9a388ee
PR: 258787


# 2dbc9a38 28-Sep-2021 Gleb Smirnoff <glebius@FreeBSD.org>

Fix memory deadlock when GELI partition is used for swap.

When we get low on memory, the VM system tries to free some by swapping
pages. However, if we are so low on free pages that GELI allocations block,
then the swapout operation cannot complete. This keeps the VM system from
being able to free enough memory so the allocation can complete.

To alleviate this, keep a UMA pool at the GELI layer which is used for data
buffer allocation in the fast path, and reserve some of that memory for swap
operations. If an IO operation is a swap, then use the reserved memory. If
the allocation still fails, return ENOMEM instead of blocking.

For non-swap allocations, change the default to using M_NOWAIT. In general,
this *should* be better, since it gives upper layers a signal of the memory
pressure and a chance to manage their failure strategy appropriately. However,
a user can set the kern.geom.eli.blocking_malloc sysctl/tunable to restore
the previous M_WAITOK strategy.

Submitted by: jtl
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D24400


# bc683a89 07-Oct-2020 Warner Losh <imp@FreeBSD.org>

Move kernel env global variables, etc to sys/kenv.h

The kernel globals for kenv are confined to 2 files that need them and
a few that likely shouldn't (but as written the code does). Move them
from sys/systm.h to sys/kenv.h. This removed a XXX from systm.h and
cleans it up a little bit...


# d40bc607 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

geom: clean up empty lines in .c and .h files


# 7d874f0f 25-Aug-2020 Alan Somers <asomers@FreeBSD.org>

geli: use unmapped I/O

Use unmapped I/O for geli. Unlike most geom providers, geli needs to
manipulate data on every read or write. Previously it would always map bios.

On my 16-core, dual socket server using geli atop md(4) devices, with 512B
sectors, this change increases geli IOPs by about 3x.

Note that geli still can't use unmapped I/O when data integrity verification
is enabled (but it could, with a little more work). And it can't use
unmapped I/O in combination with ZFS, because ZFS uses mapped bios.

Reviewed by: markj, kib, jhb, mjg, mat, bcr (manpages)
MFC after: 1 week
Sponsored by: Axcient
Differential Revision: https://reviews.freebsd.org/D25671


# 6f818c1f 08-Jul-2020 Alan Somers <asomers@FreeBSD.org>

geli: enable direct dispatch

geli does all of its crypto operations in a separate thread pool, so
g_eli_start, g_eli_read_done, and g_eli_write_done don't actually do very
much work. Enabling direct dispatch eliminates the g_up/g_down bottlenecks,
doubling IOPs on my system. This change does not affect the thread pool.

Reviewed by: markj
MFC after: 2 weeks
Sponsored by: Axcient
Differential Revision: https://reviews.freebsd.org/D25587


# 6572e5ff 25-Jun-2020 John Baldwin <jhb@FreeBSD.org>

Use explicit_bzero() instead of bzero() for sensitive data.

Reviewed by: delphij
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25441


# b172f23d 25-Jun-2020 John Baldwin <jhb@FreeBSD.org>

Use zfree() instead of bzero() and free().

These bzero's should have been explicit_bzero's.

Reviewed by: cem, delphij
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25437


# 4a711b8d 25-Jun-2020 John Baldwin <jhb@FreeBSD.org>

Use zfree() instead of explicit_bzero() and free().

In addition to reducing lines of code, this also ensures that the full
allocation is always zeroed avoiding possible bugs with incorrect
lengths passed to explicit_bzero().

Suggested by: cem
Reviewed by: cem, delphij
Approved by: csprng (cem)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25435


# a3d565a1 09-Jun-2020 John Baldwin <jhb@FreeBSD.org>

Add a crypto capability flag for accelerated software drivers.

Use this in GELI to print out a different message when accelerated
software such as AESNI is used vs plain software crypto.

While here, simplify the logic in GELI a bit for determing which type
of crypto driver was chosen the first time by examining the
capabilities of the matched driver after a single call to
crypto_newsession rather than making separate calls with different
flags.

Reviewed by: delphij
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25126


# 2a230609 27-May-2020 Alan Somers <asomers@FreeBSD.org>

geli: fix a livelock during panic

During any kind of shutdown, kern_reboot calls geli's pre_sync event hook,
which tries to destroy all unused geli devices. But during a panic, geli
can't destroy any devices, because the scheduler is stopped, so it can't
switch threads. A livelock results, and the system never dumps core.

This commit fixes the problem by refusing to destroy any devices during
panic, used or otherwise.

PR: 246207
Reviewed by: jhb
MFC after: 2 weeks
Sponsored by: Axcient
Differential Revision: https://reviews.freebsd.org/D24697


# ae1cce52 13-May-2020 Warner Losh <imp@FreeBSD.org>

Reimplement aliases in geom

The alias needs to be part of the provider instead of the geom to work
properly. To bind the DEV geom, we need to look at the provider's names and
aliases and create the dev entries from there. If this lives in the GEOM, then
it won't propigate down the tree properly. Remove it from geom, add it provider.

Update geli, gmountver, gnop, gpart, and guzip to use it, which handles the bulk
of the uses in FreeBSD. I think this is all the providers that create a new name
based on their parent's name.


# e2b99193 14-Apr-2020 John Baldwin <jhb@FreeBSD.org>

Remove support for geli(4) algorithms deprecated in r348206.

This removes support for reading and writing volumes using the
following algorithms:

- Triple DES
- Blowfish
- MD5 HMAC integrity

In addition, this commit adds an explicit whitelist of supported
algorithms to give a better error message when an invalid or
unsupported algorithm is used by an existing volume.

Reviewed by: cem
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D24343


# c0341432 27-Mar-2020 John Baldwin <jhb@FreeBSD.org>

Refactor driver and consumer interfaces for OCF (in-kernel crypto).

- The linked list of cryptoini structures used in session
initialization is replaced with a new flat structure: struct
crypto_session_params. This session includes a new mode to define
how the other fields should be interpreted. Available modes
include:

- COMPRESS (for compression/decompression)
- CIPHER (for simply encryption/decryption)
- DIGEST (computing and verifying digests)
- AEAD (combined auth and encryption such as AES-GCM and AES-CCM)
- ETA (combined auth and encryption using encrypt-then-authenticate)

Additional modes could be added in the future (e.g. if we wanted to
support TLS MtE for AES-CBC in the kernel we could add a new mode
for that. TLS modes might also affect how AAD is interpreted, etc.)

The flat structure also includes the key lengths and algorithms as
before. However, code doesn't have to walk the linked list and
switch on the algorithm to determine which key is the auth key vs
encryption key. The 'csp_auth_*' fields are always used for auth
keys and settings and 'csp_cipher_*' for cipher. (Compression
algorithms are stored in csp_cipher_alg.)

- Drivers no longer register a list of supported algorithms. This
doesn't quite work when you factor in modes (e.g. a driver might
support both AES-CBC and SHA2-256-HMAC separately but not combined
for ETA). Instead, a new 'crypto_probesession' method has been
added to the kobj interface for symmteric crypto drivers. This
method returns a negative value on success (similar to how
device_probe works) and the crypto framework uses this value to pick
the "best" driver. There are three constants for hardware
(e.g. ccr), accelerated software (e.g. aesni), and plain software
(cryptosoft) that give preference in that order. One effect of this
is that if you request only hardware when creating a new session,
you will no longer get a session using accelerated software.
Another effect is that the default setting to disallow software
crypto via /dev/crypto now disables accelerated software.

Once a driver is chosen, 'crypto_newsession' is invoked as before.

- Crypto operations are now solely described by the flat 'cryptop'
structure. The linked list of descriptors has been removed.

A separate enum has been added to describe the type of data buffer
in use instead of using CRYPTO_F_* flags to make it easier to add
more types in the future if needed (e.g. wired userspace buffers for
zero-copy). It will also make it easier to re-introduce separate
input and output buffers (in-kernel TLS would benefit from this).

Try to make the flags related to IV handling less insane:

- CRYPTO_F_IV_SEPARATE means that the IV is stored in the 'crp_iv'
member of the operation structure. If this flag is not set, the
IV is stored in the data buffer at the 'crp_iv_start' offset.

- CRYPTO_F_IV_GENERATE means that a random IV should be generated
and stored into the data buffer. This cannot be used with
CRYPTO_F_IV_SEPARATE.

If a consumer wants to deal with explicit vs implicit IVs, etc. it
can always generate the IV however it needs and store partial IVs in
the buffer and the full IV/nonce in crp_iv and set
CRYPTO_F_IV_SEPARATE.

The layout of the buffer is now described via fields in cryptop.
crp_aad_start and crp_aad_length define the boundaries of any AAD.
Previously with GCM and CCM you defined an auth crd with this range,
but for ETA your auth crd had to span both the AAD and plaintext
(and they had to be adjacent).

crp_payload_start and crp_payload_length define the boundaries of
the plaintext/ciphertext. Modes that only do a single operation
(COMPRESS, CIPHER, DIGEST) should only use this region and leave the
AAD region empty.

If a digest is present (or should be generated), it's starting
location is marked by crp_digest_start.

Instead of using the CRD_F_ENCRYPT flag to determine the direction
of the operation, cryptop now includes an 'op' field defining the
operation to perform. For digests I've added a new VERIFY digest
mode which assumes a digest is present in the input and fails the
request with EBADMSG if it doesn't match the internally-computed
digest. GCM and CCM already assumed this, and the new AEAD mode
requires this for decryption. The new ETA mode now also requires
this for decryption, so IPsec and GELI no longer do their own
authentication verification. Simple DIGEST operations can also do
this, though there are no in-tree consumers.

To eventually support some refcounting to close races, the session
cookie is now passed to crypto_getop() and clients should no longer
set crp_sesssion directly.

- Assymteric crypto operation structures should be allocated via
crypto_getkreq() and freed via crypto_freekreq(). This permits the
crypto layer to track open asym requests and close races with a
driver trying to unregister while asym requests are in flight.

- crypto_copyback, crypto_copydata, crypto_apply, and
crypto_contiguous_subsegment now accept the 'crp' object as the
first parameter instead of individual members. This makes it easier
to deal with different buffer types in the future as well as
separate input and output buffers. It's also simpler for driver
writers to use.

- bus_dmamap_load_crp() loads a DMA mapping for a crypto buffer.
This understands the various types of buffers so that drivers that
use DMA do not have to be aware of different buffer types.

- Helper routines now exist to build an auth context for HMAC IPAD
and OPAD. This reduces some duplicated work among drivers.

- Key buffers are now treated as const throughout the framework and in
device drivers. However, session key buffers provided when a session
is created are expected to remain alive for the duration of the
session.

- GCM and CCM sessions now only specify a cipher algorithm and a cipher
key. The redundant auth information is not needed or used.

- For cryptosoft, split up the code a bit such that the 'process'
callback now invokes a function pointer in the session. This
function pointer is set based on the mode (in effect) though it
simplifies a few edge cases that would otherwise be in the switch in
'process'.

It does split up GCM vs CCM which I think is more readable even if there
is some duplication.

- I changed /dev/crypto to support GMAC requests using CRYPTO_AES_NIST_GMAC
as an auth algorithm and updated cryptocheck to work with it.

- Combined cipher and auth sessions via /dev/crypto now always use ETA
mode. The COP_F_CIPHER_FIRST flag is now a no-op that is ignored.
This was actually documented as being true in crypto(4) before, but
the code had not implemented this before I added the CIPHER_FIRST
flag.

- I have not yet updated /dev/crypto to be aware of explicit modes for
sessions. I will probably do that at some point in the future as well
as teach it about IV/nonce and tag lengths for AEAD so we can support
all of the NIST KAT tests for GCM and CCM.

- I've split up the exising crypto.9 manpage into several pages
of which many are written from scratch.

- I have converted all drivers and consumers in the tree and verified
that they compile, but I have not tested all of them. I have tested
the following drivers:

- cryptosoft
- aesni (AES only)
- blake2
- ccr

and the following consumers:

- cryptodev
- IPsec
- ktls_ocf
- GELI (lightly)

I have not tested the following:

- ccp
- aesni with sha
- hifn
- kgssapi_krb5
- ubsec
- padlock
- safe
- armv8_crypto (aarch64)
- glxsb (i386)
- sec (ppc)
- cesa (armv7)
- cryptocteon (mips64)
- nlmsec (mips64)

Discussed with: cem
Relnotes: yes
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D23677


# 53a6215c 24-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (12 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Approved by: kib (mentor, blanket)
Differential Revision: https://reviews.freebsd.org/D23637


# c81929d3 07-Feb-2020 Kyle Evans <kevans@FreeBSD.org>

geli taste: allow GELIBOOT tagged providers as well

Currently the installer will tag geliboot partitions with both BOOT and
GELIBOOT; the former allows the kernel to taste it at boot, while the latter
is what loaders keys off of.

However, it seems reasonable to assume that if a provider's been tagged with
GELIBOOT that the kernel should also take that as a hint to taste/attach at
boot. This would allow us to stop tagging GELIBOOT partitions with BOOT in
bsdinstall, but I'm not sure that there's a compelling reason to do so any
time soon.

Reviewed by: oshogbo
Differential Revision: https://reviews.freebsd.org/D23387


# 8b522bda 16-Jan-2020 Warner Losh <imp@FreeBSD.org>

Pass BIO_SPEEDUP through all the geom layers

While some geom layers pass unknown commands down, not all do. For the ones that
don't, pass BIO_SPEEDUP down to the providers that constittue the geom, as
applicable. No changes to vinum or virstor because I was unsure how to add this
support, and I'm also unsure how to test these. gvinum doesn't implement
BIO_FLUSH either, so it may just be poorly maintained. gvirstor is for testing
and not supportig BIO_SPEEDUP is fine.

Reviewed by: chs
Differential Revision: https://reviews.freebsd.org/D23183


# 0aabbeff 02-Jan-2020 Alexander Motin <mav@FreeBSD.org>

Remove extra check for provider being closed.

We already checked for that earlier, and since we hold topology lock
it could not change.

MFC after: 1 week


# ac03832e 07-Aug-2019 Conrad Meyer <cem@FreeBSD.org>

GEOM: Reduce unnecessary log interleaving with sbufs

Similar to what was done for device_printfs in r347229.

Convert g_print_bio() to a thin shim around g_format_bio(), which acts on an
sbuf; documented in g_bio.9.

Reviewed by: markj
Discussed with: rlibby
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D21165


# 3bb6e0f0 01-Jul-2019 Ryan Libby <rlibby@FreeBSD.org>

g_eli_create: only dec g_access acw if we inc'd it

Reviewed by: cem, markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20743


# 49ee0fce 19-Jun-2019 Alexander Motin <mav@FreeBSD.org>

Use sbuf_cat() in GEOM confxml generation.

When it comes to megabytes of text, difference between sbuf_printf() and
sbuf_cat() becomes substantial.

MFC after: 2 weeks
Sponsored by: iXsystems, Inc.


# 5c420aae 23-May-2019 John Baldwin <jhb@FreeBSD.org>

Add deprecation warnings for weaker algorithms to geli(4).

- Triple DES has been formally deprecated in Kerberos (RFC 8429)
and is soon to be deprecated in IPsec (RFC 8221).
- Blowfish is deprecated. FreeBSD doesn't support its successor
(Twofish).
- MD5 is generally considered a weak digest that has known attacks.

geli refuses to create new volumes using these algorithms via 'geli
init'. It also warns when attaching to existing volumes or creating
temporary volumes via 'geli onetime' . The plan is to fully remove
support for these algorithms in FreeBSD 13.

Note that none of these algorithms have ever been the default
algorithm used by geli(8). Users would have had to explicitly select
these algorithms when creating volumes in the past.

Reviewed by: cem, delphij
MFC after: 3 days
Relnotes: yes
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D20344


# 2f07cdf8 03-Apr-2019 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Implement automatic online expansion of GELI providers - if the underlying
provider grows, GELI will expand automatically and will move the metadata
to the new location of the last sector.

This functionality is turned on by default. It can be turned off with the
-R flag, but it is not recommended - if the underlying provider grows and
automatic expansion is turned off, it won't be possible to attach this
provider again, as the metadata is no longer located in the last sector.

If the automatic expansion is turned off and the underlying provider grows,
GELI will only log a message with the previous size of the provider, so
recovery can be easier.

Obtained from: Fudo Security


# 1b0909d5 17-Jul-2018 Conrad Meyer <cem@FreeBSD.org>

OpenCrypto: Convert sessions to opaque handles instead of integers

Track session objects in the framework, and pass handles between the
framework (OCF), consumers, and drivers. Avoid redundancy and complexity in
individual drivers by allocating session memory in the framework and
providing it to drivers in ::newsession().

Session handles are no longer integers with information encoded in various
high bits. Use of the CRYPTO_SESID2FOO() macros should be replaced with the
appropriate crypto_ses2foo() function on the opaque session handle.

Convert OCF drivers (in particular, cryptosoft, as well as myriad others) to
the opaque handle interface. Discard existing session tracking as much as
possible (quick pass). There may be additional code ripe for deletion.

Convert OCF consumers (ipsec, geom_eli, krb5, cryptodev) to handle-style
interface. The conversion is largely mechnical.

The change is documented in crypto.9.

Inspired by
https://lists.freebsd.org/pipermail/freebsd-arch/2018-January/018835.html .

No objection from: ae (ipsec portion)
Reported by: jhb


# 78f79a9a 15-Jul-2018 Mariusz Zaborski <oshogbo@FreeBSD.org>

Let geli deal with lost devices without crashing.

PR: 162036
Submitted by: Fabian Keil <fk@fabiankeil.de>
Obtained from: ElectroBSD
Discussed with: pjd@


# 31f7586d 09-May-2018 Mariusz Zaborski <oshogbo@FreeBSD.org>

Introduce the 'n' flag for the geli attach command.

If the 'n' flag is provided the provided key number will be used to
decrypt device. This can be used combined with dryrun to verify if the key
is set correctly. This can be also used to determine which key slot we want to
change on already attached device.

Reviewed by: allanjude
Differential Revision: https://reviews.freebsd.org/D15309


# 74d6c131 10-Apr-2018 Kyle Evans <kevans@FreeBSD.org>

Annotate geom modules with MODULE_VERSION

GEOM ELI may double ask the password during boot. Once at loader time, and
once at init time.

This happens due a module loading bug. By default GEOM ELI caches the
password in the kernel, but without the MODULE_VERSION annotation, the
kernel loads over the kernel module, even if the GEOM ELI was compiled into
the kernel. In this case, the newly loaded module
purges/invalidates/overwrites the GEOM ELI's password cache, which causes
the double asking.

MFC Note: There's a pc98 component to the original submission that is
omitted here due to pc98 removal in head. This part will need to be revived
upon MFC.

Reviewed by: imp
Submitted by: op
Obtained from: opBSD
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D14992


# 0bab7fa8 14-Feb-2018 Alan Somers <asomers@FreeBSD.org>

geli: append "/eli" to the underlying provider's physical path

If the underlying provider's physical path is null, then the geli device's
physical path will be, too. Otherwise, it will append "/eli". This will make
geli work better with zfsd(8).

PR: 224962
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D13979


# 3728855a 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/geom: adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.


# 3453dc72 26-Aug-2017 Mariusz Zaborski <oshogbo@FreeBSD.org>

Hide length of geli passphrase during boot.

Introduce additional flag to the geli which allows to restore previous
behavior.

Reviewed by: AllanJude@, cem@ (previous version)
MFC: 1 month
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D11751


# ec5c0e5b 31-Mar-2017 Allan Jude <allanjude@FreeBSD.org>

Implement boot-time encryption key passing (keybuf)

This patch adds a general mechanism for providing encryption keys to the
kernel from the boot loader. This is intended to enable GELI support at
boot time, providing a better mechanism for passing keys to the kernel
than environment variables. It is designed to be extensible to other
applications, and can easily handle multiple encrypted volumes with
different keys.

This mechanism is currently used by the pending GELI EFI work.
Additionally, this mechanism can potentially be used to interface with
GRUB, opening up options for coreboot+GRUB configurations with completely
encrypted disks.

Another benefit over the existing system is that it does not require
re-deriving the user key from the password at each boot stage.

Most of this patch was written by Eric McCorkle. It was extended by
Allan Jude with a number of minor enhancements and extending the keybuf
feature into boot2.

GELI user keys are now derived once, in boot2, then passed to the loader,
which reuses the key, then passes it to the kernel, where the GELI module
destroys the keybuf after decrypting the volumes.

Submitted by: Eric McCorkle <eric@metricspace.net> (Original Version)
Reviewed by: oshogbo (earlier version), cem (earlier version)
MFC after: 3 weeks
Relnotes: yes
Sponsored by: ScaleEngine Inc.
Differential Revision: https://reviews.freebsd.org/D9575


# 4e2732b5 20-May-2016 Konstantin Belousov <kib@FreeBSD.org>

Removal of Giant droping wrappers for GEOM classes.

Sponsored by: The FreeBSD Foundation


# 9a6844d5 19-May-2016 Kenneth D. Merry <ken@FreeBSD.org>

Add support for managing Shingled Magnetic Recording (SMR) drives.

This change includes support for SCSI SMR drives (which conform to the
Zoned Block Commands or ZBC spec) and ATA SMR drives (which conform to
the Zoned ATA Command Set or ZAC spec) behind SAS expanders.

This includes full management support through the GEOM BIO interface, and
through a new userland utility, zonectl(8), and through camcontrol(8).

This is now ready for filesystems to use to detect and manage zoned drives.
(There is no work in progress that I know of to use this for ZFS or UFS, if
anyone is interested, let me know and I may have some suggestions.)

Also, improve ATA command passthrough and dispatch support, both via ATA
and ATA passthrough over SCSI.

Also, add support to camcontrol(8) for the ATA Extended Power Conditions
feature set. You can now manage ATA device power states, and set various
idle time thresholds for a drive to enter lower power states.

Note that this change cannot be MFCed in full, because it depends on
changes to the struct bio API that break compatilibity. In order to
avoid breaking the stable API, only changes that don't touch or depend on
the struct bio changes can be merged. For example, the camcontrol(8)
changes don't depend on the new bio API, but zonectl(8) and the probe
changes to the da(4) and ada(4) drivers do depend on it.

Also note that the SMR changes have not yet been tested with an actual
SCSI ZBC device, or a SCSI to ATA translation layer (SAT) that supports
ZBC to ZAC translation. I have not yet gotten a suitable drive or SAT
layer, so any testing help would be appreciated. These changes have been
tested with Seagate Host Aware SATA drives attached to both SAS and SATA
controllers. Also, I do not have any SATA Host Managed devices, and I
suspect that it may take additional (hopefully minor) changes to support
them.

Thanks to Seagate for supplying the test hardware and answering questions.

sbin/camcontrol/Makefile:
Add epc.c and zone.c.

sbin/camcontrol/camcontrol.8:
Document the zone and epc subcommands.

sbin/camcontrol/camcontrol.c:
Add the zone and epc subcommands.

Add auxiliary register support to build_ata_cmd(). Make sure to
set the CAM_ATAIO_NEEDRESULT, CAM_ATAIO_DMA, and CAM_ATAIO_FPDMA
flags as appropriate for ATA commands.

Add a new get_ata_status() function to parse ATA result from SCSI
sense descriptors (for ATA passthrough over SCSI) and ATA I/O
requests.

sbin/camcontrol/camcontrol.h:
Update the build_ata_cmd() prototype

Add get_ata_status(), zone(), and epc().

sbin/camcontrol/epc.c:
Support for ATA Extended Power Conditions features. This includes
support for all features documented in the ACS-4 Revision 12
specification from t13.org (dated February 18, 2016).

The EPC feature set allows putting a drive into a power power mode
immediately, or setting timeouts so that the drive will
automatically enter progressively lower power states after various
idle times.

sbin/camcontrol/fwdownload.c:
Update the firmware download code for the new build_ata_cmd()
arguments.

sbin/camcontrol/zone.c:
Implement support for Shingled Magnetic Recording (SMR) drives
via SCSI Zoned Block Commands (ZBC) and ATA Zoned Device ATA
Command Set (ZAC).

These specs were developed in concert, and are functionally
identical. The primary differences are due to SCSI and ATA
differences. (SCSI is big endian, ATA is little endian, for
example.)

This includes support for all commands defined in the ZBC and
ZAC specs.

sys/cam/ata/ata_all.c:
Decode a number of additional ATA command names in ata_op_string().

Add a new CCB building function, ata_read_log().

Add ata_zac_mgmt_in() and ata_zac_mgmt_out() CCB building
functions. These support both DMA and NCQ encapsulation.

sys/cam/ata/ata_all.h:
Add prototypes for ata_read_log(), ata_zac_mgmt_out(), and
ata_zac_mgmt_in().

sys/cam/ata/ata_da.c:
Revamp the ada(4) driver to support zoned devices.

Add four new probe states to gather information needed for zone
support.

Add a new adasetflags() function to avoid duplication of large
blocks of flag setting between the async handler and register
functions.

Add new sysctl variables that describe zone support and paramters.

Add support for the new BIO_ZONE bio, and all of its subcommands:
DISK_ZONE_OPEN, DISK_ZONE_CLOSE, DISK_ZONE_FINISH, DISK_ZONE_RWP,
DISK_ZONE_REPORT_ZONES, and DISK_ZONE_GET_PARAMS.

sys/cam/scsi/scsi_all.c:
Add command descriptions for the ZBC IN/OUT commands.

Add descriptions for ZBC Host Managed devices.

Add a new function, scsi_ata_pass() to do ATA passthrough over
SCSI. This will eventually replace scsi_ata_pass_16() -- it
can create the 12, 16, and 32-byte variants of the ATA
PASS-THROUGH command, and supports setting all of the
registers defined as of SAT-4, Revision 5 (March 11, 2016).

Change scsi_ata_identify() to use scsi_ata_pass() instead of
scsi_ata_pass_16().

Add a new scsi_ata_read_log() function to facilitate reading
ATA logs via SCSI.

sys/cam/scsi/scsi_all.h:
Add the new ATA PASS-THROUGH(32) command CDB. Add extended and
variable CDB opcodes.

Add Zoned Block Device Characteristics VPD page.

Add ATA Return SCSI sense descriptor.

Add prototypes for scsi_ata_read_log() and scsi_ata_pass().

sys/cam/scsi/scsi_da.c:
Revamp the da(4) driver to support zoned devices.

Add five new probe states, four of which are needed for ATA
devices.

Add five new sysctl variables that describe zone support and
parameters.

The da(4) driver supports SCSI ZBC devices, as well as ATA ZAC
devices when they are attached via a SCSI to ATA Translation (SAT)
layer. Since ZBC -> ZAC translation is a new feature in the T10
SAT-4 spec, most SATA drives will be supported via ATA commands
sent via the SCSI ATA PASS-THROUGH command. The da(4) driver will
prefer the ZBC interface, if it is available, for performance
reasons, but will use the ATA PASS-THROUGH interface to the ZAC
command set if the SAT layer doesn't support translation yet.
As I mentioned above, ZBC command support is untested.

Add support for the new BIO_ZONE bio, and all of its subcommands:
DISK_ZONE_OPEN, DISK_ZONE_CLOSE, DISK_ZONE_FINISH, DISK_ZONE_RWP,
DISK_ZONE_REPORT_ZONES, and DISK_ZONE_GET_PARAMS.

Add scsi_zbc_in() and scsi_zbc_out() CCB building functions.

Add scsi_ata_zac_mgmt_out() and scsi_ata_zac_mgmt_in() CCB/CDB
building functions. Note that these have return values, unlike
almost all other CCB building functions in CAM. The reason is
that they can fail, depending upon the particular combination
of input parameters. The primary failure case is if the user
wants NCQ, but fails to specify additional CDB storage. NCQ
requires using the 32-byte version of the SCSI ATA PASS-THROUGH
command, and the current CAM CDB size is 16 bytes.

sys/cam/scsi/scsi_da.h:
Add ZBC IN and ZBC OUT CDBs and opcodes.

Add SCSI Report Zones data structures.

Add scsi_zbc_in(), scsi_zbc_out(), scsi_ata_zac_mgmt_out(), and
scsi_ata_zac_mgmt_in() prototypes.

sys/dev/ahci/ahci.c:
Fix SEND / RECEIVE FPDMA QUEUED in the ahci(4) driver.

ahci_setup_fis() previously set the top bits of the sector count
register in the FIS to 0 for FPDMA commands. This is okay for
read and write, because the PRIO field is in the only thing in
those bits, and we don't implement that further up the stack.

But, for SEND and RECEIVE FPDMA QUEUED, the subcommand is in that
byte, so it needs to be transmitted to the drive.

In ahci_setup_fis(), always set the the top 8 bits of the
sector count register. We need it in both the standard
and NCQ / FPDMA cases.

sys/geom/eli/g_eli.c:
Pass BIO_ZONE commands through the GELI class.

sys/geom/geom.h:
Add g_io_zonecmd() prototype.

sys/geom/geom_dev.c:
Add new DIOCZONECMD ioctl, which allows sending zone commands to
disks.

sys/geom/geom_disk.c:
Add support for BIO_ZONE commands.

sys/geom/geom_disk.h:
Add a new flag, DISKFLAG_CANZONE, that indicates that a given
GEOM disk client can handle BIO_ZONE commands.

sys/geom/geom_io.c:
Add a new function, g_io_zonecmd(), that handles execution of
BIO_ZONE commands.

Add permissions check for BIO_ZONE commands.

Add command decoding for BIO_ZONE commands.

sys/geom/geom_subr.c:
Add DDB command decoding for BIO_ZONE commands.

sys/kern/subr_devstat.c:
Record statistics for REPORT ZONES commands. Note that the
number of bytes transferred for REPORT ZONES won't quite match
what is received from the harware. This is because we're
necessarily counting bytes coming from the da(4) / ada(4) drivers,
which are using the disk_zone.h interface to communicate up
the stack. The structure sizes it uses are slightly different
than the SCSI and ATA structure sizes.

sys/sys/ata.h:
Add many bit and structure definitions for ZAC, NCQ, and EPC
command support.

sys/sys/bio.h:
Convert the bio_cmd field to a straight enumeration. This will
yield more space for additional commands in the future. After
change r297955 and other related changes, this is now possible.
Converting to an enumeration will also prevent use as a bitmask
in the future.

sys/sys/disk.h:
Define the DIOCZONECMD ioctl.

sys/sys/disk_zone.h:
Add a new API for managing zoned disks. This is very close to
the SCSI ZBC and ATA ZAC standards, but uses integers in native
byte order instead of big endian (SCSI) or little endian (ATA)
byte arrays.

This is intended to offer to the complete feature set of the ZBC
and ZAC disk management without requiring the application developer
to include SCSI or ATA headers. We also use one set of headers
for ioctl consumers and kernel bio-level consumers.

sys/sys/param.h:
Bump __FreeBSD_version for sys/bio.h command changes, and inclusion
of SMR support.

usr.sbin/Makefile:
Add the zonectl utility.

usr.sbin/diskinfo/diskinfo.c
Add disk zoning capability to the 'diskinfo -v' output.

usr.sbin/zonectl/Makefile:
Add zonectl makefile.

usr.sbin/zonectl/zonectl.8
zonectl(8) man page.

usr.sbin/zonectl/zonectl.c
The zonectl(8) utility. This allows managing SCSI or ATA zoned
disks via the disk_zone.h API. You can report zones, reset write
pointers, get parameters, etc.

Sponsored by: Spectra Logic
Differential Revision: https://reviews.freebsd.org/D6147
Reviewed by: wblock (documentation)


# fdce57a0 14-May-2016 John Baldwin <jhb@FreeBSD.org>

Add an EARLY_AP_STARTUP option to start APs earlier during boot.

Currently, Application Processors (non-boot CPUs) are started by
MD code at SI_SUB_CPU, but they are kept waiting in a "pen" until
SI_SUB_SMP at which point they are released to run kernel threads.
SI_SUB_SMP is one of the last SYSINIT levels, so APs don't enter
the scheduler and start running threads until fairly late in the
boot.

This change moves SI_SUB_SMP up to just before software interrupt
threads are created allowing the APs to start executing kernel
threads much sooner (before any devices are probed). This allows
several initialization routines that need to perform initialization
on all CPUs to now perform that initialization in one step rather
than having to defer the AP initialization to a second SYSINIT run
at SI_SUB_SMP. It also permits all CPUs to be available for
handling interrupts before any devices are probed.

This last feature fixes a problem on with interrupt vector exhaustion.
Specifically, in the old model all device interrupts were routed
onto the boot CPU during boot. Later after the APs were released at
SI_SUB_SMP, interrupts were redistributed across all CPUs.

However, several drivers for multiqueue hardware allocate N interrupts
per CPU in the system. In a system with many CPUs, just a few drivers
doing this could exhaust the available pool of interrupt vectors on
the boot CPU as each driver was allocating N * mp_ncpu vectors on the
boot CPU. Now, drivers will allocate interrupts on their desired CPUs
during boot meaning that only N interrupts are allocated from the boot
CPU instead of N * mp_ncpu.

Some other bits of code can also be simplified as smp_started is
now true much earlier and will now always be true for these bits of
code. This removes the need to treat the single-CPU boot environment
as a special case.

As a transition aid, the new behavior is available under a new kernel
option (EARLY_AP_STARTUP). This will allow the option to be turned off
if need be during initial testing. I plan to enable this on x86 by
default in a followup commit in the next few days and to have all
platforms moved over before 11.0. Once the transition is complete,
the option will be removed along with the !EARLY_AP_STARTUP code.

These changes have only been tested on x86. Other platform maintainers
are encouraged to port their architectures over as well. The main
things to check for are any uses of smp_started in MD code that can be
simplified and SI_SUB_SMP SYSINITs in MD code that can be removed in
the EARLY_AP_STARTUP case (e.g. the interrupt shuffling).

PR: kern/199321
Reviewed by: markj, gnn, kib
Sponsored by: Netflix


# d8736625 07-Apr-2016 Allan Jude <allanjude@FreeBSD.org>

Create the GELIBOOT GEOM_ELI flag

This flag indicates that the user wishes to use the GELIBOOT feature to boot from a fully encrypted root file system.
Currently, GELIBOOT does not support key files, and in the future when it does, they will be loaded differently.
Due to the design of GELI, and the desire for secrecy, the GELI metadata does not know if key files are used or not, it just adds the key material (if any) to the HMAC before the optional passphrase, so there is no way to tell if a GELI partition requires key files or not.

Since the GELIBOOT code in boot2 and the loader does not support keys, they will now only attempt to attach if this flag is set. This will stop GELIBOOT from prompting for passwords to GELIs that it cannot decrypt, disrupting the boot process

PR: 208251
Reviewed by: ed, oshogbo, wblock
Sponsored by: ScaleEngine Inc.
Differential Revision: https://reviews.freebsd.org/D5867


# 4332feca 06-Jan-2016 Allan Jude <allanjude@FreeBSD.org>

Make additional parts of sys/geom/eli more usable in userspace

The upcoming GELI support in the loader reuses parts of this code
Some ifdefs are added, and some code is moved outside of existing ifdefs

The HMAC parts of GELI are broken out into their own file, to separate
them from the kernel crypto/openssl dependant parts that are replaced
in the boot code.

Passed the GELI regression suite (tools/regression/geom/eli)
Files=20 Tests=14996
Result: PASS

Reviewed by: pjd, delphij
MFC after: 1 week
Sponsored by: ScaleEngine Inc.
Differential Revision: https://reviews.freebsd.org/D4699


# 2dc7e36b 05-Nov-2015 Steven Hartland <smh@FreeBSD.org>

Fix g_eli error loss conditions

* Ensure that error information isn't lost.
* Log the error code in all cases.
* Don't overwrite bio_completed set to 0 from the error condition.

MFC after: 2 weeks
Sponsored by: Multiplay


# 46e34470 08-Aug-2015 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Enable BIO_DELETE passthru in GELI, so TRIM/UNMAP can work as expected when
GELI is used on a SSD or inside virtual machine, so that guest can tell
host that it is no longer using some of the storage.

Enabling BIO_DELETE passthru comes with a small security consequence - an
attacker can tell how much space is being really used on encrypted device and
has less data no analyse then. This is why the -T option can be given to the
init subcommand to turn off this behaviour and -t/T options for the configure
subcommand can be used to adjust this setting later.

PR: 198863
Submitted by: Matthew D. Fuller fullermd at over-yonder dot net

This commit also includes a fix from Fabian Keil freebsd-listen at
fabiankeil.de for 'configure' on onetime providers which is not strictly
related, but is entangled in the same code, so would cause conflicts if
separated out.


# 4273d412 10-Jul-2015 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Spoil even can happen for some time now even on providers opened exclusively
(on the media change event). Update GELI to handle that situation.

PR: 201185
Submitted by: Matthew D. Fuller


# fefb6a14 02-Jul-2015 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Properly propagate errors in metadata reading.

PR: 198860
Submitted by: Matthew D. Fuller


# edaa9008 02-Jul-2015 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Allow to omit keyfile number for the first keyfile.


# 66427784 22-Oct-2014 Colin Percival <cperciva@FreeBSD.org>

Populate the GELI passphrase cache with the kern.geom.eli.passphrase
variable (if any) provided in the boot environment. Unset it from
the kernel environment after doing this, so that the passphrase is
no longer present in kernel memory once we enter userland.

This will make it possible to provide a GELI passphrase via the boot
loader; FreeBSD's loader does not yet do this, but GRUB (and PCBSD)
will have support for this soon.

Tested by: kmoore


# 835c4dd4 16-Sep-2014 Colin Percival <cperciva@FreeBSD.org>

Cache GELI passphrases entered at the console during the boot process,
in order to improve user-friendliness when a system has multiple disks
encrypted using the same passphrase.

When examining a new GELI provider, the most recently used passphrase
will be attempted before prompting for a passphrase; and whenever a
passphrase is entered, it is cached for later reference. When the root
disk is mounted, the cached passphrase is zeroed (triggered by the
"mountroot" event), in order to minimize the possibility of leakage
of passphrases. (After root is mounted, the "taste and prompt for
passphrases on the console" code path is disabled, so there is no
potential for a passphrase to be stored after the zeroing takes place.)

This behaviour can be disabled by setting kern.geom.eli.boot_passcache=0.

Reviewed by: pjd, dteske, allanjude
MFC after: 7 days


# af3b2549 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Pull in r267961 and r267973 again. Fix for issues reported will follow.


# 37a107a4 27-Jun-2014 Glen Barber <gjb@FreeBSD.org>

Revert r267961, r267973:

These changes prevent sysctl(8) from returning proper output,
such as:

1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory


# 3da1cf1e 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after: 2 weeks
Sponsored by: Mellanox Technologies


# ebd05ada 05-Jun-2014 Brad Davis <brd@FreeBSD.org>

- Fix the keyfile being cleared prematurely after r259428

PR: 185084
Submitted by: fk@fabiankeil.de
Reviewed by: pjd@


# 2a3237c8 15-Dec-2013 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Clear content of keyfiles loaded by the loader after processing them.

Pointed out by: rwatson
MFC after: 1 week


# 743437c4 11-Nov-2013 Andrey V. Elsukov <ae@FreeBSD.org>

Add missing line breaks.

PR: 181900
MFC after: 1 week


# 19351a14 02-Sep-2013 Alexander Motin <mav@FreeBSD.org>

Make ELI destruction (including orphanization) less aggressive, making it
always wait for provider close. Old algorithm was reported to cause NULL
dereference panic on attempt to close provider after softc destruction.
If not global workaroung in GEOM, that could even cause destruction with
requests still in flight.


# 457bbc4f 04-Jul-2012 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Use correct part of the Master-Key for generating encryption keys.
Before this change the IV-Key was used to generate encryption keys,
which was incorrect, but safe - for the XTS mode this key was unused
anyway and for CBC mode it was used differently to generate IV
vectors, so there is no risk that IV vector collides with encryption
key somehow.

Bump version number and keep compatibility for older versions.

MFC after: 2 weeks


# f6ce353e 17-Dec-2011 Andriy Gapon <avg@FreeBSD.org>

replace uses of libkern gets with cngets

MFC after: 2 months


# 0c879bd9 27-Oct-2011 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Before this change when GELI detected hardware crypto acceleration it will
start only one worker thread. For software crypto it will start by default
N worker threads where N is the number of available CPUs.

This is not optimal if hardware crypto is AES-NI, which uses CPU for AES
calculations.

Change that to always start one worker thread for every available CPU.
Number of worker threads per GELI provider can be easly reduced with
kern.geom.eli.threads sysctl/tunable and even for software crypto it
should be reduced when using more providers.

While here, when number of threads exceeds number of CPUs avilable don't
reduce this number, assume the user knows what he is doing.

Reported by: Yuri Karaban <dev@dev97.com>
MFC after: 3 days


# 1f8c92e6 25-Oct-2011 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Add support for creating GELI devices with older metadata version for use
with older FreeBSD versions:
- Add -V option to 'geli init' to specify version number. If no -V is given
the most recent version is used.
- If -V is given don't allow to use features not supported by this version.
- Print version in 'geli list' output.
- Update manual page and add table describing which GELI version is
supported by which FreeBSD version, so one can use it when preparing GELI
device for older FreeBSD version.

Inspired by: Garrett Cooper <yanegomi@gmail.com>
MFC after: 3 days


# 0e236b6c 25-Oct-2011 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Prefer G_ELI_VERSION_* defines for version numbers over plain digits.

MFC after: 3 days


# 038c55ad 25-Oct-2011 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Fit lines into 80 chars.

MFC after: 3 days


# 5d807a0e 10-Jul-2011 Andrey V. Elsukov <ae@FreeBSD.org>

Include sys/sbuf.h directly.

Reviewed by: pjd


# a1f4a8c4 08-May-2011 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Export GELI class version via sysctl kern.geom.eli.version.

MFC after: 1 week


# ad0a5236 08-May-2011 Pawel Jakub Dawidek <pjd@FreeBSD.org>

When support for multiple encryption keys was committed, GELI integrity mode
was not updated to pass CRD_F_KEY_EXPLICIT flag to opencrypto. This resulted in
always using first key.

We need to support providers created with this bug, so set special
G_ELI_FLAG_FIRST_KEY flag for GELI provider in integrity mode with version
smaller than 6 and pass the CRD_F_KEY_EXPLICIT flag to opencrypto only if
G_ELI_FLAG_FIRST_KEY doesn't exist.

Reported by: Anton Yuzhaninov <citrin@citrin.ru>
MFC after: 1 week


# 71a19bdc 05-May-2011 Attilio Rao <attilio@FreeBSD.org>

Commit the support for removing cpumask_t and replacing it directly with
cpuset_t objects.
That is going to offer the underlying support for a simple bump of
MAXCPU and then support for number of cpus > 32 (as it is today).

Right now, cpumask_t is an int, 32 bits on all our supported architecture.
cpumask_t on the other side is implemented as an array of longs, and
easilly extendible by definition.

The architectures touched by this commit are the following:
- amd64
- i386
- pc98
- arm
- ia64
- XEN

while the others are still missing.
Userland is believed to be fully converted with the changes contained
here.

Some technical notes:
- This commit may be considered an ABI nop for all the architectures
different from amd64 and ia64 (and sparc64 in the future)
- per-cpu members, which are now converted to cpuset_t, needs to be
accessed avoiding migration, because the size of cpuset_t should be
considered unknown
- size of cpuset_t objects is different from kernel and userland (this is
primirally done in order to leave some more space in userland to cope
with KBI extensions). If you need to access kernel cpuset_t from the
userland please refer to example in this patch on how to do that
correctly (kgdb may be a good source, for example).
- Support for other architectures is going to be added soon
- Only MAXCPU for amd64 is bumped now

The patch has been tested by sbruno and Nicholas Esborn on opteron
4 x 12 pack CPUs. More testing on big SMP is expected to came soon.
pluknet tested the patch with his 8-ways on both amd64 and i386.

Tested by: pluknet, sbruno, gianni, Nicholas Esborn
Reviewed by: jeff, jhb, sbruno


# c211af03 04-May-2011 Andrey V. Elsukov <ae@FreeBSD.org>

Remove "for a moment" assignment. struct g_geom zeroed when allocated.

MFC after: 1 week


# 1e09ff3d 21-Apr-2011 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Instead of allocating memory for all the keys at device attach,
create reasonably large cache for the keys that is filled when
needed. The previous version was problematic for very large providers
(hundreds of terabytes or serval petabytes). Every terabyte of data
needs around 256kB for keys. Make the default cache limit big enough
to fit all the keys needed for 4TB providers, which will eat at most
1MB of memory.

MFC after: 2 weeks


# 90574b0a 03-Apr-2011 Mikolaj Golub <trociny@FreeBSD.org>

In g_eli_read_done() and g_eli_write_done(), for a bio with
bio_children > 1, g_destroy_bio() is never called and the bio
leaks. Fix this by calling g_destroy_bio() earlier, before the check.

Submitted by: Victor Balada Diaz <victor@bsdes.net> (initial version)
Approved by: pjd (mentor)
MFC after: 1 week


# cb08c2cc 25-Feb-2011 Alexander Leidinger <netchild@FreeBSD.org>

Add some FEATURE macros for various GEOM classes.

No FreeBSD version bump, the userland application to query the features will
be committed last and can serve as an indication of the availablility if
needed.

Sponsored by: Google Summer of Code 2010
Submitted by: kibab
Reviewed by: silence on geom@ during 2 weeks
X-MFC after: to be determined in last commit with code from this project


# 1e189c08 13-Feb-2011 Marcel Moolenaar <marcel@FreeBSD.org>

Use the preload_fetch_addr() and preload_fetch_size() convenience
functions to obtain the address and size of the preloaded key files.

Sponsored by: Juniper Networks.


# eb4c31fd 14-Nov-2010 Ed Schouten <ed@FreeBSD.org>

Add support for asterisk characters when filling in the GELI password
during boot.

Change the last argument of gets() to indicate a visibility flag and add
definitions for the numerical constants. Except for the value 2, gets()
will behave exactly the same, so existing consumers shouldn't break. We
only use it in two places, though.

Submitted by: lme (older version)


# d8d61ef8 22-Oct-2010 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Add State tag, so 'geli status' will report active/suspended status, eg:

# geli status
Name Status Components
da0.eli SUSPENDED da0
da1.eli ACTIVE da1


# 4f294e12 22-Oct-2010 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Encryption keys array might be NULL if device is suspended. Check for this, so
we don't panic when we detach suspended device.


# 1d021441 22-Oct-2010 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Move sc_akeyctx and sc_ivctx initialization to the g_eli_mkey_propagate()
function which eliminates code duplication and will ensure proper order
of operation.


# 3ac01bc2 21-Oct-2010 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Free opencrypto sessions on suspend, as they also might keep encryption keys.


# 738ffa97 20-Oct-2010 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Fix a bug introduced in r213067 where we use authentication key before
initializing it.


# 5ad4a7c7 20-Oct-2010 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Bring in geli suspend/resume functionality (finally).

Before this change if you wanted to suspend your laptop and be sure that your
encryption keys are safe, you had to stop all processes that use file system
stored on encrypted device, unmount the file system and detach geli provider.

This isn't very handy. If you are a lucky user of a laptop where suspend/resume
actually works with FreeBSD (I'm not!) you most likely want to suspend your
laptop, because you don't want to start everything over again when you turn
your laptop back on.

And this is where geli suspend/resume steps in. When you execute:

# geli suspend -a

geli will wait for all in-flight I/O requests, suspend new I/O requests, remove
all geli sensitive data from the kernel memory (like encryption keys) and will
wait for either 'geli resume' or 'geli detach'.

Now with no keys in memory you can suspend your laptop without stopping any
processes or unmounting any file systems.

When you resume your laptop you have to resume geli devices using 'geli resume'
command. You need to provide your passphrase, etc. again so the keys can be
restored and suspended I/O requests released.

Of course you need to remember that 'geli suspend' won't clear file system
cache and other places where data from your geli-encrypted file system might be
present. But to get rid of those stopping processes and unmounting file system
won't help either - you have to turn your laptop off. Be warned.

Also note, that suspending geli device which contains file system with geli
utility (or anything used by 'geli resume') is not very good idea, as you won't
be able to resume it - when you execute geli(8), the kernel will try to read it
and this read I/O request will be suspended.


# 056638c4 20-Oct-2010 Pawel Jakub Dawidek <pjd@FreeBSD.org>

- Add missing comments.
- Make a comment consistent with others.


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# f95168e0 25-Sep-2010 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Change g_eli_debug to int, so one can turn off any GELI output by setting
kern.geom.eli.debug sysctl to -1.

MFC after: 2 weeks


# 9839c97b 22-Sep-2010 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Update copyright years.

MFC after: 1 week


# 9a5a1d1e 23-Sep-2010 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Add support for AES-XTS. This will be the default now.

MFC after: 1 week


# c6a26d4c 23-Sep-2010 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Implement switching of data encryption key every 2^20 blocks.
This ensures the same encryption key won't be used for more than
2^20 blocks (sectors). This will be the default now.

MFC after: 1 week


# b35bfe7e 23-Sep-2010 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Define default overwrite count, so that userland can use it.

MFC after: 1 week


# efb46508 28-Aug-2010 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Correct offset conversion to little endian. It was implemented in version 2,
but because of a bug it was a no-op, so we were still using offsets in native
byte order for the host. Do it properly this time, bump version to 4 and set
the G_ELI_FLAG_NATIVE_BYTE_ORDER flag when version is under 4.

MFC after: 2 weeks


# 48ed64d0 18-Apr-2010 Pawel Jakub Dawidek <pjd@FreeBSD.org>

MFC r206665:

Use lower priority for GELI worker threads. This improves system
responsiveness under heavy GELI load.


# 31c4cef7 15-Apr-2010 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Use lower priority for GELI worker threads. This improves system
responsiveness under heavy GELI load.

MFC after: 3 days


# c5d387d0 16-Mar-2009 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Detach GELI providers on shutdown/reboot, which will allow providers underneath
to close properly.

Reported, reviewed and tested by: guido
MFC after: 1 week


# 921eec26 13-Mar-2009 Guido van Rooij <guido@FreeBSD.org>

Backout this commit whil a better solution is developed


# c5f79858 10-Mar-2009 Guido van Rooij <guido@FreeBSD.org>

When attaching a geli on boot make sure that it is detached
upon last close. (needed for a gmirror to properly shutdown
upon reboot when a geli is on top the gmirror)


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# ed6c3e47 12-Aug-2008 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Style(9).


# 5527ecd9 20-Jul-2008 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Clear passphrase buffer after use.

Submitted by: Fabian Keil <fk@fabiankeil.de> (a bit different version)


# 3745c395 20-Oct-2007 Julian Elischer <julian@FreeBSD.org>

Rename the kthread_xxx (e.g. kthread_create()) calls
to kproc_xxx as they actually make whole processes.
Thos makes way for us to add REAL kthread_create() and friends
that actually make theads. it turns out that most of these
calls actually end up being moved back to the thread version
when it's added. but we need to make this cosmetic change first.

I'd LOVE to do this rename in 7.0 so that we can eventually MFC the
new kthread_xxx() calls.


# 982d11f8 04-Jun-2007 Jeff Roberson <jeff@FreeBSD.org>

Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


# df3aed4f 08-Apr-2007 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Use root_mounted().


# 6810ad6f 20-Mar-2007 Sam Leffler <sam@FreeBSD.org>

Overhaul driver/subsystem api's:
o make all crypto drivers have a device_t; pseudo drivers like the s/w
crypto driver synthesize one
o change the api between the crypto subsystem and drivers to use kobj;
cryptodev_if.m defines this api
o use the fact that all crypto drivers now have a device_t to add support
for specifying which of several potential devices to use when doing
crypto operations
o add new ioctls that allow user apps to select a specific crypto device
to use (previous ioctls maintained for compatibility)
o overhaul crypto subsystem code to eliminate lots of cruft and hide
implementation details from drivers
o bring in numerous fixes from Michale Richardson/hifn; mostly for
795x parts
o add an optional mechanism for mmap'ing the hifn 795x public key h/w
to user space for use by openssl (not enabled by default)
o update crypto test tools to use new ioctl's and add cmd line options
to specify a device to use for tests

These changes will also enable much future work on improving the core
crypto subsystem; including proper load balancing and interposing code
between the core and drivers to dispatch small operations to the s/w
driver as appropriate.

These changes were instigated by the work of Michael Richardson.

Reviewed by: pjd
Approved by: re


# b9420939 02-Mar-2007 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Fix geli after last commit for UP systems that are running SMP kernel.

Submitted by: Hyo geol, Lee <hyogeollee@gmail.com>
MFC after: 1 week


# a1ea1a22 28-Jan-2007 Pawel Jakub Dawidek <pjd@FreeBSD.org>

It is possible that GEOM taste provider before SMP is started.
We can't bind to a CPU which is not yet on-line, so add code that wait for
CPUs to go on-line before binding to them.

Reported by: Alin-Adrian Anton <aanton@spintech.ro>
MFC after: 2 weeks


# 1506db21 02-Nov-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

I want CPU number here.

Noticed by: ru


# eba8f137 01-Nov-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Skip disabled CPU, because after we sched_bind() to a disabled CPU,
we won't be able to exit from the thread.

Function g_eli_cpu_is_disabled() stoled from kern_pmc.c.

PR: 104669
Reported by: Nikolay Mirin <nik@optim.com.ru>
MFC after: 1 week


# 42461fba 31-Oct-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Implement BIO_FLUSH handling by simply passing it down to the components.

Sponsored by: home.pl


# 469e9520 30-Sep-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Remove trailing spaces.


# 2bd4ade6 11-Aug-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Before using byte offset for IV creation, covert it to little endian.
This way one will be able to use provider encrypted on eg. i386 on
eg. sparc64. This doesn't really buy us much today, because UFS isn't
endian agnostic.

We retain backward compatibility by setting G_ELI_FLAG_NATIVE_BYTE_ORDER
flag on devices with version number less than 2 and not converting the
offset.


# 85059016 09-Aug-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Allow geli to operate on read-only providers.

Initial patch from: vd
MFC after: 2 weeks


# f6829a05 27-Jul-2006 Yaroslav Tykhiy <ytykhiy@gmail.com>

Fix what looks like a typo: MODULE_DEPEND() takes module names,
not KLD file names; and GELI module's name is g_eli, not geom_eli.

Approved by: pjd (silence)
MFC after: 5 days


# eaa3b919 05-Jun-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Implement data integrity verification (data authentication) for geli(8).

Supported by: Wheel Sp. z o.o. (http://www.wheel.pl)


# 05bf5e8a 05-Jun-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Make kern.geom.eli.overwrites sysctl a tunable as well.


# 5af2ae28 20-Apr-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

geli(8) provides keys on newsession time, so remove CRD_F_KEY_EXPLICIT flag
as HW crypto drivers don't support it.


# cd0d707e 15-Apr-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Correct debug: we are sending child bio here, not parent bio.

MFC after: 1 week


# d3a1be90 11-Apr-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Pass BIO_GETATTR requests down.

MFC after: 1 week


# 39d92f5f 05-Apr-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Typos.


# 9af2131b 11-Feb-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Teach geli how to load keyfiles before root file system is mounted.
An example entries for loader.conf to make it possible:

geli_da0_keyfile0_load="YES"
geli_da0_keyfile0_type="da0:geli_keyfile0"
geli_da0_keyfile0_name="/boot/keys/da0.key0"
geli_da0_keyfile1_load="YES"
geli_da0_keyfile1_type="da0:geli_keyfile1"
geli_da0_keyfile1_name="/boot/keys/da0.key1"
geli_da0_keyfile2_load="YES"
geli_da0_keyfile2_type="da0:geli_keyfile2"
geli_da0_keyfile2_name="/boot/keys/da0.key2"

geli_da1s3a_keyfile0_load="YES"
geli_da1s3a_keyfile0_type="da1s3a:geli_keyfile0"
geli_da1s3a_keyfile0_name="/boot/keys/da1s3a.key"

Thanks for jhb and kan who showed me the right direction.

MFC after: 3 days


# a80f82a4 10-Feb-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Check rootvnode variable to see if we still want to ask for passphrase on
boot. Other methods just don't work properly.

MFC after: 3 days


# 98645006 07-Feb-2006 Christian Brueffer <brueffer@FreeBSD.org>

Clean up some sysctl descriptions, debug messages etc.

Approved by: pjd
MFC after: 3 days


# 38ea96ac 31-Jan-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Remove trailing spaces.


# 7192f621 17-Jan-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Remove dead code.

Found by: Coverity Prevent(tm)
MFC after: 3 days


# 4ec04907 17-Jan-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Remove unused value.

Found by: Coverity Prevent(tm)
MFC after: 3 days


# 8a4a44b5 30-Nov-2005 Maxim Sobolev <sobomax@FreeBSD.org>

Check for g_read_data(9) errors properly:

o The only indication of error condition is NULL value returned by
the function;

o value pointed to by error argument is undefined in the case when
operation completes successfully.

Discussed with: phk


# 71270ca6 10-Sep-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Fix copy&paste typo.

MFC after: 3 days


# cf479540 10-Sep-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Don't forget to initialize crp_etype field.

Reported by: Nick Evans <nevans@syphen.net>
MFC after: 3 days


# dd549194 21-Aug-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

By default, when doing crypto work in software, start as many threads
as we have active CPUs and bind each thread to its own CPU.

MFC after: 3 days


# b8db9f58 21-Aug-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Remove stale comment (we now always start worker thread).

MFC after: 3 days


# dddd1d53 17-Aug-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Always run dedicated kernel thread (even when we have hardware support).
There is no performance impact, but allows to allocate memory with
M_WAITOK flag.
As a side effect this simplify code a bit.

MFC after: 3 days


# bf71eaac 17-Aug-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

We should now return 0.


# d1dca8a8 17-Aug-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Even if crypto_dispatch() return an error, request is not canceled and
our callback will still be called, just to tell us that requested
failed...

Reported by: Mike Tancsa <mike@sentex.net>
MFC after: 3 days


# 2be2b2ea 17-Aug-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

We don't need to clear allocated memory. This will speed-up things a bit.

MFC after: 3 days


# bb30fea6 13-Aug-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Because code paths for I/O requests are quite complex, add comments above
the functions which participate in I/O paths.

MFC after: 1 day


# 6985decf 11-Aug-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

GELI doesn't need cryptodev.

MFC after: 3 days


# ea35a2ec 27-Jul-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

MFp4: Export more informations about encrypted providers.

MFC after: 1 week


# 76254298 27-Jul-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Reduce default debug level to 0.

MFC after: 1 week


# c58794de 27-Jul-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Add GEOM_ELI class which provides GEOM providers encryption.
For features list and usage see manual page: geli(8).

Sponsored by: Wheel Sp. z o.o.
http://www.wheel.pl
MFC after: 1 week