History log of /freebsd-10.1-release/sbin/hastd/
Revision Date Author Comments
272461 03-Oct-2014 gjb

Copy stable/10@r272459 to releng/10.1 as part of
the 10.1-RELEASE process.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


270911 01-Sep-2014 ngie

MFC r270433:

Garbage collect libl dependency

The application links and runs without libl

Approved by: rpaulo (mentor)
Phabric: D673
Submitted by: trociny


270910 01-Sep-2014 ngie

MFC r270117:

Add -ll to LDADD to fix "make checkdpadd"

Phabric: D622
Approved by: rpaulo (mentor)


262192 18-Feb-2014 jhb

MFC 261517,261520:
Convert the license on files where I am the sole copyright holder to
2 clause BSD licenses.


260006 28-Dec-2013 trociny

MFC r257155, r257582, r259191, r259192, r259193, r259194, r259195, r259196:

r257155:

Make hastctl list command output current queue sizes.

Reviewed by: pjd

r257582 (pjd):

Correct alignment.

r259191:

For memsync replication, hio_countdown is used not only as an
indication when a request can be moved to done queue, but also for
detecting the current state of memsync request.

This approach has problems, e.g. leaking a request if memsynk ack from
the secondary failed, or racy usage of write_complete, which should be
called only once per write request, but for memsync can be entered by
local_send_thread and ggate_send_thread simultaneously.

So the following approach is implemented instead:

1) Use hio_countdown only for counting components we waiting to
complete, i.e. initially it is always 2 for any replication mode.

2) To distinguish between "memsync ack" and "memsync fin" responses
from the secondary, add and use hio_memsyncacked field.

3) write_complete() in component threads is called only before
releasing hio_countdown (i.e. before the hio may be returned to the
done queue).

4) Add and use hio_writecount refcounter to detect when
write_complete() can be called in memsync case.

Reported by: Pete French petefrench ingresso.co.uk
Tested by: Pete French petefrench ingresso.co.uk

r259192:

Add some macros to make the code more readable (no functional chages).

r259193:

Fix compiler warnings.

r259194:

In remote_send_thread, if sending a request fails don't take the
request back from the receive queue -- it might already be processed
by remote_recv_thread, which lead to crashes like below:

(primary) Unable to receive reply header: Connection reset by peer.
(primary) Unable to send request (Connection reset by peer):
WRITE(954662912, 131072).
(primary) Disconnected from kopusha:7772.
(primary) Increasing localcnt to 1.
(primary) Assertion failed: (old > 0), function refcnt_release,
file refcnt.h, line 62.

Taking the request back was not necessary (it would properly be
processed by the remote_recv_thread) and only complicated things.

r259195:

Send wakeup to threads waiting on empty queue before releasing the
lock to decrease spurious wakeups.

Submitted by: davidxu

r259196:

Check remote protocol version only for the first connection (when it
is actually sent by the remote node).

Otherwise it generated confusing "Negotiated protocol version 1" debug
messages when processing the second connection.


259073 07-Dec-2013 peter

Hoist all the mergeinfo up to the root in preparation for enforcing merges
to the root only. All MFC's were rerecorded to the root.

Going forward, if an MFC includes mergeinfo, it will need to be made to
the root and committed from the root. Merges with --ignore-ancestry
or diff | patch can go anywhere.

The mergeinfo in HEAD is in a bad state from years of neglect and manual
tampering and this was branched into 10.x. This confuses the coalescing
code and prevents it from doing its job.

Approved by: re (gjb, implicit)


257468 31-Oct-2013 trociny

MFC r257154:

Merging local and remote bitmaps must be protected by hr_amp lock.

This is believed to fix hastd crashes, which might occur during
synchronization, triggered by the failed assertion:

Assertion failed: (amp->am_memtab[ext] > 0),
function activemap_write_complete, file activemap.c, line 351.

Approved by: re (glebius)


256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


255717 19-Sep-2013 trociny

Fix comments.

Approved by: re (marius)
MFC after: 3 days


255716 19-Sep-2013 trociny

When updating the map of dirty extents, most recently used extents are
kept dirty to reduce the number of on-disk metadata updates. The
sequence of operations is:

1) acquire the activemap lock;
2) update in-memory map;
3) if the list of keepdirty extents is changed, update on-disk metadata;
4) release the lock.

On-disk updates are not frequent in comparison with in-memory updates,
while require much more time. So situations are possible when one
thread is updating on-disk metadata and another one is waiting for the
activemap lock just to update the in-memory map.

Improve this by introducing additional, on-disk map lock: when
in-memory map is updated and it is detected that the on-disk map needs
update too, the on-disk map lock is acquired and the on-memory lock is
released before flushing the map.

Reported by: Yamagi Burmeister yamagi.org
Tested by: Yamagi Burmeister yamagi.org
Reviewed by: pjd
Approved by: re (marius)
MFC after: 2 weeks


255714 19-Sep-2013 trociny

Use cv_broadcast() instead of cv_signal() when waking up threads
waiting on an empty queue as the queue may have several consumers.

Before the fix the following scenario was possible: 2 threads are
waiting on empty queue, 2 threads are inserting simultaneously. The
first inserting thread detects that the queue is empty and is going to
send the signal, but before it sends the second thread inserts
too. When the first sends the signal only one of the waiting threads
receive it while the other one may wait forever.

The scenario above is is believed to be the cause of the observed
cases, when ggate_recv_thread() was getting stuck on taking free
request, while the free queue was not empty.

Reviewed by: pjd
Tested by: Yamagi Burmeister yamagi.org
Approved by: re (marius)
MFC after: 2 weeks


255219 05-Sep-2013 pjd

Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

struct cap_rights {
uint64_t cr_rights[CAP_RIGHTS_VERSION + 2];
};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

#define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

#define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL)
#define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL)

#define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
void cap_rights_set(cap_rights_t *rights, ...);
void cap_rights_clear(cap_rights_t *rights, ...);
bool cap_rights_is_set(const cap_rights_t *rights, ...);

bool cap_rights_is_valid(const cap_rights_t *rights);
void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

cap_rights_t rights;

cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

#define cap_rights_set(rights, ...) \
__cap_rights_set((rights), __VA_ARGS__, 0ULL)
void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by: The FreeBSD Foundation


252472 01-Jul-2013 trociny

Make hastctl(1) ('list' command) output a worker pid.

Reviewed by: pjd
MFC after: 3 days


252421 30-Jun-2013 schweikh

Correct some grammar.


252386 29-Jun-2013 ed

Don't let hastd use C11 atomics.

Due to possible concerns about the stability of C11 atomics, use our
existing atomics API instead.

Requested by: pjd


251796 15-Jun-2013 ed

Let hastd use C11 atomics.

C11 atomics now work on all the architectures. Have at least a single
piece of software in our base system that uses C11 atomics. This
somewhat makes it less likely that we break it because of LLVM imports,
etc.


250914 22-May-2013 jkim

Improve compatibility with old flex and fix build with GCC.


250503 11-May-2013 trociny

Get rid of libl dependency. We needed it only to provide yywrap. But
yywrap is not necessary when parsing a single hast.conf file.

Suggested by: kib
Reviewed by: pjd


249970 27-Apr-2013 ed

Partially revert my last change.

I forgot that I still had a locally applied patch to my copy of Clang
that needs to be pushed in before we should use C11 atomics.


249969 27-Apr-2013 ed

Use C11 <stdatomic.h> instead of our non-standard <machine/atomic.h>.

Reviewed by: pjd


249657 19-Apr-2013 ed

Add the Clang specific -Wmissing-variable-declarations to WARNS=6.

This compiler flag enforces that that people either mark variables
static or use an external declarations for the variable, similar to how
-Wmissing-prototypes works for functions.

Due to the fact that Yacc/Lex generate code that cannot trivially be
changed to not warn because of this (lots of yy* variables), add a
NO_WMISSING_VARIABLE_DECLARATIONS that can be used to turn off this
specific compiler warning.

Announced on: toolchain@


248297 14-Mar-2013 pjd

Now that ioctl(2) is allowed in capability mode and we can limit ioctls for the
given descriptors, use Capsicum sandboxing for hastd in primary and secondary
modes. Allow for DIOCGDELETE and DIOCGFLUSH ioctls on provider descriptor and
for G_GATE_CMD_MODIFY, G_GATE_CMD_START, G_GATE_CMD_DONE and G_GATE_CMD_DESTROY
on GEOM Gate descriptor.

Sponsored by: The FreeBSD Foundation


248296 14-Mar-2013 pjd

Minor corrections.


248294 14-Mar-2013 pjd

Delete requests can be larger than MAXPHYS.


247281 25-Feb-2013 trociny

Add i/o error counters to hastd(8) and make hastctl(8) display
them. This may be useful for detecting problems with HAST disks.

Discussed with and reviewed by: pjd
MFC after: 1 week


246922 17-Feb-2013 pjd

- Add support for 'memsync' mode. This is the fastest replication mode that's
why it will now be the default.
- Bump protocol version to 2 and add backward compatibility for version 1.
- Allow to specify hosts by kern.hostid as well (in addition to hostname and
kern.hostuuid) in configuration file.

Sponsored by: Panzura
Tested by: trociny


244538 21-Dec-2012 kevlo

Fix socket calls on error post-r243965.

Submitted by: Garrett Cooper


242593 05-Nov-2012 pjd

Revert r228695. We use __func__ here as a format to distinguish between
abort and assert. It would be cleaner to use NULL or "" here, but gcc
complains in both cases.


238538 16-Jul-2012 trociny

Metaflush on/off values don't need quotes.

Reviewed by: pjd
MFC after: 3 days


238120 04-Jul-2012 pjd

Make use of GEOM Gate direct reads feature. This allows HAST to serve
reads with native speed of the underlying provider.
There are three situations when direct reads are not used:
1. Data is being synchronized and synchronization source is the secondary
node, which means secondary node has more recent data and we should read
from it.
2. Local read failed and we have to try to read from the secondary node.
3. Local component is unavailable and all I/O requests are served from the
secondary node.

Sponsored by: Panzura, http://www.panzura.com
MFC after: 1 month


237931 01-Jul-2012 pjd

Check if there is cmsg at all.

MFC after: 3 days


236919 11-Jun-2012 hselasky

Revert: r236909

Pointyhat: me


236909 11-Jun-2012 hselasky

Use the correct clock source when computing timeouts.

MFC after: 1 week


236507 03-Jun-2012 pjd

Simplify the code by using snprlcat().

MFC after: 3 days


235873 24-May-2012 wblock

Fixes to man8 groff mandoc style, usage mistakes, or typos.

PR: 168016
Submitted by: Nobuyuki Koganemaru
Approved by: gjb
MFC after: 3 days


235789 22-May-2012 bapt

Fix world after byacc import:
- old yacc(1) use to magicially append stdlib.h, while new one don't
- new yacc(1) do declare yyparse by itself, fix redundant declaration of
'yyparse'

Approved by: des (mentor)


235337 12-May-2012 gjb

General mdoc(7) and typo fixes.

PR: 167804
Submitted by: Nobuyuki Koganemaru (kogane!jp.freebsd.org)
MFC after: 3 days


233679 29-Mar-2012 trociny

If hastd is invoked with "-P pidfile" option always create pidfile
regardless of whether -F (foreground) option is set or not.

Also, if -P option is specified, ignore pidfile setting from configuration
not only on start but on reload too. This fixes the issue when for hastd
run with -P option reload caused the pidfile change.

Reviewed by: pjd
MFC after: 1 week


233392 23-Mar-2012 trociny

Fix typo.

MFC after: 3 days


231525 11-Feb-2012 pjd

Nice range comparison.

MFC after: 3 days


231016 05-Feb-2012 trociny

If a local write request is from the synchronization thread, when it
is synchronizing data that is out of date on the local component, we
should not send G_GATE_CMD_DONE acknowledge to the kernel.

This fixes the issue, observed in async mode, when on synchronization
from the remote component the worker terminated with "G_GATE_CMD_DONE
failed" error.

Reported by: Artem Kajalainen <artem kayalaynen ru>
Reviewed by: pjd
MFC after: 1 week


231015 05-Feb-2012 trociny

Fix the regression introduced in r226859: if the local component is
out of date BIO_READ requests got lost instead of being sent to the
remote component.

Reviewed by: pjd
MFC after: 1 week


230976 04-Feb-2012 pjd

Fix typo in comment.

MFC after: 3 days


230515 24-Jan-2012 pjd

- Fix documentation to note that /etc/hast.conf is the default configuration
file for hastd(8) and hastctl(8) and not hast.conf.
- In copyright statement correct that this file is documentation, not software.
- Bump date.

MFC after: 3 days


230457 22-Jan-2012 pjd

Free memory that won't be used in child.

MFC after: 1 week


230436 21-Jan-2012 pjd

Fix minor memory leak.

MFC after: 3 days


230396 20-Jan-2012 pjd

Remove another unused token.

MFC after: 3 days


230395 20-Jan-2012 pjd

Remove unused token 'port'.

MFC after: 3 days


230092 13-Jan-2012 pjd

Style cleanups.

MFC after: 3 days


229946 10-Jan-2012 pjd

- Fix a bug where pidfile was removed in SIGHUP when it hasn't changed in
configuration file.
- Log the fact that pidfile has changed.

MFC after: 3 days


229945 10-Jan-2012 pjd

For functions that return -1 on failure check exactly for -1 and not for
any negative number.

MFC after: 3 days


229944 10-Jan-2012 pjd

Don't touch pidfiles when running in foreground. Before that change we
would create an empty pidfile on start and check if it changed on SIGHUP.

MFC after: 3 days


229778 07-Jan-2012 uqs

Spelling fixes for sbin/


229744 06-Jan-2012 pjd

fork(2) returns -1 on failure, not some random negative number.

MFC after: 3 days


229699 06-Jan-2012 pjd

Constify argument.

MFC after: 3 days


228712 19-Dec-2011 dim

Use NO_WCAST_ALIGN for usr.bin/hastctl and usr.bin/hastd; the alignment
warnings in sbin/hastd/lzf.c are only emitted for i386 and amd64, and
there they can be safely ignored.

MFC after: 1 week


228696 18-Dec-2011 pjd

Use lex's standard way of not generating unused function.

Inspired by: r228555
MFC after: 1 week


228695 18-Dec-2011 pjd

Don't use function name as format string.

Detected by: clang
MFC after: 1 week


228544 15-Dec-2011 pjd

Remove redundant assignment.

Found by: Clang Static Analyzer
MFC after: 1 week


228543 15-Dec-2011 pjd

Simplify code by changing functions types from int to avoid, as the functions
always return 0.

Found by: Clang Static Analyzer
MFC after: 1 week


228542 15-Dec-2011 pjd

Remove redundant setting of the error variable.

Found by: Clang Static Analyzer
MFC after: 1 week


226861 27-Oct-2011 pjd

Remove redundant space.

MFC after: 3 days


226859 27-Oct-2011 pjd

Implement 'async' mode for HAST.

MFC after: 3 days


226857 27-Oct-2011 pjd

Minor cleanups.

MFC after: 3 days


226856 27-Oct-2011 pjd

Reduce indentation.

MFC after: 3 days


226855 27-Oct-2011 pjd

Improve comment so it doesn't suggest race is possible, but that we handle
the race.

MFC after: 3 days


226854 27-Oct-2011 pjd

- Eliminate the need for hio_nv.
- Introduce hio_clear() function for clearing hio before returning it
onto free queue.

MFC after: 3 days


226852 27-Oct-2011 pjd

Monor cleanups.

MFC after: 3 days


226851 27-Oct-2011 pjd

Delay resuid generation until first connection to secondary, not until first
write. This way on first connection we will synchronize only the extents that
were modified during the lifetime of primary node, not entire GEOM provider.

MFC after: 3 days


226842 27-Oct-2011 pjd

Correct comments.

MFC after: 3 days


226463 17-Oct-2011 pjd

Allow to specify pidfile in HAST configuration file.

MFC after: 1 week


226462 17-Oct-2011 pjd

Remove redundant space.

MFC after: 1 week


226461 17-Oct-2011 pjd

When path to the configuration file is relative, obtain full path,
so we can always find the file, even after daemonizing and changing
working directory to /.

MFC after: 1 week


225835 28-Sep-2011 pjd

Correct typo.

MFC after: 3 days


225832 28-Sep-2011 pjd

If the underlying provider doesn't support BIO_FLUSH, log it only once
and don't bother trying in the future.

MFC after: 3 days


225831 28-Sep-2011 pjd

Break a bit earlier.

MFC after: 3 days


225830 28-Sep-2011 pjd

After every activemap change flush disk's write cache, so that write
reordering won't make the actual write to be committed before marking
the coresponding extent as dirty.

It can be disabled in configuration file.

If BIO_FLUSH is not supported by the underlying file system we log a warning
and never send BIO_FLUSH again to that GEOM provider.

MFC after: 3 days


225787 27-Sep-2011 pjd

Use PJDLOG_ASSERT() and PJDLOG_ABORT() everywhere instead of assert().

MFC after: 3 days


225786 27-Sep-2011 pjd

No need to wrap pjdlog functions around with KEEP_ERRNO() macro.

MFC after: 3 days


225784 27-Sep-2011 pjd

- Convert some impossible conditions into assertions.
- Add missing 'if' in comment.

MFC after: 3 days


225783 27-Sep-2011 pjd

Correct two mistakes when converting asserts to PJDLOG_ASSERT()/PJDLOG_ABORT().

MFC after: 3 days


225782 27-Sep-2011 pjd

Prefer PJDLOG_ASSERT() and PJDLOG_ABORT() over assert() and abort().
pjdlog versions will log problem to syslog when application is running in
background.

MFC after: 3 days


225781 27-Sep-2011 pjd

No need to use KEEP_ERRNO() macro around pjdlog functions, as they don't
modify errno.

MFC after: 3 days


225773 27-Sep-2011 pjd

Ensure that pjdlog functions don't modify errno.

MFC after: 3 days


223974 13-Jul-2011 trociny

Fix indentation.

Approved by: pjd (mentor)


223780 05-Jul-2011 trociny

Remove useless initialization.

Approved by: pjd (mentor)
MFC after: 3 days


223655 28-Jun-2011 trociny

Check the returned value of activemap_write_complete() and update matadata on
disk if needed. This should fix a potential case when extents are cleared in
activemap but metadata is not updated on disk.

Suggested by: pjd
Approved by: pjd (mentor)


223654 28-Jun-2011 trociny

Make activemap_write_start/complete check the keepdirty list, when
stating if we need to update activemap on disk. This makes keepdirty
serve its purpose -- to reduce number of metadata updates.

Discussed with: pjd
Approved by: pjd (mentor)


223586 27-Jun-2011 pjd

Compile hastd and hastctl with capsicum support.

X-MFC after: capsicum merge


223585 27-Jun-2011 pjd

Compile capsicum support only if HAVE_CAPSICUM is defined.

MFC after: 3 days


223584 27-Jun-2011 pjd

Log a warning if we cannot sandbox using capsicum, but only under debug level 1.
It would be too noisy to log it as a proper warning as CAPABILITIES are not
compiled into GENERIC by default.

MFC after: 3 days


223181 17-Jun-2011 trociny

In HAST we use two sockets - one for only sending the data and one for
only receiving the data. In r220271 the unused directions were
disabled using shutdown(2).

Unfortunately, this broke automatic receive buffer sizing, which
currently works only for connections in ETASBLISHED state. It was a
root cause of the issue reported by users, when connection between
primary and secondary could get stuck.

Disable the code introduced in r220271 until the issue with automatic
buffer sizing is not resolved.

Reported by: Daniel Kalchev <daniel@digsys.bg>, danger, sobomax
Tested by: Daniel Kalchev <daniel@digsys.bg>, danger
Approved by: pjd (mentor)
MFC after: 1 week


223143 16-Jun-2011 sobomax

Revert r222688.

Requested by: Mikolaj Golub


222688 04-Jun-2011 sobomax

Read from the socket using the same max buffer size as we use while
sending. What happens otherwise is that the sender splits all the
traffic into 32k chunks, while the receiver is waiting for the whole
packet. Then for a certain packet sizes, particularly 66607 bytes in
my case, the communication stucks to secondary is expecting to
read one chunk of 66607 bytes, while primary is sending two chunks
of 32768 bytes and third chunk of 1071. Probably due to TCP windowing
and buffering the final chunk gets stuck somewhere, so neither server
not client can make any progress.

This patch also protect from short reads, as according to the manual
page there are some cases when MSG_WAITALL can give less data than
expected.

MFC after: 3 days


222467 29-May-2011 trociny

If READ from the local node failed we send the request to the remote
node. There is no use in doing this for synchronization requests.

Approved by: pjd (mentor)
MFC after: 1 week


222228 23-May-2011 pjd

Keep statistics on number of BIO_READ, BIO_WRITE, BIO_DELETE and BIO_FLUSH
requests as well as number of activemap updates.

Number of BIO_WRITEs and activemap updates are especially interesting, because
if those two are too close to each other, it means that your workload needs
bigger number of dirty extents. Activemap should be updated as rarely as
possible.

MFC after: 1 week


222224 23-May-2011 pjd

To handle BIO_FLUSH and BIO_DELETE requests in secondary worker we need
to use ioctl(2). This is why we can't use capsicum for now to sandbox
secondary. Capsicum is still used to sandbox hastctl.

MFC after: 1 week


222164 21-May-2011 pjd

Recognize HIO_FLUSH requests.

MFC after: 1 week


222121 20-May-2011 pjd

Document IPv6 support.

MFC after: 3 weeks


222120 20-May-2011 pjd

If no listen address is specified, bind by default to:

tcp4://0.0.0.0:8457
tcp6://[::]:8457

MFC after: 3 weeks


222119 20-May-2011 pjd

Rename ipv4/ipv6 to tcp4/tcp6.

MFC after: 3 weeks


222118 20-May-2011 pjd

Now that hell is fully frozen it is good time to add IPv6 support to HAST.

MFC after: 3 weeks


222117 20-May-2011 pjd

Allow [ ] characters in strings. They might be used in IPv6 addresses.

MFC after: 3 weeks


222116 20-May-2011 pjd

Rename tcp4 to tcp in preparation for IPv6 support.

MFC after: 3 weeks


222115 20-May-2011 pjd

Rename proto_tcp4.c to proto_tcp.c in preparation for IPv6 support.

MFC after: 2 weeks


222108 19-May-2011 pjd

In preparation for IPv6 support allow to specify multiple addresses to
listen on.

MFC after: 3 weeks


222087 18-May-2011 pjd

- Add support for AF_INET6 sockets for %S format character.
- Use inet_ntop(3) instead of reimplementing it.
- Use %hhu for unsigned char instead of casting it to unsigned int and
using %u.

MFC after: 1 week


221899 14-May-2011 pjd

Currently we are unable to use capsicum for the primary worker process,
because we need to do ioctl(2)s, which are not permitted in the capability
mode. What we do now is to chroot(2) to /var/empty, which restricts access
to file system name space and we drop privileges to hast user and hast
group.

This still allows to access to other name spaces, like list of processes,
network and sysvipc.

To address that, use jail(2) instead of chroot(2). Using jail(2) will restrict
access to process table, network (we use ip-less jails) and sysvipc (if
security.jail.sysvipc_allowed is turned off). This provides much better
separation.

MFC after: 1 week


221898 14-May-2011 pjd

When using capsicum to sanbox, still use other methods first, just in case
one of them have some problems.


221643 08-May-2011 pjd

Allow to specify remote as 'none' again which was broken by r219351, where
'none' was defined as a value for checksum.

Reported by: trasz
MFC after: 1 week


221632 08-May-2011 trociny

Fix isitme(), which is used to check if node-specific configuration
belongs to our node, and was returning false positive if the first
part of a node name matches short hostname.

Approved by: pjd (mentor)


221078 26-Apr-2011 trociny

Add missing ifdef. This fixes build with NO_OPENSSL.

Reported by: Pawel Tyll <ptyll@nitronet.pl>
Approved by: pjd (mentor)
MFC after: 1 week


221076 26-Apr-2011 trociny

Rename HASTCTL_ defines, which are used for conversion between main
hastd process and workers, remove unused one and set different range
of numbers. This is done in order not to confuse them with HASTCTL_CMD
defines, used for conversation between hastctl and hastd, and to avoid
bugs like the one fixed in in r221075.

Approved by: pjd (mentor)
MFC after: 1 week


221075 26-Apr-2011 trociny

For conversation between hastctl and hastd we should use HASTCTL_CMD
defines.

Approved by: pjd (mentor)
MFC after: 1 week


220899 20-Apr-2011 pjd

Correct comment.

MFC after: 1 week


220898 20-Apr-2011 pjd

When we become primary, we connect to the remote and expect it to be in
secondary role. It is possible that the remote node is primary, but only
because there was a role change and it didn't finish cleaning up (unmounting
file systems, etc.). If we detect such situation, wait for the remote node
to switch the role to secondary before accepting I/Os. If we don't wait for
it in that case, we will most likely cause split-brain.

MFC after: 1 week


220890 20-Apr-2011 pjd

If we act in different role than requested by the remote node, log it
as a warning and not an error.

MFC after: 1 week


220889 20-Apr-2011 pjd

Timeout must be positive.

MFC after: 1 week


220865 19-Apr-2011 pjd

Scenario:
- We have two nodes connected and synchronized (local counters on both sides
are 0).
- We take secondary down and recreate it.
- Primary connects to it and starts synchronization (but local counters are
still 0).
- We switch the roles.
- Synchronization restarts but data is synchronized now from new primary
(because local counters are 0) that doesn't have new data yet.

This fix this issue we bump local counter on primary when we discover that
connected secondary was recreated and has no data yet.

Reported by: trociny
Discussed with: trociny
Tested by: trociny
MFC after: 1 week


220744 17-Apr-2011 trociny

Remove hast_proto_recv(). It was used only in one place, where
hast_proto_recv_hdr() may be used. This also fixes the issue
(introduced by r220523) with hastctl, which crashed on assert in
hast_proto_recv_data().

Suggested and approved by: pjd (mentor)


220573 12-Apr-2011 pjd

The replication mode that is currently support is fullsync, not memsync.
Correct this and print a warning if different replication mode is
configured.

MFC after: 1 week


220523 10-Apr-2011 trociny

In hast_proto_recv() remove unnecessary check. The size is checked
later in hast_proto_recv_data().

Approved by: pjd (mentor)
MFC after: 1 week


220522 10-Apr-2011 trociny

In hast_proto_recv_data() check that the size of the data to be
received does not exceed the buffer size.

Approved by: pjd (mentor)
MFC after: 1 week


220521 10-Apr-2011 trociny

Fix a typo in comments.

Approved by: pjd (mentor)
MFC after: 3 days


220274 02-Apr-2011 pjd

Increase default timeout from 5 seconds to 20 seconds. 5 seconds is definitely
to short under heavy load and I was experiencing those timeouts in my recent
tests.

MFC after: 1 week


220273 02-Apr-2011 pjd

Handle ENOBUFS on send(2) by retrying for a while and logging the problem.

MFC after: 1 week


220272 02-Apr-2011 pjd

When we are operating on blocking socket and get EAGAIN on send(2) or recv(2)
this means that request timed out. Translate the meaningless EAGAIN to
ETIMEDOUT to give administrator a hint that he might need to increase timeout
in configuration file.

MFC after: 1 month


220271 02-Apr-2011 pjd

Declare directions for sockets between primary and secondary.
In HAST we use two sockets - one for only sending the data and one for only
receiving the data.

MFC after: 1 month


220270 02-Apr-2011 pjd

Allow to disable sends or receives on a socket using shutdown(2) by
interpreting NULL 'data' argument passed to proto_common_send() or
proto_common_recv() as a will to do so.

MFC after: 1 month


220266 02-Apr-2011 pjd

Handle the problem described in r220264 by using GEOM GATE queue of unlimited
length. This should fix deadlocks reported by HAST users.

MFC after: 1 week


220007 25-Mar-2011 pjd

Add mapsize to the header just before sending the packet.
Before it could change later and we were sending invalid mapsize.
Some time ago I added optimization where when nodes are connected for the
first time and there were no writes to them yet, there is no initial full
synchronization. This bug prevented it from working.

MFC after: 1 week


220006 25-Mar-2011 pjd

Use timeout from configuration file not only when sending and receiving,
but also when establishing connection.

MFC after: 1 week


220005 25-Mar-2011 pjd

Use role2str() when setting process title.

MFC after: 1 week


219900 23-Mar-2011 pjd

Don't create socketpair for connection forwarding between parent and secondary.
Secondary doesn't need to connect anywhere.

MFC after: 1 week


219887 22-Mar-2011 pjd

Add my copyright.

MFC after: 1 week


219882 22-Mar-2011 trociny

After synchronization is complete we should make primary counters be
equal to secondary counters:

primary_localcnt = secondary_remotecnt
primary_remotecnt = secondary_localcnt

Previously it was done wrong and split-brain was observed after
primary had synchronized up-to-date data from secondary.

Approved by: pjd (mentor)
MFC after: 1 week


219879 22-Mar-2011 trociny

For requests that are sent only to remote component use the
error from remote.
Approved by: pjd (mentor)
MFC after: 1 week


219873 22-Mar-2011 pjd

The proto API is a general purpose API, so don't use 'hast' in structures or
function names. It can now be used outside of HAST.

MFC after: 1 week


219864 22-Mar-2011 pjd

White space cleanups.

MFC after: 1 week


219847 21-Mar-2011 pjd

When dropping privileges prefer capsicum over chroot+setgid+setuid.
We can use capsicum for secondary worker processes and hastctl.
When working as primary we drop privileges using chroot+setgid+setuid
still as we need to send ioctl(2)s to ggate device, for which capsicum
doesn't allow (yet).

X-MFC after: capsicum is merged to stable/8


219844 21-Mar-2011 pjd

Initialize localcnt on first write. This fixes assertion when we create
resource, set role to primary, do no writes, then sent it to secondary
and accept connection from primary.

MFC after: 1 week


219843 21-Mar-2011 pjd

Fix typo.

MFC after: 1 week


219837 21-Mar-2011 pjd

Before handling any events on descriptors check signals so we can update
our info about worker processes if any of them was terminated in the meantime.

This fixes the problem with 'hastctl status' running from a hook called on
split-brain:
1. Secondary calls a hooks and terminates.
2. Hook asks for resource status via 'hastctl status'.
3. The main hastd handles the status request by sending it to the secondary
worker who is already dead, but because signals weren't checked yet he
doesn't know that and we get EPIPE.

MFC after: 1 week


219833 21-Mar-2011 pjd

Remove stale comment. Yes, it is valid to set role back to init.

MFC after: 1 week


219832 21-Mar-2011 pjd

Increase debug level of "Checking hooks." message.

MFC after: 1 week


219831 21-Mar-2011 pjd

Be pedantic and free nvout before exiting.

MFC after: 1 week


219830 21-Mar-2011 pjd

Detect situation where resource internal identifier differs.
This means that both nodes have separately managed resources that don't
have the same data.

MFC after: 1 week


219818 21-Mar-2011 pjd

In hast.conf we define the other node's address in 'remote' variable.
This way we know how to connect to secondary node when we are primary.
The same variable is used by the secondary node - it only accepts
connections from the address stored in 'remote' variable.
In cluster configurations it is common that each node has its individual
IP address and there is one addtional shared IP address which is assigned
to primary node. It seems it is possible that if the shared IP address is
from the same network as the individual IP address it might be choosen by
the kernel as a source address for connection with the secondary node.
Such connection will be rejected by secondary, as it doesn't come from
primary node individual IP.

Add 'source' variable that allows to specify source IP address we want to
bind to before connecting to the secondary node.

MFC after: 1 week


219817 21-Mar-2011 pjd

Log when we start hooks checking and when we execute a hook.

MFC after: 1 week


219816 21-Mar-2011 pjd

Use snprlcat() instead of two strlcat(3)s.

MFC after: 1 week


219815 21-Mar-2011 pjd

Add snprlcat() and vsnprlcat() - the functions I'm always missing.
They work as a combination of snprintf(3) and strlcat(3) - the caller
can append a string build based on the given format.

MFC after: 1 week


219814 21-Mar-2011 pjd

When creating connection on behalf of primary worker, set pjdlog prefix
to resource name and role, so that any logs related to that can be identified
properly.

MFC after: 1 week


219813 21-Mar-2011 pjd

If there is any traffic on one of out descriptors, we were not checking for
long running hooks. Fix it by not using select(2) timeout to decide if we want
to check hooks or not.

MFC after: 1 week


219721 17-Mar-2011 trociny

For secondary, set 2 * HAST_KEEPALIVE seconds timeout for incoming
connection so the worker will exit if it does not receive packets from
the primary during this interval.

Reported by: Christian Vogt <Christian.Vogt@haw-hamburg.de>
Tested by: Christian Vogt <Christian.Vogt@haw-hamburg.de>
Approved by: pjd (mentor)
MFC after: 1 week


219669 15-Mar-2011 pjd

Remove #include needed for debugging.

MFC after: 1 week


219482 11-Mar-2011 trociny

Make workers inherit debug level from the main process.

Approved by: pjd (mentor)
MFC after: 1 week


219385 07-Mar-2011 pjd

Unbreak the build.

MFC after: 2 weeks


219372 07-Mar-2011 pjd

- Log size of data to synchronize in human readable form (using %N).
- Log synchronization time (using %T).
- Log synchronization speed in human readable form (using %N).

MFC after: 2 weeks


219371 07-Mar-2011 pjd

Use %S to print IP address and port number.

MFC after: 2 weeks


219370 07-Mar-2011 pjd

- Turn on printf extentions.
- Load support for %T for pritning time.
- Add support for %N for printing number in human readable form.
- Add support for %S for printing sockaddr structure (currently only AF_INET
family is supported, as this is all we need in HAST).
- Disable gcc compile-time format checking as this will no longer work.

MFC after: 2 weeks


219369 07-Mar-2011 pjd

Provides three states for pjdlog_initialized, so we can also tell that
this is fist initialization ever.

MFC after: 2 weeks


219354 06-Mar-2011 pjd

Allow to compress on-the-wire data using two algorithms:
- HOLE - it simply turns all-zero blocks into few bytes header;
it is extremely fast, so it is turned on by default;
it is mostly intended to speed up initial synchronization
where we expect many zeros;
- LZF - very fast algorithm by Marc Alexander Lehmann, which shows
very decent compression ratio and has BSD license.

MFC after: 2 weeks


219351 06-Mar-2011 pjd

Allow to checksum on-the-wire data using either CRC32 or SHA256.

MFC after: 2 weeks


218474 09-Feb-2011 pjd

When we decide to unlink socket file, sun_path must be set. If it is set,
but there is problem unlinking the file, log a warning.

MFC after: 1 week


218465 08-Feb-2011 pjd

Explicitly include <sys/types.h> as suggested by getpid(2) and don't rely on
<sys/un.h> including what's needed.

MFC after: 1 week


218464 08-Feb-2011 pjd

Unlink UNIX domain socket file only if:
1. The descriptor is the one we are listening on (not the one when we connect
as a client and not the one which is created on accept(2)).
2. Descriptor was created by us (PID matches with the PID stored on bind(2)).

Reported by: Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 1 week


218376 06-Feb-2011 pjd

Now that we break the loop on fstat(2) failure we no longer need to satisfy
gcc's imperfections.

MFC after: 1 week


218375 06-Feb-2011 pjd

Add (void) cast before snprintf(3)s for which we are not interested in return
values.

MFC after: 1 week


218374 06-Feb-2011 pjd

Treat fstat(2) failure (different than EBADF) as fatal error.

Reported by: Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 1 week


218373 06-Feb-2011 pjd

Open syslog when logging sysconf(3) failure.

Reported by: Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 1 week


218370 06-Feb-2011 pjd

Close more descriptors that can be open if the worker process for the given
resource is already running.

Submitted by: Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 1 week


218218 03-Feb-2011 pjd

Setup another socketpair between parent and child, so that primary sandboxed
worker can ask the main privileged process to connect in worker's behalf
and then we can migrate descriptor using this socketpair to worker.
This is not really needed now, but will be needed once we start to use
capsicum for sandboxing.

MFC after: 1 week


218217 03-Feb-2011 pjd

Add missing locking after moving keepalive_send() to remote send thread
in r214692.

MFC after: 1 week


218214 03-Feb-2011 pjd

Let the caller log info about successful privilege drop.
We don't want to log this in hastctl.

MFC after: 1 week


218194 02-Feb-2011 pjd

- Rename proto_descriptor_{send,recv}() functions to
proto_connection_{send,recv} and change them to return proto_conn
structure. We don't operate directly on descriptors, but on
proto_conns.
- Add wrap method to wrap descriptor with proto_conn.
- Remove methods to send and receive descriptors and implement this
functionality as additional argument to send and receive methods.

MFC after: 1 week


218193 02-Feb-2011 pjd

Add proto_connect_wait() to wait for connection to finish.
If timeout argument to proto_connect() is -1, then the caller needs to use
this new function to wait for connection.

This change is in preparation for capsicum, where sandboxed worker wants
to ask main process to connect in worker's behalf and pass descriptor
to the worker. Because we don't want the main process to wait for the
connection, it will start async connection and pass descriptor to the
worker who will be responsible for waiting for the connection to finish.

MFC after: 1 week


218192 02-Feb-2011 pjd

Allow to specify connection timeout by the caller.

MFC after: 1 week


218191 02-Feb-2011 pjd

Move protocol allocation and deallocation to separate functions.

MFC after: 1 week


218185 02-Feb-2011 pjd

Be prepared that hp_client or hp_server might be NULL now.

MFC after: 1 week


218158 01-Feb-2011 pjd

Do not set socket send and receive buffer. It will be auto-tuned.

Confirmed by: rwatson
MFC after: 1 week


218148 31-Jan-2011 pjd

Fix build on ia64.

I found no way how to use CMSG_NXTHDR() macro on ia64 without alignment
warnings.

MFC after: 1 week


218147 31-Jan-2011 pjd

Until I fix the build on ia64 comment out problematic lines.
Those lines are part of the (for now) unused functions.


218139 31-Jan-2011 pjd

Implement two new functions for sending descriptor and receving descriptor
over UNIX domain sockets and socket pairs.
This is in preparation for capsicum.

MFC after: 1 week


218138 31-Jan-2011 pjd

- Use pjdlog for assertions and aborts as this will log assert/abort message
to syslog if we run in background.
- Asserts in proto.c that method we want to call is implemented and remove
dummy methods from protocols implementation that are only there to abort
the program with nice message.

MFC after: 1 week


218132 31-Jan-2011 pjd

Rename pjdlog_verify() to pjdlog_abort() as it better describes what the
the function does and mark it with __dead2.

MFC after: 1 week


218049 28-Jan-2011 pjd

Drop privileges in worker processes.

Accepting connections and handshaking in secondary is still done before
dropping privileges. It should be implemented by only accepting connections in
privileged main process and passing connection descriptors to the worker, but
is not implemented yet.

MFC after: 1 week


218048 28-Jan-2011 pjd

Implement function that drops privileges by:
- chrooting to /var/empty (user hast home directory),
- setting groups to 'hast' (user hast primary group),
- setting real group id, effective group id and saved group id to 'hast',
- setting real user id, effective user id and saved user id to 'hast'.
At the end verify that those operations where successfull.

MFC after: 1 week


218045 28-Jan-2011 pjd

Use newly added descriptors_assert() function to ensure only expected
descriptors are open.

MFC after: 1 week


218044 28-Jan-2011 pjd

Add function to assert that the only descriptors we have open are the ones
we expect to be open. Also assert that they point at expected type.

Because openlog(3) API is unable to tell us descriptor number it is using, we
have to close syslog socket, remember assert message in local buffer and if we
fail on assertion, reopen syslog socket and log the message.

MFC after: 1 week


218043 28-Jan-2011 pjd

Close all unneeded descriptors after fork(2).

MFC after: 1 week


218042 28-Jan-2011 pjd

Add comments to places where we treat errors as ciritical, but it is possible
to handle them more gracefully.

MFC after: 1 week


218041 28-Jan-2011 pjd

Add function to close all unneeded descriptors after fork(2).

MFC after: 1 week


218040 28-Jan-2011 pjd

Initialize all global variables on pjdlog_init().

MFC after: 1 week


217969 27-Jan-2011 pjd

Remember created control connection so on fork(2) we can close it in child.

Found with: procstat(1)
MFC after: 1 week


217967 27-Jan-2011 pjd

Close the control socket before exiting, so it will be unlinked.

MFC after: 1 week


217966 27-Jan-2011 pjd

Extend pjdlog_verify() to support the following additional macros:
PJDLOG_RVERIFY() - always check expression and on false log the given message
and exit.
PJDLOG_RASSERT() - check expression when NDEBUG is not defined and on false log
given message and exit.
PJDLOG_ABORT() - log the given message and exit.

MFC after: 1 week


217965 27-Jan-2011 pjd

Add functions to initialize/finalize pjdlog. This allows to open/close log
file at will.

MFC after: 1 week


217964 27-Jan-2011 pjd

Use my copyright for 2011 work.

MFC after: 1 week


217962 27-Jan-2011 pjd

Add LOG_NDELAY flag to openlog(3) - we want descriptor to be immediately open
so there are no surprises once we start chrooting or using capsicum.

MFC after: 1 week


217961 27-Jan-2011 pjd

- Remove obvious NOTREACHED comment after abort() call.
- Remove redundant newline at the end of the file.

MFC after: 1 week


217958 27-Jan-2011 pjd

Remove __dead2 from pjdlog_verify() prototype, it does return sometimes.

MFC after: 1 week


217784 24-Jan-2011 pjd

Don't open configuration file from worker process. Handle SIGHUP in the
master process only and pass changes to the worker processes over control
socket. This removes access to global namespace in preparation for capsicum
sandboxing.

MFC after: 2 weeks


217737 22-Jan-2011 pjd

Add missing logs.

MFC after: 1 week


217732 22-Jan-2011 pjd

Add nv_assert() which allows to assert that the given name exists.

MFC after: 1 week


217731 22-Jan-2011 pjd

Use more consistent function name with the others (pjdlogv_prefix_set()
instead of pjdlog_prefix_setv()).

MFC after: 1 week


217730 22-Jan-2011 pjd

Use int16 for error.

MFC after: 1 week


217729 22-Jan-2011 pjd

- On primary worker reload, update hr_exec field.
- Update comment.

MFC after: 1 week


217312 12-Jan-2011 pjd

execve(2), not fork(2) resets signal handler to the default value (if it isn't
ignored). Correct comment talking about that.

Pointed out by: kib
MFC after: 3 days


217308 12-Jan-2011 pjd

Add a note that when custom signal handler is installed for a signal,
signal action is restored to default in child after fork(2).
In this case there is no need to do anything with dummy SIGCHLD handler,
because after fork(2) it will be automatically reverted to SIG_IGN.

Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
MFC after: 3 days


217307 12-Jan-2011 pjd

Install default signal handlers before masking signals we want to handle.
It is possible that the parent process ignores some of them and sigtimedwait()
will never see them, eventhough they are masked.

The most common situation for this to happen is boot process where init(8)
ignores SIGHUP before starting to execute /etc/rc. This in turn caused
hastd(8) to ignore SIGHUP.

Reported by: trasz
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
MFC after: 3 days


216722 26-Dec-2010 pjd

Detect when resource is configured more than once.

MFC after: 3 days


216721 26-Dec-2010 pjd

When node-specific configuration is missing in resource section, provide
more useful information. Instead of:

hastd: remote address not configured for resource foo

Print the following:

No resource foo configuration for this node (acceptable node names: freefall, freefall.freebsd.org, 44333332-4c44-4e31-4a30-313920202020).

MFC after: 3 days


216494 16-Dec-2010 pjd

The 'ret' variable is of type ssize_t and we use proper format for it (%zd), so
no (bogus) cast is needed.

MFC after: 3 days


216479 16-Dec-2010 pjd

Improve problems logging.

MFC after: 3 days


216478 16-Dec-2010 pjd

Don't ignore errors from remote requests.

MFC after: 3 days


216477 16-Dec-2010 pjd

Log the fact of launching and include protocol version number.

MFC after: 3 days


215676 22-Nov-2010 brucec

Don't generate input() since it's not used.


215332 15-Nov-2010 pjd

Move timeout.tv_sec initialization outside the loop - sigtimedwait(2) won't
modify it.

Submitted by: Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 3 days


215331 15-Nov-2010 pjd

1. Exit when we cannot create incoming connection.
2. Improve logging to inform which connection can't be created.

Submitted by: [1] Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 3 days


214692 02-Nov-2010 pjd

Send packets to remote node only via the send thread to avoid possible
races - in this case a keepalive packet was send from wrong thread which
lead to connection dropping, because of corrupted packet.

Fix it by sending keepalive packets directly from the send thread.
As a bonus we now send keepalive packets only when connection is idle.

Submitted by: Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 3 days


214284 24-Oct-2010 pjd

Before this change on first connect between primary and secondary we
initialize all the data. This is huge waste of time and resources if
there were no writes yet, as there is no real data to synchronize.

Optimize this by sending "virgin" argument to secondary, which gives it a hint
that synchronization is not needed.

In the common case (where noth nodes are configured at the same time) instead
of synchronizing everything, we don't synchronize at all.

MFC after: 1 week


214283 24-Oct-2010 pjd

Implement nv_exists() function that returns true if argument of the given
name exists.

MFC after: 3 days


214282 24-Oct-2010 pjd

Move all NV defines into nv.c, they are not used externally thus there is
no need to make then visible from outside.

MFC after: 3 days


214276 24-Oct-2010 pjd

Simplify code a bit.

MFC after: 3 days


214275 24-Oct-2010 pjd

Plug memory leak.

MFC after: 3 days


214274 24-Oct-2010 pjd

Plug memory leaks.

Found with: valgrind
MFC after: 3 days


214273 24-Oct-2010 pjd

Load geom_gate.ko module after parsing arguments.

MFC after: 3 days


214119 20-Oct-2010 pjd

Use closefrom(2) instead of close(2) in a loop.

MFC after: 1 week


213981 17-Oct-2010 pjd

Log correct connection when canceling half-open connection.

Submitted by: Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 3 days


213939 16-Oct-2010 pjd

Use one fprintf() instead of two.

MFC after: 3 days


213938 16-Oct-2010 pjd

Clear signal mask before executing a hook.

Submitted by: Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 3 days


213580 08-Oct-2010 pjd

We can't zero out ggio request, as we have some fields in there we initialize
once during start-up.

Reported by: Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 3 days


213579 08-Oct-2010 pjd

We close the event socketpair early in the mainloop to prevent spaming with
error messages, so when we clean up after child process, we have to check if
the event socketpair is still there.

Submitted by: Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 3 days


213533 07-Oct-2010 pjd

Clear ggate structures before using them. We don't initialize all the field
and there can be some garbage from the stack.

MFC after: 1 week


213531 07-Oct-2010 pjd

Log error message when we fail to destroy ggate provider.

MFC after: 3 days


213530 07-Oct-2010 pjd

Start the guard thread first, so we can handle signals from the very begining.

Reported by: Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 1 week


213529 07-Oct-2010 pjd

Don't close local component on exit as we can hang waiting on g_waitidle.
I'm unable to reproduce the race described in comment anymore and also the
comment is incorrect - localfd represents local component from configuration
file, eg. /dev/da0 and not HAST provider.

Reported by: Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 1 week


213430 04-Oct-2010 pjd

Decrease report interval to 5 seconds, as this also means we will check for
signals every 5 seconds and not every 10 seconds as before.

MFC after: 3 days


213429 04-Oct-2010 pjd

hook_check() is now only used to report about long-running hooks, so the
argument is redundant, remove it.

MFC after: 3 days


213428 04-Oct-2010 pjd

We can't mask ignored signal, so install dummy signal hander for SIGCHLD before
masking it.

This fixes bogus reports about hooks running for too long and other problems
related to garbage-collecting child processes.

Reported by: Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 3 days


213183 26-Sep-2010 pjd

Plug memory leak on fork(2) failure.

Submitted by: Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 3 days


213009 22-Sep-2010 pjd

Switch to sigprocmask(2) API also in the main process and secondary process.
This way the primary process inherits signal mask from the main process,
which fixes a race where signal is delivered to the primary process before
configuring signal mask.

Reported by: Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 3 days


213008 22-Sep-2010 pjd

Assert that descriptor numbers are sane.

MFC after: 3 days


213007 22-Sep-2010 pjd

Fix possible deadlock where worker process sends an event to the main process
while the main process sends control message to the worker process, but worker
process hasn't started control thread yet, because it waits for reply from the
main process.

The fix is to start the control thread before sending any events.

Reported and fix suggested by: Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 3 days


213006 22-Sep-2010 pjd

Fix descriptor leaks: when child exits, we have to close control and event
socket pairs. We did that only in one case out of three.

MFC after: 3 days


213004 22-Sep-2010 pjd

If we are unable to receive control message is most likely because the main
process died. Instead of entering infinite loop, terminate.

MFC after: 3 days


213003 22-Sep-2010 pjd

Sort includes.

MFC after: 3 days


212899 20-Sep-2010 pjd

Add __dead2 to functions that we know they are going to exit.

MFC after: 3 days


212052 31-Aug-2010 pjd

Include process PID in log messages.

Submitted by: Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 2 weeks


212051 31-Aug-2010 pjd

Correct error message.

Submitted by: Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 2 weeks


212049 31-Aug-2010 pjd

Forgot to add event.c and event.h in r212038.

Pointed out by: pluknet <pluknet@gmail.com>
MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


212046 31-Aug-2010 pjd

Mask only those signals that we want to handle.

Suggested by: jilles
MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


212038 30-Aug-2010 pjd

Because it is very hard to make fork(2) from threaded process safe (we are
limited to async-signal safe functions in the child process), move all hooks
execution to the main (non-threaded) process.

Do it by maintaining connection (socketpair) between child and parent
and sending events from the child to parent, so it can execute the hook.

This is step in right direction for others reasons too. For example there is
one less problem to drop privs in worker processes.

MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


212037 30-Aug-2010 pjd

We only want to know if descriptors are ready for reading.

MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


212036 30-Aug-2010 pjd

When someone gives NULL as data, assume this is because he want to declare
connection side only.

MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


212034 30-Aug-2010 pjd

Use pjdlog_exit() before fork().

MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


212033 30-Aug-2010 pjd

Constify arguments we can constify.

MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


211984 30-Aug-2010 pjd

Execute hook when connection between the nodes is established or lost.

MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


211983 30-Aug-2010 pjd

Execute hook when split-brain is detected.

MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


211982 30-Aug-2010 pjd

Use sigtimedwait(2) for signals handling in primary process.
This fixes various races and eliminates use of pthread* API in signal handler.

Pointed out by: kib
With help from: jilles
MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


211981 29-Aug-2010 pjd

- Move functionality responsible for checking one connection to separate
function to make code more readable.
- Be sure not to reconnect too often in case of signal delivery, etc.

MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


211979 29-Aug-2010 pjd

Disconnect after logging errors.

MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


211978 29-Aug-2010 pjd

- Call hook on role change.
- Document new event.

MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


211977 29-Aug-2010 pjd

Allow to run hooks from the main hastd process.

MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


211976 29-Aug-2010 pjd

- Add hook_fini() which should be called after fork() from the main hastd
process, once it start to use hooks.
- Add hook_check_one() in case the caller expects different child processes
and once it can recognize it, it will pass pid and status to hook_check_one().

MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


211975 29-Aug-2010 pjd

Implement mtx_destroy() and rw_destroy().

MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


211899 27-Aug-2010 pjd

When SIGTERM or SIGINT is received, terminate worker processes.

MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


211898 27-Aug-2010 pjd

When logging to stdout/stderr, flush after each log.

MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


211897 27-Aug-2010 pjd

Correct when we log interrupted synchronization.

MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


211896 27-Aug-2010 pjd

Check if no signals were delivered just before going to sleep.

MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


211895 27-Aug-2010 pjd

Add hooks execution.

MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


211887 27-Aug-2010 pjd

Document new 'exec' parameter.

MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


211886 27-Aug-2010 pjd

Allow to execute specified program on various HAST events.

MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


211885 27-Aug-2010 pjd

- Run hooks in background - don't block waiting for them to finish.
- Keep all hooks we're running in a global list, so we can report when
they finish and also report when they are running for too long.

MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


211884 27-Aug-2010 pjd

When logging to stdout/stderr don't close those descriptors after fork().

MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


211883 27-Aug-2010 pjd

Reduce indent where possible.

MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


211882 27-Aug-2010 pjd

Implement keepalive mechanism inside HAST protocol so we can detect secondary
node failures quickly for HAST resources that are rarely modified.

Remove XXX from a comment now that the guard thread never sleeps infinitely.

MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


211881 27-Aug-2010 pjd

- Remove redundant and incorrect 'old' word from debug message.
- Log disconnects as warnings.

MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


211880 27-Aug-2010 pjd

Don't increase number synchronized bytes in case of an error.

MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


211879 27-Aug-2010 pjd

Log that synchronization was interrupted in a proper place.

MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


211878 27-Aug-2010 pjd

We have sync_start() function to start synchronization, introduce sync_stop()
function to stop it.

MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


211877 27-Aug-2010 pjd

Add QUEUE_INSERT() and QUEUE_TAKE() macros that simplify the code a bit.

MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


211876 27-Aug-2010 pjd

Add mtx_owned() implementation.

MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


211875 27-Aug-2010 pjd

Make comment more readable.

MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com


211452 18-Aug-2010 pjd

For some setups sending data in 128kB chunks makes communication very slow. No
idea why. 32kB on the other hand seems to work properly everywhere.

Reported by: Thomas Steen Rasmussen <thomas@gibfest.dk>
MFC after: 3 weeks


211407 16-Aug-2010 pjd

The 'size' variable is there to limit how many bytes we want to copy from
'addr'. It is very likely that size of 'addr' is larger than 'size', so checking
strlcpy() return value is bogus.

MFC after: 3 weeks


211397 16-Aug-2010 joel

Fix typos, spelling, formatting and mdoc mistakes found by Nobuyuki while
translating these manual pages. Minor corrections by me.

Submitted by: Nobuyuki Koganemaru <n-kogane@syd.odn.ne.jp>


210892 05-Aug-2010 pjd

Document 'none' value for remote.

Reviewed by: dougb
MFC after: 1 month


210886 05-Aug-2010 pjd

Implement configuration reload on SIGHUP. This includes:
- Load added resources.
- Stop and forget removed resources.
- Update modified resources in least intrusive way, ie. don't touch
/dev/hast/<name> unless path to local component or provider name were
modified.

Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
MFC after: 1 month


210883 05-Aug-2010 pjd

Prepare configuration parsing code to be called multiple times:
- Don't exit on errors if not requested.
- Don't keep configuration in global variable, but allocate memory for
configuration.
- Call yyrestart() before yyparse() so that on error in configuration file
we will start from the begining next time and not from the place we left of.

MFC after: 1 month


210882 05-Aug-2010 pjd

Make control_set_role() more public. We will need it soon.

MFC after: 1 month


210881 05-Aug-2010 pjd

Allow to use 'none' keywork as remote address in case second cluster node
is not setup yet.

MFC after: 1 month


210880 05-Aug-2010 pjd

Reset signal handlers after fork().

MFC after: 1 month


210879 05-Aug-2010 pjd

- Use pjdlog_exitx() to log errors and exit instead of errx().
- Use 'unable to' (instead of 'cannot') consistently.

MFC after: 1 month


210876 05-Aug-2010 pjd

Assert that various buffers we are large enough.

MFC after: 1 month


210875 05-Aug-2010 pjd

Problem with assertion is that it logs on stderr. Add two macros:
PJDLOG_ASSERT() and PJDLOG_VERIFY() that will check the given condition
and log the problem where appropriate. The difference between those
two is that PJDLOG_VERIFY() always work and PJDLOG_ASSERT() can be
turned off by defining NDEBUG.

MFC after: 1 month


210873 05-Aug-2010 pjd

Keep $FreeBSD$ in __FBSDID() only for C files.

MFC after: 1 month


210872 05-Aug-2010 pjd

Mark two more places that we won't reach.

MFC after: 1 month


210870 05-Aug-2010 pjd

Now that TCP will be checked last we don't need any knowledge about other
protocols.

MFC after: 1 month


210869 05-Aug-2010 pjd

Add an argument to the proto_register() function which allows protocol to
declare it is the default and be placed at the end of the queue so it is
checked last.

MFC after: 1 month


210702 31-Jul-2010 joel

Spelling fixes.


210368 22-Jul-2010 pjd

Actually, only the fullsync mode is implemented, not memsync mode.
Correct manual page.

MFC after: 3 days


209185 14-Jun-2010 pjd

Correct various log messages.

Submitted by: Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 3 days


209184 14-Jun-2010 pjd

Fix typos.

MFC after: 3 days


209183 14-Jun-2010 pjd

Initialize gctl_seq for synchronization requests.

Reported by: hiroshi@soupacific.com
Analysed by: Mikolaj Golub <to.my.trociny@gmail.com>
Tested by: hiroshi@soupacific.com, Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 3 days


209182 14-Jun-2010 pjd

Plug memory leak.

Found by: Coverity Prevent
CID: 7057
MFC after: 3 days


209181 14-Jun-2010 pjd

Plug memory leak.

Found by: Coverity Prevent
CID: 7056
MFC after: 3 days


209180 14-Jun-2010 pjd

Plug memory leak.

Found by: Coverity Prevent
CID: 7051
MFC after: 3 days


209179 14-Jun-2010 pjd

Plug memory leaks.

Found by: Coverity Prevent
CID: 7052, 7053, 7054, 7055
MFC after: 3 days


209177 14-Jun-2010 pjd

Remove macros that are not really needed. The idea was to have them in case
we grow more descriptors, but I'll reconsider readding them once we get there.

Passing (a = b) expression to FD_ISSET() is bad idea, as FD_ISSET() evaluates
its argument twice.

Found by: Coverity Prevent
CID: 5243
MFC after: 3 days


209175 14-Jun-2010 pjd

Eliminate dead code.

Found by: Coverity Prevent
CID: 5158
MFC after: 3 days


208028 13-May-2010 uqs

mdoc: move remaining sections into consistent order

This pertains mostly to FILES, HISTORY, EXIT STATUS and AUTHORS sections.

Found by: mdocml lint run
Reviewed by: ru


207390 29-Apr-2010 pjd

Default connection timeout is way too long. To make it shorter we have to
make socket non-blocking, connect() and if we get EINPROGRESS, we have to
wait using select(). Very complex, but I know no other way to define
connection timeout for a given socket.

Reported by: hiroshi@soupacific.com
MFC after: 3 days


207372 29-Apr-2010 pjd

- Check if the worker process was killed by signal and restart it.
- Improve logging.

Pointed out by: Garrett Cooper <yanefbsd@gmail.com>
MFC after: 3 days


207371 29-Apr-2010 pjd

Fix a problem where hastd will stuck in recv(2) after sending request to
secondary, which died between send(2) and recv(2). Do it by adding timeout
to recv(2) for primary incoming and outgoing sockets and secondary outgoing
socket.

Reported by: Mikolaj Golub <to.my.trociny@gmail.com>
Tested by: Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 3 days


207348 28-Apr-2010 pjd

Restart worker thread only if the problem was temporary.
In case of persistent problem we don't want to loop forever.

MFC after: 3 days


207347 28-Apr-2010 pjd

Mark temporary issues as such.

MFC after: 3 days


207345 28-Apr-2010 pjd

Use WEXITSTATUS() to obtain real exit code.

MFC after: 3 days


207343 28-Apr-2010 pjd

Don't assume that "resource" property is in metadata.

Reported by: Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 3 days


207070 22-Apr-2010 pjd

Fix compilation with WITHOUT_CRYPT or WITHOUT_OPENSSL options.

Reported by: Andrei V. Lavreniyuk <andy.lavr@reactor-xg.kiev.ua>
MFC after: 3 days


206697 16-Apr-2010 pjd

Fix log size calculation which caused message truncation.

Submitted by: Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 3 days


206696 16-Apr-2010 pjd

Fix control socket leak when worker process exits.

Submitted by: Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 3 days


206669 15-Apr-2010 pjd

Increase ggate queue size to maximum value.
HAST was not able to stand heavy random load.

Reported by: Hiroyuki Yamagami
MFC after: 3 days


205738 27-Mar-2010 pjd

Don't hold connection lock when doing reconnects as it makes I/Os wait for
connection timeouts.

Reported by: Kevin Day <toasty@dragondata.com>


204596 02-Mar-2010 uqs

Remove redundant WARNS?=6 overrides and inherit the WARNS setting from
the toplevel directory.

This does not change any WARNS level and survives a make universe.

Approved by: ed (co-mentor)


204352 26-Feb-2010 ru

Fixed static linkage.


204177 21-Feb-2010 pjd

Changing proto_socketpair.c compilation and linking order revealed
a problem - we should simply ignore proto_server() if address
doesn't start with socketpair://, and not abort.


204076 18-Feb-2010 pjd

Please welcome HAST - Highly Avalable Storage.

HAST allows to transparently store data on two physically separated machines
connected over the TCP/IP network. HAST works in Primary-Secondary
(Master-Backup, Master-Slave) configuration, which means that only one of the
cluster nodes can be active at any given time. Only Primary node is able to
handle I/O requests to HAST-managed devices. Currently HAST is limited to two
cluster nodes in total.

HAST operates on block level - it provides disk-like devices in /dev/hast/
directory for use by file systems and/or applications. Working on block level
makes it transparent for file systems and applications. There in no difference
between using HAST-provided device and raw disk, partition, etc. All of them
are just regular GEOM providers in FreeBSD.

For more information please consult hastd(8), hastctl(8) and hast.conf(5)
manual pages, as well as http://wiki.FreeBSD.org/HAST.

Sponsored by: FreeBSD Foundation
Sponsored by: OMCnet Internet Service GmbH
Sponsored by: TransIP BV