272461 |
03-Oct-2014 |
gjb |
Copy stable/10@r272459 to releng/10.1 as part of the 10.1-RELEASE process.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
|
272407 |
02-Oct-2014 |
hselasky |
MFC r272027:
Hardware driver update from Mellanox Technologies, including: - improved performance - better stability - new features - bugfixes
Supported HCAs: - ConnectX-2 - ConnectX-3 - ConnectX-3 Pro
NOTE: - TSO feature needs r271946, which is not yet merged.
Sponsored by: Mellanox Technologies Approved by: re, glebius
|
271127 |
04-Sep-2014 |
hselasky |
MFC r270710 and r270821: - Update the OFED Linux Emulation layer as a preparation for a hardware driver update from Mellanox Technologies. - Remove empty files from the OFED Linux Emulation layer. - Fix compile warnings related to printf() and the "%lld" and "%llx" format specifiers. - Add some missing 2-clause BSD copyrights. - Add "Mellanox Technologies, Ltd." to list of copyright holders. - Add some new compatibility files. - Fix order of uninit in the mlx4ib module to avoid crash at unload using the new module_exit_order() function.
Sponsored by: Mellanox Technologies
|
270166 |
19-Aug-2014 |
hselasky |
MFC r269859: Fix for memory leak.
Sponsored by: Mellanox Technologies
|
269862 |
12-Aug-2014 |
hselasky |
MFC r268316: Fix OFED startup order: All SYSINIT()'s and modules should be loaded prior to starting "/sbin/init" which will run all the "/etc/rc.d/xxx" scripts. Else there can be a race configuring the interfaces via "/etc/rc.conf".
Sponsored by: Mellanox Technologies
|
269861 |
12-Aug-2014 |
hselasky |
MFC r268315: Fix compile warning.
Sponsored by: Mellanox Technologies
|
269860 |
12-Aug-2014 |
hselasky |
MFC r268314: Fix some compile warnings.
Sponsored by: Mellanox Technologies
|
267517 |
15-Jun-2014 |
hselasky |
MFC r267395: - Fix out of range shifting bug in bitops.h. - Make code a bit easier to read by adding parenthesis.
|
261455 |
04-Feb-2014 |
eadler |
MFC r258779,r258780,r258787,r258822:
Fix undefined behavior: (1 << 31) is not defined as 1 is an int and this shifts into the sign bit. Instead use (1U << 31) which gets the expected result.
Similar to the (1 << 31) case it is not defined to do (2 << 30).
This fix is not ideal as it assumes a 32 bit int, but does fix the issue for most cases.
A similar change was made in OpenBSD.
|
260495 |
09-Jan-2014 |
dim |
MFC r260102:
Similar to r260020, only use -fms-extensions with gcc, for all other modules which require this flag to compile. Use a GCC_MS_EXTENSIONS variable, defined in kern.pre.mk, which can be used to easily supply the flag (or not), depending on the compiler type.
MFC r260322:
In addition to r260102, also define GCC_MS_EXTENSIONS in bsd.sys.mk, since kernel module builds do not use kern.pre.mk.
|
260321 |
05-Jan-2014 |
dim |
Revert MFC of r260102 for now, until I can merge the required fix from head. This should fix building modules which require -fms-extensions to compile them with gcc.
|
260268 |
04-Jan-2014 |
dim |
MFC r260020:
For sys/dev/drm2/radeon, only use -fms-extensions with gcc. This flag is only to stop gcc complaining about anonymous unions, which clang does not do. For clang 3.4 however, -fms-extensions enables the Microsoft __wchar_t type, which clashes with our own types.h.
MFC r260102:
Similar to r260020, only use -fms-extensions with gcc, for all other modules which require this flag to compile. Use a GCC_MS_EXTENSIONS variable, defined in kern.pre.mk, which can be used to easily supply the flag (or not), depending on the compiler type.
|
259608 |
19-Dec-2013 |
alfred |
Defer start/stop port to workqueues.
MFC: 259411
|
258280 |
17-Nov-2013 |
alfred |
MFC: 258276
Fix creating a vlan over lagg over mlxen crash.
PR: 181931 Submitted by: Shahar Klein (shahark mellanox.com)
Approved by: re
|
258242 |
17-Nov-2013 |
alfred |
MFC: 257542
Fix API mismatch exposed by lagg.
When destroying a lagg the driver tries to restore the old mac and fails due to API mismatch.
Submitted by: Shahar Klein (shahark at mellanox.com) Approved by: re
|
257867 |
08-Nov-2013 |
alfred |
MFC: r257862, r257863, r257864
r257862:
Use explicit long cast to avoid overflow in bitopts.
This was causing problems with the buddy allocator inside of ofed.
r257863:
Fix for bad performance when mtu is increased.
Update the auto moderation behavior in the mlxen driver to match the new LINUX OFED code.
r257864:
Do not use a sleep lock when protecting the driver flags.
This was causing a locking issue with lagg.
Approved by: re
|
256810 |
20-Oct-2013 |
alfred |
Fix resource free.
The order of releasing resources in mlxen was wrong, which caused panic on reload of the module.
MFC: 256682
Submitted by: Shahar Klein (shahark at mellanox.com) Approved by: re
|
256686 |
17-Oct-2013 |
alfred |
Fix __free_pages() in the linux shim.
__free_pages() is actaully supposed to take a "struct page *" not an address.
MFC: 256546
Approved by: re
|
256281 |
10-Oct-2013 |
gjb |
Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
|
256269 |
10-Oct-2013 |
alfred |
Fix for When more than one NIC is present.
The device name was incorrect due to a specific function we ported from the Linux driver that is not FBSD compatible. This resulted with a false sysctl registration and some more problematic issues.
The patch basically revokes it all together.
Submitted by: Meny Yossefi (menyy mellanox.com)
Approved by: re
|
256179 |
09-Oct-2013 |
dim |
Remove redundant declaration of cmclass in sys/ofed/drivers/infiniband/core/ucm.c, to silence a gcc warning.
Approved by: re (kib) X-MFC-With: r255932
|
256116 |
07-Oct-2013 |
dim |
Give an unnamed union in sys/ofed/include/rdma/ib_verbs.h a name, to silence a gcc warning.
Approved by: re (gjb) MFC after: 3 days
|
255973 |
01-Oct-2013 |
alfred |
Fixed kernel crash when running devinfo
When calling to ib_uverbs_cleanup_ucontext, there is a call to mutex_lock of xrcd_table_mutex, which was not initialized. Added missing initialization for xrcd_table_mutex.
Submitted by: Orit Moskovich (oritm mellanox.com)
Approved by: re
|
255972 |
01-Oct-2013 |
alfred |
Enable ib_dev.mmap function
Removed the ifdef linux from this function. Added stub function for contiguous pages to avoid compilation errors.
Submitted by: Orit Moskovich (oritm mellanox.com) Approved by: re
|
255970 |
01-Oct-2013 |
alfred |
Fixed 'Couldn't Create QP' issue when running rc_pingpong, uc_pingpong, srq_pingpong IBverbs
Removed refrences using 'ifdef __linux__' to qpg functions and related fields in struct ib_qp_init_attr.
Submitted by: Orit Moskovich (oritm mellanox.com)
Approved by: re
|
255969 |
01-Oct-2013 |
alfred |
Fixed kernel crash when removing IPOIB_CM option from configuration file
Changed module init from module_init() to module_init_order() with SI_ORDER_MIDDLE flag Submitted by: Orit Moskovich (oritm mellanox.com) Approved by: re
|
255968 |
01-Oct-2013 |
alfred |
Fix mis-merge of upstream fix.
We would accidentally make the string one byte too short.
Submitted by: Orit Moskovich (oritm mellanox.com)
Approved by: re
|
255932 |
29-Sep-2013 |
alfred |
Update OFED to Linux 3.7 and update Mellanox drivers.
Update the OFED Infiniband core to the version supplied in Linux version 3.7.
The update to OFED is nearly all additional defines and functions with the exception of the addition of additional parameters to ib_register_device() and the reg_user_mr callback.
In addition the ibcore (Infiniband core) and ipoib (IP over Infiniband) have both been made into completely loadable modules to facilitate testing of the OFED stack in FreeBSD.
Finally the Mellanox Infiniband drivers are now updated to the latest version shipping with Linux 3.7.
Submitted by: Mellanox FreeBSD driver team: Oded Shanoon (odeds mellanox.com), Meny Yossefi (menyy mellanox.com), Orit Moskovich (oritm mellanox.com)
Approved by: re
|
255240 |
05-Sep-2013 |
pjd |
Handle cases where capability rights are not provided.
Reported by: kib
|
254832 |
25-Aug-2013 |
andre |
Change m->pkthdr.header to m->pkthdr.PH_loc.ptr after r254804 to transiently store pointers to packet headers.
Sponsored by: The FreeBSD Foundation
|
254734 |
23-Aug-2013 |
np |
Fix implementation of sock_getname.
MFC after: 1 week
|
254576 |
20-Aug-2013 |
jhb |
Stop an ipoib interface before detaching it.
PR: kern/181225 Submitted by: Shahar Klein Obtained from: Mellanox MFC after: 1 week
|
254523 |
19-Aug-2013 |
andre |
Add m_clrprotoflags() to clear protocol specific mbuf flags at up and downwards layer crossings.
Consistently use it within IP, IPv6 and ethernet protocols.
Discussed with: trociny, glebius
|
254356 |
15-Aug-2013 |
glebius |
Make sendfile() a method in the struct fileops. Currently only vnode backed file descriptors have this method implemented.
Reviewed by: kib Sponsored by: Nginx, Inc. Sponsored by: Netflix
|
254122 |
09-Aug-2013 |
jeff |
- Reserve a special AF for SDP. The one we were incorrectly using before was taken by another AF.
Sponsored by: EMC / Isilon Storage Division
|
254121 |
09-Aug-2013 |
jeff |
- Correctly handle various edge cases in sysfs emulation.
Sponsored by: EMC / Isilon Storage Division
|
254120 |
09-Aug-2013 |
jeff |
- Use the correct type in the linux bitops emulation.
Submitted by: Maxim Ignatenko <gelraen.ua@gmail.com>
|
254065 |
07-Aug-2013 |
kib |
Split the pagequeues per NUMA domains, and split pageademon process into threads each processing queue in a single domain. The structure of the pagedaemons and queues is kept intact, most of the changes come from the need for code to find an owning page queue for given page, calculated from the segment containing the page.
The tie between NUMA domain and pagedaemon thread/pagequeue split is rather arbitrary, the multithreaded daemon could be allowed for the single-domain machines, or one domain might be split into several page domains, to further increase concurrency.
Right now, each pagedaemon thread tries to reach the global target, precalculated at the start of the pass. This is not optimal, since it could cause excessive page deactivation and freeing. The code should be changed to re-check the global page deficit state in the loop after some number of iterations.
The pagedaemons reach the quorum before starting the OOM, since one thread inability to meet the target is normal for split queues. Only when all pagedaemons fail to produce enough reusable pages, OOM is started by single selected thread.
Launder is modified to take into account the segments layout with regard to the region for which cleaning is performed.
Based on the preliminary patch by jeff, sponsored by EMC / Isilon Storage Division.
Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation
|
254025 |
07-Aug-2013 |
jeff |
Replace kernel virtual address space allocation with vmem. This provides transparent layering and better fragmentation.
- Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem.
Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division
|
253785 |
29-Jul-2013 |
jhb |
Add a missing prototype.
Pointy hat: me
|
253774 |
29-Jul-2013 |
jhb |
Various fixes to the mlxen(4) driver: - Remove an incorrect assertion that can trigger when downing an interface. - Stop the interface during detach to avoid panics when unloading the driver. - A few locking fixes to be more consistent with other FreeBSD drivers: - Protect if_drv_flags with the driver lock, not atomic ops - Hold the driver lock when adjusting multicast state. - Hold the driver lock while adjusting if_capenable.
PR: kern/180791 [1,2] Submitted by: Shakar Klein @ Mellanox [1,2] MFC after: 3 days
|
253653 |
25-Jul-2013 |
jhb |
Avoid trashing IP fragments: - Only enable UDP/TCP hardware checksums if CSUM_UDP or CSUM_TCP is set. - Only enable IP hardware checksums if CSUM_IP is set.
PR: kern/180430 Submitted by: Meny Yossefi <menyy@mellanox.com> MFC after: 1 week
|
253604 |
24-Jul-2013 |
avg |
rename scheduler->swapper and SI_SUB_RUN_SCHEDULER->SI_SUB_LAST
Also directly call swapper() at the end of mi_startup instead of relying on swapper being the last thing in sysinits order.
Rationale:
- "RUN_SCHEDULER" was misleading, scheduling already takes place at that stage - "scheduler" was misleading, the function swaps in the swapped out processes - another SYSINIT(SI_SUB_RUN_SCHEDULER, SI_ORDER_ANY) could never be invoked depending on its relative order with scheduler; this was not obvious and the bug actually used to exist
Reviewed by: kib (ealier version) MFC after: 14 days
|
253449 |
18-Jul-2013 |
jhb |
Rework the previous fix for the IB vs Ethernet sysctl handler to be more generic and apply to all sysfs attributes: - Use sysctl_handle_string() instead of reimplementing it. - Remove trailing newline from the current value before passing it to userland and append a newline to the new string value before passing it to the attribute's store function. - Don't leak the temporary buffer if the first error check triggers. - Revert earlier change to mlx4 port mode handler.
PR: kern/174213 Submitted by: Garrett Cooper Reviewed by: Shakar Klein @ Mellanox MFC after: 1 week
|
253423 |
17-Jul-2013 |
jhb |
Remove check forbidding requests that would result in one port being set to Ethernet and the subsequent port being set to IB.
Submitted by: Shakar Klein @ Mellanox Tested by: Morgan Robertson <morganrobertson@gmail.com> MFC after: 1 week
|
253048 |
08-Jul-2013 |
jhb |
Allow mlx4 devices to switch from Ethernet to Infiniband (and vice versa): - Fix sysctl wrapper for sysfs attributes to properly handle new string values similar to sysctl_handle_string() (only copyin the user's supplied length and nul-terminate the string). - Don't check for a trailing newline when evaluating the desired operating mode of a mlx4 device.
PR: kern/179999 Submitted by: Shahar Klein <shahark@mellanox.com> MFC after: 1 week
|
251617 |
11-Jun-2013 |
jhb |
Store a reference to the vnode associated with a file descriptor in the linux_file structure and use it instead of directly accessing td_fpop when destroying the linux_file structure. The td_fpop pointer is not valid when a cdevpriv destructor is run, and the type-specific close method has already been called, so f_vnode may not be valid (and the vnode might have been recycled without our own reference).
Tested by: Julian Stecklina <jsteckli@os.inf.tu-dresden.de> MFC after: 1 week
|
250460 |
10-May-2013 |
eadler |
Fxi a bunch of typos.
PR: misc/174625 Submitted by: Jeremy Chadwick <jdc@koitsu.org>
|
250374 |
08-May-2013 |
delphij |
According to the documentation, on Linux, cancel_delayed_work() does not do drain (flush_workqueue() in Linux terms) but instead returns true if the work was removed before it is run, or false otherwise.
Simulate this by removing the taskqueue_drain() and return the value derived from taskqueue_cancel()'s return value.
This would solve a witness warning caused by calling taskqueue_drain() with a non-sleepable lock held, like:
taskqueue_drain with the following non-sleepable locks held: exclusive rw lle (lle) r = 0 (0xfffffe001450b410) locked @ /usr/src/sys/netinet/in.c:1484 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffff848d4f7690 kdb_backtrace() at kdb_backtrace+0x39/frame 0xffffff848d4f7740 witness_warn() at witness_warn+0x4a8/frame 0xffffff848d4f7800 taskqueue_drain() at taskqueue_drain+0x3a/frame 0xffffff848d4f7840 set_timeout() at set_timeout+0x4a/frame 0xffffff848d4f7860 netevent_callback() at netevent_callback+0x16/frame 0xffffff848d4f7870 arpintr() at arpintr+0x9b5/frame 0xffffff848d4f7930
This do not affect kernel without OFED compiled in.
Reported by: Garrett Cooper <yaneurabeya gmail com> (who also tested an earlier version of this patch, but bugs are mine) MFC after: 2 weeks
|
249976 |
27-Apr-2013 |
glebius |
Add const qualifier to the dst parameter of the ifnet if_output method.
|
249066 |
03-Apr-2013 |
jhb |
Check for SS_NBIO in the socket state field rather than socket buffer flags.
Submitted by: Vijay Singh MFC after: 1 week
|
248084 |
09-Mar-2013 |
attilio |
Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes.
The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs.
The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example).
Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho
|
247675 |
02-Mar-2013 |
mav |
Add protective parentheses for macro argument, missed in r247671.
|
247671 |
02-Mar-2013 |
mav |
MFcalloutng: Give OFED Linux wrapper own "expires" field instead of abusing callout's c_time, which will change its type and units with calloutng commit.
|
247602 |
02-Mar-2013 |
pjd |
Merge Capsicum overhaul:
- Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights.
- The cap_new(2) system call is left, but it is no longer documented and should not be used in new code.
- The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one.
- The cap_getrights(2) syscall is renamed to cap_rights_get(2).
- If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall.
- If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2).
- To support ioctl and fcntl white-listing the filedesc structure was heavly modified.
- The audit subsystem, kdump and procstat tools were updated to recognize new syscalls.
- Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below:
CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT.
Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2).
Added CAP_SYMLINKAT: - Allow for symlinkat(2).
Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2).
Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory.
Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall.
Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call.
Removed CAP_MAPEXEC.
CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE.
Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ | PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ | PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE | PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC).
Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT.
CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required).
CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required).
Added convinient defines:
#define CAP_PREAD (CAP_SEEK | CAP_READ) #define CAP_PWRITE (CAP_SEEK | CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP | CAP_SEEK | CAP_READ) #define CAP_MMAP_W (CAP_MMAP | CAP_SEEK | CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R | CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R | CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W | CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE
#define CAP_SOCK_CLIENT \ (CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \ CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \ CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \ CAP_SETSOCKOPT | CAP_SHUTDOWN)
Added defines for backward API compatibility:
#define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT | CAP_SOCK_SERVER)
Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib
|
246581 |
09-Feb-2013 |
delphij |
Fix LINT build on amd64.
|
246482 |
07-Feb-2013 |
rrs |
This fixes a out-of-order problem with several of the newer drivers. The basic problem was that the driver was pulling the mbuf off the drbr ring and then when sending with xmit(), encounting a full transmit ring. Thus the lower layer xmit() function would return an error, and the drivers would then append the data back on to the ring. For TCP this is a horrible scenario sure to bring on a fast-retransmit.
The fix is to use drbr_peek() to pull the data pointer but not remove it from the ring. If it fails then we either call the new drbr_putback or drbr_advance method. Advance moves it forward (we do this sometimes when the xmit() function frees the mbuf). When we succeed we always call advance. The putback will always copy the mbuf back to the top of the ring. Note that the putback *cannot* be used with a drbr_dequeue() only with drbr_peek(). We most of the time, in putback, would not need to copy it back since most likey the mbuf is still the same, but sometimes xmit() functions will change the mbuf via a pullup or other call. So the optimial case for the single consumer is to always copy it back. If we ever do a multiple_consumer (for lagg?) we will need a test and atomic in the put back possibly a seperate putback_mc() in the ring buf.
Reviewed by: jhb@freebsd.org, jlv@freebsd.org
|
243882 |
05-Dec-2012 |
glebius |
Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys.
Exceptions:
- sys/contrib not touched - sys/mbuf.h edited manually
|
242933 |
12-Nov-2012 |
dim |
Redo r242842, now actually fixing the warnings, as follows: - In sys/ofed/drivers/infiniband/core/cma.c, an enum struct member is interpreted as an int, so cast it to an int. - In sys/ofed/drivers/infiniband/core/ud_header.c, initialize the packet_length variable in ib_ud_header_init(), to prevent undefined behaviour. - In sys/ofed/drivers/infiniband/ulp/sdp/sdp_rx.c, call rdma_notify() with the correct enum type and value. - In sys/ofed/include/linux/pci.h, change the PCI_DEVICE and PCI_VDEVICE macros to use C99 struct initializers, so additional members can be overridden.
Reviewed by: delphij, Garrett Cooper <yanegomi@gmail.com> MFC after: 1 week
|
242841 |
09-Nov-2012 |
delphij |
Use %s when calling make_dev with a string pointer. This makes clang happy.
MFC after: 2 weeks
|
241844 |
22-Oct-2012 |
eadler |
remove duplicate semicolons where possible.
Approved by: cperciva MFC after: 1 week
|
241697 |
18-Oct-2012 |
jhb |
Take advantage of if_baudrate_pf and calculate an effective baud rate on all platforms (not just amd64) to compute an equivalent IB rate.
|
241696 |
18-Oct-2012 |
jhb |
Use if_initbaudrate().
|
241037 |
28-Sep-2012 |
glebius |
The drbr(9) API appeared to be so unclear, that most drivers in tree used it incorrectly, which lead to inaccurate overrated if_obytes accounting. The drbr(9) used to update ifnet stats on drbr_enqueue(), which is not accurate since enqueuing doesn't imply successful processing by driver. Dequeuing neither mean that. Most drivers also called drbr_stats_update() which did accounting again, leading to doubled if_obytes statistics. And in case of severe transmitting, when a packet could be several times enqueued and dequeued it could have been accounted several times.
o Thus, make drbr(9) API thinner. Now drbr(9) merely chooses between ALTQ queueing or buf_ring(9) queueing. - It doesn't touch the buf_ring stats any more. - It doesn't touch ifnet stats anymore. - drbr_stats_update() no longer exists.
o buf_ring(9) handles its stats itself: - It handles br_drops itself. - br_prod_bytes stats are dropped. Rationale: no one ever reads them but update of a common counter on every packet negatively affects performance due to excessive cache invalidation. - buf_ring_enqueue_bytes() reduced to buf_ring_enqueue(), since we no longer account bytes.
o Drivers handle their stats theirselves: if_obytes, if_omcasts.
o mlx4(4), igb(4), em(4), vxge(4), oce(4) and ixv(4) no longer use drbr_stats_update(), and update ifnet stats theirselves.
o bxe(4) was the most correct driver, it didn't call drbr_stats_update(), thus it was the only driver accurate under moderate load. Now it also maintains stats itself.
o ixgbe(4) had already taken stats from hardware, so just - drop software stats updating. - take multicast packet count from hardware as well.
o mxge(4) just no longer needs NO_SLOW_STATS define.
o cxgb(4), cxgbe(4) need no change, since they obtain stats from hardware.
Reviewed by: jfv, gnn
|
240680 |
18-Sep-2012 |
gavin |
Align the PCI Express #defines with the style used for the PCI-X #defines. This also has the advantage that it makes the names more compact, iand also allows us to correct the non-uniform naming of the PCIM_LINK_* defines, making them all consistent amongst themselves.
This is a mostly mechanical rename: s/PCIR_EXPRESS_/PCIER_/g s/PCIM_EXP_/PCIEM_/g s/PCIM_LINK_/PCIEM_LINK_/g
When this is MFC'd, #defines will be added for the old names to assist out-of-tree drivers.
Discussed with: jhb MFC after: 1 week
|
240082 |
04-Sep-2012 |
melifaro |
Remove unneeded ipfw headers introduced in r213447 from Infiniband code.
MFC after: 2 weeks
|
239303 |
15-Aug-2012 |
hselasky |
Streamline use of cdevpriv and correct some corner cases.
1) It is not useful to call "devfs_clear_cdevpriv()" from "d_close" callbacks, hence for example read, write, ioctl and so on might be sleeping at the time of "d_close" being called and then then freed private data can still be accessed. Examples: dtrace, linux_compat, ksyms (all fixed by this patch)
2) In sys/dev/drm* there are some cases in which memory will be freed twice, if open fails, first by code in the open routine, secondly by the cdevpriv destructor. Move registration of the cdevpriv to the end of the drm open routines.
3) devfs_clear_cdevpriv() is not called if the "d_open" callback registered cdevpriv data and the "d_open" callback function returned an error. Fix this.
Discussed with: phk MFC after: 2 weeks
|
239065 |
05-Aug-2012 |
kib |
After the PHYS_TO_VM_PAGE() function was de-inlined, the main reason to pull vm_param.h was removed. Other big dependency of vm_page.h on vm_param.h are PA_LOCK* definitions, which are only needed for in-kernel code, because modules use KBI-safe functions to lock the pages.
Stop including vm_param.h into vm_page.h. Include vm_param.h explicitely for the kernel code which needs it.
Suggested and reviewed by: alc MFC after: 2 weeks
|
237563 |
25-Jun-2012 |
np |
Fix clang warning when compiling iw_cxgb.
Reported by: rene, dim
|
237263 |
19-Jun-2012 |
np |
- Updated TOE support in the kernel.
- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs. These are available as t3_tom and t4_tom modules that augment cxgb(4) and cxgbe(4) respectively. The cxgb/cxgbe drivers continue to work as usual with or without these extra features.
- iWARP driver for Terminator 3 ASIC (kernel verbs). T4 iWARP in the works and will follow soon.
Build-tested with make universe.
30s overview ============ What interfaces support TCP offload? Look for TOE4 and/or TOE6 in the capabilities of an interface: # ifconfig -m | grep TOE
Enable/disable TCP offload on an interface (just like any other ifnet capability): # ifconfig cxgbe0 toe # ifconfig cxgbe0 -toe
Which connections are offloaded? Look for toe4 and/or toe6 in the output of netstat and sockstat: # netstat -np tcp | grep toe # sockstat -46c | grep toe
Reviewed by: bz, gnn Sponsored by: Chelsio communications. MFC after: ~3 months (after 9.1, and after ensuring MFC is feasible)
|
234946 |
03-May-2012 |
melifaro |
Revert r234834 per luigi@ request.
Cleaner solution (e.g. adding another header) should be done here.
Original log: Move several enums and structures required for L2 filtering from ip_fw_private.h to ip_fw.h. Remove ipfw/ip_fw_private.h header from non-ipfw code.
Requested by: luigi Approved by: kib(mentor)
|
234834 |
30-Apr-2012 |
melifaro |
Move several enums and structures required for L2 filtering from ip_fw_private.h to ip_fw.h. Remove ipfw/ip_fw_private.h header from non-ipfw code.
Approved by: ae(mentor) MFC after: 2 weeks
|
234618 |
23-Apr-2012 |
bz |
Do not announce IPv6 TSO support yet. The driver seems to make assumptions based on IPv4 header parsing only.
MFC after: 1 week
|
234183 |
12-Apr-2012 |
jhb |
Add OFED and the associated options and drivers to x86 LINT builds: - Mark 'sdp' as requiring 'inet'. - Always include "opt_inet.h" and "opt_inet6.h" and modify the IB driver Makefiles to honor WITH/WITHOUT_INET/INET6/_SUPPORT options to determine what should be enabled during a module build. - Fix the mlxen(4) driver and the core IB code to compile without if INET is disabled (including when both INET and INET6 are disabled).
Reviewed by: bz MFC after: 2 weeks
|
234182 |
12-Apr-2012 |
jhb |
Don't update if_obytes when transmitting packets. That is already done in IFQ_HANDOFF() when the packet is passed to the start routine, so doing it here resulted in double counting.
Reported by: Alex Tutubalin lexa lexa ru MFC after: 1 week
|
234099 |
10-Apr-2012 |
jhb |
Properly parse 40G media types from newer Mellanox adapters that are 40G capable. For now, map all 40G links to 40GBase-CR4.
MFC after: 2 weeks
|
233870 |
04-Apr-2012 |
jhb |
Fix build on i386.
|
233547 |
27-Mar-2012 |
jhb |
Use VM_MEMATTR_UNCACHEABLE instead of VM_MEMATTR_UNCACHED for UC mappings. VM_MEMATTR_UNCACHED is actually the x86-specific UC- mode (where a WC MTRR can override the PAT setting).
|
233198 |
19-Mar-2012 |
jhb |
Fix build of OFED bits with debugging options enabled.
|
233040 |
16-Mar-2012 |
jhb |
Fix build with INET6 disabled.
|
230135 |
15-Jan-2012 |
uqs |
Remove spurious 8bit chars, turning files into plain ASCII.
|
228469 |
13-Dec-2011 |
ed |
Replace __signed by signed.
The signed keyword is an integral part of the C syntax. There's no need to use __signed.
|
228443 |
12-Dec-2011 |
mdf |
Do not define bool/true/false if the symbols already exist.
MFC after: 2 weeks Sponsored by: Isilon Systems, LLC
|
227309 |
07-Nov-2011 |
ed |
Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
|
227293 |
07-Nov-2011 |
ed |
Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.
This means that their use is restricted to a single C file.
|
226436 |
16-Oct-2011 |
eadler |
- change "is is" to "is" or "it is" - change "the the" to "the"
Approved by: lstewart Approved by: sahil (mentor) MFC after: 3 days
|
224914 |
16-Aug-2011 |
kib |
Add the fo_chown and fo_chmod methods to struct fileops and use them to implement fchown(2) and fchmod(2) support for several file types that previously lacked it. Add MAC entries for chown/chmod done on posix shared memory and (old) in-kernel posix semaphores.
Based on the submission by: glebius Reviewed by: rwatson Approved by: re (bz)
|
222813 |
07-Jun-2011 |
attilio |
etire the cpumask_t type and replace it with cpuset_t usage.
This is intended to fix the bug where cpu mask objects are capped to 32. MAXCPU, then, can now arbitrarely bumped to whatever value. Anyway, as long as several structures in the kernel are statically allocated and sized as MAXCPU, it is suggested to keep it as low as possible for the time being.
Technical notes on this commit itself: - More functions to handle with cpuset_t objects are introduced. The most notable are cpusetobj_ffs() (which calculates a ffs(3) for a cpuset_t object), cpusetobj_strprint() (which prepares a string representing a cpuset_t object) and cpusetobj_strscan() (which creates a valid cpuset_t starting from a string representation). - pc_cpumask and pc_other_cpus are target to be removed soon. With the moving from cpumask_t to cpuset_t they are now inefficient and not really useful. Anyway, for the time being, please note that access to pcpu datas is protected by sched_pin() in order to avoid migrating the CPU while reading more than one (possible) word - Please note that size of cpuset_t objects may differ between kernel and userland. While this is not directly related to the patch itself, it is good to understand that concept and possibly use the patch as a reference on how to deal with cpuset_t objects in userland, when accessing kernland members. - KTR_CPUMASK is changed and now is represented through a string, to be set as the example reported in NOTES.
Please additively note that no MAXCPU is bumped in this patch, but private testing has been done until to MAXCPU=128 on a real 8x8x2(htt) machine (amd64).
Please note that the FreeBSD version is not yet bumped because of the upcoming pcpu changes. However, note that this patch is not targeted for MFC.
People to thank for the time spent on this patch: - sbruno, pluknet and Nicholas Esborn (nick AT desert DOT net) tested several revision of the patches and really helped in improving stability of this work. - marius fixed several bugs in the sparc64 implementation and reviewed patches related to ktr. - jeff and jhb discussed the basic approach followed. - kib and marcel made targeted review on some specific part of the patch. - marius, art, nwhitehorn and andreast reviewed MD specific part of the patch. - marius, andreast, gonzo, nwhitehorn and jceel tested MD specific implementations of the patch. - Other people have made contributions on other patches that have been already committed and have been listed separately.
Companies that should be mentioned for having participated at several degrees: - Yahoo! for having offered the machines used for testing on big count of CPUs. - The FreeBSD Foundation for having sponsored my devsummit attendance, which has been instrumental. - Sandvine for having offered offices and infrastructure during development.
(I really hope I didn't forget anyone, if it happened I apologize in advance).
|
222330 |
26-May-2011 |
delphij |
In ipoib_cm_handle_rx_wc(): Count incoming packets and bytes toward incoming counters.
Reviewed by: jeff
|
221055 |
26-Apr-2011 |
jeff |
- Catch up to falloc() changes. - PHOLD() before using a task structure on the stack. - Fix a LOR between the sleepq lock and thread lock in _intr_drain().
|
220555 |
12-Apr-2011 |
bz |
Even though this block is not compiled currently, properly assign CSUM_TSO to if_hwassist rather than if_capabilities to avoid future errors.
Reviewed by: jeff
|
220016 |
26-Mar-2011 |
jeff |
- Implement wake-on-lan support in mlxen.
|
219902 |
23-Mar-2011 |
jhb |
Do a sweep of the tree replacing calls to pci_find_extcap() with calls to pci_find_cap() instead.
|
219893 |
23-Mar-2011 |
jeff |
- Correct the vlan filter programming. The device filter is built in reverse order. - Name the cq taskqueues according to whether they handle rx or tx. - Default LRO to on.
|
219859 |
22-Mar-2011 |
jeff |
- Don't use a separate set of rx queues for UDP, hash them into the same set as TCP. - Eliminate the fully linear non-scatter/gather rx path, there is no harm in using arrays of clusters for both TCP and UDP. - Implement support for enabling/disabling per-vlan priority pause and queues via sysctl.
|
219846 |
21-Mar-2011 |
kib |
Allow the ofed modules to be compiled on i386.
Reviewed by: jeff
|
219820 |
21-Mar-2011 |
jeff |
- Merge in OFED 1.5.3 from projects/ofed/head
|