linux

Commit Graph

Author	SHA1	Message	Date
kbuild test robot	9886ce2b9d	net: encx24j600_exit() can be static Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 04:02:43 -07:00
Jon Ringle	04fbfce7a2	net: Microchip encx24j600 driver This ethernet driver supports the Micorchip enc424j600/626j600 Ethernet controller over a SPI bus interface. This driver makes use of the regmap API to optimize access to registers by caching registers where possible. Datasheet: http://ww1.microchip.com/downloads/en/DeviceDoc/39935b.pdf Signed-off-by: Jon Ringle <jringle@gridpoint.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 04:02:41 -07:00
Jon Ringle	7741c373cf	regmap: Allow installing custom reg_update_bits function This commit allows installing a custom reg_update_bits function for cases where the hardware provides a mechanism to set or clear register bits without a read/modify/write cycle. Such is the case with the Microchip ENCX24J600. Signed-off-by: Jon Ringle <jringle@gridpoint.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 04:02:40 -07:00
Govindarajulu Varadarajan	937317c7c1	enic: do hang reset only in case of tx timeout The current code invokes hang reset in case of error interrupt. We should hang reset only in case of tx timeout. This because of the way hang reset is implemented in firmware. Hang reset takes more firmware resources than soft reset. Adaptor does not generate error interrupt in case of tx timeout. Hang reset only in case of tx timeout, in .ndo_tx_timeout. Do soft reset otherwise. Introduce deferred work, enic_tx_hang_reset, to do hang reset. Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:51:35 -07:00
Govindarajulu Varadarajan	cc809237e1	enic: handle spurious error interrupt Some of the enic adaptors are know to generate spurious interrupts. When error interrupt is generated, driver just resets the device. This patch resets the device only when an error is occurred. Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:51:33 -07:00
David S. Miller	2905f5bb1c	Merge branch 'cxgb4-next' Hariprasad Shenai says: ==================== cxgb4: Trivial fixes for cxgb4 Fixes the following issues Don't read non existent T4/T5/T6 adapter registers for ethtool dump. For T4, dont read mailbox control registers. Adds new devlog faility and report correct link speed for unsupported ones. This patch series has been created against net-next tree and includes patches on cxgb4 driver. We have included all the maintainers of respective drivers. Kindly review the change and let us know in case of any review comments. ==================== Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:48:45 -07:00
Hariprasad Shenai	85412255ef	cxgb4: Report correct link speed for unsupported ones When we get garbage from the firmware with weird Port Speeds, etc. we should emit a warning regarding unsupported speeds rather than use the bogus default of "10Mbps" which isn't even an option in the firmware Port Information message Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:48:41 -07:00
Hariprasad Shenai	da4976e17b	cxgb4: Adds a new Device Log Facility FW_DEVLOG_FACILITY_CF The firmware team added a new Device Log Facility FW_DEVLOG_FACILITY_CF, but the driver has been decoding Device Log messages with that Facility as "(NULL)", fixing it. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:48:41 -07:00
Hariprasad Shenai	b3695540ba	cxgb4: For T4, don't read the Firmware Mailbox Control register T4 doesn't have the Shadow copy of the register which we can read without side effect. So don't read mbox control register for T4 adapter Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:48:40 -07:00
Hariprasad Shenai	8119c01800	cxgb4 : Update T4/T5/T6 register ranges Update T4/T5/T6 adapter register ranges so that it doesn't read non existent registers when dumped using ethtool Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:48:39 -07:00
David S. Miller	40e106801e	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/net-next Eric W. Biederman says: ==================== net: Pass net through ip fragmention This is the next installment of my work to pass struct net through the output path so the code does not need to guess how to figure out which network namespace it is in, and ultimately routes can have output devices in another network namespace. This round focuses on passing net through ip fragmentation which we seem to call from about everywhere. That is the main ip output paths, the bridge netfilter code, and openvswitch. This has to happend at once accross the tree as function pointers are involved. First some prep work is done, then ipv4 and ipv6 are converted and then temporary helper functions are removed. ==================== Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:39:31 -07:00
David S. Miller	7e2832f17f	Merge branch 'rds-perf' Sowmini Varadhan says: ==================== RDS: RDS-TCP perf enhancements A 3-part patchset that (a) improves current RDS-TCP perf by 2X-3X and (b) refactors earlier robustness code for better observability/scaling. Patch 1 is an enhancment of earlier robustness fixes that had used separate sockets for client and server endpoints to resolve race conditions. It is possible to have an equivalent solution that does not use 2 sockets. The benefit of a single socket solution is that it results in more predictable and observable behavior for the underlying TCP pipe of an RDS connection Patches 2 and 3 are simple, straightforward perf bug fixes that align the RDS TCP socket with other parts of the kernel stack. v2: fix kbuild-test-robot warnings, comments from Sergei Shtylov and Santosh Shilimkar. ==================== Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:35:29 -07:00
Sowmini Varadhan	76b29ef120	RDS-TCP: Set up MSG_MORE and MSG_SENDPAGE_NOTLAST as appropriate in rds_tcp_xmit For the same reasons as commit `2f53384424` ("tcp: allow splice() to build full TSO packets") and commit `35f9c09fe9` ("tcp: tcp_sendpages() should call tcp_push() once"), rds_tcp_xmit may have multiple pages to send, so use the MSG_MORE and MSG_SENDPAGE_NOTLAST as hints to tcp_sendpage() Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:34:53 -07:00
Sowmini Varadhan	1edd6a14d2	RDS-TCP: Do not bloat sndbuf/rcvbuf in rds_tcp_tune Using the value of RDS_TCP_DEFAULT_BUFSIZE (128K) clobbers efficient use of TSO because it inflates the size_goal that is computed in tcp_sendmsg/tcp_sendpage and skews packet latency, and the default values for these parameters actually results in significantly better performance. In request-response tests using rds-stress with a packet size of 100K with 16 threads (test parameters -q 100000 -a 256 -t16 -d16) between a single pair of IP addresses achieves a throughput of 6-8 Gbps. Without this patch, throughput maxes at 2-3 Gbps under equivalent conditions on these platforms. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:34:53 -07:00
Sowmini Varadhan	3b20fc3897	RDS: Use a single TCP socket for both send and receive. Commit `f711a6ae06` ("net/rds: RDS-TCP: Always create a new rds_sock for an incoming connection.") modified rds-tcp so that an incoming SYN would ignore an existing "client" TCP connection which had the local port set to the transient port. The motivation for ignoring the existing "client" connection in `f711a6ae` was to avoid race conditions and an endless duel of reconnect attempts triggered by a restart/abort of one of the nodes in the TCP connection. However, having separate sockets for active and passive sides is avoidable, and the simpler model of a single TCP socket for both send and receives of all RDS connections associated with that tcp socket makes for easier observability. We avoid the race conditions from `f711a6ae` by attempting reconnects in rds_conn_shutdown if, and only if, the (new) c_outgoing bit is set for RDS_TRANS_TCP. The c_outgoing bit is initialized in __rds_conn_create(). A side-effect of re-using the client rds_connection for an incoming SYN is the potential of encountering duelling SYNs, i.e., we have an outgoing RDS_CONN_CONNECTING socket when we get the incoming SYN. The logic to arbitrate this criss-crossing SYN exchange in rds_tcp_accept_one() has been modified to emulate the BGP state machine: the smaller IP address should back off from the connection attempt. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:34:51 -07:00
David S. Miller	393159e917	Merge branch 'xgbe-next' Tom Lendacky says: ==================== amd-xgbe: AMD XGBE driver updates 2015-09-30 The following patches are included in this driver update series: - Remove unneeded semi-colon - Follow the DT/ACPI precedence used by the device_ APIs - Add ethtool support for getting and setting the msglevel - Add ethtool support error and debug messages - Simplify the hardware FIFO assignment calculations - Add receive buffer unavailable statistic - Use the device workqueue instead of the system workqueue - Remove the use of a link state bit This patch series is based on net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:23:40 -07:00
Lendacky, Thomas	50789845cf	amd-xgbe: Remove the XGBE_LINK state bit The XGBE_LINK bit is used just to determine whether to call the netif_carrier_on/off functions. Rather than define and use this bit, just call the functions. The netif_carrier_ok function can be used in place of checking the XGBE_LINK bit in the future. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:23:27 -07:00
Lendacky, Thomas	afb43e8a0a	amd-xgbe: Use device workqueue instead of system workqueue The driver creates, flushes and destroys a device workqueue but queues work to the system workqueue. Switch from using the system workqueue to the device workqueue. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:23:26 -07:00
Lendacky, Thomas	72c9ac4e1f	amd-xgbe: Add receive buffer unavailable statistic Add a statistic that tracks how many times an interrupt is generated for a receive buffer not being available to the hardware which prevents the hardware from being able to DMA the received data. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:23:26 -07:00
Lendacky, Thomas	9c439e4b73	amd-xgbe: Simplify calculation and setting of queue fifos The calculation of the Tx and Rx fifo sizes can be calculated rather than hardcoded in a switch statement. Additionally, the per-queue fifo sizes can be calculated rather than hardcoded using if/else if statements that can possibly underutilize the available fifo area. Change the code to calculate the fifo sizes and the per-queue fifo sizes to simplify the code and make best use of the available fifo. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:23:25 -07:00
Lendacky, Thomas	e5dd8b8110	amd-xgbe: Add ethtool error and debug messages Add error and dynamic debug messages to various ethtool functions in the driver while also removing the DBGPR debug print calls. Also, change the message level for some error messages from alert to err. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:23:25 -07:00
Lendacky, Thomas	349fb2d700	amd-xgbe: Add ethtool support for setting the msglevel Provide the ethtool functions to support getting and setting the msglevel for the driver. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:23:23 -07:00
Lendacky, Thomas	47f2e6c275	amd-xgbe: Use proper DT / ACPI precedence checking Device tree presence takes precedence over ACPI in the device_* APIs. The amd-xgbe driver should follow the same precedence. Update the check on whether to use DT / ACPI to follow this. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:23:22 -07:00
Lendacky, Thomas	3947d78a54	amd-xgbe: Remove an unneeded semicolon on a switch statement Remove an unneeded semicolon at the end of a switch statement block. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:23:22 -07:00
Eric Dumazet	ac8cfc7bb8	tcp: restore fastopen operations I accidentally cleared fastopenq.max_qlen in reqsk_queue_alloc() while max_qlen can be set before listen() is called, using TCP_FASTOPEN socket option for example. Fixes: `0536fcc039` ("tcp: prepare fastopen code for upcoming listener changes") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:19:06 -07:00
David S. Miller	77946de51b	Merge branch 'net-y2038' Arnd Bergmann says: ==================== net: assorted y2038 changes This is a set of changes for network drivers and core code to get rid of the use of time_t and derived data structures. I have a longer set of patches that enables me to build kernels with the time_t definition removed completely as a help to find y2038 overflow issues. This is the subset for networking that contains all code that has a reasonable way of fixing at the moment and that is either commonly used (in one of the defconfigs) or that blocks building a whole subsystem. Most of the patches in this series should be noncontroversial, but the last two that I marked [RFC] are a bit tricky and need input from people that are more familiar with the code than I am. All 12 patches are independent of one another and can be applied in any order, so feel free to pick all that look good. Patches that are not included here are: - disabling less common device drivers that I don't have a fix for yet, this includes drivers/net/ethernet/brocade/bna/bfa_ioc.c drivers/net/ethernet/qlogic/netxen/netxen_nic_hw.c drivers/net/ethernet/tile/tilegx.c drivers/net/hamradio/baycom_ser_fdx.c drivers/net/wireless/ath/ath10k/core.h drivers/net/wireless/ath/ath9k/ drivers/net/wireless/ath/ath9k/ drivers/net/wireless/atmel.c drivers/net/wireless/prism54/isl_38xx.c drivers/net/wireless/rt2x00/rt2x00debug.c drivers/net/wireless/rtlwifi/ drivers/net/wireless/ti/wlcore/ drivers/staging/ozwpan/ net/atm/mpoa_caches.c net/atm/mpoa_proc.c net/dccp/probe.c net/ipv4/tcp_probe.c net/netfilter/nfnetlink_queue_core.c net/netfilter/nfnetlink_queue_core.c net/netfilter/xt_time.c net/openvswitch/flow.c net/sctp/probe.c net/sunrpc/auth_gss/ net/sunrpc/svcauth_unix.c net/vmw_vsock/af_vsock.c We'll get there eventually, or we an add a dependency to ensure they are not built on 32-bit kernels that need to survive beyond 2038. Most of these should be really easy to fix. - recvmmsg/sendmmsg system calls: patches have been sent out as part of the syscall series, need a little more work and review - SIOCGSTAMP/SIOCGSTAMPNS/ ioctl calls: tricky, need to discuss with some folks at kernel summit - SO_RCVTIMEO/SO_SNDTIMEO/SO_TIMESTAMP/SO_TIMESTAMPNS socket opt: similar and related to the ioctl - mmapped packet socket: need to create v4 of the API, nontrivial - pktgen: sends 32-bit timestamps over network, need to find out if using unsigned stamps is good enough - af_rxpc: similar to pktgen, uses 32-bit times for deadlines - ppp ioctl: patch is being worked on, nontrivial but doable ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:16:49 -07:00
Arnd Bergmann	3ef0a25bf9	net: sctp: avoid incorrect time_t use We want to avoid using time_t in the kernel because of the y2038 overflow problem. The use in sctp is not for storing seconds at all, but instead uses microseconds and is passed as 32-bit on all machines. This patch changes the type to u32, which better fits the use. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: linux-sctp@vger.kernel.org Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:16:48 -07:00
Arnd Bergmann	3dd7669f1f	ipv6: use ktime_t for internal timestamps The ipv6 mip6 implementation is one of only a few users of the skb_get_timestamp() function in the kernel, which is both unsafe on 32-bit architectures because of the 2038 overflow, and slightly less efficient than the skb_get_ktime() based approach. This converts the function call and the mip6_report_rate_limiter structure that stores the time stamp, eliminating all uses of timeval in the ipv6 code. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:16:47 -07:00
Arnd Bergmann	f6389ecbc5	nfnetlink: use y2038 safe timestamp The __build_packet_message function fills a nfulnl_msg_packet_timestamp structure that uses 64-bit seconds and is therefore y2038 safe, but it uses an intermediate 'struct timespec' which is not. This trivially changes the code to use 'struct timespec64' instead, to correct the result on 32-bit architectures. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: netfilter-devel@vger.kernel.org Cc: coreteam@netfilter.org Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:16:47 -07:00
Arnd Bergmann	70ba07b675	atm: remove 'struct zatm_t_hist' The zatm_t_hist structure is not used anywhere in the kernel, but is exported to user space. As we are trying to eliminate uses of time_t in the kernel for y2038 compatibility, the current definition triggers checking tools because it contains 'struct timeval'. As pointed out by Chas Williams, the only user of this structure was the ZATM_GETHIST ioctl command that has been removed a long time ago, and we can remove the structure as well without breaking any user space. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Chas Williams <3chas3@gmail.com> Cc: linux-atm-general@lists.sourceforge.net Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:16:46 -07:00
Arnd Bergmann	84b00607ae	mac80211: use ktime_get_seconds The mac80211 code uses ktime_get_ts to measure the connected time. As this uses monotonic time, it is y2038 safe on 32-bit systems, but we still want to deprecate the use of 'timespec' because most other users are broken. This changes the code to use ktime_get_seconds() instead, which avoids the timespec structure and is slightly more efficient. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: linux-wireless@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:16:45 -07:00
Arnd Bergmann	52f4f91893	mwifiex: avoid gettimeofday in ba_threshold setting mwifiex_get_random_ba_threshold() uses a complex homegrown implementation to generate a pseudo-random number from the current time as returned from do_gettimeofday(). This currently requires two 32-bit divisions plus a couple of other computations that are eventually discarded as only eight bits of the microsecond portion are used at all. We could replace this with a call to get_random_bytes(), but that might drain the entropy pool too fast if this is called for each packet. Instead, this patch converts it to use ktime_get_ns(), which is a bit faster than do_gettimeofday(), and then uses a similar algorithm as before, but in a way that takes both the nanosecond and second portion into account for slightly-more-but-still-not-very-random pseudorandom number. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Amitkumar Karwar <akarwar@marvell.com> Cc: Nishant Sarmukadam <nishants@marvell.com> Cc: Kalle Valo <kvalo@codeaurora.org> Cc: linux-wireless@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:16:44 -07:00
Arnd Bergmann	e253fb74d6	mwifiex: use ktime_get_real for timestamping The mwifiex_11n_aggregate_pkt() function creates a ktime_t from a timeval returned by do_gettimeofday, which is slow and causes an overflow in 2038 on 32-bit architectures. This solves both problems by using the appropriate ktime_get_real() function. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Amitkumar Karwar <akarwar@marvell.com> Cc: Nishant Sarmukadam <nishants@marvell.com> Cc: Kalle Valo <kvalo@codeaurora.org> Cc: linux-wireless@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:16:43 -07:00
Arnd Bergmann	40c9b0796d	net: igb: avoid using timespec We want to deprecate the use of 'struct timespec' on 32-bit architectures, as it is will overflow in 2038. The igb driver uses it to read the current time, and can simply be changed to use ktime_get_real_ts64() instead. Because of hardware limitations, there is still an overflow in year 2106, which we cannot really avoid, but this documents the overflow. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: intel-wired-lan@lists.osuosl.org Reviewed-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:16:42 -07:00
Arnd Bergmann	0a6241551d	net: stmmac: avoid using timespec We want to deprecate the use of 'struct timespec' on 32-bit architectures, as it is will overflow in 2038. The stmmac driver uses it to read the current time, and can simply be changed to use ktime_get_real_ts64() instead. Because of hardware limitations, there is still an overflow in year 2106, which we cannot really avoid, but this documents the overflow. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Richard Cochran <richardcochran@gmail.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:16:41 -07:00
Arnd Bergmann	be7ccdc36b	net: fec: avoid timespec use The fec_ptp_enable_pps uses an open-coded implementation of ns_to_timespec, which will be removed eventually as it is not y2038-safe on 32-bit architectures. Two more instances of the same code in this file were already converted to use the safe ns_to_timespec64 in commit `6630514fce` ("ptp: fec: use helpers for converting ns to timespec"), this changes the last one as well. The seconds portion here is actually unused and we could just remove the timespec variable, but using ns_to_timespec64 can still be better as the implementation can be hand-optimized in the future. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Fugang Duan <b38611@freescale.com> Cc: Luwei Zhou <b45643@freescale.com> Cc: Frank Li <Frank.Li@freescale.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:16:39 -07:00
David S. Miller	07355737a8	Merge branch 'ipv4-multipath-hash' Peter Nørlund says: ==================== ipv4: Hash-based multipath routing When the routing cache was removed in 3.6, the IPv4 multipath algorithm changed from more or less being destination-based into being quasi-random per-packet scheduling. This increases the risk of out-of-order packets and makes it impossible to use multipath together with anycast services. This patch series replaces the old implementation with flow-based load balancing based on a hash over the source and destination addresses. Distribution of the hash is done with thresholds as described in RFC 2992. This reduces the disruption when a path is added/remove when having more than two paths. To futher the chance of successful usage in conjuction with anycast, ICMP error packets are hashed over the inner IP addresses. This ensures that PMTU will work together with anycast or load-balancers such as IPVS. Port numbers are not considered since fragments could cause problems with anycast and IPVS. Relying on the DF-flag for TCP packets is also insufficient, since ICMP inspection effectively extracts information from the opposite flow which might have a different state of the DF-flag. This is also why the RSS hash is not used. These are typically based on the NDIS RSS spec which mandates TCP support. Measurements of the additional overhead of a two-path multipath (p_mkroute_input excl. __mkroute_input) on a Xeon X3550 (4 cores, 2.66GHz): Original per-packet: ~394 cycles/packet L3 hash: ~76 cycles/packet Changes in v5: - Fixed compilation error Changes in v4: - Functions take hash directly instead of func ptr - Added inline hash function - Added dummy macros to minimize ifdefs - Use upper 31 bits of hash instead of lower Changes in v3: - Multipath algorithm is no longer configurable (always L3) - Added random seed to hash - Moved ICMP inspection to isolated function - Ignore source quench packets (deprecated as per RFC 6633) Changes in v2: - Replaced 8-bit xor hash with 31-bit jenkins hash - Don't scale weights (since 31-bit) - Avoided unnecesary renaming of variables - Rely on DF-bit instead of fragment offset when checking for fragmentation - upper_bound is now inclusive to avoid overflow - Use a callback to postpone extracting flow information until necessary - Skipped ICMP inspection entirely with L4 hashing - Handle newly added sysctl ignore_routes_with_linkdown ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:00:26 -07:00
Peter Nørlund	79a131592d	ipv4: ICMP packet inspection for multipath ICMP packets are inspected to let them route together with the flow they belong to, minimizing the chance that a problematic path will affect flows on other paths, and so that anycast environments can work with ECMP. Signed-off-by: Peter Nørlund <pch@ordbogen.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:00:04 -07:00
Peter Nørlund	0e884c78ee	ipv4: L3 hash-based multipath Replaces the per-packet multipath with a hash-based multipath using source and destination address. Signed-off-by: Peter Nørlund <pch@ordbogen.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 02:59:21 -07:00
David S. Miller	2472186f58	Merge branch 'tcp-listener-fixes-and-improvement' Eric Dumazet says: ==================== tcp: lockless listener fixes and improvement This fixes issues with TCP FastOpen vs lockless listeners, and SYNACK being attached to request sockets. Then, last patch brings performance improvement for syncookies generation and validation. Tested under a 4.3 Mpps SYNFLOOD attack, new perf profile looks like : 12.11% [kernel] [k] sha_transform 5.83% [kernel] [k] tcp_conn_request 4.59% [kernel] [k] __inet_lookup_listener 4.11% [kernel] [k] ipt_do_table 3.91% [kernel] [k] tcp_make_synack 3.05% [kernel] [k] fib_table_lookup 2.74% [kernel] [k] sock_wfree 2.66% [kernel] [k] memcpy_erms 2.12% [kernel] [k] tcp_v4_rcv ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 02:46:26 -07:00
Eric Dumazet	a1a5344ddb	tcp: avoid two atomic ops for syncookies inet_reqsk_alloc() is used to allocate a temporary request in order to generate a SYNACK with a cookie. Then later, syncookie validation also uses a temporary request. These paths already took a reference on listener refcount, we can avoid a couple of atomic operations. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 02:45:27 -07:00
Eric Dumazet	004a5d0140	net: use sk_fullsock() in __netdev_pick_tx() SYN_RECV & TIMEWAIT sockets are not full blown, they do not have a sk_dst_cache pointer. Fixes: `ca6fb06518` ("tcp: attach SYNACK messages to request sockets instead of listener") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 02:45:25 -07:00
Eric Dumazet	e7eadb4de9	ipv6: inet6_sk() should use sk_fullsock() SYN_RECV & TIMEWAIT sockets are not full blown, they do not have a pinet6 pointer. Fixes: `ca6fb06518` ("tcp: attach SYNACK messages to request sockets instead of listener") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 02:45:25 -07:00
Eric Dumazet	caf3f2676a	inet: ip_skb_dst_mtu() should use sk_fullsock() SYN_RECV & TIMEWAIT sockets are not full blown, do not even try to call ip_sk_use_pmtu() on them. Fixes: `ca6fb06518` ("tcp: attach SYNACK messages to request sockets instead of listener") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 02:45:24 -07:00
Eric Dumazet	7656d842de	tcp: fix fastopen races vs lockless listener There are multiple races that need fixes : 1) skb_get() + queue skb + kfree_skb() is racy An accept() can be done on another cpu, data consumed immediately. tcp_recvmsg() uses __kfree_skb() as it is assumed all skb found in socket receive queue are private. Then the kfree_skb() in tcp_rcv_state_process() uses an already freed skb 2) tcp_reqsk_record_syn() needs to be done before tcp_try_fastopen() for the same reasons. 3) We want to send the SYNACK before queueing child into accept queue, otherwise we might reintroduce the ooo issue fixed in commit `7c85af8810` ("tcp: avoid reorders for TFO passive connections") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 02:45:24 -07:00
David S. Miller	3e087caa23	Merge branch 'bridge-netlink' Nikolay Aleksandrov says: ==================== bridge: complete netlink support This set completes the bridge device's netlink support and makes it possible to view and configure everything that can be configured via sysfs. I have tested all of these (setting and getting). There're a few longer line warnings about the br_get_size() ifla comments but I think we should have them to know what has been accounted for. I have used the sysfs interface as a guide of what and how to set. As usual I'll send the corresponding iproute2 patches later. The bridge port's netlink interface will be completed after this set gets applied in some form. This patch-set is on top of my last vlan cleanups set: http://www.spinics.net/lists/netdev/msg346005.html ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-04 16:46:14 -07:00
Nikolay Aleksandrov	0f963b7592	bridge: netlink: add support for default_pvid Add IFLA_BR_VLAN_DEFAULT_PVID to allow setting/getting bridge's default_pvid via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-04 16:46:07 -07:00
Nikolay Aleksandrov	93870cc02a	bridge: netlink: add support for netfilter tables config Add support to allow getting/setting netfilter tables settings. Currently these are IFLA_BR_NF_CALL_IPTABLES, IFLA_BR_NF_CALL_IP6TABLES and IFLA_BR_NF_CALL_ARPTABLES. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-04 16:46:07 -07:00
Nikolay Aleksandrov	7e4df51eb3	bridge: netlink: add support for igmp's intervals Add support to set/get all of the igmp's configurable intervals via netlink. These currently are: IFLA_BR_MCAST_LAST_MEMBER_INTVL IFLA_BR_MCAST_MEMBERSHIP_INTVL IFLA_BR_MCAST_QUERIER_INTVL IFLA_BR_MCAST_QUERY_INTVL IFLA_BR_MCAST_QUERY_RESPONSE_INTVL IFLA_BR_MCAST_STARTUP_QUERY_INTVL Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-04 16:46:06 -07:00
Nikolay Aleksandrov	b89e6babad	bridge: netlink: add support for multicast_startup_query_count Add IFLA_BR_MCAST_STARTUP_QUERY_CNT to allow setting/getting br->multicast_startup_query_count via netlink. Also align the ifla comments. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-04 16:46:06 -07:00

1 2 3 4 5 ...

548055 Commits All Branches Search

548055 Commits

All Branches