Its better to place this inside of kore_worker_privsep(), this
way it'll be called for each process still and we can do it
before we sandbox the processes completely.
When trying to pin a worker to a certain CPU, Kore will log
if it fails but still continue.
The problem is that it tried to do it a bit early and the logging
facilities were not yet setup, causing it to be unable to continue
if kore_log() was called too early.
By moving it to kore_worker_started() we are certain all facilities
are up and running correctly.
The worker_count is incremented by 2 earlier to account for keymgr/acme
but aren't actually workers that should count towards CPU pinning.
So adjust the count when comparing to cpu_count when logging that there
are more workers than cpus.
The parent process never differentiated between a worker process
asking for a shutdown or a worker process calling fatalx() when
it came to its exit code.
I made some changes here so the parent process will exit with
an exit code 1 if anything worker related went wrong (fatalx/death policy).
This work moves all TLS / crypto related code into a tls_openssl.c
file and adds a tls_none.c which contains just stubs.
Allows compilation of Kore with TLS_BACKEND=none to remove building
against OpenSSL.
Also adds code for SHA1/SHA2 taken from openssh-portable so we don't
depend on those being present anymore in libcrypto.
It wasn't possible for the parent process to send messages
directly via kore_msg_send() to other worker processes.
This is now rectified to from the parent process one can call
kore_msg_send() with a worker destination and it'll work.
Wait for any process in our process group only instead of WAIT_ANY.
This allows the parent process to start subprocesses that end up
in different process groups which are handled in user code instead
completely (using signalfd for example).
This commit adds improved hooks for Python and a new signal delivery hook.
For the Python API kore_worker_configure() and kore_worker_teardown() had
to be implemented before this commit. Now one can create a workerstart
and workerend method in their koreapp as those will be called when
they exist.
The new signal hook is either kore_worker_signal() or koreapp.signal.
This new hook is called after the worker event code handles the received
signal itself first.
With this commit there is also a new kore_signal_trap() API call allowing
you to more easily trap new signals. This API also also exported to the
Python part of the code under kore.sigtrap()
- Make sure we drain the worker log channel if it dies
so we can flush out any lingering log messages.
- Get rid of the raise() in the parent to signal ourselves
we should terminate. Instead depend on the new kore_quit.
- Always attempt to reap children one way or the other.
Routes are now configured in a context per route:
route /path {
handler handler_name
methods get post head
validate qs:get id v_id
}
All route related configurations are per-route, allowing multiple
routes for the same path (for different methods).
The param context is removed and merged into the route context now
so that you use the validate keyword to specify what needs validating.
With the new process startup code we must handle the SIGSTOP
from the processes if seccomp_tracing is enabled. Otherwise
they just hang indefinitely and we assume they failed to start,
which is somewhat true.
Starting with the privsep config, this commit changes the following:
- Removes the root, runas, keymgr_root, keymgr_runas, acme_root and
acme_runas configuration options.
Instead these are now configured via a privsep configuration context:
privsep worker {
root /tmp
runas nobody
}
This is also configurable via Python using the new kore.privsep() method:
kore.privsep("worker", root="/tmp", runas="nobody", skip=["chroot"])
Tied into this we also better handle worker startup:
- Per worker process, wait until it signalled it is ready.
- If a worker fails at startup, display its last log lines more clearly.
- Don't start acme process if no domain requires acme.
- Remove each process its individual startup log message in favour
of a generalized one that displays its PID, root and user.
- At startup, log the kore version and built-ins in a nicer way.
- The worker processes now check things they need to start running
before signaling they are ready (such as access to CA certs for
TLS client authentication).
Before each worker process would either directly print to stdout if
Kore was running in foreground mode, or syslog otherwise.
With this commit the workers will submit their log messages to the
parent process who will either put it onto stdout or syslog.
This change in completely under the hood and users shouldn't care about it.
If a coroutine is woken up from another coroutine running from an
http request we can end up in a case where the call path looks like:
0 kore_worker_entry
1 epoll wait <- bound to pending timers
2 http_process <- first coro sleep
3 kore_python_coro_run <- wakes up request
4 http_process <- wakes up another coroutine
5 return to kore_worker_entry
In the case where 4 wakes up another coroutine but 1 is bound to a timer
and no io activity occurs the coroutine isn't run until the timer fires.
Fix this issue by always checking for pending coroutines even if the
netwait isn't INFINITE.
Otherwise in certain scenarios it could mean that workers
unsuccessfully grabbed the lock, reset accept_avail and
no longer attempt to grab the lock afterwards.
This can cause a complete stall in workers processing requests.
- Remove the edge trigger io hacks we had in place.
- Use level triggered io for the libcurl fds instead.
- Batch all curl events together and process them at the end
of our worker event loop.
If waitpid() returns -1 check if errno is ECHILD, just mark the worker
process as exited.
This could happen if Kore starts without keymgr/acme but those would still
be accounted for.
A new acme process is created that communicates with the acme servers.
This process does not hold any of your private keys (no account keys,
no domain keys etc).
Whenever the acme process requires a signed payload it will ask the keymgr
process to do the signing with the relevant keys.
This process is also sandboxed with pledge+unveil on OpenBSD and seccomp
syscall filtering on Linux.
The implementation only supports the tls-alpn-01 challenge. This means that
you do not need to open additional ports on your machine.
http-01 and dns-01 are currently not supported (no wildcard support).
A new configuration option "acme_provider" is available and can be set
to the acme server its directory. By default this will point to the
live letsencrypt environment:
https://acme-v02.api.letsencrypt.org/directory
The acme process can be controlled via the following config options:
- acme_root (where the acme process will chroot/chdir into).
- acme_runas (the user the acme process will run as).
If none are set, the values from 'root' and 'runas' are taken.
If you want to turn on acme for domains you do it as follows:
domain kore.io {
acme yes
}
You do not need to specify certkey/certfile anymore, if they are present
still
they will be overwritten by the acme system.
The keymgr will store all certificates and keys under its root
(keymgr_root), the account key is stored as "/account-key.pem" and all
obtained certificates go under "certificates/<domain>/fullchain.pem" while
keys go under "certificates/<domain>/key.pem".
Kore will automatically renew certificates if they will expire in 7 days
or less.
If set to "yes" then Kore will trace its child processes and properly
notify you of seccomp violations while still allowing the syscalls.
This can be very useful when running Kore on new platforms that have
not been properly tested with seccomp, allowing me to adjust the default
policies as we move further.
In cases where a request is immediately completed in libcurl its multi
handle and no additional i/o is happening a coro can get stuck waiting
to be run.
Prevent this by lowering netwait from KORE_WAIT_INFINITE if there
are pending python coroutines.
Before kore needed to be built with NOTLS=1 to be able to do non TLS
connections. This has been like this for years.
It is time to allow non TLS listeners without having to rebuild Kore.
This commit changes your configuration format and will break existing
applications their config.
Configurations now get listener {} contexts:
listen default {
bind 127.0.0.1 8888
}
The above will create a listener on 127.0.0.1, port 8888 that will serve
TLS (still the default).
If you want to turn off TLS on that listener, specify "tls no" in that
context.
Domains now need to be attached to a listener:
Eg:
domain * {
attach default
}
For the Python API this kills kore.bind(), and kore.bind_unix(). They are
replaced with:
kore.listen("name", ip=None, port=None, path=None, tls=True).
With this commit all Kore processes (minus the parent) are running
under seccomp.
The worker processes get the bare minimum allowed syscalls while each module
like curl, pgsql, etc will add their own filters to allow what they require.
New API functions:
int kore_seccomp_filter(const char *name, void *filter, size_t len);
Adds a filter into the seccomp system (must be called before
seccomp is enabled).
New helpful macro:
define KORE_SYSCALL_ALLOW(name)
Allow the syscall with a given name, should be used in
a sock_filter data structure.
New hooks:
void kore_seccomp_hook(void);
Called before seccomp is enabled, allows developers to add their
own BPF filters into seccomp.
- decouple pgsql from the HTTP request allowing it to be used in other
contexts as well (such as a task, etc).
- change names to dbsetup() and dbquery().
eg:
result = kore.dbquery("db", "select foo from bar")
In case libcurl instructs us to call the timeout function as soon
as possible (timeout == 0 in curl_timeout), don't try to be clever
with a timeout value of 10ms.
Instead call the timeout function once we get back in the worker
event loop. This makes things a lot snappier as we don't depend
on epoll/kqueue waiting for io for 10ms (which actually isn't 10ms...).