diff --git a/docs/admin/command-engine.rst b/docs/admin/command-engine.rst new file mode 100644 index 00000000..c1b43948 --- /dev/null +++ b/docs/admin/command-engine.rst @@ -0,0 +1,129 @@ +===================================== +Run shell commands from your instance +===================================== + +Command line engines are custom engines that run commands in the shell of the +host. In this article you can learn how to create a command engine and how to +customize the result display. + +The command +=========== + +When specifyng commands, you must make sure the commands are available on the +searx host. Searx will not install anything for you. Also, make sure that the +``searx`` user on your host is allowed to run the selected command and has +access to the required files. + +Access control +============== + +Be careful when creating command engines if you are running a public +instance. Do not expose any sensitive information. You can restrict access by +configuring a list of access tokens under tokens in your ``settings.yml``. + +Available settings +================== + +* ``command``: A comma separated list of the elements of the command. A special + token ``{{QUERY}}`` tells searx where to put the search terms of the + user. Example: ``['ls', '-l', '-h', '{{QUERY}}']`` +* ``query_type``: The expected type of user search terms. Possible values: + ``path`` and ``enum``. ``path`` checks if the uesr provided path is inside the + working directory. If not the query is not executed. ``enum`` is a list of + allowed search terms. If the user submits something which is not included in + the list, the query returns an error. +* ``delimiter``: A dict containing a delimiter char and the "titles" of each + element in keys. +* ``parse_regex``: A dict containing the regular expressions for each result + key. +* ``query_enum``: A list containing allowed search terms if ``query_type`` is + set to ``enum``. +* ``working_dir``: The directory where the command has to be executed. Default: + ``.`` +* ``result_separator``: The character that separates results. Default: ``\n`` + +Customize the result template +============================= + +There is a default result template for displaying key-value pairs coming from +command engines. If you want something more tailored to your result types, you +can design your own template. + +Searx relies on `Jinja2 `_ for +templating. If you are familiar with Jinja, you will not have any issues +creating templates. You can access the result attributes with ``{{ +result.attribute_name }}``. + +In the example below the result has two attributes: ``header`` and ``content``. +To customize their diplay, you need the following template (you must define +these classes yourself): + +.. code:: html + +
+
+ {{ result.header }} +
+
+ {{ result.content }} +
+
+ +Then put your template under ``searx/templates/{theme-name}/result_templates`` +named ``your-template-name.html``. You can select your custom template with the +option ``result_template``. + +.. code:: yaml + + - name: your engine name + engine: command + result_template: your-template-name.html + +Examples +======== + +Find files by name +------------------ + +The first example is to find files on your searx host. It uses the command +`find` available on most Linux distributions. It expects a path type query. The +path in the search request must be inside the ``working_dir``. + +The results are displayed with the default `key-value.html` template. A result +is displayed in a single row table with the key "line". + +.. code:: yaml + + - name : find + engine : command + command : ['find', '.', '-name', '{{QUERY}}'] + query_type : path + shortcut : fnd + tokens : [] + disabled : True + delimiter : + chars : ' ' + keys : ['line'] + + +Find files by contents +----------------------- + +In the second example, we define an engine that searches in the contents of the +files under the ``working_dir``. The search type is not defined, so the user can +input any string they want. To restrict the input, you can set the ``query_type`` +to ``enum`` and only allow a set of search terms to protect +yourself. Alternatively, make the engine private, so no one malevolent accesses +the engine. + +.. code:: yaml + + - name : regex search in files + engine : command + command : ['grep', '{{QUERY}}'] + shortcut : gr + tokens : [] + disabled : True + delimiter : + chars : ' ' + keys : ['line'] diff --git a/docs/admin/engines.rst b/docs/admin/engines.rst index ff25a7ed..ed6f27bd 100644 --- a/docs/admin/engines.rst +++ b/docs/admin/engines.rst @@ -86,3 +86,60 @@ Show errors **DE** {% endfor %} + .. flat-table:: Additional engines (commented out in settings.yml) + :header-rows: 1 + :stub-columns: 2 + + * - Name + - Base URL + - Host + - Port + - Paging + + * - elasticsearch + - localhost:9200 + - + - + - False + + * - meilicsearch + - localhost:7700 + - + - + - True + + * - mongodb + - + - 127.0.0.1 + - 21017 + - True + + * - mysql_server + - + - 127.0.0.1 + - 3306 + - True + + * - postgresql + - + - 127.0.0.1 + - 5432 + - True + + * - redis_server + - + - 127.0.0.1 + - 6379 + - False + + * - solr + - localhost:8983 + - + - + - True + + * - sqlite + - + - + - + - True diff --git a/docs/admin/index.rst b/docs/admin/index.rst index c708c4ff..660ed5fe 100644 --- a/docs/admin/index.rst +++ b/docs/admin/index.rst @@ -19,5 +19,9 @@ Administrator documentation filtron morty engines + private-engines + command-engine + indexer-engines + no-sql-engines plugins buildhosts diff --git a/docs/admin/indexer-engines.rst b/docs/admin/indexer-engines.rst new file mode 100644 index 00000000..fcb814d4 --- /dev/null +++ b/docs/admin/indexer-engines.rst @@ -0,0 +1,89 @@ +================== +Search in indexers +================== + +Searx supports three popular indexer search engines: + +* Elasticsearch +* Meilisearch +* Solr + +Elasticsearch +============= + +Make sure that the Elasticsearch user has access to the index you are querying. +If you are not using TLS during your connection, set ``enable_http`` to ``True``. + +.. code:: yaml + + - name : elasticsearch + shortcut : es + engine : elasticsearch + base_url : http://localhost:9200 + username : elastic + password : changeme + index : my-index + query_type : match + enable_http : True + +Available settings +------------------ + +* ``base_url``: URL of Elasticsearch instance. By default it is set to ``http://localhost:9200``. +* ``index``: Name of the index to query. Required. +* ``query_type``: Elasticsearch query method to use. Available: ``match``, + ``simple_query_string``, ``term``, ``terms``, ``custom``. +* ``custom_query_json``: If you selected ``custom`` for ``query_type``, you must + provide the JSON payload in this option. +* ``username``: Username in Elasticsearch +* ``password``: Password for the Elasticsearch user + +Meilisearch +=========== + +If you are not using TLS during connection, set ``enable_http`` to ``True``. + +.. code:: yaml + + - name : meilisearch + engine : meilisearch + shortcut: mes + base_url : http://localhost:7700 + index : my-index + enable_http: True + +Available settings +------------------ + +* ``base_url``: URL of the Meilisearch instance. By default it is set to http://localhost:7700 +* ``index``: Name of the index to query. Required. +* ``auth_key``: Key required for authentication. +* ``facet_filters``: List of facets to search in. + +Solr +==== + +If you are not using TLS during connection, set ``enable_http`` to ``True``. + +.. code:: yaml + + - name : solr + engine : solr + shortcut : slr + base_url : http://localhost:8983 + collection : my-collection + sort : asc + enable_http : True + +Available settings +------------------ + +* ``base_url``: URL of the Meilisearch instance. By default it is set to http://localhost:8983 +* ``collection``: Name of the collection to query. Required. +* ``sort``: Sorting of the results. Available: ``asc``, ``desc``. +* ``rows``: Maximum number of results from a query. Default value: 10. +* ``field_list``: List of fields returned from the query. +* ``default_fields``: Default fields to query. +* ``query_fields``: List of fields with a boost factor. The bigger the boost + factor of a field, the more important the field is in the query. Example: + ``qf="field1^2.3 field2"`` diff --git a/docs/admin/no-sql-engines.rst b/docs/admin/no-sql-engines.rst new file mode 100644 index 00000000..5da19df5 --- /dev/null +++ b/docs/admin/no-sql-engines.rst @@ -0,0 +1,170 @@ +=========================== +Query SQL and NoSQL servers +=========================== + +SQL +=== + +SQL servers are traditional databases with predefined data schema. Furthermore, +modern versions also support BLOB data. + +You can search in the following servers: + +* `PostgreSQL`_ +* `MySQL`_ +* `SQLite`_ + +The configuration of the new database engines are similar. You must put a valid +SELECT SQL query in ``query_str``. At the moment you can only bind at most +one parameter in your query. + +Do not include LIMIT or OFFSET in your SQL query as the engines +rely on these keywords during paging. + +PostgreSQL +---------- + +Required PyPi package: ``psychopg2`` + +You can find an example configuration below: + +.. code:: yaml + + - name : postgresql + engine : postgresql + database : my_database + username : searx + password : password + query_str : 'SELECT * from my_table WHERE my_column = %(query)s' + shortcut : psql + + +Available options +~~~~~~~~~~~~~~~~~ +* ``host``: IP address of the host running PostgreSQL. By default it is ``127.0.0.1``. +* ``port``: Port number PostgreSQL is listening on. By default it is ``5432``. +* ``database``: Name of the database you are connecting to. +* ``username``: Name of the user connecting to the database. +* ``password``: Password of the database user. +* ``query_str``: Query string to run. Keywords like ``LIMIT`` and ``OFFSET`` are not allowed. Required. +* ``limit``: Number of returned results per page. By default it is 10. + +MySQL +----- + +Required PyPi package: ``mysql-connector-python`` + +This is an example configuration for quering a MySQL server: + +.. code:: yaml + + - name : mysql + engine : mysql_server + database : my_database + username : searx + password : password + limit : 5 + query_str : 'SELECT * from my_table WHERE my_column=%(query)s' + shortcut : mysql + + +Available options +~~~~~~~~~~~~~~~~~ +* ``host``: IP address of the host running MySQL. By default it is ``127.0.0.1``. +* ``port``: Port number MySQL is listening on. By default it is ``3306``. +* ``database``: Name of the database you are connecting to. +* ``auth_plugin``: Authentication plugin to use. By default it is ``caching_sha2_password``. +* ``username``: Name of the user connecting to the database. +* ``password``: Password of the database user. +* ``query_str``: Query string to run. Keywords like ``LIMIT`` and ``OFFSET`` are not allowed. Required. +* ``limit``: Number of returned results per page. By default it is 10. + +SQLite +------ + +You can read from your database ``my_database`` using this example configuration: + +.. code:: yaml + + - name : sqlite + engine : sqlite + shortcut: sq + database : my_database + query_str : 'SELECT * FROM my_table WHERE my_column=:query' + + +Available options +~~~~~~~~~~~~~~~~~ +* ``database``: Name of the database you are connecting to. +* ``query_str``: Query string to run. Keywords like ``LIMIT`` and ``OFFSET`` are not allowed. Required. +* ``limit``: Number of returned results per page. By default it is 10. + +NoSQL +===== + +NoSQL data stores are used for storing arbitrary data without first defining their +structure. To query the supported servers, you must install their drivers using PyPi. + +You can search in the following servers: + +* `Redis`_ +* `MongoDB`_ + +Redis +----- + +Reqired PyPi package: ``redis`` + +Example configuration: + +.. code:: yaml + + - name : mystore + engine : redis_server + exact_match_only : True + host : 127.0.0.1 + port : 6379 + password : secret-password + db : 0 + shortcut : rds + enable_http : True + +Available options +~~~~~~~~~~~~~~~~~ + +* ``host``: IP address of the host running Redis. By default it is ``127.0.0.1``. +* ``port``: Port number Redis is listening on. By default it is ``6379``. +* ``password``: Password if required by Redis. +* ``db``: Number of the database you are connecting to. +* ``exact_match_only``: Enable if you need exact matching. By default it is ``True``. + + +MongoDB +------- + +Required PyPi package: ``pymongo`` + +Below is an example configuration for using a MongoDB collection: + +.. code:: yaml + + - name : mymongo + engine : mongodb + shortcut : icm + host : '127.0.0.1' + port : 27017 + database : personal + collection : income + key : month + enable_http: True + + +Available options +~~~~~~~~~~~~~~~~~ + +* ``host``: IP address of the host running MongoDB. By default it is ``127.0.0.1``. +* ``port``: Port number MongoDB is listening on. By default it is ``27017``. +* ``password``: Password if required by Redis. +* ``database``: Name of the database you are connecting to. +* ``collection``: Name of the collection you want to search in. +* ``exact_match_only``: Enable if you need exact matching. By default it is ``True``. diff --git a/docs/admin/prefernces-private.png b/docs/admin/prefernces-private.png new file mode 100644 index 00000000..6ff69ca8 Binary files /dev/null and b/docs/admin/prefernces-private.png differ diff --git a/docs/admin/private-engines.rst b/docs/admin/private-engines.rst new file mode 100644 index 00000000..36636143 --- /dev/null +++ b/docs/admin/private-engines.rst @@ -0,0 +1,44 @@ +============================= +How to create private engines +============================= + +If you are running your public searx instance, you might want to restrict access +to some engines. Maybe you are afraid of bots might abusing the engine. Or the +engine might return private results you do not want to share with strangers. + +Server side configuration +========================= + +You can make any engine private by setting a list of tokens in your settings.yml +file. In the following example, we set two different tokens that provide access +to the engine. + +.. code:: yaml + + - name: my-private-google + engine: google + shortcut: pgo + tokens: ['my-secret-token-1', 'my-secret-token-2'] + + +To access the private engine, you must distribute the tokens to your searx +users. It is up to you how you let them know what the access token is you +created. + +Client side configuration +========================= + +As a searx instance user, you can add any number of access tokens on the +Preferences page. You have to set a comma separated lists of strings in "Engine +tokens" input, then save your new preferences. + +.. image:: prefernces-private.png + :width: 600px + :align: center + :alt: location of token textarea + +Once the Preferences page is loaded again, you can see the information of the +private engines you got access to. If you cannot see the expected engines in the +engines list, double check your token. If there is no issue with the token, +contact your instance administrator. + diff --git a/searx/engines/bing.py b/searx/engines/bing.py index 61abd466..7d9d8549 100644 --- a/searx/engines/bing.py +++ b/searx/engines/bing.py @@ -68,11 +68,13 @@ def response(resp): for result in eval_xpath(dom, '//div[@class="sa_cc"]'): link = eval_xpath(result, './/h3/a')[0] url = link.attrib.get('href') + pretty_url = extract_text(eval_xpath(result, './/cite')) title = extract_text(link) content = extract_text(eval_xpath(result, './/p')) # append result results.append({'url': url, + 'pretty_url': pretty_url, 'title': title, 'content': content}) diff --git a/searx/webapp.py b/searx/webapp.py index 9626ab87..2027e72d 100755 --- a/searx/webapp.py +++ b/searx/webapp.py @@ -647,7 +647,7 @@ def search(): # removing html content and whitespace duplications result['title'] = ' '.join(html_to_text(result['title']).strip().split()) - if 'url' in result: + if 'url' in result and 'pretty_url' not in result: result['pretty_url'] = prettify_url(result['url']) # TODO, check if timezone is calculated right