searx/blog/intro-offline.html

173 lines
9.7 KiB
HTML

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Preparation for offline engines &#8212; Searx Documentation (Searx-1.1.0.tex)</title>
<link rel="stylesheet" type="text/css" href="../_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="../_static/searx.css" />
<link rel="stylesheet" type="text/css" href="../_static/tabs.css" />
<script data-url_root="../" id="documentation_options" src="../_static/documentation_options.js"></script>
<script src="../_static/jquery.js"></script>
<script src="../_static/underscore.js"></script>
<script src="../_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="../_static/doctools.js"></script>
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link rel="next" title="Searx admin interface" href="admin.html" />
<link rel="prev" title="Limit access to your searx engines" href="private-engines.html" />
</head><body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="../genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="../py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="right" >
<a href="admin.html" title="Searx admin interface"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="private-engines.html" title="Limit access to your searx engines"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="../index.html">Searx Documentation (Searx-1.1.0.tex)</a> &#187;</li>
<li class="nav-item nav-item-1"><a href="index.html" accesskey="U">Blog</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Preparation for offline engines</a></li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<section id="preparation-for-offline-engines">
<h1>Preparation for offline engines<a class="headerlink" href="#preparation-for-offline-engines" title="Permalink to this heading"></a></h1>
<section id="offline-engines">
<h2>Offline engines<a class="headerlink" href="#offline-engines" title="Permalink to this heading"></a></h2>
<p>To extend the functionality of searx, offline engines are going to be
introduced. An offline engine is an engine which does not need Internet
connection to perform a search and does not use HTTP to communicate.</p>
<p>Offline engines can be configured as online engines, by adding those to the
<cite>engines</cite> list of <a class="reference external" href="https://github.com/searx/searx/blob/master/searx/settings.yml">settings.yml</a>. Thus, searx
finds the engine file and imports it.</p>
<p>Example skeleton for the new engines:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">subprocess</span> <span class="kn">import</span> <span class="n">PIPE</span><span class="p">,</span> <span class="n">Popen</span>
<span class="n">categories</span> <span class="o">=</span> <span class="p">[</span><span class="s1">&#39;general&#39;</span><span class="p">]</span>
<span class="n">offline</span> <span class="o">=</span> <span class="kc">True</span>
<span class="k">def</span> <span class="nf">init</span><span class="p">(</span><span class="n">settings</span><span class="p">):</span>
<span class="k">pass</span>
<span class="k">def</span> <span class="nf">search</span><span class="p">(</span><span class="n">query</span><span class="p">,</span> <span class="n">params</span><span class="p">):</span>
<span class="n">process</span> <span class="o">=</span> <span class="n">Popen</span><span class="p">([</span><span class="s1">&#39;ls&#39;</span><span class="p">,</span> <span class="n">query</span><span class="p">],</span> <span class="n">stdout</span><span class="o">=</span><span class="n">PIPE</span><span class="p">)</span>
<span class="n">return_code</span> <span class="o">=</span> <span class="n">process</span><span class="o">.</span><span class="n">wait</span><span class="p">()</span>
<span class="k">if</span> <span class="n">return_code</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">RuntimeError</span><span class="p">(</span><span class="s1">&#39;non-zero return code&#39;</span><span class="p">,</span> <span class="n">return_code</span><span class="p">)</span>
<span class="n">results</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">line</span> <span class="o">=</span> <span class="n">process</span><span class="o">.</span><span class="n">stdout</span><span class="o">.</span><span class="n">readline</span><span class="p">()</span>
<span class="k">while</span> <span class="n">line</span><span class="p">:</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">parse_line</span><span class="p">(</span><span class="n">line</span><span class="p">)</span>
<span class="n">results</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">results</span><span class="p">)</span>
<span class="n">line</span> <span class="o">=</span> <span class="n">process</span><span class="o">.</span><span class="n">stdout</span><span class="o">.</span><span class="n">readline</span><span class="p">()</span>
<span class="k">return</span> <span class="n">results</span>
</pre></div>
</div>
</section>
<section id="development-progress">
<h2>Development progress<a class="headerlink" href="#development-progress" title="Permalink to this heading"></a></h2>
<p>First, a proposal has been created as a Github issue. Then it was moved to the
wiki as a design document. You can read it here: <a class="reference external" href="https://github.com/searx/searx/wiki/Offline-engines">Offline-engines</a>.</p>
<p>In this development step, searx core was prepared to accept and perform offline
searches. Offline search requests are scheduled together with regular offline
requests.</p>
<p>As offline searches can return arbitrary results depending on the engine, the
current result templates were insufficient to present such results. Thus, a new
template is introduced which is caplable of presenting arbitrary key value pairs
as a table. You can check out the pull request for more details see
<a class="reference external" href="https://github.com/searx/searx/pull/1700">PR 1700</a>.</p>
</section>
<section id="next-steps">
<h2>Next steps<a class="headerlink" href="#next-steps" title="Permalink to this heading"></a></h2>
<p>Today, it is possible to create/run an offline engine. However, it is going to be publicly available for everyone who knows the searx instance. So the next step is to introduce token based access for engines. This way administrators are able to limit the access to private engines.</p>
</section>
<section id="acknowledgement">
<h2>Acknowledgement<a class="headerlink" href="#acknowledgement" title="Permalink to this heading"></a></h2>
<p>This development was sponsored by <a class="reference external" href="https://nlnet.nl/discovery">Search and Discovery Fund</a> of <a class="reference external" href="https://nlnet.nl/">NLnet Foundation</a> .</p>
<div class="line-block">
<div class="line">Happy hacking.</div>
<div class="line">kvch // 2019.10.21 17:03</div>
</div>
</section>
</section>
<div class="clearer"></div>
</div>
</div>
</div>
<span id="sidebar-top"></span>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<p class="logo"><a href="../index.html">
<img class="logo" src="../_static/searx_logo_small.png" alt="Logo"/>
</a></p>
<h3>Project Links</h3>
<ul>
<li><a href="https://searx.github.io/searx/blog/index.html">Blog</a>
<li><a href="https://github.com/searx/searx">Source</a>
<li><a href="https://github.com/searx/searx/wiki">Wiki</a>
<li><a href="https://twitter.com/Searx_engine">Twitter</a>
<li><a href="https://github.com/searx/searx/issues">Issue Tracker</a>
</ul><h3>Navigation</h3>
<ul>
<li><a href="../index.html">Overview</a>
<ul>
<li><a href="index.html">Blog</a>
<ul>
<li>Previous: <a href="private-engines.html" title="previous chapter">Limit access to your searx engines</a>
<li>Next: <a href="admin.html" title="next chapter">Searx admin interface</a></ul>
</li>
</ul>
</li>
</ul>
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="../search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false"/>
<input type="submit" value="Go" />
</form>
</div>
</div>
<script>document.getElementById('searchbox').style.display = "block"</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2015-2022, Adam Tauber, Noémi Ványi.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.1.1.
</div>
<script src="../_static/version_warning_offset.js"></script>
</body>
</html>