auto merge of #13431 : lifthrasiir/rust/rustdoc-smaller-index, r=alexcrichton

This is a series of inter-related commits which depend on #13402 (Prune the paths that do not appear in the index). Please consider this as an early review request; I'll rebase this when the parent PR get merged and rebase is required.

----

This PR aims at reducing the search index without removing the actual information. In my measurement with both library and compiler docs, the search index is 52% smaller before gzipped, and 16% smaller after gzipped:

```
 1719473 search-index-old.js
 1503299 search-index.js (after #13402, 13% gain)
  724955 search-index-new.js (after this PR, 52% gain w.r.t. #13402)

  262711 search-index-old.js.gz
  214205 search-index.js.gz (after #13402, 18.5% gain)
  179396 search-index-new.js.gz (after this PR, 16% gain w.r.t. #13402)
```

Both the uncompressed and compressed size of the search index have been accounted. While the former would be less relevant when #12597 (Web site should be transferring data compressed) is resolved, the uncompressed index will be around for a while anyway and directly affects the UX of docs. Moreover, LZ77 (and gzip) can only remove *some* repeated strings (since its search window is limited in size), so optimizing for the uncompressed size often has a positive effect on the compressed size as well.

Each commit represents the following incremental improvements, in the order:

1. Parent paths were referred by its AST `NodeId`, which tends to be large. We don't need the actual node ID, so we remap them to the smaller sequential numbers. This also means that the list of paths can be a flat array instead of an object.
2. We remap each item type to small predefined numbers. This is strictly intended to reduce the uncompressed size of the search index.
3. We use arrays instead of objects and reconstruct the original objects in the JavaScript code. Since this removes a lot of boilerplates, this affects both the uncompressed and compressed size.
4. (I've found that a centralized `searchIndex` is easier to handle in JS, so I shot one global variable down.)
5. Finally, the repeated paths in the consecutive items are omitted (replaced by an empty string). This also greatly affects both the uncompressed and compressed size.

There had been several unsuccessful attempts to reduce the search index. Especially, I explicitly avoided complex optimizations like encoding paths in a compressed form, and only applied the optimizations when it had a substantial gain compared to the changes. Also, while I've tried to be careful, the lack of proper (non-smoke) tests makes me a bit worry; any advice on testing the search indices would be appreciated.
This commit is contained in:
bors 2014-04-14 08:36:56 -07:00
commit 2f41a85d8e
5 changed files with 229 additions and 76 deletions

View File

@ -1,4 +1,4 @@
// Copyright 2013 The Rust Project Developers. See the COPYRIGHT
// Copyright 2013-2014 The Rust Project Developers. See the COPYRIGHT
// file at the top-level directory of this distribution and at
// http://rust-lang.org/COPYRIGHT.
//
@ -24,6 +24,8 @@ use syntax::ast;
use syntax::ast_util;
use clean;
use html::item_type;
use html::item_type::ItemType;
use html::render;
use html::render::{cache_key, current_location_key};
@ -172,17 +174,17 @@ fn external_path(w: &mut io::Writer, p: &clean::Path, print_all: bool,
},
|_cache| {
Some((Vec::from_slice(fqn), match kind {
clean::TypeStruct => "struct",
clean::TypeEnum => "enum",
clean::TypeFunction => "fn",
clean::TypeTrait => "trait",
clean::TypeStruct => item_type::Struct,
clean::TypeEnum => item_type::Enum,
clean::TypeFunction => item_type::Function,
clean::TypeTrait => item_type::Trait,
}))
})
}
fn path(w: &mut io::Writer, path: &clean::Path, print_all: bool,
root: |&render::Cache, &[~str]| -> Option<~str>,
info: |&render::Cache| -> Option<(Vec<~str> , &'static str)>)
info: |&render::Cache| -> Option<(Vec<~str> , ItemType)>)
-> fmt::Result
{
// The generics will get written to both the title and link
@ -252,12 +254,12 @@ fn path(w: &mut io::Writer, path: &clean::Path, print_all: bool,
url.push_str("/");
}
match shortty {
"mod" => {
item_type::Module => {
url.push_str(*fqp.last().unwrap());
url.push_str("/index.html");
}
_ => {
url.push_str(shortty);
url.push_str(shortty.to_static_str());
url.push_str(".");
url.push_str(*fqp.last().unwrap());
url.push_str(".html");

View File

@ -0,0 +1,97 @@
// Copyright 2014 The Rust Project Developers. See the COPYRIGHT
// file at the top-level directory of this distribution and at
// http://rust-lang.org/COPYRIGHT.
//
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
// option. This file may not be copied, modified, or distributed
// except according to those terms.
//! Item types.
use std::fmt;
use clean;
/// Item type. Corresponds to `clean::ItemEnum` variants.
///
/// The search index uses item types encoded as smaller numbers which equal to
/// discriminants. JavaScript then is used to decode them into the original value.
/// Consequently, every change to this type should be synchronized to
/// the `itemTypes` mapping table in `static/main.js`.
#[deriving(Eq, Clone)]
pub enum ItemType {
Module = 0,
Struct = 1,
Enum = 2,
Function = 3,
Typedef = 4,
Static = 5,
Trait = 6,
Impl = 7,
ViewItem = 8,
TyMethod = 9,
Method = 10,
StructField = 11,
Variant = 12,
ForeignFunction = 13,
ForeignStatic = 14,
Macro = 15,
}
impl ItemType {
pub fn to_static_str(&self) -> &'static str {
match *self {
Module => "mod",
Struct => "struct",
Enum => "enum",
Function => "fn",
Typedef => "typedef",
Static => "static",
Trait => "trait",
Impl => "impl",
ViewItem => "viewitem",
TyMethod => "tymethod",
Method => "method",
StructField => "structfield",
Variant => "variant",
ForeignFunction => "ffi",
ForeignStatic => "ffs",
Macro => "macro",
}
}
}
impl fmt::Show for ItemType {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
self.to_static_str().fmt(f)
}
}
impl fmt::Unsigned for ItemType {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
(*self as uint).fmt(f)
}
}
pub fn shortty(item: &clean::Item) -> ItemType {
match item.inner {
clean::ModuleItem(..) => Module,
clean::StructItem(..) => Struct,
clean::EnumItem(..) => Enum,
clean::FunctionItem(..) => Function,
clean::TypedefItem(..) => Typedef,
clean::StaticItem(..) => Static,
clean::TraitItem(..) => Trait,
clean::ImplItem(..) => Impl,
clean::ViewItemItem(..) => ViewItem,
clean::TyMethodItem(..) => TyMethod,
clean::MethodItem(..) => Method,
clean::StructFieldItem(..) => StructField,
clean::VariantItem(..) => Variant,
clean::ForeignFunctionItem(..) => ForeignFunction,
clean::ForeignStaticItem(..) => ForeignStatic,
clean::MacroItem(..) => Macro,
}
}

View File

@ -52,6 +52,8 @@ use rustc::util::nodemap::NodeSet;
use clean;
use doctree;
use fold::DocFolder;
use html::item_type;
use html::item_type::{ItemType, shortty};
use html::format::{VisSpace, Method, FnStyleSpace};
use html::layout;
use html::markdown;
@ -138,7 +140,7 @@ pub struct Cache {
/// URLs when a type is being linked to. External paths are not located in
/// this map because the `External` type itself has all the information
/// necessary.
pub paths: HashMap<ast::NodeId, (Vec<~str> , &'static str)>,
pub paths: HashMap<ast::NodeId, (Vec<~str> , ItemType)>,
/// This map contains information about all known traits of this crate.
/// Implementations of a crate should inherit the documentation of the
@ -193,7 +195,7 @@ struct Sidebar<'a> { cx: &'a Context, item: &'a clean::Item, }
/// Struct representing one entry in the JS search index. These are all emitted
/// by hand to a large JS file at the end of cache-creation.
struct IndexItem {
ty: &'static str,
ty: ItemType,
name: ~str,
path: ~str,
desc: ~str,
@ -262,6 +264,9 @@ pub fn run(mut krate: clean::Crate, dst: Path) -> io::IoResult<()> {
});
cache.stack.push(krate.name.clone());
krate = cache.fold_crate(krate);
let mut nodeid_to_pathid = HashMap::new();
let mut pathid_to_nodeid = Vec::new();
{
let Cache { search_index: ref mut index,
orphan_methods: ref meths, paths: ref mut paths, ..} = cache;
@ -283,48 +288,67 @@ pub fn run(mut krate: clean::Crate, dst: Path) -> io::IoResult<()> {
}
};
// Prune the paths that do not appear in the index.
let mut unseen: HashSet<ast::NodeId> = paths.keys().map(|&id| id).collect();
// Reduce `NodeId` in paths into smaller sequential numbers,
// and prune the paths that do not appear in the index.
for item in index.iter() {
match item.parent {
Some(ref pid) => { unseen.remove(pid); }
Some(nodeid) => {
if !nodeid_to_pathid.contains_key(&nodeid) {
let pathid = pathid_to_nodeid.len();
nodeid_to_pathid.insert(nodeid, pathid);
pathid_to_nodeid.push(nodeid);
}
}
None => {}
}
}
for pid in unseen.iter() {
paths.remove(pid);
}
assert_eq!(nodeid_to_pathid.len(), pathid_to_nodeid.len());
}
// Publish the search index
let index = {
let mut w = MemWriter::new();
try!(write!(&mut w, "searchIndex['{}'] = [", krate.name));
try!(write!(&mut w, r#"searchIndex['{}'] = \{"items":["#, krate.name));
let mut lastpath = ~"";
for (i, item) in cache.search_index.iter().enumerate() {
// Omit the path if it is same to that of the prior item.
let path;
if lastpath == item.path {
path = "";
} else {
lastpath = item.path.clone();
path = item.path.as_slice();
};
if i > 0 {
try!(write!(&mut w, ","));
}
try!(write!(&mut w, "\\{ty:\"{}\",name:\"{}\",path:\"{}\",desc:{}",
item.ty, item.name, item.path,
try!(write!(&mut w, r#"[{:u},"{}","{}",{}"#,
item.ty, item.name, path,
item.desc.to_json().to_str()));
match item.parent {
Some(id) => {
try!(write!(&mut w, ",parent:'{}'", id));
Some(nodeid) => {
let pathid = *nodeid_to_pathid.find(&nodeid).unwrap();
try!(write!(&mut w, ",{}", pathid));
}
None => {}
}
try!(write!(&mut w, "\\}"));
try!(write!(&mut w, "]"));
}
try!(write!(&mut w, "];"));
try!(write!(&mut w, "allPaths['{}'] = \\{", krate.name));
for (i, (&id, &(ref fqp, short))) in cache.paths.iter().enumerate() {
try!(write!(&mut w, r#"],"paths":["#));
for (i, &nodeid) in pathid_to_nodeid.iter().enumerate() {
let &(ref fqp, short) = cache.paths.find(&nodeid).unwrap();
if i > 0 {
try!(write!(&mut w, ","));
}
try!(write!(&mut w, "'{}':\\{type:'{}',name:'{}'\\}",
id, short, *fqp.last().unwrap()));
try!(write!(&mut w, r#"[{:u},"{}"]"#,
short, *fqp.last().unwrap()));
}
try!(write!(&mut w, "\\};"));
try!(write!(&mut w, r"]\};"));
str::from_utf8(w.unwrap().as_slice()).unwrap().to_owned()
};
@ -360,7 +384,7 @@ pub fn run(mut krate: clean::Crate, dst: Path) -> io::IoResult<()> {
}
}
let mut w = try!(File::create(&dst));
try!(writeln!(&mut w, r"var searchIndex = \{\}; var allPaths = \{\};"));
try!(writeln!(&mut w, r"var searchIndex = \{\};"));
for index in all_indexes.iter() {
try!(writeln!(&mut w, "{}", *index));
}
@ -613,12 +637,13 @@ impl DocFolder for Cache {
} else {
let last = self.parent_stack.last().unwrap();
let path = match self.paths.find(last) {
Some(&(_, "trait")) =>
Some(&(_, item_type::Trait)) =>
Some(self.stack.slice_to(self.stack.len() - 1)),
// The current stack not necessarily has correlation for
// where the type was defined. On the other hand,
// `paths` always has the right information if present.
Some(&(ref fqp, "struct")) | Some(&(ref fqp, "enum")) =>
Some(&(ref fqp, item_type::Struct)) |
Some(&(ref fqp, item_type::Enum)) =>
Some(fqp.slice_to(fqp.len() - 1)),
Some(..) => Some(self.stack.as_slice()),
None => None
@ -678,7 +703,7 @@ impl DocFolder for Cache {
clean::VariantItem(..) => {
let mut stack = self.stack.clone();
stack.pop();
self.paths.insert(item.id, (stack, "enum"));
self.paths.insert(item.id, (stack, item_type::Enum));
}
_ => {}
}
@ -836,7 +861,7 @@ impl Context {
}
title.push_str(" - Rust");
let page = layout::Page {
ty: shortty(it),
ty: shortty(it).to_static_str(),
root_path: cx.root_path.as_slice(),
title: title.as_slice(),
};
@ -890,27 +915,6 @@ impl Context {
}
}
fn shortty(item: &clean::Item) -> &'static str {
match item.inner {
clean::ModuleItem(..) => "mod",
clean::StructItem(..) => "struct",
clean::EnumItem(..) => "enum",
clean::FunctionItem(..) => "fn",
clean::TypedefItem(..) => "typedef",
clean::StaticItem(..) => "static",
clean::TraitItem(..) => "trait",
clean::ImplItem(..) => "impl",
clean::ViewItemItem(..) => "viewitem",
clean::TyMethodItem(..) => "tymethod",
clean::MethodItem(..) => "method",
clean::StructFieldItem(..) => "structfield",
clean::VariantItem(..) => "variant",
clean::ForeignFunctionItem(..) => "ffi",
clean::ForeignStaticItem(..) => "ffs",
clean::MacroItem(..) => "macro",
}
}
impl<'a> Item<'a> {
fn ismodule(&self) -> bool {
match self.item.inner {
@ -1000,7 +1004,7 @@ impl<'a> fmt::Show for Item<'a> {
fn item_path(item: &clean::Item) -> ~str {
match item.inner {
clean::ModuleItem(..) => *item.name.get_ref() + "/index.html",
_ => shortty(item) + "." + *item.name.get_ref() + ".html"
_ => shortty(item).to_static_str() + "." + *item.name.get_ref() + ".html"
}
}
@ -1086,13 +1090,13 @@ fn item_module(w: &mut Writer, cx: &Context,
indices.sort_by(|&i1, &i2| cmp(&items[i1], &items[i2], i1, i2));
debug!("{:?}", indices);
let mut curty = "";
let mut curty = None;
for &idx in indices.iter() {
let myitem = &items[idx];
let myty = shortty(myitem);
let myty = Some(shortty(myitem));
if myty != curty {
if curty != "" {
if curty.is_some() {
try!(write!(w, "</table>"));
}
curty = myty;
@ -1695,8 +1699,9 @@ impl<'a> fmt::Show for Sidebar<'a> {
};
try!(write!(w, "<div class='block {}'><h2>{}</h2>", short, longty));
for item in items.iter() {
let curty = shortty(cur).to_static_str();
let class = if cur.name.get_ref() == item &&
short == shortty(cur) { "current" } else { "" };
short == curty { "current" } else { "" };
try!(write!(w, "<a class='{ty} {class}' href='{curty, select,
mod{../}
other{}
@ -1707,7 +1712,7 @@ impl<'a> fmt::Show for Sidebar<'a> {
ty = short,
tysel = short,
class = class,
curty = shortty(cur),
curty = curty,
name = item.as_slice()));
}
try!(write!(w, "</div>"));
@ -1726,7 +1731,7 @@ impl<'a> fmt::Show for Sidebar<'a> {
fn build_sidebar(m: &clean::Module) -> HashMap<~str, Vec<~str> > {
let mut map = HashMap::new();
for item in m.items.iter() {
let short = shortty(item);
let short = shortty(item).to_static_str();
let myname = match item.name {
None => continue,
Some(ref s) => s.to_owned(),

View File

@ -9,7 +9,7 @@
// except according to those terms.
/*jslint browser: true, es5: true */
/*globals $: true, rootPath: true, allPaths: true */
/*globals $: true, rootPath: true */
(function() {
"use strict";
@ -135,7 +135,7 @@
function execQuery(query, max, searchWords) {
var valLower = query.query.toLowerCase(),
val = valLower,
typeFilter = query.type,
typeFilter = itemTypeFromName(query.type),
results = [],
split = valLower.split("::");
@ -156,7 +156,7 @@
for (var i = 0; i < nSearchWords; i += 1) {
if (searchWords[i] === val) {
// filter type: ... queries
if (!typeFilter || typeFilter === searchIndex[i].ty) {
if (typeFilter < 0 || typeFilter === searchIndex[i].ty) {
results.push({id: i, index: -1});
}
}
@ -174,7 +174,7 @@
searchWords[j].replace(/_/g, "").indexOf(val) > -1)
{
// filter type: ... queries
if (!typeFilter || typeFilter === searchIndex[j].ty) {
if (typeFilter < 0 || typeFilter === searchIndex[j].ty) {
results.push({id: j, index: searchWords[j].replace(/_/g, "").indexOf(val)});
}
}
@ -258,7 +258,7 @@
var result = results[i],
name = result.item.name.toLowerCase(),
path = result.item.path.toLowerCase(),
parent = allPaths[result.item.crate][result.item.parent];
parent = result.item.parent;
var valid = validateResult(name, path, split, parent);
if (!valid) {
@ -405,7 +405,7 @@
shown.push(item);
name = item.name;
type = item.ty;
type = itemTypes[item.ty];
output += '<tr class="' + type + ' result"><td>';
@ -422,12 +422,12 @@
'/index.html" class="' + type +
'">' + name + '</a>';
} else if (item.parent !== undefined) {
var myparent = allPaths[item.crate][item.parent];
var myparent = item.parent;
var anchor = '#' + type + '.' + name;
output += item.path + '::' + myparent.name +
'::<a href="' + rootPath +
item.path.replace(/::/g, '/') +
'/' + myparent.type +
'/' + itemTypes[myparent.ty] +
'.' + myparent.name +
'.html' + anchor +
'" class="' + type +
@ -505,28 +505,76 @@
showResults(results);
}
// This mapping table should match the discriminants of
// `rustdoc::html::item_type::ItemType` type in Rust.
var itemTypes = ["mod",
"struct",
"enum",
"fn",
"typedef",
"static",
"trait",
"impl",
"viewitem",
"tymethod",
"method",
"structfield",
"variant",
"ffi",
"ffs",
"macro"];
function itemTypeFromName(typename) {
for (var i = 0; i < itemTypes.length; ++i) {
if (itemTypes[i] === typename) return i;
}
return -1;
}
function buildIndex(rawSearchIndex) {
searchIndex = [];
var searchWords = [];
for (var crate in rawSearchIndex) {
if (!rawSearchIndex.hasOwnProperty(crate)) { continue }
var len = rawSearchIndex[crate].length;
var i = 0;
// an array of [(Number) item type,
// (String) name,
// (String) full path or empty string for previous path,
// (String) description,
// (optional Number) the parent path index to `paths`]
var items = rawSearchIndex[crate].items;
// an array of [(Number) item type,
// (String) name]
var paths = rawSearchIndex[crate].paths;
// convert `paths` into an object form
var len = paths.length;
for (var i = 0; i < len; ++i) {
paths[i] = {ty: paths[i][0], name: paths[i][1]};
}
// convert `items` into an object form, and construct word indices.
//
// before any analysis is performed lets gather the search terms to
// search against apart from the rest of the data. This is a quick
// operation that is cached for the life of the page state so that
// all other search operations have access to this cached data for
// faster analysis operations
for (i = 0; i < len; i += 1) {
rawSearchIndex[crate][i].crate = crate;
searchIndex.push(rawSearchIndex[crate][i]);
if (typeof rawSearchIndex[crate][i].name === "string") {
var word = rawSearchIndex[crate][i].name.toLowerCase();
var len = items.length;
var lastPath = "";
for (var i = 0; i < len; i += 1) {
var rawRow = items[i];
var row = {crate: crate, ty: rawRow[0], name: rawRow[1],
path: rawRow[2] || lastPath, desc: rawRow[3],
parent: paths[rawRow[4]]};
searchIndex.push(row);
if (typeof row.name === "string") {
var word = row.name.toLowerCase();
searchWords.push(word);
} else {
searchWords.push("");
}
lastPath = row.path;
}
}
return searchWords;

View File

@ -41,6 +41,7 @@ pub mod fold;
pub mod html {
pub mod highlight;
pub mod escape;
pub mod item_type;
pub mod format;
pub mod layout;
pub mod markdown;