In-between release for linuxtag. This will also the last release supporting java 1.6
Commit | Description |
---|---|
Mon May 05 23:16:01 CEST 2014 by Marc Nause | Improved Blacklist API: *) added JSON support *) fixed Exception in case of missing parameters *) renamed parameter for items in "add entry" and "delete entry" from "entry" to "item" to match term in XML Changed Files: htroot/api/blacklists/add_entry_p.java, htroot/api/blacklists/add_entry_p.json, htroot/api/blacklists/delete_entry_p.java, htroot/api/blacklists/delete_entry_p.json, htroot/api/blacklists/get_list_p.java, htroot/api/blacklists/get_list_p.json, htroot/api/blacklists/get_metadata_p.java, htroot/api/blacklists/get_metadata_p.json |
Wed Apr 30 00:48:38 CEST 2014 by Marc Nause | First draft of a blacklist API. Changed Files: htroot/Blacklist_p.java, htroot/api/blacklists/add_entry_p.java, htroot/api/blacklists/add_entry_p.xml, htroot/api/blacklists/delete_entry_p.java, htroot/api/blacklists/delete_entry_p.xml, htroot/api/blacklists/get_list_p.java, htroot/api/blacklists/get_list_p.xml, htroot/api/blacklists/get_metadata_p.java, htroot/api/blacklists/get_metadata_p.xml, htroot/api/blacklists_p.java, source/net/yacy/repository/BlacklistHelper.java |
Sun Apr 20 01:41:30 CEST 2014 by reger | refactore URIMetadataNode to further unify interaction with index - URIMetadataNode extending SolrDocument - use language as stored (String), reducing conversion to string - optimize debug code in transferIndex Changed Files: htroot/api/yacydoc.java, htroot/yacy/crawlReceipt.java, htroot/yacy/transferURL.java, source/net/yacy/data/ymark/YMarkMetadata.java, source/net/yacy/kelondro/data/meta/URIMetadataNode.java, source/net/yacy/kelondro/data/word/WordReferenceVars.java, source/net/yacy/peers/Protocol.java, source/net/yacy/repository/Blacklist.java, source/net/yacy/search/query/SearchEvent.java, source/net/yacy/search/ranking/ReferenceOrder.java, source/net/yacy/search/schema/CollectionConfiguration.java, source/net/yacy/search/snippet/ResultEntry.java |
Thu Apr 17 13:21:43 CEST 2014 by Michael Peter Christen | added crawl depth for failed documents Changed Files: htroot/Crawler_p.java, htroot/HostBrowser.java, htroot/yacy/crawlReceipt.java, source/net/yacy/crawler/CrawlStacker.java, source/net/yacy/crawler/data/CrawlQueues.java, source/net/yacy/crawler/retrieval/FTPLoader.java, source/net/yacy/crawler/retrieval/HTTPLoader.java, source/net/yacy/repository/LoaderDispatcher.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/index/DocumentIndex.java, source/net/yacy/search/index/ErrorCache.java, source/net/yacy/search/schema/CollectionConfiguration.java, source/net/yacy/search/snippet/MediaSnippet.java |
Wed Apr 16 22:16:20 CEST 2014 by Michael Peter Christen | removed clickdepth_i field and related postprocessing. This information is now available in the crawldepth_i field which is identical to clickdepth_i because of a specific crawler strategy. Changed Files: defaults/solr.collection.schema, defaults/solr.webgraph.schema, defaults/yacy.init, htroot/HostBrowser.java, htroot/RankingSolr_p.java, source/net/yacy/cora/document/id/DigestURL.java, source/net/yacy/cora/federate/solr/ProcessType.java, source/net/yacy/cora/federate/solr/SchemaConfiguration.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/schema/CollectionConfiguration.java, source/net/yacy/search/schema/CollectionSchema.java, source/net/yacy/search/schema/WebgraphConfiguration.java, source/net/yacy/search/schema/WebgraphSchema.java |
Wed Apr 16 21:34:28 CEST 2014 by Michael Peter Christen | - added a new Crawler Balancer: HostBalancer and HostQueues: This organizes all urls to be loaded in separate queues for each host. Each host separates the crawl depth into it's own queue. The primary rule for urls taken from any queue is, that the crawl depth is minimal. This produces a crawl depth which is identical to the clickdepth. Furthermorem the crawl is able to create a much better balancing over all hosts which is fair to all hosts that are in the queue. This process will create a very large number of files for wide crawls in the QUEUES folder: for each host a directory, for each crawl depth a file inside the directory. A crawl with maxdepth = 4 will be able to create 10.000s of files. To be able to use that many file readers, it was necessary to implement a new index data structure which opens the file only if an access is wanted (OnDemandOpenFileIndex). The usage of such on-demand file reader shall prevent that the number of file pointers is over the system limit, which is usually about 10.000 open files. Some parts of YaCy had to be adopted to handle the crawl depth number correctly. The logging and the IndexCreateQueues servlet had to be adopted to show the crawl queues differently, because the host name is attached to the port on the host to differentiate between http, https, and ftp services. Changed Files: defaults/yacy.logging, htroot/ConfigPortal.java, htroot/Crawler_p.java, htroot/IndexCreateQueues_p.html, htroot/IndexCreateQueues_p.java, source/net/yacy/cora/document/id/DigestURL.java, source/net/yacy/crawler/Balancer.java, source/net/yacy/crawler/HostBalancer.java, source/net/yacy/crawler/HostQueue.java, source/net/yacy/crawler/LegacyBalancer.java, source/net/yacy/crawler/data/CrawlQueues.java, source/net/yacy/crawler/data/Latency.java, source/net/yacy/crawler/data/NoticedURL.java, source/net/yacy/crawler/retrieval/Response.java, source/net/yacy/document/Document.java, source/net/yacy/document/TextParser.java, source/net/yacy/document/importer/MediawikiImporter.java, source/net/yacy/document/parser/bzipParser.java, source/net/yacy/document/parser/gzipParser.java, source/net/yacy/document/parser/sevenzipParser.java, source/net/yacy/document/parser/tarParser.java, source/net/yacy/document/parser/zipParser.java, source/net/yacy/kelondro/index/OnDemandOpenFileIndex.java, source/net/yacy/kelondro/table/ChunkIterator.java, source/net/yacy/kelondro/table/Table.java, source/net/yacy/repository/LoaderDispatcher.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/index/DocumentIndex.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/query/QueryModifier.java, source/net/yacy/search/schema/CollectionConfiguration.java |
Thu Apr 10 18:58:03 CEST 2014 by Michael Peter Christen | strong redesign of html parser: object recursion is now made using a stack on html tag objects, not using a recursive parse-again method which may cause bad performance and huge memory allocation. The new method also produced better parsed image objects with exact anchor text references. Changed Files: source/net/yacy/document/parser/html/AbstractScraper.java, source/net/yacy/document/parser/html/AbstractTransformer.java, source/net/yacy/document/parser/html/ContentScraper.java, source/net/yacy/document/parser/html/ContentTransformer.java, source/net/yacy/document/parser/html/Scraper.java, source/net/yacy/document/parser/html/Transformer.java, source/net/yacy/document/parser/html/TransformerWriter.java, source/net/yacy/document/parser/htmlParser.java, source/net/yacy/document/parser/images/genericImageParser.java, source/net/yacy/search/schema/HyperlinkGraph.java, source/net/yacy/search/schema/WebgraphConfiguration.java |
Wed Apr 09 12:45:04 CEST 2014 by Michael Peter Christen | new structure and enhancements for link graph computation: - added order option to solr queries to be able to retrieve document lists in specific order, here: link length - added HyperlinkEdge class which manages the link structure - integrated the HyperlinkEdge class into clickdepth computation - extended the linkstructure.json servlet to show also the clickdepth and other statistic information Changed Files: htroot/HostBrowser.java, htroot/IndexDeletion_p.java, htroot/api/citation.java, htroot/api/linkstructure.java, htroot/api/linkstructure.json, htroot/js/hypertree.js, source/net/yacy/cora/federate/opensearch/OpenSearchConnector.java, source/net/yacy/cora/federate/solr/SchemaConfiguration.java, source/net/yacy/cora/federate/solr/connector/AbstractSolrConnector.java, source/net/yacy/cora/federate/solr/connector/CachedSolrConnector.java, source/net/yacy/cora/federate/solr/connector/ConcurrentUpdateSolrConnector.java, source/net/yacy/cora/federate/solr/connector/EmbeddedSolrConnector.java, source/net/yacy/cora/federate/solr/connector/MirrorSolrConnector.java, source/net/yacy/cora/federate/solr/connector/SolrConnector.java, source/net/yacy/search/index/ErrorCache.java, source/net/yacy/search/index/Fulltext.java, source/net/yacy/search/index/ReindexSolrBusyThread.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/schema/CollectionConfiguration.java, source/net/yacy/search/schema/HyperlinkEdge.java, source/net/yacy/search/schema/HyperlinkGraph.java |
Sun Apr 06 10:45:03 CEST 2014 by Michael Peter Christen | replaced solr 4.6.1 with solr 4.7.1 and added index migration to lucene_47 Changed Files: .classpath, addon/YaCy.app/Contents/Info.plist, build.xml, defaults/solr/solrconfig.xml, lib/lucene-analyzers-common-4.7.1.jar, lib/lucene-analyzers-phonetic-4.7.1.jar, lib/lucene-classification-4.7.1.jar, lib/lucene-codecs-4.7.1.jar, lib/lucene-core-4.7.1.jar, lib/lucene-facet-4.7.1.jar, lib/lucene-grouping-4.7.1.jar, lib/lucene-highlighter-4.7.1.jar, lib/lucene-join-4.7.1.jar, lib/lucene-memory-4.7.1.jar, lib/lucene-misc-4.7.1.jar, lib/lucene-queries-4.7.1.jar, lib/lucene-queryparser-4.7.1.jar, lib/lucene-spatial-4.7.1.jar, lib/lucene-suggest-4.7.1.jar, lib/solr-core-4.7.1.jar, lib/solr-solr-4.7.1.License, lib/solr-solrj-4.7.1.License, lib/solr-solrj-4.7.1.jar, lib/spatial4j-0.4.1.jar, source/net/yacy/search/index/Fulltext.java |
Commit | Description |
---|---|
Wed Apr 30 06:21:53 CEST 2014 by Michael Peter Christen | enhanced HostBrowser buttons and fixed text input alignment Changed Files: htroot/HostBrowser.html, htroot/env/base.css |
Wed Apr 30 05:14:01 CEST 2014 by Michael Peter Christen | fix for strange fail reason Changed Files: htroot/IndexCreateParserErrors_p.java |
Tue Apr 29 19:50:33 CEST 2014 by Michael Peter Christen | fix for slow crawling and better logging in balancer Changed Files: source/net/yacy/crawler/HostBalancer.java, source/net/yacy/crawler/HostQueue.java |
Tue Apr 29 19:24:05 CEST 2014 by Michael Peter Christen | npe fix Changed Files: source/net/yacy/crawler/CrawlSwitchboard.java |
Tue Apr 29 19:13:54 CEST 2014 by Michael Peter Christen | fix to menu colours Changed Files: skins/pdbootstrap.css |
Tue Apr 29 16:24:21 CEST 2014 by Michael Peter Christen | fix for result display Changed Files: htroot/yacysearchtrailer.html |
Tue Apr 29 16:24:01 CEST 2014 by Michael Peter Christen | design fixes to better use the new colours Changed Files: htroot/Network.html, htroot/js/yacyinteractive.js |
Sun Apr 27 20:52:06 CEST 2014 by reger | optimize and fix lat / lon assignment Changed Files: source/net/yacy/kelondro/data/meta/URIMetadataNode.java |
Fri Apr 25 09:26:20 CEST 2014 by orbiter | npe fix Changed Files: source/net/yacy/kelondro/blob/HeapReader.java |
Fri Apr 25 09:23:10 CEST 2014 by orbiter | npe fix Changed Files: source/net/yacy/kelondro/blob/Tables.java |
Wed Apr 23 23:13:07 CEST 2014 by orbiter | fixed a situation where finished crawls had not been detected. Changed Files: source/net/yacy/search/Switchboard.java |
Thu Apr 17 16:58:17 CEST 2014 by Michael Peter Christen | fix for deadlocks in crawler Changed Files: source/net/yacy/crawler/HostBalancer.java, source/net/yacy/crawler/HostQueue.java, source/net/yacy/crawler/data/CrawlQueues.java, source/net/yacy/crawler/data/NoticedURL.java |
Wed Apr 16 22:24:04 CEST 2014 by Michael Peter Christen | fix for display bug Changed Files: htroot/HostBrowser.java |
Fri Apr 11 15:12:34 CEST 2014 by Michael Peter Christen | fix for virtual root nodes Changed Files: source/net/yacy/search/schema/HyperlinkGraph.java |
Fri Apr 11 09:56:44 CEST 2014 by Michael Peter Christen | fix for maximum tag length in parser Changed Files: source/net/yacy/document/parser/html/ContentScraper.java |
Thu Apr 10 09:08:59 CEST 2014 by Michael Peter Christen | fix for wrong status codes of error pages Changed Files: htroot/Crawler_p.java, source/net/yacy/crawler/data/CrawlQueues.java, source/net/yacy/crawler/retrieval/HTTPLoader.java, source/net/yacy/repository/LoaderDispatcher.java, source/net/yacy/search/index/ErrorCache.java |
Commit | Description |
---|---|
Tue May 06 18:54:56 CEST 2014 by Michael Peter Christen | Release 1.72 Changed Files: build.properties |
Tue May 06 16:48:50 CEST 2014 by Michael Peter Christen | enhanced snippets: remove lines which are identical to the title and choose longer versions if possible. Prefer the description part. Changed Files: source/net/yacy/cora/federate/solr/responsewriter/GSAResponseWriter.java, source/net/yacy/cora/federate/solr/responsewriter/OpensearchResponseWriter.java, source/net/yacy/cora/federate/solr/responsewriter/YJsonResponseWriter.java, source/net/yacy/http/servlets/GSAsearchServlet.java, source/net/yacy/http/servlets/SolrSelectServlet.java, source/net/yacy/peers/Protocol.java, source/net/yacy/search/query/SearchEvent.java |
Tue May 06 05:58:51 CEST 2014 by orbiter | fix for navigation steering / p2p mode see also: http://forum.yacy-websuche.de/viewtopic.php?f=5&t=5198&p=29958#p29958 Changed Files: htroot/env/templates/submenuAccessTracker.template, htroot/env/templates/submenuCrawlMonitor.template, htroot/env/templates/submenuIndexControl.template |
Mon May 05 13:24:41 CEST 2014 by sixcooler | o not check for segments-count on optimize: this is also done in Solr and our getSegmentsCount() does not return up-to-date values Changed Files: source/net/yacy/cora/federate/solr/connector/SolrServerConnector.java |
Sun May 04 09:29:07 CEST 2014 by reger | content of surrogates/out never accessed (remove) After import the conent is never accessed but may take up a lot of disk space, also the getLoadedOAIServer (which lists the files in surrogate out) is not used. Making the surrogate.out obsolete. Removed keeping of xmls after import. Changed Files: source/net/yacy/document/importer/OAIPMHImporter.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/SwitchboardConstants.java |
Sat May 03 21:57:06 CEST 2014 by reger | Merge origin/master Changed Files: source/net/yacy/kelondro/table/Table.java |
Sat May 03 21:55:10 CEST 2014 by reger | fix input-group layout on index.html see bug http://mantis.tokeek.de/view.php?id=391 Changed Files: htroot/index.html |
Fri May 02 22:55:47 CEST 2014 by sixcooler | remove tables from tabletracker on close to avoid lots of dead entrys in /PerformanceMemory_p.html Changed Files: source/net/yacy/kelondro/table/Table.java |
Fri May 02 19:32:09 CEST 2014 by reger | fix NPE on continuing crawls after YaCy restart (Agent is then nulll) Changed Files: source/net/yacy/crawler/HostBalancer.java |
Fri May 02 14:18:52 CEST 2014 by Marc Nause | Key for parameter "blacklist name" is "list" in all servlets now. Changed Files: htroot/api/blacklists/add_entry_p.java, htroot/api/blacklists/delete_entry_p.java, htroot/api/blacklists/get_list_p.java |
Fri May 02 01:15:03 CEST 2014 by reger | adjust search page layout - search box to current style Changed Files: htroot/ConfigSearchPage_p.html |
Fri May 02 00:35:54 CEST 2014 by reger | remove obsolet css class bookmarkfieldset Changed Files: htroot/Bookmarks.html |
Wed Apr 30 13:26:32 CEST 2014 by Michael Peter Christen | added configuration option for maxmimum load and minimum ram for postprocessing Changed Files: defaults/yacy.init, source/net/yacy/search/Switchboard.java |
Wed Apr 30 06:46:06 CEST 2014 by Michael Peter Christen | input-group for main search input window Changed Files: htroot/index.html |
Wed Apr 30 05:05:02 CEST 2014 by Michael Peter Christen | use submitted default userAgent if cloning a crawl Changed Files: htroot/CrawlStartExpert.html, htroot/CrawlStartExpert.java |
Tue Apr 29 22:51:01 CEST 2014 by reger | add display filter (active/disabled) to IndexSchema_p.html config for easier overview of schema fields Changed Files: htroot/IndexSchema_p.html, htroot/IndexSchema_p.java |
Tue Apr 29 18:46:50 CEST 2014 by Michael Peter Christen | small changes to search headline colour Changed Files: defaults/yacy.init, skins/pdbootstrap.css |
Tue Apr 29 16:23:42 CEST 2014 by Michael Peter Christen | new default skin pdbootstrap which keeps the design shapes but slightly changes the colours to match with bootstrap colours Changed Files: defaults/yacy.init, skins/pdbootstrap.css |
Tue Apr 29 16:22:31 CEST 2014 by Michael Peter Christen | better buttons Changed Files: htroot/CrawlResults.html, htroot/Crawler_p.html, htroot/Table_API_p.html |
Tue Apr 29 00:41:29 CEST 2014 by reger | add html5 audio/video <source> tag to html content scraper - <source src=.. type=..> tag content is added to embed collection Changed Files: source/net/yacy/document/parser/html/ContentScraper.java |
Mon Apr 28 11:52:13 CEST 2014 by Michael Peter Christen | bootstrap update Changed Files: htroot/env/bootstrap/css/bootstrap-rtl.css, htroot/env/bootstrap/css/bootstrap-rtl.min.css, htroot/env/bootstrap/css/bootstrap.css, htroot/env/bootstrap/css/bootstrap.css.map, htroot/env/bootstrap/css/bootstrap.min.css, htroot/env/bootstrap/js/bootstrap.js, htroot/env/bootstrap/js/bootstrap.min.js |
Mon Apr 28 04:59:47 CEST 2014 by reger | fix contentscraper img height/width parsing prevent numberformat exception on common "100px" property - include in test case Changed Files: source/net/yacy/document/parser/html/ContentScraper.java, test/net/yacy/cora/document/id/DigestURLTest.java, test/net/yacy/document/parser/htmlParserTest.java |
Sun Apr 27 23:54:34 CEST 2014 by malykhin.dmitry | Update russian translation Changed Files: locales/ru.lng |
Sun Apr 27 22:22:00 CEST 2014 by reger | remove redundant javascript & id in index.html to set focus to query field in IE11 Changed Files: htroot/index.html |
Sun Apr 27 18:20:33 CEST 2014 by reger | reimplement tighter lat/lon calc in URIMetadataNode from old MetadataRow, considering http://mantis.tokeek.de/view.php?id=272 Changed Files: source/net/yacy/kelondro/data/meta/URIMetadataNode.java |
Sat Apr 26 22:27:59 CEST 2014 by reger | add exit proxy link to UrlProxy on proxied pages a link to exit proxy is added to top of page. Link text can be configured in web.xml init-parameter (see default/web.xml). If missing no link is displayed. Changed Files: defaults/web.xml, source/net/yacy/http/servlets/UrlProxyServlet.java |
Sat Apr 26 01:30:51 CEST 2014 by reger | throw MalformedURLException on unknown protocol on other than the supported http https ftp file smb \\ mailto Changed Files: source/net/yacy/cora/document/id/MultiProtocolURL.java |
Fri Apr 25 20:15:55 CEST 2014 by reger | fix: resolve url without path but searchpart e.g. http://yacy.net?q=test was resolved as host "yacy.net?q=test" now host="yacy.net" path="/" fixes http://mantis.tokeek.de/view.php?id=47 added test case for getHost Changed Files: source/net/yacy/cora/document/id/MultiProtocolURL.java, test/net/yacy/cora/document/id/MultiProtocolURLTest.java |
Fri Apr 25 01:05:28 CEST 2014 by reger | recover sax fatal error on OAI-PMH import of xml with entity error this allows to continue loading next resumptionToken even if import file caused sax parser error fix http://mantis.tokeek.de/view.php?id=63 Changed Files: htroot/IndexImportOAIPMHList_p.java, source/net/yacy/document/importer/OAIListFriendsLoader.java, source/net/yacy/document/importer/OAIPMHImporter.java, source/net/yacy/document/importer/OAIPMHLoader.java, source/net/yacy/document/importer/ResumptionToken.java |
Wed Apr 23 23:41:10 CEST 2014 by reger | add current css to HTMLResponseWriter to fix metadata view (using css from metas.template except js links) Changed Files: htroot/env/templates/metas.template, source/net/yacy/cora/federate/solr/responsewriter/HTMLResponseWriter.java |
Wed Apr 23 23:12:08 CEST 2014 by orbiter | better removal of stored urls when doing a crawl start Changed Files: htroot/Crawler_p.java |
Wed Apr 23 23:11:37 CEST 2014 by orbiter | enhanced Host Balancer strategy: fair round robin Changed Files: source/net/yacy/crawler/HostBalancer.java |
Wed Apr 23 08:41:36 CEST 2014 by orbiter | do not apply lazy value instantiation for numeric or boolean values because that is misleading and confusing in case of 0- or false-values and may cause NPEs in retrieval functions. Changed Files: source/net/yacy/cora/federate/solr/SchemaConfiguration.java |
Wed Apr 23 08:37:19 CEST 2014 by orbiter | in case of short memory, do not cut down robinson peers to 1, just reduce by 50% Changed Files: source/net/yacy/peers/RemoteSearch.java |
Wed Apr 23 00:55:16 CEST 2014 by reger | exclude html tags in in/outboundlinks_anchortext_txt parsed text - some outboundlinks_anchortext_txt in index contain e.g. <span>text</span> or more tags, remove all tags for text property (inline img tags are still parsed) - added test case for above (to htmlParserTest) - fix solr test case Changed Files: source/net/yacy/document/parser/html/ContentScraper.java, test/net/yacy/cora/federate/solr/connector/EmbeddedSolrConnectorTest.java, test/net/yacy/document/parser/htmlParserTest.java |
Tue Apr 22 23:14:54 CEST 2014 by orbiter | added new button to terminate all crawls Changed Files: htroot/Crawler_p.html, htroot/Crawler_p.java |
Tue Apr 22 23:14:05 CEST 2014 by orbiter | catch IllegalArgumentException for wrong process types (that is needed for migrations when new process types are introduced or disappear) Changed Files: source/net/yacy/search/schema/CollectionConfiguration.java |
Tue Apr 22 19:48:49 CEST 2014 by orbiter | fix for NPE in IndexCreateParserErrors_p.html caused by bad handling of lazy value instantiation of 0-value in crawldepth_i Changed Files: htroot/HostBrowser.java, source/net/yacy/search/schema/CollectionConfiguration.java |
Tue Apr 22 19:35:15 CEST 2014 by orbiter | removed warnings Changed Files: source/net/yacy/kelondro/data/meta/URIMetadataNode.java |
Mon Apr 21 17:28:21 CEST 2014 by reger | add custom Jetty errorhandler to provide custom error page footer line - remove redundant mime check in UrlProxyServlet Changed Files: source/net/yacy/http/Jetty8HttpServerImpl.java, source/net/yacy/http/YaCyErrorHandler.java, source/net/yacy/http/servlets/UrlProxyServlet.java |
Mon Apr 21 17:16:06 CEST 2014 by reger | defer creation of new ArrayList after possible early return (to skip not used object allocation) Changed Files: source/net/yacy/peers/Protocol.java |
Fri Apr 18 22:03:16 CEST 2014 by reger | - remove empty http0_9 status text array and unused default_charset = ISO-8859-1 Changed Files: source/net/yacy/cora/protocol/HeaderFramework.java, source/net/yacy/cora/protocol/ResponseHeader.java, source/net/yacy/server/http/HTTPDemon.java |
Fri Apr 18 19:57:35 CEST 2014 by reger | - remove unused manual http KeepAlive config (reducing references to obsolete httpdemon) - add port info to settings_http Changed Files: defaults/yacy.init, htroot/SettingsAck_p.html, htroot/SettingsAck_p.java, htroot/Settings_Http.inc, htroot/Settings_p.java, source/net/yacy/server/http/HTTPDemon.java |
Fri Apr 18 06:51:46 CEST 2014 by Michael Peter Christen | add canonical links to the same crawldepth, not the next crawldepth Changed Files: source/net/yacy/document/Document.java, source/net/yacy/search/Switchboard.java |
Fri Apr 18 06:51:10 CEST 2014 by Michael Peter Christen | increased runtime for postprocessing query job Changed Files: source/net/yacy/search/schema/CollectionConfiguration.java |
Fri Apr 18 06:50:07 CEST 2014 by Michael Peter Christen | special strategy for balancer: do not remove targets with zero wait time from the queue Changed Files: source/net/yacy/crawler/HostBalancer.java, source/net/yacy/crawler/HostQueue.java, source/net/yacy/crawler/LegacyBalancer.java |
Thu Apr 17 16:19:38 CEST 2014 by Michael Peter Christen | increased resource.disk.used.max.steadystate and resource.disk.used.max.overshot by 4 times because first users reached that limit and wondered why the crawler was paused automatically :) The crawler will now stop at 2TB disk usage :) Changed Files: defaults/yacy.init |
Thu Apr 17 12:54:18 CEST 2014 by Michael Peter Christen | - better subgraph handling, less overhead for crawls without the webgraph - usage of crawler crawldepth cache for the linkgraph target depth computation Changed Files: source/net/yacy/search/schema/CollectionConfiguration.java, source/net/yacy/search/schema/WebgraphConfiguration.java |
Thu Apr 17 12:52:54 CEST 2014 by Michael Peter Christen | new Strategies in Balancer: - doublecheck cache now records the crawl depth as well - doublecheck cache is available from the outside (made static) - no more need to crawl hosts with lowest depth first, instead all hosts which have only singleton entries are preferred to reduce the number of files. Changed Files: source/net/yacy/crawler/HostBalancer.java, source/net/yacy/crawler/HostQueue.java, source/net/yacy/crawler/robots/RobotsTxt.java |
Thu Apr 17 12:44:05 CEST 2014 by Michael Peter Christen | fix for Table in case that requested file does not exist and paths also do not exist Changed Files: source/net/yacy/kelondro/table/Table.java |
Thu Apr 17 03:20:29 CEST 2014 by reger | implement gzip input handling directly in defaultservlet (making reference to legacy httpdemon obsolete) Changed Files: source/net/yacy/http/servlets/YaCyDefaultServlet.java, source/net/yacy/server/http/HTTPDemon.java |
Mon Apr 14 13:32:35 CEST 2014 by Michael Peter Christen | refactoring of the crawl balancer: the balancer is turned into an interface and the old balancer class is moved into LegacyBalancer to make room for a fresh implementation of a crawl balancer. Changed Files: source/net/yacy/crawler/Balancer.java, source/net/yacy/crawler/LegacyBalancer.java, source/net/yacy/crawler/data/NoticedURL.java, source/net/yacy/search/Switchboard.java |
Sun Apr 13 07:32:32 CEST 2014 by reger | autoupdate fails to download latest release (1.71) due to default release blacklist - removed the default version blacklist regex from init (for future versions) !!! left existing update blacklist setting untouched !!! (existing installation wanting autoupdate for 1.71 need to change blacklist in ConfigUpdate_p.html) - moved old blacklist patch to migration.java Changed Files: defaults/yacy.init, source/net/yacy/migration.java, source/net/yacy/peers/operation/yacyRelease.java |
Fri Apr 11 12:27:21 CEST 2014 by Michael Peter Christen | find depth-matches also for edge targets Changed Files: source/net/yacy/search/schema/HyperlinkEdges.java |
Fri Apr 11 12:09:33 CEST 2014 by Michael Peter Christen | introduction of a data structure for HyperlinkEdges which should use less memory as it does no double-storage of source links for each edge of the graph. Changed Files: htroot/api/linkstructure.java, source/net/yacy/search/schema/HyperlinkEdge.java, source/net/yacy/search/schema/HyperlinkEdges.java, source/net/yacy/search/schema/HyperlinkGraph.java |
Fri Apr 11 10:58:37 CEST 2014 by Michael Peter Christen | using MultiProtocolURL for edge data which is faster (hash computation is now much easier) and smaller in size Changed Files: source/net/yacy/search/schema/HyperlinkEdge.java, source/net/yacy/search/schema/HyperlinkGraph.java |
Fri Apr 11 10:23:48 CEST 2014 by Michael Peter Christen | enhanced hashcode computation for MultiProtocolURL Changed Files: source/net/yacy/cora/document/id/MultiProtocolURL.java |
Fri Apr 11 09:25:18 CEST 2014 by Michael Peter Christen | refactoring of SystemLoad calls (only one backend tool) Changed Files: source/net/yacy/kelondro/util/MemoryControl.java, source/net/yacy/kelondro/workflow/AbstractBusyThread.java |
Thu Apr 10 23:46:35 CEST 2014 by Michael Peter Christen | refactoring Changed Files: htroot/api/linkstructure.java, source/net/yacy/search/schema/HyperlinkEdge.java, source/net/yacy/search/schema/HyperlinkGraph.java, source/net/yacy/search/schema/HyperlinkType.java |
Wed Apr 09 21:59:54 CEST 2014 by Michael Peter Christen | also delete the robots.txt file from the cache when a new crawl is started Changed Files: htroot/Crawler_p.java, source/net/yacy/crawler/robots/RobotsTxt.java |
Wed Apr 09 18:33:48 CEST 2014 by Michael Peter Christen | fix for robots.txt handling: delete old entry before starting a new crawl. Changed Files: htroot/Crawler_p.java, source/net/yacy/crawler/robots/RobotsTxt.java, source/net/yacy/search/schema/CollectionConfiguration.java |
Wed Apr 09 17:52:51 CEST 2014 by orbiter | linkstructure refactoring to get more options for clickdepth analysis Changed Files: htroot/api/linkstructure.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/schema/HyperlinkGraph.java |
Sun Apr 06 22:31:22 CEST 2014 by reger | fix: typo in default charset in metadata2solr update pom and NB build to Solr 4.7.1 libs Changed Files: nbproject/project.xml, pom.xml, source/net/yacy/search/schema/CollectionConfiguration.java |
Sun Apr 06 11:04:23 CEST 2014 by Michael Peter Christen | do solr optimization independently from memory and load constraints: - not doing an optimization will likely cause a too many files exception - without optimization performance will be even worse which would prevent optimization in the future as well (prevent a deadlock situation) Changed Files: source/net/yacy/search/Switchboard.java |
Sun Apr 06 03:59:11 CEST 2014 by reger | update commons-compress.jar to 1.8 Changed Files: .classpath, addon/YaCy.app/Contents/Info.plist, build.xml, lib/commons-compress-1.8.License, lib/commons-compress-1.8.jar, nbproject/project.xml, pom.xml |
Sun Apr 06 01:20:03 CEST 2014 by Michael Peter Christen | different algorithm to test checkalive as it depends less on the existence of wget (or curl) on the OS. Changed Files: bin/checkalive.sh |
Sun Apr 06 01:00:09 CEST 2014 by Michael Peter Christen | Emergency bugfix for killYACY.sh as the file yacy00.log does not exist in case that a too many open files error exist. In such a case, the file yacy00.log does not exist but only the file yacy00.log.lck. In the long term a different solution should be addressed. Changed Files: killYACY.sh |
Sun Apr 06 00:35:35 CEST 2014 by Michael Peter Christen | test using compound file format, see UseCompoundFile in https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig This appears to be necessary as many times a java.io.FileNotFoundException: (Too many open files) appears. See also: https://issues.apache.org/jira/browse/SOLR-4 and desperate users at http://stackoverflow.com/questions/3828343/too-many-open-file-exception-while-indexin-using-solr We cannot force users to do a "ulimit -n 1000000", so this action seems to be required. Changed Files: defaults/solr/solrconfig.xml |
Sun Apr 06 00:32:10 CEST 2014 by Michael Peter Christen | next development version 1.71 It's nowhere explained or declared, but since some time we follow the schema that uneven version numbers are used for development versions and even numbers for release versions. That concept may change sometime but this is used at this time to distinguish development from main. Changed Files: build.properties |
Sun Apr 06 00:20:12 CEST 2014 by reger | upd version in pom Changed Files: pom.xml |