<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"><channel><title>Matasano Chargen - Latest Comments in Aguri: Coolest Data Structure You&amp;#8217;ve Never Heard Of</title><link>http://matasanochargen.disqus.com/</link><description></description><language>en</language><lastBuildDate>Tue, 30 Jun 2009 03:32:38 -0000</lastBuildDate><item><title>Re: Aguri: Coolest Data Structure You&amp;#8217;ve Never Heard Of</title><link>http://www.matasano.com/log/1009/aguri-coolest-data-structure-youve-never-heard-of/#comment-11927320</link><description>.I always use the Internet For Hosting the Website In the server.Sometimes My Connection Gets Slow At that Time I used the site &lt;a href="http://www.ip-details.com/" rel="nofollow"&gt;IP-Address&lt;/a&gt; For the Speed Checking</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">venkat2009</dc:creator><pubDate>Tue, 30 Jun 2009 03:32:38 -0000</pubDate></item><item><title>Re: Aguri: Coolest Data Structure You&amp;#8217;ve Never Heard Of</title><link>http://www.matasano.com/log/1009/aguri-coolest-data-structure-youve-never-heard-of/#comment-2323543</link><description>FYI: aguri in greek means cucumber.. :oS</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">none</dc:creator><pubDate>Fri, 02 May 2008 13:33:00 -0000</pubDate></item><item><title>Re: Aguri: Coolest Data Structure You&amp;#8217;ve Never Heard Of</title><link>http://www.matasano.com/log/1009/aguri-coolest-data-structure-youve-never-heard-of/#comment-2323542</link><description>I could burn 8 more blog posts talking about the different theories behind packet filtering --- pcap takes a pretty idiosyncratic response, which is to frame the problem compiler theoretically (though the performance of pcap is dominated by the IO channel you use to get packets, by lines of code, pcap performance is overwhelmingly addressed by IR optimizers.&lt;br&gt;&lt;br&gt;I say that because tries are a classical approach to speeding up packet filtering. You can consider routing a special case of filtering, of course, in a single dimension. As you add dimensions, you start cross-producting multiple tries.&lt;br&gt;&lt;br&gt;And, of course, radix tries (and edge-labelled PATRICIA tries in particular) are just a specialization of DFAs, which brings us back to pcap and optimizers and...</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Thomas Ptacek</dc:creator><pubDate>Wed, 30 Jan 2008 18:30:57 -0000</pubDate></item><item><title>Re: Aguri: Coolest Data Structure You&amp;#8217;ve Never Heard Of</title><link>http://www.matasano.com/log/1009/aguri-coolest-data-structure-youve-never-heard-of/#comment-2323594</link><description>Good summary on trees and tries!&lt;br&gt;I wonder if radix tries could be used also to improve the performance on keyword searches in PCAP files or sniffed traffic (like an Echelon functionality). This type of searches can be performed with NetworkMiner, see:&lt;br&gt;&lt;a href="http://networkminer.wiki.sourceforge.net/Keyword+Search" rel="nofollow"&gt;http://networkminer.wiki.sourceforge.net/Keywor...&lt;/a&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Erik</dc:creator><pubDate>Tue, 29 Jan 2008 08:09:19 -0000</pubDate></item><item><title>Re: Aguri: Coolest Data Structure You&amp;#8217;ve Never Heard Of</title><link>http://www.matasano.com/log/1009/aguri-coolest-data-structure-youve-never-heard-of/#comment-2323593</link><description>Speaking of regex and DFAs, I saw an interesting presentation from a couple of Google researchers at IT Defense this week. They have found a bunch of implementation problems in various regex engines:&lt;br&gt;&lt;a href="http://www.it-defense.de/itdefense2008_com/pages/presentations.html" rel="nofollow"&gt;http://www.it-defense.de/itdefense2008_com/page...&lt;/a&gt;&lt;br&gt;(Scroll down to "Regular Exceptions".)&lt;br&gt;&lt;br&gt;No slides online or anything yet, I don't think. If you're interested, I'll come back when they are.</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Ryan Russell</dc:creator><pubDate>Sat, 26 Jan 2008 23:05:09 -0000</pubDate></item><item><title>Re: Aguri: Coolest Data Structure You&amp;#8217;ve Never Heard Of</title><link>http://www.matasano.com/log/1009/aguri-coolest-data-structure-youve-never-heard-of/#comment-2323541</link><description>/me sighs&lt;br&gt;&lt;br&gt;What if you have to write the in_array() function, jackass?</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">sigsegv</dc:creator><pubDate>Fri, 25 Jan 2008 09:30:20 -0000</pubDate></item><item><title>Re: Aguri: Coolest Data Structure You&amp;#8217;ve Never Heard Of</title><link>http://www.matasano.com/log/1009/aguri-coolest-data-structure-youve-never-heard-of/#comment-2323592</link><description>"I hand you an array of random integers. How can you tell if the number three is in it?"&lt;br&gt;&lt;br&gt;Maybe you should read the manual.&lt;br&gt;&lt;br&gt;if (in_array(3)) {&lt;br&gt;    echo "3 is in the array";&lt;br&gt;} else {&lt;br&gt;   echo "3 is not in the array";&lt;br&gt;}&lt;br&gt;&lt;br&gt;Problem solved. I didn't even have to read the rest of the post about aguri. That is a kind of cactus, right?</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Charles</dc:creator><pubDate>Thu, 24 Jan 2008 14:35:02 -0000</pubDate></item><item><title>Re: Aguri: Coolest Data Structure You&amp;#8217;ve Never Heard Of</title><link>http://www.matasano.com/log/1009/aguri-coolest-data-structure-youve-never-heard-of/#comment-2323591</link><description>The FPP, Fast Pattern Processor, portion of the Agere Network Processor is Patrica Tree based. The same practice of using Patricia Trees can apply to generic TCP/IP packet processes and "psuedo" regular expression searching. You can leverage them in really neat ways.&lt;br&gt;&lt;br&gt;I first used Patricia Trees in creating silicon for routers [NP based Cisco routers] and later on IPS systems. The downfall is the changing of rules. Meaning, reorganizing the tree(s) is an art form [in a sense] and I had to fall onto a genetic algorithm to load the tree based on user settings when it came to 220 bit and beyond Patricia Trees. This seemed to solve the rule switching issue. To show you how bad the rule switching was on a 7200 NP based system (Cisco) it would take up to 120 seconds to reload the BPG table (back when that was hip]. Later in 2001 we all switched to creating devices with dual banked memory for Patricia Tree silicon. &lt;br&gt;&lt;br&gt;Great topic by the way - would love to see more articles like this.</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Dennis Cox</dc:creator><pubDate>Wed, 23 Jan 2008 23:12:46 -0000</pubDate></item><item><title>Re: Aguri: Coolest Data Structure You&amp;#8217;ve Never Heard Of</title><link>http://www.matasano.com/log/1009/aguri-coolest-data-structure-youve-never-heard-of/#comment-2323590</link><description>Good article!&lt;br&gt;&lt;br&gt;Correct me if I'm wrong though (it's been a while since my algorithms study), but in response to the original question:&lt;br&gt;&lt;br&gt;"I hand you an array of random integers. How can you tell if the number three is in it?"&lt;br&gt;&lt;br&gt;Yes, linear search is bad, but isn't it the best way to answer the original question? If you're sorting the array, you're introducing additional complexity that still has to touch each element O(n). From what I remember, there can't be any sorting algorithm better than O(n).&lt;br&gt;&lt;br&gt;If we sort the given array we've already spent at least O(n) but we now have to search the array (providing we didn't search while we were sorting). This assumes we're only looking for one element.&lt;br&gt;&lt;br&gt;Now, if we can choose to get a sorted array of random integers, that all of course changes.&lt;br&gt;&lt;br&gt;Anyway, that's all beside the point of Alguri. Never heard of it but I'll definitely check it out!</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">RPS</dc:creator><pubDate>Wed, 23 Jan 2008 19:00:57 -0000</pubDate></item><item><title>Re: Aguri: Coolest Data Structure You&amp;#8217;ve Never Heard Of</title><link>http://www.matasano.com/log/1009/aguri-coolest-data-structure-youve-never-heard-of/#comment-2323589</link><description>Hi Thomas,&lt;br&gt;&lt;br&gt;nice to see you writing again. This data structure you're describing sounds awfully similar to &lt;a href="http://wikipedia.org/wiki/Binary_decision_diagram" rel="nofollow"&gt;Binary Decision Diagrams&lt;/a&gt;. Basically if I did a binary expansion of that 32-bit IPv4 address and then gave each of those bits a variable name, I could do exactly the same thing you've described with a reduced, ordered BDD. The merging operation you have described is the reduction operation that transforms a binary decision tree into a decision diagram.&lt;br&gt;&lt;br&gt;Cheers,&lt;br&gt;Ralf</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Ralf</dc:creator><pubDate>Wed, 23 Jan 2008 18:30:03 -0000</pubDate></item><item><title>Re: Aguri: Coolest Data Structure You&amp;#8217;ve Never Heard Of</title><link>http://www.matasano.com/log/1009/aguri-coolest-data-structure-youve-never-heard-of/#comment-2323588</link><description>Don't tell him I told you about it; he may regret leaving that out there. =)&lt;br&gt;&lt;br&gt;I've used code derived from Kneel's PATRICIA tree in production, so I'm pretty comfortable with it.</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Thomas Ptacek</dc:creator><pubDate>Wed, 23 Jan 2008 13:12:42 -0000</pubDate></item><item><title>Re: Aguri: Coolest Data Structure You&amp;#8217;ve Never Heard Of</title><link>http://www.matasano.com/log/1009/aguri-coolest-data-structure-youve-never-heard-of/#comment-2323587</link><description>"I first heard the complaint from Danny Dulai when we and Kneel Fachan wrote the Patricia library for libishiboo, which you may still be able to find online."&lt;br&gt;&lt;br&gt;Ah cool - I found the libishiboo library at his (Dulai's) site. Looks like the code actually implements removal of nodes from the patricia trie too which is neat since I've never been able to find any good explanation of how to go about doing this (sedgewick leaves it as an exercise in his book).&lt;br&gt;&lt;br&gt;Todd H</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Todd Hayton</dc:creator><pubDate>Wed, 23 Jan 2008 13:09:17 -0000</pubDate></item><item><title>Re: Aguri: Coolest Data Structure You&amp;#8217;ve Never Heard Of</title><link>http://www.matasano.com/log/1009/aguri-coolest-data-structure-youve-never-heard-of/#comment-2323586</link><description>"none of you better have a BSCS or higher"&lt;br&gt;&lt;br&gt;Never fear, dude :^)</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Chris</dc:creator><pubDate>Wed, 23 Jan 2008 10:22:33 -0000</pubDate></item><item><title>Re: Aguri: Coolest Data Structure You&amp;#8217;ve Never Heard Of</title><link>http://www.matasano.com/log/1009/aguri-coolest-data-structure-youve-never-heard-of/#comment-2323585</link><description>To be honest, without digging out my copy of Sedgewick, I can't actually back that statement up. In my defense, I first heard the complaint from Danny Dulai when we and Kneel Fachan wrote the Patricia library for libishiboo, which you may still be able to find online. &lt;br&gt;&lt;br&gt;You can check the errata on Sedgewick's site, but he only has errata for the third edition on; third edition is from, I think, 2000? and my apocryphal claim about his error would be from around 1995. So, long story short, if you crib your trie code from the current Sedgewick book, you might be aces.&lt;br&gt;&lt;br&gt;The graph algorithms volume to new Sedgewick is also excellent.</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Thomas Ptacek</dc:creator><pubDate>Wed, 23 Jan 2008 09:52:16 -0000</pubDate></item><item><title>Re: Aguri: Coolest Data Structure You&amp;#8217;ve Never Heard Of</title><link>http://www.matasano.com/log/1009/aguri-coolest-data-structure-youve-never-heard-of/#comment-2323584</link><description>Hey there, interesting post. I was wondering - I've seen two references now stating that Sedgewick got the implementation for patricia tries wrong (the other reference being &lt;a href="http://cr.yp.to/critbit.html" rel="nofollow"&gt;http://cr.yp.to/critbit.html&lt;/a&gt;) - however I've never seen an explanation on just what exactly was wrong with his implementation. &lt;br&gt;&lt;br&gt;Todd H</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Todd Hayton</dc:creator><pubDate>Wed, 23 Jan 2008 08:51:44 -0000</pubDate></item><item><title>Re: Aguri: Coolest Data Structure You&amp;#8217;ve Never Heard Of</title><link>http://www.matasano.com/log/1009/aguri-coolest-data-structure-youve-never-heard-of/#comment-2323583</link><description>i have no idea how the 4 in the binary tree can be where it seems to be; maybe this has already been pointed out or sumthin but 4 comes from 6 which comes from 5 so 4 is greater than 5? maybe i just didnt get it but 4 shud be a left-wing descendant of 5 instead of being a left wing descendant of 5's right-wing descendant. at least thats how i wudve done it.</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">desentizised</dc:creator><pubDate>Wed, 23 Jan 2008 04:48:51 -0000</pubDate></item><item><title>Re: Aguri: Coolest Data Structure You&amp;#8217;ve Never Heard Of</title><link>http://www.matasano.com/log/1009/aguri-coolest-data-structure-youve-never-heard-of/#comment-2323582</link><description>For a disgusting example of what I'm talking about that Dug Song just mentioned, consider Judy arrays.</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Thomas Ptacek</dc:creator><pubDate>Wed, 23 Jan 2008 01:17:17 -0000</pubDate></item><item><title>Re: Aguri: Coolest Data Structure You&amp;#8217;ve Never Heard Of</title><link>http://www.matasano.com/log/1009/aguri-coolest-data-structure-youve-never-heard-of/#comment-2323581</link><description>I'm not talking about the storage hierarchy. We agree, external search, different problem from in-memory search. &lt;br&gt;&lt;br&gt;I'm specifically talking about tree structure layouts that minimize cache misses and overhead for links. &lt;br&gt;&lt;br&gt;You don't even need to store links directly; you can use minimum-bit or succinct encodings. If you're concerned about optimizing a specific memory access pattern --- for instance, increasing the chance that all the traversal fetches are going to be in loaded cache lines --- you can tailor the memory layout to that as well. &lt;br&gt;&lt;br&gt;It's true that "struct treenode { void *key; void *data; struct treenode *left; struct treenode *right; };" is not a particularly memory-efficient or cache-efficient layout. I'm just strongly objecting to tarring the whole concept of a tree with that naive implementation. Realtime network production code running at n mpps wouldn't use "struct treenode", or a hash table.</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Thomas Ptacek</dc:creator><pubDate>Wed, 23 Jan 2008 01:13:33 -0000</pubDate></item><item><title>Re: Aguri: Coolest Data Structure You&amp;#8217;ve Never Heard Of</title><link>http://www.matasano.com/log/1009/aguri-coolest-data-structure-youve-never-heard-of/#comment-2323580</link><description>Obviously were both just miscommunicating here, so please bear with me a little longer.&lt;br&gt;&lt;br&gt;My initial statement was pretty simple. You said "Binary trees are a better default table implementation than hash tables." and I said it depends. Take the below with that context, okay?&lt;br&gt;&lt;br&gt;I don't understand what you mean by "insistent on defining 'tree structures' as 'structs with two child pointers' and 'memory' as 'what malloc returns'." Nothing I said is dependant upon a specific representation. Quite the contrary, I gave the serialied-to-disk example or a ternary tree to counter that possible issue. And as for memory being what malloc returns, all I can say is "huh?". You seem to think that RAM has different characteristics whether it's heap or stack? I'm horribly, horribly confused by what you are trying to get across here.&lt;br&gt;&lt;br&gt;So...?&lt;br&gt;&lt;br&gt;From my point of view a tree is something that requires a set of traversals. I would define a single "traversal step" as a load from memory and a branch decision (based on a comparison). Whether memory comes from heap or stack or is even purely on disk is immaterial to my argument. Whether it's a b-tree, radix tree, binary tree, or some hybrid is also immaterial.&lt;br&gt;&lt;br&gt;So, I guess I just flat out don't understand your point. I'm saying in certain circumstances, a hash is a far better data stucture than a tree, whatever kind of tree that may be. Are you disagreeing with that? I just can't fathom how. Are you saying that a tree of some sort is *always* better than a hash? If so, why do you think ComSci profs teach it? Merely to confuse students?&lt;br&gt;&lt;br&gt;I mean, in none of my comments have I been strident about this. I'm just trying to say that in your otherwise nice writeup, you gloss over a point that needs to be addressed more carefully for those who are not as versed in data structures as others. And by reading the comments, there are many who just aren't as well trained as we may hope. Don't mislead them. That's all I'm saying. Okay?</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Ray Lee</dc:creator><pubDate>Wed, 23 Jan 2008 00:39:01 -0000</pubDate></item><item><title>Re: Aguri: Coolest Data Structure You&amp;#8217;ve Never Heard Of</title><link>http://www.matasano.com/log/1009/aguri-coolest-data-structure-youve-never-heard-of/#comment-2323579</link><description>Judy is disgusting.</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Thomas Ptacek</dc:creator><pubDate>Wed, 23 Jan 2008 00:21:47 -0000</pubDate></item><item><title>Re: Aguri: Coolest Data Structure You&amp;#8217;ve Never Heard Of</title><link>http://www.matasano.com/log/1009/aguri-coolest-data-structure-youve-never-heard-of/#comment-2323578</link><description>hey tom. going to explain judy trees next? ;-)</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">dugsong</dc:creator><pubDate>Tue, 22 Jan 2008 23:05:22 -0000</pubDate></item><item><title>Re: Aguri: Coolest Data Structure You&amp;#8217;ve Never Heard Of</title><link>http://www.matasano.com/log/1009/aguri-coolest-data-structure-youve-never-heard-of/#comment-2323577</link><description>Ray, all you're pointing out is that loading whole 64 bit pointers out of memory and following them to some random other region of memory causes cache misses. You're seem insistent on defining "tree structures" as "structs with two child pointers", and "memory" as "what malloc returns".&lt;br&gt;&lt;br&gt;You're obviously a smart guy. Why am I having such a hard time getting you past that idea?</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Thomas Ptacek</dc:creator><pubDate>Tue, 22 Jan 2008 19:22:01 -0000</pubDate></item><item><title>Re: Aguri: Coolest Data Structure You&amp;#8217;ve Never Heard Of</title><link>http://www.matasano.com/log/1009/aguri-coolest-data-structure-youve-never-heard-of/#comment-2323540</link><description>&lt;i&gt;You’re really comparing external search to in-memory search? Come on, Ray.&lt;/i&gt;&lt;br&gt;&lt;br&gt;Wow. I must be &lt;b&gt;really&lt;/b&gt; miscommunicating here if that's all you took away from that. It was an example to motivate your intuition.&lt;br&gt;&lt;br&gt;But to go with it for a moment, yes, I am, as they have parallels. A cache miss in memory is akin to a head seek on a drive. To see why, just think about how the L2 cache is the same as the track cache. The L1 is equivalent to whether a sector is in the RAM cache or not. But this was merely an example. Discard it if it offends you so deeply.&lt;br&gt;&lt;br&gt;The point is that cache misses are &lt;i&gt;expensive&lt;/i&gt; on modern processors. Back when I started on the 6502 et al (1979), there was no difference. Nowadays, there's a &lt;i&gt;massive&lt;/i&gt; difference between code that makes many memory accesses or few, and whether those memory accesses are spread out randomly or in order.&lt;br&gt;&lt;br&gt;Try it. Make a large 2-d array of integers (say 1024x1024), and try walking the array in column order first versus row order first, and timing it. &lt;br&gt;&lt;br&gt;Anyway, if you don't want your intuition motivated, well then, I guess I'm just going to have to beg you to measure the speed difference between walking a tree and looking up a key in hash.&lt;br&gt;&lt;br&gt;And for the third time, there are times when hashes are completely worthless -- such as partial path lookups and needing to traverse neighbors (L/R/Parent) in an ordered collection. But for many things you're generally going to be better off with a hash.&lt;br&gt;&lt;br&gt;Don't take my word for it. &lt;i&gt;Go measure it.&lt;/i&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Ray Lee</dc:creator><pubDate>Tue, 22 Jan 2008 17:28:57 -0000</pubDate></item><item><title>Re: Aguri: Coolest Data Structure You&amp;#8217;ve Never Heard Of</title><link>http://www.matasano.com/log/1009/aguri-coolest-data-structure-youve-never-heard-of/#comment-2323576</link><description>You're really comparing external search to in-memory search? Come on, Ray.</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Thomas Ptacek</dc:creator><pubDate>Tue, 22 Jan 2008 16:21:12 -0000</pubDate></item><item><title>Re: Aguri: Coolest Data Structure You&amp;#8217;ve Never Heard Of</title><link>http://www.matasano.com/log/1009/aguri-coolest-data-structure-youve-never-heard-of/#comment-2323575</link><description>&lt;i&gt;(2) See N comments ago where I addressed the received wisdom that trees are necessarily hard on memory and cache performance. No, your trees are.&lt;/i&gt;&lt;br&gt;&lt;br&gt;Sigh. &lt;b&gt;If&lt;/b&gt; you're not doing prefix or neighbor lookups, then a tree is never a win in memory due to cache effects. It's &lt;i&gt;especially&lt;/i&gt; not a win on a disk drive as you're causing more seeks.&lt;br&gt;&lt;br&gt;This isn't wisdom whispered in halls, this is just plain measurable on any test.&lt;br&gt;&lt;br&gt;And yes, I know about optimizing trees to deal with cache effects. I wrote a bone-head boggle solver that creates a ternary tree of all the words, so that I could quickly prune search paths. In part of the optimization of the algorithm, I measured cache misses (to disk, as I had serialized the tree and mapped it directly into memory from disk so that it wouldn't have to be re-run on every invocation). Part of what came up immediately was it was important to create the tree in a nice order so that after the first few comparisons, everything was under a page size. This is your point that you make above.&lt;br&gt;&lt;br&gt;But, bottom line, if you have to traverse a tree structure to merely see whether or not a key exists (or just acquire the associated leaf data), it will always be slower, and worse on the entire system (due to cache effects) than a hash.&lt;br&gt;&lt;br&gt;I like tries. I use tries. But they aren't always appropriate and you're doing your readers a disservice if you tell them any differently.</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Ray Lee</dc:creator><pubDate>Tue, 22 Jan 2008 15:30:20 -0000</pubDate></item></channel></rss>