Introduction by Jim Koch -
Eric Brewer is the co founder and chief scientist at Inktomi.  One of the recent great success stories in the internet arena.  He receive his Ph.D. in parallel computing from MIT in 1994.  He’s currently an assistant professor of electrical engineering and computer science at the University of California at Berkeley.  His studies include a focus on internet infrastructure, internet security and mobile computing.  Certainly the work that Eric is involved with in the internet area will provide us with some considerable insight on what were going do with all this storage capacity we have in markets that are yet before us to be created.


Brewer's Biography

ERIC BREWER


 
    I actually have very little experience in the disk drive industry, I’m a user, that’s about it.  But as a user of disk drives I think in an unusual way, in a way I think will become more and more relevant to the evolution of industry.  I’ll talk about how we use the disk in the internet, and also what I think I would like to see out of disk drives in the future and it’s not areal density.  So a little background; this chart shows the growth of the Internet measured in back bone traffic.  For reference the black line at the bottom is 100% annual growth.  That’s a very interesting graph that means that Moore's law is below the black line.  If you say were gonna build the internet and that were gonna ride Moore's law to keep up, there’s no chance.  Which means several things, it means that over time you need you more processors on the internet because processors aren’t going to get faster, fast enough.  Same is true for disk drives.  Even at 200% or 100% per year increase in density it won’t keep up.  Which is good since that means selling more disk drives. 
    So we’ll talk about that, but Inktomi exists because of this picture.  Inktomi’s strength basically is cluster computing.  So we know how to make very large computers, and I’ll show you in a little bit.  They act as one simple giant computer, that can grow at a pace that’s faster than Moore’s law. 
    So a short company overview.  It is IKNT on NASDAQ.  It came out of UC Berkeley.  It was founded by myself and one of my graduate students.  Actually my first graduate student, I found out today that we are now 260 employees, a sign of fast growth.  We were at 80 in earlier in the year so it’s kind of scary.  We have 3 applications well known mostly as search engines.  If you use HOTBOT or Yahoo or Snap those are all Inktomi search engines. We provide the search capacity they actually use to deliver their products.  So we are an OEM search provider.  Actually an OEM in general for all these products. 
    Networking cache, that I will talk about, has to do with making the internet faster.  On line shopping, that you just keep hearing about in the news, allowing you to buy things on line which fundamental will be more efficient than any other distribution mechanism.  It’s a very nice use of the internet.  So we have a lots of partners.  I’ll talk about a few of them a little bit.  But that’s kind of the big view.
    I’ll talk about the two applications that use lots of disk.  The first one is the search engine, and this is actually the most telling picture.  The way the search works if you would do a search.  That’s pretty normal.  You would go to say Yahoo, and type in your query.  Then what actually happens to it is not intuitive.  So it goes to Yahoo, which is I believe is from around here, in Mountain View.  Their office and their computers are in different places.  Then from Yahoo you actually currently go to Virginia. 
    So Yahoo gets your query and they sent it to Inktomi in Virginia even though there is a cluster in Santa Clara that we will talk about in a little bit.  The cluster gives them back the answer in real time and they then do a presentation, which is to say they convert it to HTML, and they insert whatever advertisement they’d like and what other things, little icon’s and such that show up on the page.  Maybe it’s Yahoo to get your stock quotes or something like that.  So they actually do the presentation and then you get your answer.  So there is actually quite a few steps in that.  There’s a step to Yahoo than a step from Yahoo to Inktomi.  It turns out most search engines work this way.  In fact what is interesting is that there is one big cluster that does most of searches on the internet.  It’s actually not very far from here, it’s off the Great America highway, I don’t know exactly but about a mile or 2 miles from here. 
    That main cluster has now 166 nodes and each of those have 2 CPUs.  When I say a large virtual machine what I mean a machine that’s got a 166 nodes in it, more than 300 hundred processors that solve the search engine queries.  It works as a virtual computer.  It also has a lot of disks, this is where I am gonna go with this.  But I think what is interesting is, that this picture implies, that there is an infrastructure being built in the back ground that actually does what we call the heavy lifting.  And it’s important because it is not only a centralization of CPU resources but a centralization of disk resources.  In fact I think that the most important trend from the internet for the disk drive industry is that most storage that end users own won’t be in their house, it won’t be in their laptops.  It’s gonna be in the infrastructure.  This means that the market is going to have to shift a little bit.
    So how do we do this?  I actually wanted to use this old picture to talk about the difference.  So what we do is take a cluster of commodity nodes  (I’ll show you a picture in a minute) typically work stations like a 2 processor SUN workstation.  We spread the data base across the cluster.  That is actually the interesting part because that been historically very difficult to do.  So what happens is actually when a query comes in it can go to any of the nodes in the cluster and that node will give the query to all the nodes, and in this case 166 nodes.  They will in parallel perform that query.  They will return their partial answers to the node that started the query that collates them, and picks the top 10 and actually returns that to the user.&