Preview Mode Links will not work in preview mode

Oct 29, 2019

Welcome to the History of Computing Podcast, where we explore the history of information technology. Because understanding the past prepares us for the innovations of the future! Today we’re going to cover one of the most important and widely distributed server platforms ever: The Apache Web Server. Today, Apache servers account for around 44% of the 1.7 Billion web sites on the Internet. But at one point it was zero. And this is crazy, it’s down from over 70% in 2010. Tim Berners-Lee had put the first website up in 1991 and what we now know as the web was slowly growing. In 1994 and begins with the National Center for Supercomputing Applications, University of Illinois, Urbana-Champaign. Yup, NCSA is also the organization that gave us telnet and Mosaic, the web browser that would evolve into Netscape. After Rob leaves NCSA, the HTTPdaemon goes a little, um, dormant in development. The distress had forked and the extensions and bug fixes needed to get merged into a common distribution. Apache is a free and open source web server that was initially created by Robert McCool and written in C in 1995, the same year Berners-Lee coined the term World Wide Web. You can’t make that name up. I’d always pictured him as a cheetah wearing sunglasses. Who knew that he’d build a tool that would host half of the web sites in the world. A tool that would go on to be built into plenty of computers so they can spin up sharing services. Times have changed since 1995. Originally the name was supposedly a cute name referring to a Patchy server, given that it was based on lots of existing patches of craptostic code from NCSA. So it was initially based on NCSA HTTPd is still alive and well all the way up to the configuration files. For example, on a Mac these are stored at /private/etc/apache2/httpd.conf. The original Apache group consisted of * Brian Behlendorf * Roy T. Fielding * Rob Hartill * David Robinson * Cliff Skolnick * Randy Terbush * Robert S. Thau * Andrew Wilson And there were additional contributions from Eric Hagberg, Frank Peters, and Nicolas Pioch. Within a year of that first shipping, Apache had become the most popular web server on the internet. The distributions and sites continued to grow to the point that they formed the Apache Software Foundation that would give financial, legal, and organizational support for Apache. They even started bringing other open source projects under that umbrella. Projects like Tomcat. And the distributions of Apache grew. Mod_ssl, which brought the first SSL functionality to Apache 1.17, was released in 1998. And it grew. The Apache Foundation came in 1999 to make sure the project outlived the participants and bring other tools under the umbrella. The first conference, ApacheCon came in 2000. Douglas Adams was there. I was not. There were 17 million web sites at the time. The number of web sites hosted on Apache servers continued to rise. Apache 2 was released in 2004. The number of web sites hosted on Apache servers continued to rise. By 2009, Apache was hosting over 100 million websites. By 2013 Apache had added that it was named “out of a respect for the Native American Indian tribe of Apache”. The history isn’t the only thing that was rewritten. Apache itself was rewritten and is now distributed as Apache 2.0. there were over 670 million web sites by then. And we hit 1 billion sites in 2014. I can’t help but wonder what percentage collections of fart jokes. Probably not nearly enough. But an estimated 75% are inactive sites. The job of a web server is to serve web pages on the internet. Those were initially flat HTML files but have gone on to include CGI, PHP, Python, Java, Javascript, and others. A web browser is then used to interpret those files. They access the .html or .htm (or other one of the other many file types that now exist) file and it opens a page and then loads the text, images, included files, and processes any scripts. Both use the http protocol; thus the URL begins with http or https if the site is being hosted over ssl. Apache is responsible for providing the access to those pages over that protocol. The way the scripts are interpreted is through Mods. These include mod_php, mod_python, mod_perl, etc. The modular nature of Apache makes it infinitely extensible. OK, maybe not infinitely. Nothing’s really infinite. But the Loadable Dynamic Modules do make the system more extensible. For example, you can easily get TLS/SSL using mod_ssl. The great thing about Apache and its mods are that anyone can adapt the server for generic uses and they allow you to get into some pretty really specific needs. And the server as well as each of those mods has its source code available on the Interwebs. So if it doesn’t do exactly what you want, you can conform the server to your specific needs. For example, if you wanna’ hate life, there’s a mod for FTP. Out of the box, Apache logs connections, includes a generic expression parser, supports webdav and cgi, can support Embedded Perl, PHP and Lua scripting, can be configured for public_html per-user web-page, supports htaccess to limit access to various directories as one of a few authorization access controls and allows for very in depth custom logging and log rotation. Those logs include things like the name and IP address of a host as well as geolocations. Can rewrite headers, URLs, and content. It’s also simple to enable proxies Apache, along with MySQL, PHP and Linux became so popular that the term LAMP was coined, short for those products. The prevalence allowed the web development community to build hundreds or thousands of tools on top of Apache through the 90s and 2000s, including popular Content Management Systems, or CMS for short, such as Wordpress, Mamba, and Joomla. * Auto-indexing and content negotiation * Reverse proxy with caching * Multiple load balancing mechanisms * Fault tolerance and Failover with automatic recovery * WebSocket, FastCGI, SCGI, AJP and uWSGI support with caching * Dynamic configuration * Name- and IP address-based virtual servers * gzip compression and decompression * Server Side Includes * User and Session tracking * Generic expression parser * Real-time status views * XML support Today we have several web servers to choose from. Engine-X, spelled Nginx, is a newer web server that was initially released in 2004. Apache uses a thread per connection and so can only process the number of threads available; by default 10,000 in Linux and macOS. NGINX doesn’t use threads so can scale differently, and is used by companies like AirBNB, Hulu, Netflix, and Pinterest. That 10,000 limit is easily controlled using concurrent connection limiting, request processing rate limiting, or bandwidth throttling. You can also scale with some serious load balancing and in-band health checks or with one of the many load balancing options. Having said that, Baidu.com, Apple.com, Adobe.com, and PayPal.com - all Apache. We also have other web servers provided by cloud services like Cloudflare and Google slowly increasing in popularity. Tomcat is another web server. But Tomcat is almost exclusively used to run various Java servers, servelets, EL, webscokets, etc. Today, each of the open source projects under the Apache Foundation has a Project Management committee. These provide direction and management of the projects. New members are added when someone who contributes a lot to the project get nominated to be a contributor and then a vote is held requiring unanimous support. Commits require three yes votes with no no votes. It’s all ridiculously efficient in a very open source hacker kinda’ way. The Apache server’s impact on the open-source software community has been profound. It iis partly explained by the unique license from the Apache Software Foundation. The license was in fact written to protect the creators of Apache while giving access to the source code for others to hack away at it. The Apache License 1.1 was approved in 2000 and removed the requirement to attribute the use of the license in advertisements of software. Version two of the license came in 2004, which made the license easier for projects that weren’t from the Apache Foundation. This made it easier for GPL compatibility, and using a reference for the whole project rather than attributing software in every file. The open source nature of Apache was critical to the growth of the web as we know it today. There were other projects to build web servers for sure. Heck, there were other protocols, like Gopher. But many died because of stringent licensing policies. Gopher did great until the University of Minnesota decided to charge for it. Then everyone realized it didn’t have nearly as good of graphics as other web servers. Today the web is one of the single largest growth engines of the global economy. And much of that is owed to Apache. So thanks Apache, for helping us to alleviate a little of the suffering of the human condition for all creatures of the world. By the way, did you know you can buy hamster wheels on the web. Or cat food. Or flea meds for the dog. Speaking of which, I better get back to my chores. Thanks for taking time out of your busy schedule to listen! You probably get to your chores as well though. Sorry if I got you in trouble. But hey, thanks for tuning in to another episode of the History of Computing Podcast. We’re lucky to have you. Have a great day!