The 10th anniversary of the as malware is another opportunity to reflect upon computer system defects, human error, process flaws, organizational mistakes, and the best principles and practices for solution in the IT industry. In this blog and my upcoming book, , I will chronicle some important system failures in the past and discuss ideas for improving the future of system quality. As information technology becomes increasingly woven into Life, the quality of hardware and software impacts our commerce, health, infrastructure, military, politics, science, security, and transportation. The Big Idea is that we have no choice but to get better at delivering technology solutions because our lives depend on it. Google search incident that incorrectly classified the entire World Wide Web Bugs: A Short History of Computer System Failure On 31 January 2009, a Google engineer manually updated its search engine’s blacklist of sites classified as malware to include the URL of ; this change meant that . Fortunately, Google’s on-call Site Reliability Engineering (SRE) team quickly identified the problem and fixed it within an hour. Besides affecting organic search results, the system error also impacted Google’s email service, GMail, in which users reported genuine messages routed to spam folders; interestingly, . This essay explores some of the business and technology factors that contributed to the system defect, the incident’s timely resolution, and the wider implications for the Web, search, and malware classification. ‘/’ every organic Google search result for the entire World Wide Web ( ) was incorrectly classified as malware WWW or Web advertised or promoted search results were not affected by the error Source: VisualCapitalist.com According to multiple sources including JumpShot, Netmarketshare.com, and Statista.com, Google has of the market share for web search traffic depending on the country. Google is also the default search engine on most smartphones running the Android operating system; according to Gartner Research and Statista.com, Android holds about 85% market share since 2017. If one also accounts for its sister properties such as Google Image, Maps, and Youtube, then Google holds an impressive market share of web, mobile, and in-app searches. There are some potential threats on the horizon to Google’s dominance in Search; they range from Amazon’s Alexa and Echo devices used to search and buy products to users spending more time on Facebook, and even some users opting out of data sharing entirely through Ad/Cookie blocking browser plugins. In the end though, Google handles , has more than 1.5 billion unique users, and earns about in advertising revenue from search. 60–80% 90% 3.5 billion searches per day $32B annually Malware is software designed to intentionally cause harm to an individual user, a computing device, or a larger network of nodes by attacking the system’s availability, confidentiality, or integrity. There are different types of malware such as computer viruses, worms, spam, Trojan horses, ransomware, spyware, adware, and others. What began out of curiosity and fun when the Internet was an academic computing environment has now turned into malice and profit because malware means big business and serious trouble for corporations, governments, and individuals across the world. According to various computer security reports from McAfee, Center for Strategic and International Studies (CSIS), IBM, the Ponemon Institute, and Symantec, there are several cybercrime statistics one should be concerned about: The of cybercrime was estimated by McAfee and CSIS at in 2018, almost 0.8% of global GDP and up from $500B in 2014. cost $600B Mobile malware attacks have increased by 54% in 2017 according to Symantec, and 3rd party app stores (e.g. not Apple or Google) are the source for of discovered mobile malware. 99.9% Nearly Americans have been affected by according to a 2018 online survey conducted by the Harris Poll and about 140M worldwide (about 2% of all people around the world) according to ENISA in Europe. These identities are used to perpetrate various crimes and frauds of impersonation including credit cards, utilities, banking, loans, and government documents. Since 2014, almost and other PII has been stolen by hackers. 60M identity theft 3 billion internet credentials The largest sources of cyber attacks in 2017 were China (20%), USA (11%), and Russia (6%); however, Iran and North Korea are growing state sponsors of cyberterrorism. Cryptocurrency mining is a huge growth area in cybercrime with detections of cryptojacking on endpoint computers surging according to Symantec as criminal botnets trying to add new machines such as your computer to their resource pool. Tor and Bitcoin have facilitated the growth of the Dark Web and are some of the preferred tools for cybercriminals. 8500% The cost of the average global corporate data breach is almost and the average time to detect and react to a breach is days according to IBM and Ponemon. The costs include funds to help victims with losses, notification expenses, as well as business disruption, customer turnover, revenue forfeiture, and reputation damage. $2M 196 There are over that are detected according to data compiled by AV-Test and McAfee. 300,000 new malware variants, 33,000 phishing attacks, and 80 billion malicious scans every day Microsoft Office file format such as Word, PowerPoint, and Excel are the most prevalent group of malicious file extensions comprising 38% of the total. Google Search Warning for Malware So with great power comes great responsibility. Through the Stopbadware.org initiative since 2006, Google has partnered with the likes of Consumer Reports, Mozilla, Paypal, Verisign, Verizon, and others to prevent, mitigate, and remediate malware websites. Stopbadware receives data from different content and hosting providers, defines criteria for classifying malware sites, maintains a common clearinghouse of URLs blacklisted by community members, aggregates malware statistics, manages the appeal process if a site is blocked by providers, and publishes advisory documents and best practices to reduce the incidence of malware. Although Google supports StopBadware through data sharing, participates in its working groups, and contributes financially to the organization, Google’s Safe Browsing Initiative and Secure Web API’s are separate services that use Google’s own private blacklist curated by both man and machine. This list is periodically updated, and on 31 January, 2009, a Google engineer accidentally added and committed the “/” URL to the blacklist, and Google’s system interpreted this URL to . Twitter was briefly ablaze and abuzz with people reporting the error using the hashtags #googmayharm #googmayhem. The warning message in Google’s organic search results also linked to Stopbadware.org, and the torrent of users clicking the link caused a DDOS on their website. Users could still copy-and-paste links into the URL field and visit the sites manually, but the widespread perception on that Saturday morning was that the Web was experiencing a malware catastrophe. The good news for Google and the Web was that the Google SRE team was on-call, and it was actively monitoring and supporting its cloud services. The SRE team was notified of user complaints, identified the root cause, communicated a response to the global community through its blog and Twitter, reverted the blacklist change, and deployed the updated configuration to its services. Google’s search services like much of its cloud platform are distributed on servers located across the world so the blacklist configuration update was released in a staggered and rolling fashion. The search errors began appearing between 6:27 and 6:40 AM PST when the blacklist was initially changed and then began disappearing between 7:10 and 7:25 AM when that change was reverted. match all Web URLs While this story is about a negative incident involving Google, there are several positive lessons to be learned for IT professionals. matters in a crisis, and investing in a strong support and operations organization for your IT solutions can be the difference when problems inevitably unfold per Murphy’s Law. Often the operations capability is treated as overhead by accounting and IT departments, however Google’s SRE team is first class, a key ingredient in the secret sauce of success for the company as a leading cloud service provider, and they worked fast to identify the root cause, to communicate acknowledgement of the issue as well as ongoing updates, and to resolve the problem. Customer Service automation is a major competitive advantage for IT departments and companies that are willing to invest in their personnel, platforms, and processes. In less than an thirty minutes, Google was able to reliably deploy a production configuration update to its servers distributed across 16 data centers. If you and your organization are new to the DevOps capability, then focus on two things: first on one-touch automation of the deployment pipeline and then second on production environment monitoring. IT processes can execute faster while reducing costs, defects and risks. Business can be more agile which means faster time-to-market as well as higher potential revenue growth and market share. Sure, there are initial and ongoing costs to a DevOps capability in terms of new skills and tools, but on balance the long-term ROI comes from the ability to reliably deliver new features and fixes to customers at a higher frequency and also in terms of productive hours saved in deployments and reacting to production issues. For more information about the multifaceted benefits of DevOps, check out DORA’s State of DevOps research report and read The DevOps Handbook written by Jez Humble and Gene Kim. DevOps is not so simple. The “/” URL is syntactically and semantically valid, but it did not make sense in the specific context of the malware blacklist. It is a configuration data entry error that is obscure and subtle, and while Google surely had automated tests for the core components of the Safe Browsing Services API, they did not test these blacklist configuration changes in a staging environment. Having visited Google’s headquarters in Mountain View for work, I can attest to the cultural importance of QA with written articles published on the walls of cafeterias, hallways, and yes, even the bathrooms about Quality Assurance and best practices for verifying system integrity. The lesson here is one about humility and separation of concerns. It motivates the business need to have distinct teams developing and testing software. The former wants things to work and sometimes believes that truth to a fault, while the latter wants to break things to make things better. Quality Assurance In subsequent articles, I will discuss specific system incidents involving malware that resulted in security breaches as well as strategies and tactics for preventing and reacting to these events. Enjoy the article? Follow me on Medium and Twitter for more updates. References https://googleblog.blogspot.com/2009/01/this-site-may-harm-your-computer-on.html https://www.stopbadware.org/blog/2009/01/31/google-glitch-causes-confusion https://www.theregister.co.uk/2009/01/31/google_malware_snafu/ https://arstechnica.com/information-technology/2009/01/google-broke-the-internet-malware-detector-went-haywire/ https://techcrunch.com/2009/01/31/google-flags-whole-internet-as-malware/ https://www.visualcapitalist.com/this-chart-reveals-googles-true-dominance-over-the-web/ https://www.consumerwatchdog.org/blog/googles-dominance https://www.netmarketshare.com/search-engine-market-share.aspx https://www.av-test.org/en/statistics/malware/ https://www.datacenterknowledge.com/archives/2017/03/16/google-data-center-faq https://www.varonis.com/blog/cybersecurity-statistics/ https://sparktoro.com/blog/new-jumpshot-2018-data-where-searches-happen-on-the-web-google-amazon-facebook-beyond/ https://www.accenture.com/t20170926T072837Z__w__/us-en/_acnmedia/PDF-61/Accenture-2017-CostCyberCrimeStudy.pdf https://www.mcafee.com/enterprise/en-us/assets/reports/restricted/rp-economic-impact-cybercrime.pdf