What Dark Web markets are still up

what is dark web used for and what does dark web mean and what does the dark web contain and what is dark web activity
Dr.AlbaNathan Profile Pic
Dr.AlbaNathan,United States,Researcher
Published Date:09-08-2017
Your Website URL(Optional)
Dark Web Attribute System 1 Intr oduction The weekly news coverage of excerpts from messages and videos produced and web- cast by terrorists/extremists has shown that terrorists and extremists have become exploiters of the Internet beyond routine communication operations. The Internet has dramatically increased their ability to influence the outside world (Arquilla and Ronfeldt, 1993). Several virtues of the Internet, such as ease of access, anonymity of posting, huge audience, and lack of regulations, have enabled terrorists to directly speak to millions of people – both supporters and adversaries – with little chance of being detected. As posited by Jenkins (2004) , through operating their own web sites and online forums, terrorists have effectively created their own “terrorist news network.” Terrorist/extremist organizations have generated thousands of web sites that sup- port psychological warfare, fundraising, recruitment, coordination, and distribution of propaganda materials. From those terrorist/extremist web sites, supporters can download multimedia training materials, buy games, T-shirts, and music CDs and access forums and chat services such as PalTalk (Elison, 2000; Tekwani, 2002; Bowers, 2004; Muriel, 2004; Weimann, 2004). Some web sites such as those associated with the jihad terrorist/extremist movement are extremely dynamic in that they emerge overnight, frequently modify their contents, and then swiftly “dis- appear” by changing their URLs which are later announced via online forums ( Weimann, 2004). They are often hosted on free web space servers or by unsecured and poorly maintained commercial servers. Such web sites are technically supported by those who are Internet savvy to provide sophisticated propaganda images and videos via proxy servers to mask ownerships (Armstrong and Forde, 2003; El Deeb, 2004). The level of technical sophistication of the Islamic terrorist/extremist organi- zations’ web sites has increased according to Katz, who monitors Islamic funda- mentalist Internet activities (Internet Haganah, 2005). The rapid proliferation and increased sophistication of web sites and online forums run by terrorist/extremist organizations are indications of the growing popularity of the Internet in terrorism campaigns. They also indicate that there is a vast pool of sympathizers that such , 128 8 organizations have attracted, with some applying their IT expertise as contributions to the cause (Jesdanun, 2004). Although this alternate side of the Internet, referred to as the “Dark Web,” has received extensive government and media attention, there is a dearth of empirical studies that examine the sophistication of terrorist/extremist organizations’ web sites and how they support strategic and tactical information operations. Therefore, some basic questions about terrorist/extremist organizations’ Internet usage remain unanswered. For example, what are the major Internet technologies that they have used on their web sites? How sophisticated and effective are the technologies in terms of supporting communications and propaganda activities? In this chapter, we explore an integrated approach for collecting and monitoring terrorist-created web contents and propose a systematic content analysis approach to enable quantitative assessment of the technical sophistication of terrorist/extrem- ist organizations’ Internet usage. The rest of this chapter is organized as follows: In Sect. 2, we briefly review previous works on terrorists’ use of the Internet. In Sect. 3, we present our research questions and the proposed methodologies to study those questions. In Sect. 4, we describe the findings obtained from a case study of the analysis of technical sophistication, content richness, and web interactivity features of major Middle Eastern terrorist/extremist organizations’ web sites and a bench- mark comparison of Middle Eastern terrorist/extremist web sites and web sites from the US government. In the last section, we provide conclusions and discuss the future directions of this research. 2 Li terature Review 2.1 T errorism and the Internet Previous research showed that terrorists/extremists mainly utilize the Internet to enhance their information operations surrounding propaganda, communication, and psychological warfare (Thomas, 2003; Denning, 2004; Weimann, 2004) . To achieve their goals, terrorists/extremists often need to maintain a certain level of publicity for their causes and activities to attract more supporters. Prior to the Internet era, terrorists/extremists maintained publicity mainly by catching the attention of tradi- tional media such as television, radio, or print media. This was difficult for them because terrorists/extremists often could not meet the editorial selection criteria of those public media (Weimann, 2004) . With the Internet, terrorists/extremists can bypass the requirements of traditional media and directly reach hundreds of mil- lions of people globally, 24/7. Terrorist/extremist groups have sought to replicate or supplement the communi- cation, fundraising, propaganda, recruitment, and training functions on the Internet by building web sites with massive and dynamic online libraries of speeches, train- ing manuals, and multimedia resources that are hyperlinked to other sites that share 2 Literature Review 129 similar beliefs (Coll and Glasser, 2005; Weimann, 2004) . The web sites are designed to communicate with diverse global audiences of members, sympathizers, media, enemies, and the public (Weimann, 2004) . Table 8.1 summarizes terrorist/extremist groups’ objectives and tasks that are supported by web sites. 2.2 Existing Dark Web Studies In recent years, there have been studies of how terrorists/extremists use the web to facilitate their activities (Zhou et al., 2005; Chen et al., 2004; ISTS, 2004; Thomas, 2003; Tsfati and Weimann, 2002; Weimann, 2004). For example, researchers at the Institute for Security Technology Studies (ISTS) have analyzed dozens of terrorist/extremist organizations’ web sites and identified five categories of terrorists’ use of the web: propaganda, recruitment and training, fundraising, communications, and targeting. These usage categories are supported by other studies such as those by Thomas ( 2003), Katz at SITE Institute (2004), and Weimann ( 2004). Since the late 1990s, several organizations, such as SITE Institute, the Anti- Terrorism Coalition (ATC), and the Middle East Media Research Institute (MEMRI), started to monitor contents from selected terrorist/extremist web sites for research and intelligence purposes. Tsfati and Weimann (2002) studied the content types and target audiences of terrorist/extremist organizations’ web sites by analyzing the con- tent of 29 Middle Eastern web sites. Table 8.2 lists some of the organizations that capture and analyze terrorists/extremists’ web sites (and the collection start dates) grouped into three functional categories: archive, research center, and vigilante community. Except for the Artificial Intelligence (AI) Lab, none of the enumerated organiza- tions seem to use automated methodologies for both collection building and analy- sis of the web sites. Due to the enormous size and the dynamic nature of the web, the manual collection and analysis approaches have limited the comprehensiveness of their analyses. Furthermore, none of the studies have provided empirical evi- dence of the levels of technical sophistication or compared terrorist/extremist orga- nizations’ cyber capabilities with those of mainstream organizations. Since technical knowledge required to maintain web sites provides an indication of terrorist/extrem- ist organizations’ technology adoption strategies (Jackson, 2001) , we believe it is important to analyze the technologies required to maintain terrorist/extremists’ web sites from the perspectives of technical sophistication, content richness, and web interactivity. 2.3 Dark Web Collection Building The first step toward studying the terrorist/extremist web presence is to capture ter- rorist web sites and store them in a repository for further analysis. Web collection 130 8 Dark Web Attribute System Table 8.1 How web sites support objectives of terrorist/extremist groups Terrorist/extremists’ objectives Tasks supported by web sites Web features (Preece, 2000) Enhance • Composing, sending, and receiving • Synchronous (chat, video communication messages conferencing, MUDs, and (Becker, 2005; • Searching for messages, MOOs) and asynchronous Weimann, 2004) information, and people (e-mail, bulletin board, forum, • One-to-one and one-to-many and usenet newsgroup) communications • GUI • Maintaining anonymity • Help function • Feedback form • Login • E-mail address for webmaster and organization contact Increase fundraising • Publicizing need for funds • Payment instruction and (Weimann, 2004) • Providing options for facility collecting funds • E-commerce application • Hyperlinks to other resources Diffuse propaganda • Posting resources in multiple • Content management (Weimann, 2004) languages • Hyperlinks • Providing links to forums, videos, • Directory for documents and other groups’ web sites • Navigation support • Using web sites as an online • Search, browsable index clearinghouse for statements • Free web site hosting from leaders • Accessible Increase publicity • Advertising groups’ events, • Downloadable files (Coll and Glasser, martyrs, history, and ideologies • Animated and flashy banner, 2005; Jenkins, 2004) • Providing groups’ logo, and slogan interpretation of the news • Clickable maps • Information resources (e.g., international news) Overcome obstacles • Send encrypted messages • Anonymous e-mail accounts from law via e-mail, forums, or post • Password-protected or enforcement and on web sites encrypted services military (Coll and • Move web sites to different servers • Downloadable encryption Glasser, 2005; so that they are protected software Kelley, 2001) • E-mail security • Stenography Provide recruitment • Hosting martyrs’ stories, • Interactive services (e.g., and training speeches, and multimedia games, cartoons, and maps) (Weimann, 2004) that are used for recruitment • Online registration process • Using flashy logos, banners, • Directory and cartoons to appeal to sympa- • Multimedia (e.g., videos, thizers with specialized skills and audios, and images) similar views • FAQs, alerts • Build massive and dynamic • Virtual community online libraries of training resources 2 Literature Review 131 Table 8.2 Organizations that capture and analyze terrorists’ web sites Organization Description Access Ar chives 1. Internet Archive (IA) 1996 – Coll ect open access HTML Via http://www.archive.org pages (every 2 months) Rese arch centers 2. Anti-Terrorism 2003 – Jihad watch. Has 448 terrorist Via http://www.atcoalition.net Coalition (ATC) web sites and forums 3. Artificial Intelligence 2003 – Spidering (every 2 months) Via test bed portal called (AI) Lab, University to collect terrorist web sites. Has Dark Web Portal of Arizona thousands of web sites: US (ai.arizona.edu) domestic, Latin America, and middle eastern web sites 4. MEMRI 2003 – Jihad and terrorism studies Access reports via project http: //www.memri.org 5. SITE Institute 2003 – Capture web sites every 24 h. Access reports and fee-based Extensive collection of thousands intelligence services of files http: //siteinstitute.org 6. Weimann (University 1998 – Capture web sites daily. Closed collection of Haifa, Israel) extensive collection of thousands of files Vigilante Community 7. Internet Haganah 2001 – Confronting the global Jihad Provides snapshots of terrorist project. Has hundreds of links to web sites http://haganah.us web sites building is the process of gathering and organizing unstructured information from pages and data on the web. Previous studies have suggested three types of approaches to collecting web contents in specific domains: manual approach, automatic approach, and semiautomatic approach. In order to build the September 11 and Election 2002 Web Archives (Schneider et al., 2003), the Library of Congress collected seed URLs for a given theme. The seeds and their close neighbors (distance 1) are then downloaded. The limitation of such a manual approach is that it is time consuming and inefficient. Albertsen (2003) used an automatic approach in the “Paradigma” project. The goal of Paradigma is to archive Norwegian legal deposit documents on the web. It employed a focused web crawler (Kleinberg et al., 1998), an automatic program that discovers and downloads web sites in particular domains by following web links found in the HTML pages of a starting set of web pages. Metadata was then extracted and used to rank the web sites in terms of relevance. The automatic approach is more efficient than the manual approach; however, due to the limita- tions of current focused crawling techniques, automatic approaches often introduce noise (off-topic web pages) into the collection. The “Political Communications Web Archiving” group employed a semiauto- matic approach to collecting domain-specific web sites (Reilly et al., 2003). Domain experts provided seed URLs as well as typologies for constructing metadata that can be used in the crawling process. Their project’s goal is to develop a methodology for 132 8 Dark Web Attribute System constructing an archive of broad-spectrum political communications over the web. We believe that the semiautomatic approach is most suitable for collecting terrorist/ extremist web sites because it combines the high accuracy and high efficiency of manual and automatic approaches. 2.4 Dark Web Content Analysis In order to reach an understanding of the various facets of terrorist/extremist web usage and communications, a systematic analysis of the web sites’ content is required. Researchers in the terrorism domain have used observation and content analysis to analyze web site data. In Bunt’s (2003) overview of jihadi movements’ presence on the web, he described the reaction of the global Muslim community to the content of jihadi terrorist web sites. His assessment of the influence such content had on Muslims and Westerners was based on a qualitative analysis of message contents extracted from Taliban and al-Qaeda web sites. Tsfati and Weimann (2002) conducted a content analysis of the characteristics of terrorist groups’ communica- tions. They said that the small size of their collection and the descriptive nature of their research questions made a quantitative analysis infeasible. Demchak et al. (2000) provided a well-defined methodology for analyzing com- municative content in government web sites. Their work focused on measuring “openness” of government web sites. To achieve this goal, they developed a Web Site Attribute System (WAES) tool that is basically composed of a set of high-level attributes such as transparency and interactivity. Each high-level attribute is associ- ated with a second layer of attributes at a more refined level of granularity. For example, the increase of “operational information” and “responses” on a given web page can induce an increase in the openness level of a government web site. This WAES system is an example of a well-structured and systematic content analysis methodology. Demchak et al.’s work provides guidance for this chapter. However, the “open- ness” attributes used in their work were designed specifically for e-government studies. We surveyed research in e-commerce, e-government, and e-education domains and identified several sets of attributes that could be used to study the tech- nical advancement and effectiveness of terrorists/extremists’ use of the Internet. Palmer and David’s (1998) study identified a set of 15 attributes (called “techni- cal characteristics” in the original work) to evaluate two aspects of e-commerce web sites: technical sophistication and media richness. More specifically, the technical sophistication attributes measure the level of advancement of the techniques used in the design of web sites, e.g., “use of HTML frames,” “use of Java scripts,” etc. The media richness attributes measure how well the web sites use multimedia to deliver information to their users, e.g., “hyperlinks,” “images,” “video/audio files,” etc. Another set of attributes called web interactivity has been widely adopted by researchers in e-government and e-education domains to evaluate how well web sites facilitate the communication among web site owners and users. Two organizations, 3 Proposed Methodology: Dark Web Collection and Analysis 133 the United Nations Online Network in Public Administration and Finance (UNPAN; www .unpan.org) and the European Commission’s IST program (www.cordis.lu/ ist/), have conducted large-scale studies to evaluate the interactivity of government web sites of major countries in the world. The web interactivity attributes can be summarized into three categories: one-to-one-level interactivity, community-level interactivity, and transaction-level interactivity. The one-to-one-level interactivity attributes measure how well the web sites sup- port individual users to give feedback to the web site owners (e.g., provide e-mail contact, provide guest book functions, etc.). The community-level interactivity attri- butes measure how well the web sites support the two-way interaction between site owners and multiple users (e.g., use of forums, online chat rooms, etc.). The transaction- level interactivity measures how well users are allowed to finish tasks electronically on the web sites (e.g., online purchasing, online donation, etc.). Chou’s (2003) study proposed a detailed four-level framework to analyze e-education web sites’ level of advancement and effectiveness. Attributes in the first level (called learner-interface interaction) of Chou’s framework are very similar to the technical sophistication attributes used in Palmer and David’s (1998) study. Attributes in the other three levels (learner-content interaction, learner-instructor interaction, and learner-learner interaction) of Chou’s framework are similar to the three-level web interactivity attributes used in the e-government evaluation projects as mentioned above. To date, no study has employed the technical sophistication, media richness, and web interactivity attributes as well as the WAES framework in the terrorism domain. We believe that these web content analysis metrics can be applied in terrorist/ extremist web site analysis to deepen our understanding of the terrorists’ tactical use of the web. 3 Pr oposed Methodology: Dark Web Collection and Analysis The research questions postulated in this chapter are: 1. What design features and attributes are necessary to build a highly relevant and comprehensive Dark Web collection for intelligence and analysis purposes? 2. For terrorist/extremist web sites, what are the levels of technical sophistication in their system design? 3. F or terrorist/extremist web sites, what are the levels of richness in their online content? 4. For terrorist/extremist web sites, what are the levels of web interactivity to sup- port individual, community, and transaction interactions? To study the research questions, we propose a Dark Web analysis tool which contains several components: a systematic procedure for collecting and monitoring Dark Web contents and a Dark Web Attribute System to enable quantitative analysis of Dark Web content (see Fig. 8.1). 134 8 Dark Web Attribute System The Web Dark Web The Dark Web Attribute System Identify Terrorist Groups from Authoritative Sources Technical Media Web Sophistication Richness Interactivity (TS) (MR) (WI) Identify Seed Terrorist URLs Identify Presence of TS, MR, & WI Attributes Expand Seed URLs from Dark Web Sites through Automatic and Manual Coding Approaches Automatically Collect Calculate Dark Web TS, MR, and WI scores Terrorist Web Documents Conduct Benchmark Comparison between the Dark Web Collection and a U.S. The Dark Web Collection Government Web Collection Dark Web Collection Building Dark Web Content Analysis Fig. 8.1 The Dark Web collection-building and content analysis framework 3.1 Dark Web Collection Building The first step toward studying terrorists’ tactical use of the web is to build a high- quality Dark Web collection. To ensure the quality of our collection, based on our review of web collection-building methodologies, we propose to use a semiautomated approach to collecting Dark Web contents (Reid et al., 2004) . Our collection-building approach contains the following steps (see Fig. 8.2 for graphical depiction): 1. Identify terrorist/extremist groups: Defining terrorism is complicated by the fact that people almost never define themselves as terrorists, and the use of the label by others often has political overtones. We start the collection-building process by identifying the groups that are considered by authoritative sources as terrorist/ extremist groups. The sources include government agency reports (e.g., US State Department reports, FBI reports, government reports from United Kingdom, Australia, Japan, and P. R. China, etc.), authoritative organization reports (e.g., Counter-Terrorism Committee of the UN Security Council, US Committee for A Free Lebanon, etc.), and studies published by terrorism research centers such as the Anti-Terrorism Coalition (ATC), the Middle East Media Research Institute (MEMRI), Dartmouth College, etc. Information such as terrorist group names, leaders’ names, and terrorist jargon is identified from the sources to create a ter- rorism keyword lexicon for use in the next step. 3 Proposed Methodology: Dark Web Collection and Analysis 135 Government Reports Research Centers (FBI, US State Department, UN Security (ATC, MEMRI, Dartmouth, Norwegian Council, etc) Research, etc) Terrorism Lexicon (Organization names, leader names, slogans, special keywords...) 1. Identify Terrorist Groups Research Centers Government Reports Search Engines (ATC, MEMRI, (FBI, US State Department, (Google, Yahoo, etc) Dartmouth, Norwegian UN Security Council, etc) Research, etc) 2. Identity Terrorist Group URLs Initial Seed URLs URL Expansion Terrorist Forum In/Out-link Expansion Analysis URL Filtering 3. Expand Terrorist Group URLs through Link and Forum Analysis Expanded URLs Automatic Web Crawler (Download multilingual, multimedia Web contents) 4. Download Terrorist Site Contents Dark Web Collection Fig. 8.2 The Dark Web collection building approach 136 8 Dark Web Attribute System 2. Identify terrorist/extremist group URLs: We manually identify a set of seed ter- rorist group URLs from two sources. First, terrorist group URLs can be directly identified from the authoritative sources and literature used in the first step. Second, terrorist group URLs can be identified by using the terrorism keyword lexicon to query major search engines on the web. The identified set of terrorist group URLs will serve as the seed URLs for the next step. Expand terrorist/extremist URL sets through link and forum analysis: After iden- tifying the seed URLs, out-links and in-links of the seed URLs were automati- cally extracted using link analysis programs. The out-links are extracted from the HTML contents of “favorite link” pages under the seed web sites. The in-links are extracted from Google in-link search service through Google API. Automatic out-link and in-link expansion is an effective way to expand the scope of our collection. We also have language experts who browse the contents of terrorist- supporting forums and extract the terrorist/extremist URLs posted by terrorist supporters. Because bogus or unrelated web sites can make their way into our collection through the expansion, we have developed a robust filtering process based on evidence and clues from the web sites . 3. A side from sites which explicitly identify themselves as the official sites of a terrorist organization or one of its members, a web site that contains even minor praise of or adopts ideologies espoused by a terrorist group is included in our collection. 4. Download terrorist/extremist web site contents: Once the terrorist/extremist web sites are identified, a program is used to automatically download all their con- tents. Unlike the tools used in previous studies, our program was designed to download not only the textual files (e.g., HTML, TXT, PDF, etc.) but also multi- media files (e.g., images, video, audio, etc.) and dynamically generated web files (e.g., PHP, ASP, JSP, etc.). Moreover, because terrorist organizations set up forums within their web sites whose contents are of special value to research communities, our program also can automatically log into the forums and down- load the dynamic forum contents. The automatic downloading method allows us to effectively build Dark Web collections with millions of documents. This greatly increases the comprehensiveness of our Dark Web study. To keep the Dark Web collection comprehensive and up-to-date, steps 2 to 4 are periodically repeated. Collections built using such a recursive procedure can also provide information about the evolution and diffusion of the Dark Web. 3.2 The Dark Web Attribute System (DWAS) Instead of using observation-based qualitative analysis approaches (Thomas, 2003) , we propose a systematic approach to enable the quantitative study of terrorist/ extremist groups’ use of the web. The proposed Dark Web Attribute System is simi- lar to the WAES framework in Demchak et al.’s study (2000). However, instead of the openness attributes used in WAES, our framework focuses on the attributes that could help us better understand the level of advancement and effectiveness of terrorists’ web usage, namely, technical sophistication attributes, content richness 3 Proposed Methodology: Dark Web Collection and Analysis 137 Table 8.3 ( a ) Technical sophistication attributes TS attributes Weights Basic HTML techniques Use of lists 1 Use of tables 2 Use of frames 2 Use of forms 1.5 Embedded multimedia Use of background image 1 Use of background music 2 Use of stream audio/video 3.5 Advanced HTML Use of DHTML/SHTML 2.5 Use of predefined script functions 2 Use of self-defined script functions 4.5 Dynamic web programming Use of CGI 2.5 Use of PHP 4.5 Use of JSP/ASP 5.5 Table 8.3 ( b) Content richness attributes CR Attributes Scores Hyperlink Hyperlinks File/software download Downloadable documents Image Images Video/audio file Video/audio files Table 8.3 ( c ) Web interactivity attributes WI attributes Weights One-to-one interactivity E-mail feedback 1.75 E-mail list 2.25 Contact address 1.25 Feedback form 2.75 Guest book 1.5 Community-level interactivity Private message 4.25 Online forum 4.25 Chat room 4.75 Transaction-level interactivity Online shop 4 Online payment 4 Online application form 4 attributes (an extension of the traditional media richness attributes), and web inter- activity attributes. Based on previous literatures in e-commerce (Palmer and David, 1998), e-government (Demchak et al. 2000), and e-education domains (Chou, 2003), we selected 13 technical sophistication attributes, 5 content richness attri- butes, and 11 web interactivity attributes for our DWAS framework. A list of these attributes is summarized in Tables 8.3a to 8.3c. 138 8 Dark Web Attribute System 1. Technical sophistication (TS) attributes: The technical sophistication attributes can be grouped into four categories as shown in Table 8.3a. The first category of four attributes, called the basic HTML technique attributes, measures how well the basic HTML layout techniques (i.e., lists, tables, frames, and forms) are applied in web sites to organize web contents. The second category, called the embedded media attributes, measures how well the web sites deliver their information to the user in multimedia formats such as images, animations, and audio/video clips. The third category of three attributes, called the advanced HTML attributes, measures how well advanced HTML techniques, such as DHTML and SHTML, and pre- defined and self-defined script functions (e.g., JavaScript, VBScript, etc.) are applied to implement security and dynamic functionalities. The last category, called the dynamic web programming attributes, measures how well dynamic web pro- gramming languages such as PHP, ASP, and JSP are utilized to implement dynamic interaction functionalities such as user login, online request or application, and online transaction processing. The four technical sophistication attributes and asso- ciated subattributes are present in most of the Dark Web sites we collected. The presence of different attributes indicates different levels of technical sophistication. For example, a web site which uses JSP techniques should be considered more technically sophisticated than a site which only uses static HTML. Different weights should be assigned to the attributes to reflect the dif- ferences (Chou, 2003) . We determined the weights based on web experts’ opin- ions collected through an e-mail survey. Surveys were sent to webmasters and network administrators of several web sites belonging to the University of Arizona, and they were encouraged to forward the survey to their webmaster colleagues. In the survey, we asked the experts to give each of our attributes a weight of 1–10 (1 is the least advanced/sophisticated). Six experts sent their responses back to us. For each attribute, the average weight assigned by the experts was used in the final framework. Among the six experts, two are web- masters of academia web sites, two are webmasters of commercial web sites, one is a web developer in a commercial company, and the last one is a professor teaching web development courses in a university. On average, they have 7 years of professional experience in web technology. To ensure the reliability of the weights, we conducted a reliability test on the experts’ answers. The reliability score (Cronbach’s alpha) calculated for the experts’ answers was 0.89 which was well above the 0.70 required for acceptable scale reliability ( Nunnally, 1978) . The TS attributes and their weights are summarized in Table 8.3a. 2. Content richness (CR) attributes: In traditional media richness studies, research- ers only focused on the variety of media used to deliver information (Trevino et al., 1987; Palmer and Griffith, 1998) . However, to have a deep understanding of the richness of Dark Web contents, we would like to measure not only the variety of the media but also the amount of information delivered by each type of media. In this chapter, we expand the media richness concept by taking the vol- ume of information into consideration. More specifically, as shown in Table 8.3b, we calculated the average number of four types of web elements as the indication of Dark Web content richness: hyperlinks, downloadable documents, images, and video/audio files. 4 Case Study: Understanding Middle Eastern Terrorist Groups 139 3. Web interactivity (WI) attributes: For the web interactivity attributes (see Table 8.3c), we followed the standard built by the UNPAN and the European Commission’s IST program as well as Chou’s ( 2003) work to group the attri- butes into three levels: the one-to-one-level interactivity, the community-level interactivity, and the transaction-level interactivity. The one-to-one-level interac- tivity contains five attributes (i.e., e-mail feedback, e-mail list, contact address, feedback form, and guest book) that provide basic one-to-one communication channels for Dark Web users to contact the terrorist web site owners (see Table 8.3c). The community-level interactivity contains three attributes (i.e., pri- vate message, online forum, and chat room) that allow Dark Web site owners and users to engage in synchronized many-to-many communications with each other. The transaction-level interactivity contains three attributes (i.e., online shop, online payment, and online application form) that allow Dark Web users to com- plete tasks such as donating to terrorist/extremist groups, applying for group membership, etc. The presence of these attributes in the Dark Web sites indicates how well terrorists/extremists utilize Internet technology to facilitate their com- munication with their supporters. Similar to the TS attributes, different weights should be assigned to the WI attri- butes to indicate their different levels of support on communications. We asked web experts to assign weights of 1 to 10 to the WI attributes in the same e-mail survey where the TS attributes’ weights were determined. The WI attributes and their weights are summarized in Table 8.3c. We developed strategies to efficiently and accurately identify the presence of the DWAS attributes from Dark Web sites. The TS and CR attributes are marked by HTML tags in page contents or file extension names in the page URL strings. For example, an HTML tag “image” indicates that an image is inserted into the page content. A URL string ending with “.jsp” indicates that the page utilizes JSP tech- nology. We developed programs to automatically analyze Dark Web page contents and URL strings to extract the presence of the TS and CR attributes. Since there are no clear indications or rules that a program could follow to identify WI attributes from Dark Web contents with a high degree of accuracy, we developed a set of cod- ing schemes to allow human coders to identify their presence in Dark Web sites. Technical sophistication, content richness, and web interactivity scores are calcu- lated for each web site based on the presence of the attributes to indicate how advanced and effective the site is in terms of supporting terrorist/extremist groups’ communications and interactions. 4 Case Study: Understanding Middle Easter n Terrorist Groups To test our proposed approach, we conducted a case study to collect and analyze the web presence of major Middle Eastern terrorist groups. We also conducted a bench- mark comparison between the terrorist/extremist web sites and US federal and 140 8 Dark Web Attribute System state government web sites to evaluate the terrorist/extremist organizations’ online capabilities. The terrorist/extremist groups we studied mainly include Islamic terrorist groups rooted in Middle Eastern countries, for example, al-Qaeda, Palestinian Islamic Jihad, Hamas, etc. These terrorist/extremist groups are the focus of most current counterterrorism studies. We chose US government web sites as benchmarks because government web sites and terrorist/extremist web sites have common overall objectives – to inform the public about their goals, programs, and strategies. To achieve this objective, similar web features must be implemented in both government and terrorist/extremist web sites. Furthermore, the US government was ranked the top in the world by the CyPRG group (http://www.cyprg.arizona. edu/) in terms of web technical sophistication and interactivity. With the US govern- ment web sites as high-standard benchmarks, we can better understand the terrorist/ extremist web sites’ levels of technical advancement and effectiveness. 4.1 Building Dark Web Research Test Bed Following the collection-building procedure discussed in Sect. 3.1, we created a Middle Eastern terrorist/extremist web site collection and a US government web site collection as the test beds for this study. The Middle Eastern terrorist/extremist web collection was created in June of 2004. We identified 36 Middle Eastern terrorist/extremist groups from authoritative sources mentioned in Sect. 3.1. Based on the information of these terrorist/extremist groups, we constructed a lexicon of Middle Eastern terrorism keywords with the help of Arabic language experts. Examples of relevant keywords include terrorist leaders’ names such as “الشيخ المجاهد بن لادن” (“Sheikh Mujahid bin Laden”), ter- rorist groups’ names such as “ايران خلق” (“Khalq Iran”), and special words used by terrorists/extremists such as “حرب صليبية” (“Crusader’s War”) and “الكفار” (“Infidels”). This lexicon was used to query major search engines for identification and retrieval of terrorist/extremist groups’ URLs. The URLs identified from the search engines, together with the terrorist/extremist URLs listed in the terrorism literature and reports, served as seed URLs for the out-link and in-link expansion process. We performed a one-level-deep in-link expansion using Google’s in-link search tool and a one-level-deep out-link expansion. After carefully filtering the expansion results, we obtained the URLs of 86 Middle Eastern terrorist/extremist web sites. Using SpidersRUs, a digital library building toolkit developed by our group, we collected about 222,000 multimedia web documents from the identified terrorist/extremist web sites. Table 8.4 summarizes the detailed file-type breakdown of the terrorist/extremist collection; 179,223 out of the total 222,687 documents in the terrorist/extremist col- lection are indexable files. These are textual files such HTML files, plain text files, PDF/Word documents, and dynamic files generated by web applications (e.g., ASP, JSP, etc.). Interestingly, the majority of indexable files (130,972 files out of 179,223 total files) in the terrorist/ 4 Case Study: Understanding Middle Eastern Terrorist Groups 141 Table 8.4 Middle eastern terrorist/extremist web collection file types Terrorist/extremist collection Fil es Volume (bytes) Grand total 222,687 12,362,050,865 Indexable files total 179,223 4,854,971,043 HTML fi les 44,334 1,137,725,685 Word fi les 278 16,371,586 PDF fi les 3,145 542,061,545 Dynamic fi les 130,972 3,106,537,495 Text fi les 390 45,982,886 PowerPoint fi les 6 6,087,168 XML fi les 98 204,678 Multimedia files total 35,164 5,915,442,276 Image fi les 31,691 525,986,847 Audio fi les 2,554 3,750,390,404 V ideo fi les 919 1,230,046,468 Archive files 1,281 483,138,149 Nonstandard files 7,019 1,108,499,397 extremist collection are dynamic files. We conducted a preliminary analysis on the contents of these dynamic files and found that most dynamic files were forum post- ings. This indicates that online forums play an important role in terrorists/extremists’ web usage. Other than indexable files, multimedia files also make a significant pres- ence in the terrorist/extremist collection. While the quantity of multimedia files is not as large as the indexable files, multimedia files are the largest category in the collec- tion in terms of their volume. This indicates heavy use of multimedia technologies in terrorist/extremist web sites. The last two categories, archive files (1,281 files) and nonstandard files (7,019 files), made up less than 5% of the collection. Archive files are compressed file packages such as .zip files and .rar files. They could be password protected. Nonstandard files are files that cannot be recognized by the Windows oper- ating system. These files may be of special interest to terrorism researchers and experts because they could be encrypted information created by terrorists/extremists. Further analysis is needed to study the contents of these two types of files. The benchmark US government web collection was built in July of 2004. All 92 federal and state government URLs under Yahoo’s “Government” category were selected as seed URLs. Around 277,000 web documents were automatically col- lected from these government web sites using the SpidersRUs toolkit. The detailed file type breakdown of the US government web collection is summarized in Table 8.5. The file-type distribution of the government collection is similar to the terrorist/extremist collection. Indexable files (221,684 files) are the largest category, the majority of which are dynamic files (145,590 files). However, in the government collection, we did not find as many forum postings as in the terrorist/extremist col- lection. Many dynamic files in the government collection are articles dynamically retrieved from large-document databases at users’ requests. Multimedia files also have a significant presence in the government collection, indicating heavy multime- dia usage in government web sites. 142 8 Dark Web Attribute System Table 8.5 US government web collection file types US government collection Files Volume (bytes) Grand total 277,274 19,341,345,384 Indexable files total 221,684 6,502,288,302 HTML fi les 71,518 2,632,912,620 Word fi les 298 210,906,045 PDF fi les 841 663,293,376 Dynamic fi les 145,590 2,071,734,849 Text fi les 2,878 555,403,447 Excel fi les 4 98,560 PowerPoint fi les 5 725,017 XML fi les 554 367,214,389 Multimedia files total 49,582 10,835,029,216 Image fi les 45,707 850,011,712 Audio fi les 3,429 8,153,419,931 Video fi les 449 1,831,597,573 Archive files 538 286,312,990 Nonstandard files 5,471 1,717,714,876 4.2 Col lection Analysis and Benchmark Comparison Following the DWAS approach, the presence of technical sophistication and media richness attributes was automatically extracted from the collections using programs. The presence of web interactivity attributes was extracted from each web site by language experts based on the coding scheme in DWAS. Because of the time limita- tion, language experts examined only the top two levels of web pages in each web site. For each web site in the two collections, three scores (technical sophistication, content richness, and web interactivity) were calculated based on the presence of the attributes and their corresponding weights in DWAS. Statistical analysis was conducted to compare the advancement/effectiveness scores achieved by the terror- ist/extremist collection and the US government collection. 4.2.1 Benchmar k Comparison Results: Technical Sophistication The technical sophistication comparison results are shown in Table 8.6. The results showed that: • The US government web sites are significantly more advanced than the terrorist web sites in terms of basic HTML techniques (p 0.0001). Government agencies paid a great deal of attention to the design of their web sites, and they used many of the HTML features to organize their web contents. Terrorists/extremists, on the other hand, did not organize the contents on their web sites very well. • The US government web sites are significantly more advanced than the terrorist web sites in terms of utilizing dynamic web programming languages (p= 0.0066). Most government web sites employed web programming technologies (e.g., PHP, 4 Case Study: Understanding Middle Eastern Terrorist Groups 143 Table 8.6 Technical sophistication comparison results Weighted average score TS attributes US T errorists t-Test result Basic HTML techniques 0.9130434 0.710526 p 0.0001 Embedded multimedia 0.565217 0.833333 p = 0.0027 Advanced HTML 1.789855 1.771929 p = 0.139 Dynamic web programming 2.159420 1.407894 p = 0.0066 Average 1.356884 1.180921 p = 0.06 Significant level is at 0.05 ASP, JSP, etc..) to implement functionalities such as user login, online applica- tion, online purchase, etc. Few terrorist/extremist web sites implemented such dynamic functionalities. • There is no significant difference between the terrorist web sites and the US government web sites in terms of applying advanced HTML techniques at a sig- nificant level of 0.05 (p= 0.139). • The terrorist web sites have a significantly higher level of embedded media usage than the US government web sites (p= 0.0027). This unique characteristic of ter- rorist/extremist web sites is discussed in detail below. • When taking all four sets of attributes into consideration, there is no significant difference between the technical sophistication of the Middle Eastern terrorist web sites and the US government web sites at a significant level of 0.05 (p= 0.06). The extensive use of media in terrorist/extremist groups’ web sites is of special interest. While the terrorist/extremist groups are not as good as the US government in terms of organizing their web pages into clear layouts or implementing dynamic web functionalities, they employed a significantly higher level of embedded multi- media techniques, especially images and audio/video clips, to catch the interest of their target audience. In the terrorist/extremist groups’ collection, 46% of the web sites embedded audio/video clips into their pages, while only 29% of the US gov- ernment web sites provided audio/video clips. Multimedia content is more attractive and tends to leave a stronger impression on people than pure textual content. For example, the militant Islamic group Hamas foments a violent resistance to their “enemies” by disseminating graphic posters on their web sites (see Fig. 8.3). Moreover, terrorists often post images, audio, or video clips from their leaders or martyrs to boost the spirits of their members and support- ers. For example, Osama bin-Laden’s portrait appears on homepages of many Middle Eastern terrorist/extremist web sites. Recently, posters of the Iraqi terrorist leader Abu Mus’ab Zarqawi, who is suspected to be responsible for the beheading of several Western hostages, can also be found in Middle Eastern terrorist web sites (see Fig. 8.4). These posters explicitly mention that Abu Mus’ab Zarqawi is a “beheader” and praise his brutal killing of innocents as a way to protect Iraq. Terrorists/extremists also post images and audio/video clips of their “martyrdom operations” as a way to demonstrate their resolve to fight their enemies and inspire

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.