Baidu's and Don'ts: Privacy and Security Issues in Baidu Browser - The Citizen Lab (original) (raw)

ResearchApp Privacy and Controls

阅读报告的主要发现

Key Findings

Introduction and Overview

Baidu Browser is a free web browser for the Windows and Android platforms, produced by Baidu, one of China’s largest technology companies. The browser offers a number of features beyond those found in standard browsers, including video and audio download tools and built-in torrent support.

This report provides a detailed analysis of how Baidu Browser manages and transmits user data during its operation. The report identifies security concerns in both the Windows and Android versions of the browser that may expose personal user data, including a user’s geolocation, hardware identifiers, nearby wireless networks, web browsing data and search terms. Such user data is transmitted, in both the Windows and Android versions, unencrypted or with easily decryptable encryption, which means that any in-path actor could acquire this data by collecting the traffic and performing any necessary decryption. In addition, neither version of the application secures its software update process with a digital signature, which means that a malicious in-path actor could cause the browser to download and execute arbitrary code.

This report is a continuation of our prior work examining the security and privacy of popular mobile applications in Asia. Our previous research includes a report on similar concerns with UC Browser, a popular mobile web browser owned by China-based e-commerce giant Alibaba. That report documented UC Browser’s unencrypted transmission of sensitive user information, including IMSI, IMEI, Android ID, Wi-Fi MAC Address, geolocation data and user search queries. The security issues in UC Browser were identified in documents leaked by Edward Snowden that indicated the Five Eyes intelligence alliance, consisting of intelligence agencies from Canada, the United States, the United Kingdom, Australia and New Zealand, had used these vulnerabilities as a means of identifying users.

In previous work, we have analyzed the auto-update mechanisms in popular third party software. The remote code execution via man-in-the-middle attack vulnerabilities that we found in Baidu Browser’s auto-update mechanisms are consistent with vulnerabilities common in other third party software.

In addition, we have conducted research into keyword censorship and surveillance in TOM-Skype and keyword censorship in messaging platform Sina UC, as well as a comparative analysis of mobile chat applications popular in Asia, including WeChat, LINE and KakaoTalk.

We have also published an overview of privacy and security in mobile communications, entitled The Many Identifiers in Our Pockets. This primer on mobile technology identifiers is useful background for some of the technical issues raised in this report. Additionally, we have published, in collaboration with Open Effect, an analysis of privacy and security concerns in fitness trackers

Responsible Disclosure and Notification

On November 26, 2015, we notified Baidu of our findings and our intent to publish this report. We indicated that we would not publish sooner than 45 days after notification, in line with international standards on vulnerability disclosure. Baidu initially stated that the issues we identified would be resolved in updates released by January 24, 2016. However, after Baidu identified that these security issues affected additional products, they requested we delay publication until after February 14, 2016. We agreed to not publish before February 14, 2016, in order to give Baidu sufficient time to fix all the vulnerabilities that we identified.

Following our security disclosure, Baidu indicated that they would release updated versions of both the Windows and Android clients by February 14, 2016. We performed an analysis of both updated versions to determine if the issues we identified had been resolved. The results of that analysis are described in the “Update: Analysis of updated versions of Baidu Browser” section at the end of this report.

On February 16, 2016 we emailed a set of questions regarding Baidu Browser’s security and privacy practices to Baidu’s Director of International Communications, and on February 22, 2016 we received their responses.

We have documented all correspondence with Baidu related to these security issues in an Appendix at the end of this report.

Baidu Browser: Brief Background

Baidu Browser (百度浏览器) is a web browser produced for Windows and Android, developed by China-based Internet giant Baidu. First released in 2011 and based on Google’s Chromium platform, the web browser offers a number of features, including integrated video and audio downloading tools, a built-in torrent client and mouse gesture support. The browser is one of many service offerings from Baidu, which include its marquee search engine, a massive advertising platform, and Baike, a Wikipedia-like collaborative encyclopedia. In 2015, the browser was estimated by China Internet Watch to have had a penetration rate amongst Chinese users of 29.2%.

Baidu has become one of the dominant tech companies in China, and shielded from competition from the censored Google search engine, it has become the most used search engine in China. The Baidu search engine ranks fourth on the Alexa list of most visited websites worldwide, and is the most visited website in China. The company earned USD$7.96 billion in revenue in 2014.

In July 2014, Baidu formed a partnership with U.S.-based Internet traffic management company CloudFlare, creating a service that leverages Baidu’s Chinese data centres with CloudFlare’s traffic management services to increase traffic speeds across China’s border. The service, called Baidu Yunjiasu (百度云加速) or “Cloud acceleration,” is primarily targeted at businesses seeking to speed up the flow of traffic across China’s inefficient, censorship-heavy network. Part 2 of our analysis below describes a feature of Baidu Browser that proxies traffic to certain websites hosted outside of China to improve performance.

Technical Analysis

We analyzed both the Android and Windows versions of Baidu Browser using reverse engineering techniques. To analyze program behavior, we used machine code and bytecode disassemblers, decompilers, and debuggers including JD, JADX, and IDA. To capture and analyze network traffic, we used tcpdump and Wireshark.

Our analysis is split into three parts. The first part describes how both the Android and Windows versions of the Chinese language Baidu Browser send unencrypted and easily decryptable personal information to Baidu servers. The second part describes a feature in the Chinese Windows version of Baidu Browser that proxies requests for certain websites hosted outside of China to increase performance. The third part discusses the shared vulnerabilities between the Chinese and global versions of the browser and how many of these shared problems exist due to the use of a Baidu software development kit that exists in other Baidu and third-party apps.

“Easily decryptable” encryption

In this report we sometimes utilize the phrase “easily decryptable” in referring to the encryption used by Baidu Browser. Here we discuss what we mean by this phrase, and how Baidu Browser’s encryption could be properly implemented.

When we say that encryption is “easily decryptable,” we do not mean that the encryption algorithm used is itself flawed or insecure (although in some cases the algorithms Baidu Browser uses are). Instead, we mean that the algorithm is used improperly. Namely, it is used in such a way that an analyst examining Baidu Browser could write a tool capable of decrypting these algorithms’ encryption.

baidu1

There are two basic ways of encrypting data: using either symmetric encryption or asymmetric encryption. The advantage of symmetric encryption (illustrated in Figure 1) is that it is significantly faster than asymmetric encryption. The disadvantage is that with symmetric encryption, when you know the algorithm used to encrypt and, if it uses one, its key, then you also know how to decrypt anything it encrypts. With simple algorithms, this is as easy as reversing the steps of that algorithm. When encryption uses only symmetric algorithms, someone analyzing a program can write a tool to decrypt anything that the program encrypts by analyzing the algorithms it is using and/or finding its hard-coded encryption keys (fixed encryption keys that appear inside the program’s code).

baidu2

Asymmetric algorithms (illustrated in Figure 2) such as RSA were developed to address this weakness. While they are significantly slower than symmetric algorithms, they have the advantage that a different key is used to decrypt data than the one used to encrypt. These keys must be mathematically related, but the algorithms are designed such that it is easy to generate a key pair but computationally intractable to derive the decryption key from the encryption key. This means that a program can use a hard-coded encryption key but keep the decryption key a secret by storing it only at the recipient. Although the recipient of the data, using the secret decryption key, will still be able to decrypt the data, a third party cannot write a tool to do so since the decryption key is not present in the program.

To ameliorate the performance disadvantage of asymmetric encryption, it is typically combined with symmetric encryption using the following technique. To encrypt data, a symmetric encryption key is randomly generated and used to encrypt that data. Then the randomly generated symmetric key is encrypted using an asymmetric key. The asymmetrically encrypted symmetric key and the data encrypted with that symmetric key are then sent to the recipient. The recipient can then decrypt the symmetric key using her private decryption key and use the decrypted symmetric key to decrypt the data. Since only the key, which is typically much smaller than the data, is encrypted using asymmetric encryption, this technique is much faster than using asymmetric encryption to encrypt all of the data and thus combines the best properties of both types of encryption. It is the foundation of common encryption protocols on the Internet such as SSL.

We describe Baidu Browser’s encryption as easily decryptable because it is entirely symmetric and uses hard-coded keys. We demonstrate that their communications encrypted using this method can be easily decrypted. We recommend that Baidu and anyone else desiring to securely send sensitive information over the Internet use a well-known and well-tested protocol utilizing asymmetric cryptography (such as SSL) and not attempt to implement their own “homebrew” encryption protocols. SSL in particular is a well-analyzed protocol that addresses many security issues not likely to be considered by a nonprofessional cryptographer.

Part 1: Insecure transmission of personal data

Android Version

We analyzed version 6.2.18.0 of the browser, which we downloaded from http://mb.baidu.com/. We identified a number of security and privacy concerns regarding transmission of personal data through insecure methods. Table 1 summarizes the personal data that is collected and transmitted by the application with either no encryption or with easily decryptable encryption methods.

Table 1: Summary of personal data collected and level of encryption of Android version of Baidu Browser

Personal Data Level of Encryption
User Operating System Not encrypted
GPS coordinates plus last GPS update time Not encrypted
IMEI Easily decryptable
Nearby wireless networks including MACs Easily decryptable
Search terms entered into address bar Not encrypted
URLs visited Not encrypted

We identified the following security flaws related to the collection and transmission of personal data in a number of features of the Android version of the application:

a. Leaks sensitive data on startup

Upon application launch, we observed Baidu Browser sending an HTTP POST request to

The body of this HTTP request is a gzipped JSON file. The JSON file contains a list of fields with various details about the phone and the user, some in plain text and others encrypted.

Unencrypted fields in the JSON file include:

Some fields are encrypted using AES+ECB with the hard-coded ASCII-encoded key

h9YLQoINGWyOBYYk

and then Base64 encoded. These fields include:

With knowledge of the hard-coded key, these fields can easily be decrypted. The source code for a python script for decrypting these fields is available here.

b. Leaks sensitive data and address bar contents when inputting into address bar

Like other browsers, users can enter text into the address bar of Baidu Browser in order to either visit a given URL or to perform a search. When text is entered into the address bar in such a manner, it is sent without encryption as an HTTP GET request with multiple GET parameters to the following URL:

For example, inputting the text “some address bar contents” into the address bar generates:

http://uil.cbs.baidu.com/sug/rich?wd=some+address+bar+contents&ua=I4Ly8_OLL8_lPvC0tpwbqkrywN0sCFzKkhF6q9pvANIr5wj0_hHQNgCcvCgnhvId_OXNiyJuvNvrCUdsB&cuid=ga2Pfgal2u0ca28Yg8vkugu0-uYBiSiAlP2Nf_8ZS88Pa28g_a2q8_aq28_qa28qA&cfrom=1200a&from=1200a&crp=0&it=0&ctv=2&st=00000000&nw=3g&cen=ua_cuid

These GET parameters include

The wd parameter value is sent without encryption; however, the ua and cuid parameter values are encrypted with a nonstandard, easily decryptable algorithm described as follows. Each 32-bit word of the string’s UTF8-encoded bytes is interpreted as a little-endian integer, then circularly bit-shifted to the right by three bits and XOR’d with the hard-coded mask 0x2D382324. Finally, the resulting 32-bit words are together Base64-encoded with the following custom 64-character alphabet:

qogjOuCRNkfil5p4SQ3LAmxGKZTdesvB6z_YPahMI9t80rJyHW1DEwFbc7nUVX2-

Note that the typical Base64 alphabet is as follows:

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/

With knowledge of the algorithm and the custom alphabet, these fields can be easily decrypted, and the source code for a python script for decrypting these fields is available here. This algorithm is implemented by Baidu Browser in native ARM code in the implementation of the nativeB64Encode(byte[]) function in libchiperencoder_v1_3_1_browser.so (“chiper” in “libchiperencoder” is likely a misspelling of “cipher”).

c. Phones home with sensitive data about every page view

After a page is viewed in the browser, a GET request is sent to the following URL:

The HTTP Referer header contains, unencrypted, the full URL of the page visited. The URL is reported unencrypted even if the page was retrieved via HTTPS, which would normally encrypt the URL in the page request. The value of its etag GET parameter is encrypted using RC4 using the ASCII-encoded five-byte key:

HR2ER

When decrypted, the tag contains a number of other values including the cuid, which contains the phone’s IMEI number written backwards, and various timing information related to how long various steps of loading the page took, including DNS lookup, creating the connection, and DOM loading.

d. Insecurely checks for software updates

When the application makes a request for updates, it makes an HTTP GET request to the following URL:

This request includes multiple GET parameters including the encrypted ua and cuid parameters described in the previous section. The server returns a response in an unencrypted but zlib-compressed custom binary format. If an update is available, the server will include the description of the update and an APK URL in the response. Baidu Browser will display the description, inquiring whether the user wants to upgrade. If the user confirms the upgrade, the APK is downloaded and automatically opened, prompting the user with the typical Android system user interface for the install or upgrade of apps (see Figure 3).

The APK is not verified by Baidu Browser with any digital signature, and so any man-in-the-middle can perform an active attack by sending specially crafted responses. A man-in-the-middle is thus capable of sending a URL to any APK along with a textual description of the update that will be presented to the user. Since Android will not allow an APK to upgrade an app if the APK is signed with a different digital signature than that of the currently installed app, this technique cannot be used to replace Baidu Browser with an arbitrary APK. However, it may still be used to install a new app, and a properly crafted APK (for example, using the name and logo of Baidu Browser) could be used to deceive a user into installing a malicious APK (see Figure 3).

: Example man-in-the-middle attack on Baidu Browser’s updater

Figure 3: Example man-in-the-middle attack on Baidu Browser’s updater. On the left, we injected a custom update description. On the right, after the update is downloaded, the browser prompts the user to install the Angry Birds APK (an actual attacker might instead craft an app called “Baidu Browser” with an icon similar to that of Baidu Browser to further convince the user to install it).

Windows Version

We analyzed version 7.6.100.2089 of the Windows version of Baidu Browser, which we downloaded from http://liulanqi.baidu.com/. This version also contained a number of vulnerabilities presenting privacy and security concerns. Table 2 summarizes the personal data points collected and transmitted with either no encryption or easily decryptable encryption methods:

Table 2: Summary of personal data collection and level of encryption of Windows version of Baidu Browser

Personal Data Level of Encryption
Search terms entered into address bar Not encrypted
Hard disk drive serial number, model number and controller version number Easily decryptable
MAC address Easily decryptable
URLs visited including page title Easily decryptable
Machine CPU model Easily decryptable
Hard disk drive serial number Easily decryptable
File system volume serial number Easily decryptable

We identified the following security flaws related to the collection and transmission of personal data in a number of features of the Windows version of the application:

a. Leaks address bar contents when inputting into address bar

Like the Android version, when a user enters text into the address bar in order to retrieve search suggestions, that text is sent without encryption as an HTTP GET request to the following URL:

http://uil.cbs.baidu.com/sug/rich

with the address bar contents in the value of the wd parameter.

b. Communicates with Baidu servers via an easily decryptable protocol

Upon application launch and while browsing, we also observed that Baidu Browser sends multiple HTTP POST requests to various subdomains of the *.br.baidu.com domain. The bodies of these POST requests always adhere to a certain format: a header followed by an encrypted payload. Two strings appear, unencrypted, in the header. The first is the browser’s GUID, which is the MD5 hash of

  1. Hard drive disk serial number
  2. Hard drive disk model number
  3. Hard drive disk controller version number
  4. Network MAC address
  5. The string “BDM”

joined by “#” characters, e.g.,

md5(“VBf952409b-973833b1#VBOX HARDDISK#1.0#080027f2c8cf#BDM”).

The second string is the browser’s SupplyID, a value retrieved from the

HKEY_LOCAL_MACHINE\SOFTWARE\Baidu\BaiduBrowser\SupplyID

registry key, which appears to be related to the version of the browser.

The encrypted payload, when decrypted, contains data serialized using Google Protocol Buffers, or protobuf. The serialized data is encrypted using a modified TEA cipher that we call MTEA. The block cipher mode Baidu uses with MTEA is a nonstandard modification of CBC (see Figure 4) that we call MCBC.

MCBC, the nonstandard modification of CBC Baidu uses for MTEA.

Figure 4: MCBC, the nonstandard modification of CBC Baidu uses for MTEA. Additional XOR operations of MCBC over CBC are in dashed-blue in the above figure.

For encrypting and decrypting all protobuf messages, Baidu Browser uses MTEA+MCBC with the following hard-coded ASCII-encoded key:

vb%,J^d@2B1l'Abn

An all-zero-byte initialization vector is always used. The source code for a python script for decrypting these requests is available here.

c. Phones home information about every page view that includes hardware serial numbers

We decrypted the protobuf requests made by the browser. We found that one such request, which we call the Page Report request, is sent for every page that the user views, including both HTTP and HTTPS pages, and includes the following information about the page and the user:

  1. The page’s full URL
  2. The page’s HTTP status code
  3. The page’s HTML title
  4. Every domain for which the browser has stored a cookie named “BAIDUID” and that cookie’s contents (Baidu uses a cookie with this name as a tracking cookie)
  5. Machine’s CPU model
  6. Machine’s hard drive serial number
  7. File system’s volume serial number
  8. Machine’s network MAC address
  9. Browser’s GUID

d. Insecurely checks for software updates

When the browser checks for updates, it makes a protobuf request, which we call an Update Info request, for information pertaining to the latest version of Baidu Browser. The decrypted response includes the version number of the latest release, a description of its improvements, a URL for the browser to download an executable to update to the latest version, and that file’s MD5 hash. The executable is not protected with any digital signature, only the MD5 hash in the encrypted protobuf request. By encrypting and sending its own protobuf replies, any man-in-the-middle performing an active attack can send a URL to any executable and its MD5 hash to cause the browser to download and execute arbitrary code.

Example man-in-the-middle attack on Baidu Browser’s self-updater.

Figure 5: Example man-in-the-middle attack on Baidu Browser’s self-updater. A benign program that displays “Oh Hai There” was used as the payload, but any arbitrary program could be injected.

e. Updates list of website domains triggering proxying

When the browser starts, it makes a protobuf request we call a Proxy Info request that, when decrypted, contains the version numbers of different resources related to Baidu’s automatic proxying for foreign websites, explained in detail in Part 2 of our report. If the server determines that one of these resources requires an update, depending on the resource, either an up-to-date version of the resource will be included in the protobuf response or a link to the latest version will be provided with its MD5 hash and an encryption key to decrypt it.

Part 2: Proxying of foreign-hosted websites

Web users in China face one of the world’s strictest regimes of Internet censorship. Perhaps the most well known form of censorship in China is the “Great Firewall” – the comprehensive system of web filtering that blocks attempts from within China to access banned content. As this system limits the access Chinese Internet users have to content hosted outside of the country, many users seek out methods of circumventing this censorship. One such method is through the use of internationally-hosted proxies, which mask and redirect web traffic in order to evade censorship.

In addition to the limits that China’s censorship imposes on access to information, the Great Firewall also introduces significant inefficiencies that slow the transmission of data into and out of the country. Proxies can serve to improve performance by bypassing these network bottlenecks.

Our analysis of the Windows version of Baidu Browser shows that the software contains a feature to automatically proxy requests to certain websites hosted outside of China. Baidu advertises such a service on its website and describes the potential performance improvements (see Figure 6).

“We took millions in advertising earnings to buy an overseas tunnel to accelerate access to overseas websites.”

Figure 6: “We took millions in advertising earnings to buy an overseas tunnel to accelerate access to overseas websites.” (Retrieved November 13, 2015)

In addition to improving performance, we have also found that the proxy provides access to some websites that are normally blocked by the Great Firewall, such as www.wordpress.com. Only certain websites, predetermined by Baidu, are tunneled through the proxy according to rules determined by the following three resources: kv_auth, fg_pac, and kv_report.

kv_auth resource

The kv_auth resource contains information about different proxy servers. If the browser’s version of kv_auth is not up to date, then a newer version will be included inside the response to the Proxy Info request. The resource contains MD5 hashes of proxy address domain:port pairs. For instance, one hash entry is 6d4c0ce00565ca6e6ec8f7bff5ab7619, which corresponds to md5(“0.wacc.baidu.com:80”). Each of these hash entries is associated with a username salt and a password salt, which will be used to compute the username and password to access the proxy, as explained later. For instance, the username salt and password salt currently provided in the entry for 6d4c0ce00565ca6e6ec8f7bff5ab7619 are “2” and “345,” respectively.

The list currently contains 13 entries; however, since only hashes are provided, we cannot use this list to acquire the entire list of proxies. We have a decrypted version of this resource here.

fg_pac resource

The fg_pac resource contains rules for determining which websites are proxied. If the browser’s version of fg_pac is not up to date, then a link to download a newer version of the file will be included in the response to the Proxy Info request, along with that file’s MD5 hash and encryption key. The encryption key, when zero-padded to 16 bytes, can be used to decrypt the file using the MTEA+MCBC algorithm described in an earlier section. At present, the key provided in the protobuf response is “test”; thus when zero-padded it is “test” followed by 12 zero bytes. The URL presently provided is

We have a decrypted version of the above file available here, and the source code for a python script to decrypt it is available here. Note that since the filename in the URL is the MD5 hash of the file’s contents, any update to the file would be at a different URL, and so the above URL cannot be tracked for updates.

When decrypted, this file is a proxy auto-config, or PAC file, which is Javascript code for determining which websites are proxied, which proxy is used for each website, and whether to use HTTP or HTTPS to communicate with that proxy.

The PAC file presently associates websites with one of two different proxy servers, 0.wacc.baidu.com and out.wacc.baidu.com, including whether the browser should connect to the proxy via either HTTP or HTTPS. We found no obvious pattern as to why some websites use one proxy versus the other or why some websites connect to the proxy with HTTP versus HTTPS. If a site does not match the rules for any proxy or matches the rules for a blacklist included in the PAC file, then no proxy will be used. We found that the MD5 hashes of 0.wacc.baidu.com:80, 0.wacc.baidu.com:443, out.wacc.baidu.com:80, and out.wacc.baidu.com:443 (each proxy server on HTTP and HTTPS ports) account for four of the hashes currently provided by the kv_auth file, but the other nine are presently unknown.

kv_report resource

The kv_report resource is used to determine if visiting a website sends additional reporting information to Baidu. If the browser’s version of kv_report is not up to date, then a newer version will be included inside the response to the Proxy Info request. This resource contains a list of domains. If a viewed page matches one of these domains, then the browser sends another encrypted protobuf request in addition to the Page Report request that we describe earlier about the page and its request. This information includes:

  1. The page’s full URL
  2. The page’s HTTP “Referer” field
  3. The site’s IP address
  4. The list of HTTP redirects taken to arrive at the page
  5. The start and end times of the page’s DNS lookup (if performed), TCP connection, SSL handshake (if performed), and the total request time
  6. Whether any proxying was used and, if so, which proxy

We have a decrypted version of this resource here. The domains listed in this resource largely overlap with the sites in the fg_pac resource, and it may be intended to debug performance of proxied sites and other sites perhaps under consideration for being proxied.

Analysis

We found that both proxies referenced in fg_pac resource, 0.wacc.baidu.com and out.wacc.baidu.com, are modified Squid proxies. Username and password authentication is required to use the servers. The proxy authentication scheme is consistent with the digest scheme described in RFC 2617 with authentication quality of protection enabled. The username and password used for authentication are dynamically computed, the latter of which is a function of the value of a nonstandard ip field communicated by the Squid proxy in an HTTP header. The username and password can be computed as follows:

1. Set key to be the proxy’s password salt (specified by the kv_auth resource) padded to 16 bytes
2. Set user1 to be the user salt (specified by the kv_auth resource)
3. Set user2 to be: browser’s GUID + “|0|” + fg_pac version number
4. Then encrypt user2 using MTEA+MCBC with key key and Base64 encode it
5. The username is: user1 + “|” + user2
6. Let ip be the value of the nonstandard ip field returned by the proxy in the HTTP header
7. Then the password is: md5(browser’s GUID + “|” + ip + “|” + password salt)

We wrote a python script that would authenticate with the proxy and request to download arbitrary URLs. First, we used it to test whether Baidu Browser would access sites not listed in the fg_pac file. We found that both proxy servers displayed error pages (illustrated in Figure 7) demonstrating server-side access controls to ensure that the proxies cannot be used to access sites for which they are not intended to be used. From a Chinese VPS, we tested access to every domain in the Alexa Top One Million list, and we found 46 additional domains (see Table 3) that both proxy servers allow even though they do not appear in the fg_pac resource used by the browser. These domains include google.com (which is still mentioned in fg_pac but is presently commented out and thus disabled), facebook.com, github.com, and stackoverflow.com, and many of these 46 domains are presently blocked in China. It is unclear why the proxy servers allow domains in addition to the ones for which the browser uses the proxy.

Attempting to manually use the proxy to access unapproved sites such as www.citizenlab.org results in a server-side error.

Figure 7: Attempting to manually use the proxy to access unapproved sites such as www.citizenlab.org results in a server-side error.

Table 3: Forty-six additional domains from the Alexa Top One Million list that proxy servers allow even though they do not appear in the fg_pac resource used by the browser.

angularjs.org blogblog.com blogger.com blogspot.com blogspot.jp chromium.org crossrider.com dropbox.com dropboxusercontent.com ebaystatic.com facebook.com fbcdn.net flickr.com freebase.com fsdn.com github.com githubusercontent.com gmail.com golang.org google.co.jp google.co.th google.com google.com.br google.com.eg google.com.hk google.fr googlecode.com hhvm.com livefyre.com llvm.org magnumphotos.com mitbbs.com nytimes.com schema.org slideshare.net sourceforge.net sstatic.net stackexchange.com stackoverflow.com t.co twimg.com twitter.com webex.com wikimedia.org wikimediafoundation.org wikipedia.org

We next used the script to determine the endpoints of each proxy. The addresses of both proxies 0.wacc.baidu.com and out.wacc.baidu.com (111.206.37.99 and 111.206.37.225, respectively) appeared to be in mainland China; however, we observed sites accessed via 0.wacc.baidu.com were receiving requests from 180.76.14.8, 180.76.14.131, 180.76.14.132, 180.76.14.138, and 180.76.14.142, all addresses in Hong Kong. We were able to observe these addresses by visiting http://www.reddit.com/account-activity, since www.reddit.com is one of the sites allowed by the proxies’ access controls. Moreover, we observed out.wacc.baidu.com had additional endpoints in 180.76.14.0/24. This suggests that both of these proxy servers tunnel HTTP requests from mainland China to Hong Kong.

We used our python script on a Chinese VPS to test if the proxy would improve download performance of sites and found that many of the sites listed in fg_pac consistently took less time to download. We thought that this could be explained solely by Squid’s caching behavior, but when we appended random query parameters to the URLs to ensure that they would not hit the Squid proxy’s cache, some sites still downloaded more quickly. For instance, on the Chinese VPS, live.com downloaded on average in 1.55 seconds without the proxy, but in 0.39 seconds using it. We found that all sites with a *.hk domain loaded in approximately the same time whether using the proxy or not, suggesting that the tunnel to Hong Kong may not necessarily be faster than ordinary routing; however, we found that routes from Hong Kong to overseas servers are often faster than routes from mainland China. For instance, by doing a traceroute from the VPS to live.com, we found that packets do not enter Microsoft’s autonomous system until North America, whereas by doing the same traceroute from Hong Kong we found that they entered Microsoft’s autonomous system before leaving Hong Kong via the AMX-IX Hong Kong Internet exchange.

In our testing, we were unable to download some of the sites also found that some of the sites from our Chinese VPS unless the proxy was utilized. In Table 4, we list all sites from the fg_pac file that were only accessible from our VPS by using a proxy. We say that a site was DNS poisoned if the DNS request received at least two responses, the first being anomalous and the final being correct. We say that the connection timed out to a site when the TCP handshake to that site never completed due to timing out. We say that the connection was reset to a site for which a TCP reset prematurely aborted our HTTP connection to that site. We found that all of the sites in this category trigger resets when their domain is in the HTTP host field of any HTTP request.

Table 4: Sites accessible using the proxy but otherwise inaccessible from our Chinese VPS, categorized by reason for inaccessibility.

DNS poisoning Connection timeout Connection reset
edgecastcdn.net android.comapis.google.comgoogleapis.comgstatic.comhtml5rocks.com expedia.comexpedia.com.hkwordpress.com

We found that proxied traffic is also not subject to HTTP keyword filtering. For instance, both the keywords freenet and hrichina were not censored in HTTP requests to foreign sites when using the proxy, whereas when not using the proxy, they triggered injected TCP resets.

Despite Baidu Browser’s fg_pac file no longer including rules to proxy them, the domains translate.google.com.hk, translate.googleusercontent.com, and webcache.googleusercontent.com are presently still allowed by the proxies’ server-side access control rules. Thus, someone directly using the proxies without using Baidu Browser (for instance, using our python script) can still view almost any webpage through the proxy via either Google’s translation service or Google’s web cache.

Part 3: Vulnerabilities in other Baidu products and third-party apps

We conducted a preliminary investigation to determine if the leaks of sensitive information discovered in Baidu Browser also existed in other Baidu products, via a shared code mechanism or otherwise. Baidu produces global editions of both the Windows and Android versions of its browser, which we examined in detail, and we describe the results below. We found that many of the leaks of sensitive information in the Android edition of the browser exist as part of an analytics software development kit (SDK) used not only in other Baidu Android apps but also in a large number of third-party apps as well.

Global editions of Baidu Browser

We investigated both the Windows and Android global editions of the browser. We obtained version 43.22.1000.452 of the global edition of the Windows browser downloaded from Baidu’s English-language site. We analyzed this version and did not find the same information leaks as were found in the Chinese version. Search terms entered into the address bar were sent encrypted over SSL. We found that the browser does send additional information back to Baidu’s servers via HTTP during startup and as triggered by other operations of the browser, but the data itself in the HTTP payload appears to be encrypted using a randomly generated 128-bit AES key encrypted with a 1024-bit RSA key. The encryption is thus asymmetric and not as easily decryptable by an eavesdropper; however, since the encryption does not appear to be implemented via a well-tested protocol such as SSL, it may contain less obvious flaws, and so further investigation is still warranted to determine its security.

We also analyzed version 5.1.0.1 of the global edition of the Android browser available from the Google Play store; however, unlike the global Windows and Chinese Windows versions, we found that the information that the Chinese Android version leaks at startup is also leaked by the global Android version and is sent to the same servers using the same easily decryptable cryptography. We discovered that the shared leaks are related to a common software development kit employed by both browsers, which we discuss in further detail below. We also found that the global Android browser sends information about pageviews via a different mechanism than the Chinese Android browser, but it is still encrypted using a symmetric, easily decryptable algorithm, which for brevity we do not describe in this report but make available a python script for decrypting here. The browser also sends sensitive information to an additional server encrypted with a 1024-bit RSA key, but, as with the global Windows version, the use of asymmetric encryption does not guarantee secure transmission, and so further investigation is required to determine if this instance of encryption is secure, although the browser is already leaking this sensitive information through more obvious means anyways.

We analyzed neither the global Windows nor global Android versions of the browser for other vulnerabilities such as those that may exist in their self-update processes or otherwise.

Sensitive data leaks in Baidu Mobile Tongji (Analytics) SDK

We found that the shared sensitive information leaks in both the Chinese and global versions of the Android browser are due to use of a common SDK, the Baidu Mobile Tongji (Analytics) SDK. The SDK is available as a download for use by third-parties at http://mtj.baidu.com. We have found two variants of the SDK:

  1. com.baidu.mobstat.*
  2. com.baidu.mtjstatsdk.*

Using data provided by Lookout, a mobile security company, we were able to determine a list of apps containing either of the above SDK variants in other Baidu and third-party apps. Lookout searched their app database for the presence of each of the above SDK variant names and the hard-coded key “h9YLQoINGWyOBYYk”. For com.baidu.mobstat.*, they found 11,636 unique Android app package names, and for com.baidu.mtjstatsdk.*, they found 12,482 unique app package names. Together, there were 22,548 unique app package names (a small number of apps contained both SDK variants).

Many of these apps appeared to have automatically generated package names which inflated the actual number of legitimate apps containing the SDK. Therefore, we filtered the apps by whether they were present in the Google Play Store. We found that only 454 of 22,548 of these were in the Google Play store. Since Google and the Google Play Store are presently inaccessible in mainland China, third-party app stores in China are very common. On hiapk.com, one popular app store in China, we found 6,672 out of 22,548 of the apps were present.

We analyzed the popularity of the apps that were in the Google Play Store. Among the apps that were free to download and that the store showed as having been installed a million or more times, we downloaded the app and manually verified that the SDK was still present in the app. In Table 5, we list these apps ordered by popularity.

Table 5: Popular apps in the Google Play Store containing the SDK and the number of their Google Play Store installs.

Google Play Store installs Android app title and package name
100,000,000 – 500,000,000 ES File Explorer File Manager [com.estrongs.android.pop]
50,000,000 – 100,000,000 Photo Wonder-Collage Maker [cn.jingling.motu.photowonder]
10,000,000 – 50,000,000 Azar-Video Chat & Call, Messenger [com.azarlive.android] ES Task Manager (Task Killer) [com.estrongs.android.taskmanager] 愛奇藝PPS [tv.pps.mobile]
5,000,000 – 10,000,000 Meipai [com.meitu.meipaimv]
1,000,000 – 5,000,000 百度地图 [com.baidu.BaiduMap] 手机百度 [com.baidu.searchbox] Well File Manager [com.fihtdc.filemanager] SingPlay: Karaoke your MP3s [com.nexstreaming.app.singplay] Kwai, the best short video App [com.smile.gifmaker] Mydol (STAR LOCKSCREEN) [com.wacompany.mydol] Speedometer GPS [luo.speedometergps]
500,000 – 1,000,000 ES App Locker [com.estrongs.locker] 爱奇艺视频HD [com.qiyi.video.pad]

Since the Google Play Store is inaccessible in mainland China, these installs are likely to represent users outside of China. More research is required to determine how many users are affected by the SDK by apps in Chinese app stores, which our initial analysis indicates contain a much larger number of the affected apps and would better represent the number of these apps’ Chinese users.

Any app that uses this SDK for statistics and event tracking sends messages to Baidu’s servers containing the same sensitive information in unencrypted and easily decryptable form that we saw the Baidu Android browser sending when it starts up. This includes the phone’s unique IMEI number, current GPS location, and nearby wireless networks.

Google provides a competing analytics SDK, whose users it requires to follow multiple policies. However, in contrast to the Baidu SDK, which tracks and insecurely sends sensitive information in every transmission, we found that the Google SDK does not and in fact prohibits third parties from using it to “upload any data that allows Google to personally identify an individual (such as certain names, Social Security Numbers, email addresses, or any similar data), or data that permanently identifies a particular device (such as a unique device identifier if such an identifier cannot be reset), even in hashed form.”

Discussion

Mobile devices generate, collect, and transmit a wide variety of personal identifiers and user data, in many cases without providing any notification to users. Our primer on this topic, The Many Identifiers in Our Pockets, provides an overview of this data collection and highlights some of the risks associated with the widespread collection and transmission of these identifiers.

The issues identified in this report raise a number of concerns surrounding the privacy and security of personal user data for users of Baidu Browser. Numerous identifiers, including a user’s search terms, GPS coordinates, URLs of visited websites, and MAC address, are sent without encryption from the user’s device to Baidu-hosted servers. A number of other data points, including device IMEI, nearby Wi-Fi networks and their MAC addresses, hard drive serial number, and file-system volume serial number, are transmitted using easily decryptable encryption.

The transmission of personal data without properly implemented encryption can expose a user’s data to surveillance. Any in-path actor, which could include a user’s ISP, wireless network operator (such as a coffee shop Wi-Fi connection), mobile carrier, or a malicious actor with network visibility, would have visibility into the unencrypted data transmitted from this application. Further, an in-path actor would be able to decrypt the encrypted communications sent by this application with relative ease as a result of the methods used to encrypt this traffic. Such interception would permit the discovery of a user’s physical location, the terms for which they are searching, nearby wireless networks, and a number of digital fingerprints of their physical hardware. Users would have no way of knowing their data was surveilled in such a manner, and most would be unaware that such data was transmitted by the application at all.

The leakage of such user data is particularly problematic for individuals who use these applications and their devices to engage in politically-sensitive communications. For example, documents released by Edward Snowden indicate that members of the Five Eyes intelligence alliance have used similar personal data leaked through UC’s mobile web browser to identify targeted individuals and track their communications.

Further concerns are raised regarding the collection and storage of this personal user data by Baidu. Baidu, like all other companies offering Internet services and hosting user data in China, is required by Chinese law to permit law enforcement and intelligence services access to this data. Dissidents and activists based in China have previously raised concerns about authorities’ use of personal data gathered from surveillance of mobile applications produced by China-based companies. As this report has shown, the Windows version of Baidu Browser collects and transmits the URL and title of all websites visited by a user and sends it alongside the serial number of the user’s hard drive as well as their MAC address. While Internet companies often collect personal user data for the normal and efficient provision of services, it is unclear why Baidu Browser collects and transmits such an extensive range of sensitive user data points.

In addition to the leakage of personal user data, the lack of code signatures means that both versions, Windows and Android, are susceptible to a malicious in-path actor forcing the application to download and execute arbitrary code. While the Android version of the application would prompt a user to authorize the installation of such code, an effectively crafted APK could easily deceive a user into authorizing such an installation. The Windows version does not prompt the user in any way before downloading and executing code, leaving the user wholly susceptible to a malicious actor.

The advertised benefits of the proxy feature in Baidu Browser appear similar to those motivating Baidu’s recent partnership with CloudFlare, specifically, improved performance for Chinese users accessing Internet services outside of the country. We are uncertain, however, whether the proxy feature documented herein is technologically related to the CloudFlare partnership.

While the advertised purpose of the proxy feature is to improve performance when accessing sites outside of China, our research demonstrates that this feature also permits users to access some otherwise banned websites from the browser. However, the pervasive collection and insecure transmission of personal user data by Baidu Browser means that using the browser to bypass censorship would require exposing a significant amount of personal data to Baidu and all in-path agents. In announcing their CloudFlare partnership, Baidu noted that it was “in contact with Chinese regulators from the beginning” of negotiations. It is unclear whether detailed user monitoring is required as a condition to providing access to a more efficient path to the wider Internet.

Questions for Baidu

On February 16, 2016, we sent a letter to Baidu with additional questions about the security vulnerabilities we identified, and committed to publishing their response in full. Read the letter here [PDF].

Baidu responded to our questions on February 22, 2016. Read their full response here [PDF].

Update: Analysis of updated versions of Baidu Browser

We analyzed version 6.4.14.0 of the Android version and version 8.2.100.3090 of the Windows to assess if the issues we identified and reported to Baidu has been fixed.

Analysis of Android client version 6.4.14.0

Our initial analysis of version 6.2.18.0 of the Android client identified four general security and privacy issues. Our analysis of version 6.4.14.0 of the Android client, released in late January 2016, shows that some of these issues have been resolved and some remain unresolved:

a. Leaks sensitive data on startup and Phones home with sensitive data about every page view

These issues appear to have been resolved insofar as the same information appears to be communicated by the application to Baidu servers but now it is encrypted using SSL.

b. Leaks sensitive data and address bar contents when inputting into address bar

This issue remains unresolved. In our communications with Baidu, they indicated they would not be fixing this issue. However, in addition to the contents of user searches, the browser still also includes sensitive data such as a user’s IMEI in an easily decryptable format in the request URL.

c. Insecurely checks for software updates

This issue has been resolved. Software updates are now checked using HTTPS.

Analysis of Windows client version 8.2.100.3090

Our initial analysis of version 7.6.100.2089 of the Windows client identified four general security and privacy issues. Our analysis of version 8.2.100.3090 of the client, released January 21, 2016, shows that only one of these issues has been addressed.

a. Leaks address bar contents when inputting into address bar

This issues remains unresolved. In our communications with Baidu, they indicated they would not be fixing this issue.

b. Communicates with Baidu servers via an easily decryptable protocol and Phones home information about every page view that includes hardware serial numbers

These issues remain unresolved. Our analysis indicates that data is still transmitted with easily decryptable encryption. In addition, every protobuf request sent to the dr.br.baidu.com domain now includes the user’s hard drive serial number and MAC access unencrypted in the header, a behavior not identified in the earlier version 7.6.100.2089 of the application that we analyzed in this report.

c. Insecurely checks for software updates

The application still checks for software updates unencrypted over HTTP; however, it now verifies the authenticode digital signature of the downloaded update to have been signed by Baidu.

Acknowledgments

The Citizen Lab would like to thank Seth Hardy from Lookout for assistance with this report. Jeffrey Knockel’s research for this project was supported by the Open Technology Fund’s Information Control Fellowship Program. Sarah McKune’s research was supported by a grant from the Open Society Foundations (Ronald J. Deibert, Principal Investigator), and Adam Senft’s from the John D. and Catherine T. MacArthur Foundation (Ronald J. Deibert, Principal Investigator).

Appendix

The following table lists our communications with Baidu related to the security and privacy issues we identified in Baidu Browser:

Date Contact
November 25, 2015 We submit a general inquiry by email to Baidu about how to submit a disclosure.
November 25, 2015 A Baidu representative responds with details on their disclosure process.
November 26, 2015 We formally emailed Baidu to notify them of the findings found in this report and indicated we would not publish sooner than 45 days (January 10) after this notification.
November 26, 2015 Baidu confirmed receipt of this notification.
December 3, 2015 Baidu responded to our notification, “confirm[ing] all the below security issues” and stating they “have taken appropriate solutions to prepare for fixing them. It will take at least three weeks to fix all the flaws in the next new version.”
January 4, 2016 We emailed Baidu inquiring as to the status of the resolution of the reported vulnerabilities and to remind them that we intend to publish January 10.
January 6, 2016 Baidu responded, stating that they intend to fix all reported vulnerabilities except for search terms being sent in the clear. They stated that the other vulnerabilities in the Android version will be fixed in version 6.4 released January 10 and the vulnerabilities in the Windows version will be fixed in a release on January 24.
January 9, 2015 Baidu representatives email us about a possible delay in resolving the issues under the initially proposed timeline. They suggested having a phone call with their security engineers to discuss the relevant issues.
January 12, 2015 We confirm to Baidu we would be interested in discussing the issues by phone.
January 13, 2016 Baidu emails us requesting to have a conversation over email or phone.
January 13, 2016 We indicate a preference for phone and say that we would like to talk as soon as possible.
January 14, 2016 A Baidu security researcher closer to China contacts us requesting a call.
January 14, 2016 We respond with a time offering to Skype.
January 14, 2016 A Baidu security researcher agrees to a Skype call to discuss the timeline for fixing the vulnerabilities.
January 14, 2016 We Skyped with a Baidu security engineer who requested delaying publication until February 14 because of the additional time required to fully fix the vulnerabilities due to them existing in multiple applications.
January 14, 2016 We emailed Baidu and agreed to delay publication until February 14.
February 16, 2016 We emailed a set of questions regarding Baidu Browser’s security and privacy practices to Baidu’s Director of International Communications.
February 17, 2016 We emailed the Baidu security team with our understanding of the status of the issues we reported in both browsers and inquired whether there would be future updates and, if so, their timeline.
February 22, 2016 Baidu sends back responses to our questions, which have have published in full here [PDF]

Footnote

1 A software development kit is a set of tools used for developing software applications.

Media Mentions

Reuters, The Globe and Mail, BBC News, Boing Boing, ZDNet, SBS Australia, SC Magazine UK, The Independent, Bangkok Post, iTWire, Betanews, Nasdaq, TechWeek Europe, Slate.

Listen to Citizen Lab Director Ron Deibert’s interview with BBC Newshour, which begins at 18:45 in the podcast.