NotRandomThoughts

Wednesday, March 9, 2016

ALERT: Reject the DHS Proposal

As part of the rulemaking process, the Department of Homeland Security (DHS) has proposed to exempt its information systems in the National Insider Threat Program from the Privacy Act requirements for criminal, civil, and administrative enforcement. (http://1.usa.gov/1QuPpom)

We should reject it in total.

What does the proposal mean?

The U.S. government has been moving swiftly to implement the National Insider Threat Program under a Presidential Executive Order since 2010. Under this mandate, federal departments and agencies with classified networks are directed to establish insider threat detection and prevention programs. Insider threats include: Attempted or actual espionage, subversion, sabotage, terrorism, or extremist activities; unauthorized use or hacking of information systems; unauthorized disclosure of classified or proprietary information or technology; and indicators of potential insider threats (undefined).

Who are the “insiders?”

Current or former federal employees, contractors, or detailees with access to secured systems and classified information are ostensibly within the scope. So are other authorized individuals who access related facilities, equipment, and information. According to the DHS docket (http://1.usa.gov/1paWlRl), they also cover family members, dependents, relatives, and individuals with a personal association to an individual under investigation, as well as witnesses and other individuals who provide statements or information related to an inquiry. In total, tens of millions of individuals are potentially covered as insiders under the national program.

What are “secured systems and classified information?”

The answer is elusive because there is no central control or consistent rule in the government. A simple statistic such as the total number of ongoing economic espionage investigations is a national secret according to the FBI. What is classified as secret by one federal agency may be simultaneously circulated widely and openly by another agency. Furthermore, information that has been unclassified for many years can be retroactively reclassified to be secret without explanation, as exemplified recently by the emails of former Secretaries of State.

What is collected and contained in the National Insider Threat Program information systems?

According to the DHS docket, the categories of records are extensive on each individual, including but not limited to personal and biometric data, ethnicity and race, letters and emails, social media accounts, logs of computer activities, travel records and foreign contacts, and information provided by individuals who report known or suspected insider threats.

On the last point, the U.S. government has reportedly been requiring “federal employees to keep closer tabs on their co-workers and exhorts managers to punish those who fail to report their suspicions” under the National Insider Threat Program (http://bit.ly/1i3VTzA). Others observed that such unfettered practice of using unreliable source had been tried during the Cold War to search for Soviet spies and did not work, but they led to the investigations of hundreds of loyal government workers, mostly of Eastern European origin, and ruined the careers of many (http://bit.ly/1MLfTj9). A similar approach by a U.S. senator of making accusations of subversion or treason against anyone “un-American” without proper regard for evidence is now termed “McCarthyism” in today’s dictionary.

The Privacy Act of 1974 provides fair principles to govern the government’s collection, maintenance, use and dissemination of personally identifiable individual records. With possible exceptions, such as for law enforcement or statistical purposes, the Privacy Act safeguards individual privacy from the misuse of federal records by requiring written consent of an individual before the government agency may disclose the personal record, even if it is to share with another federal agency. It also grants an individual access to his or her own federal records.

The DHS has already been collecting and maintaining individual data under the National Insider Threat Program. By citing criminal, civil, and administrative enforcement needs, the DHS proposes exemptions from the Privacy Act so that it can avoid accounting for disclosure, deny an individual from accessing his or her own records, collect and retain information about an individual regardless of relevancy or accuracy, and waive the requirement to serve notice to the individual when such information is collected or used.

The Story of Sherry Chen

Sherry Chen is a naturalized U.S. citizen and a federal employee. She has been an exemplary, award-winning hydrologist working in the National Weather Service until a co-worker in the U.S. Army Corps of Engineers identified her as a “Chinese National” attempting to access confidential information, which was in fact publicly available (http://bit.ly/1Mr5kHN, page 7).

Sherry was arrested and indicted in October 2014, accused of spying for China, the nation of her birth. Without credible evidence to proceed, the government dropped her case in March 2015 before her trial was to begin.

Whether it was coincidence or not, the informer was promoted into the National Oceanic and Atmospheric Administration which oversees the National Weather Service. Sherry was not allowed to return to her job and has been placed on administrative leave at taxpayers’ expense for the past year. To add insult to injury, the National Weather Service initiated the process to terminate Sherry’s employment in September 2015, using the same allegations in the failed prosecution. Her appeal is still pending after six months.

The government has so far refused to provide an explanation of what happened or an apology for its action, despite numerous media editorials, congressional inquiries, and petitions led by Nobel laureates and community and professional organizations (http://bit.ly/AAProfiling).

Reject the DHS Proposal

The story of Sherry Chen is not an isolated incident.

Racial discrimination and ethnic profiling have been a large part of American history. They have not disappeared. In its current zeal to find and prosecute insider threats, the government seems to consider the protection of some innocent Americans to be only secondary. Lack of accountability permits rush to judgment and potential misuse and abuse of authority without consequences.

The Federation of American Scientists has already submitted a comment on the DHS proposal that in case of adverse actions, an accused individual should be given at least a summary of the information used against him or her and be allowed to challenge the allegations as a matter of due process.

Whereas

· Tens of millions of Americans may be covered as insiders under the National Insider Threat Program

· Massive amounts of data and information are being collected on each of the individuals that may be inaccurate, unreliable, or retroactively modified

· Federal investigations are subject to human mistakes, errors from using unreliable information, misunderstanding, misguided direction, and illegal profiling

· Present safeguards have failed and allowed flawed investigations to proceed to wrongful prosecutions

· There is no statistical and objective third-party monitoring in place to provide accountability and prevent misuse and abuse of authority

The DHS proposal, as it stands, presents high risks that innocent individuals will be falsely accused and subject to unjust and damaging investigations and prosecutions with no recourse. These risks are even higher under today’s turbulent political climate where traditional American values are questioned or even refuted.

Therefore, the DHS proposal should be rejected in total in its present form.

For an alternative proposal to be considered potentially acceptable,

An individual should be allowed to review at least a summary of his or her security file upon request
An individual should be allowed full access to his or her security file as part of due process upon investigation or when accused of wrongdoing
Irrelevant and inaccurate records must be purged from the individual’s records when their status becomes clear
The government must produce publicly available statistical summaries on the status and trends of the information systems, including but not limited to the number of individuals covered and the number of ongoing investigations with breakdowns by protected civil rights factors
Regular third-party monitoring and review of the inherent policies and practices, such as Congressional hearings or public-private commissions, must be fully established

Comments on the DHS proposal can be submitted online by individuals or organizations at http://1.usa.gov/1QuPpom. The comment period ends on March 28, 2016.

This is a personal blog not associated with any organizations.

Saturday, July 4, 2015

推动中国智慧城市发展，小统计势在必行

胡善庆 王琼刘真

中国现代城市体制源于中国宪法第一章第三十条。主要分直辖市、地级市和县级市三个层次。地级行政区域包括30个自治州，其中22个州府设在县级市，其余8个州府设在县中。

改革开放以来，有不同的副地级市或省直辖市出现，但目前还属于非正式行政级别。

据《国家新型城镇化规划（2014-2020）》报告，2010年中国共有658个城市。截至2015年6月，有据可查的中国城市增加到670个，分别是4个直辖市，291个地级市和375个县级市。山东是全国最多城市的省份，共48个。广东其次，共44个。

智慧城市，利用先进的信息科技，以人为本，实现城市智慧式管理和运行，是21世纪很多国家都在追求的理想，亦是中国于2020年建成小康社会的一大目标。

据报导，“十二五”期间，全国智慧城市计划投资规模预计将超过1.6万亿元人民币。有些估计未来中国智慧城市市场的规模可达4万亿元。

一、智慧城市布局差异显著

从2013年1月至2015年4月，中国住建部和科技部分別公布三批共291个智慧城市试点名单，包括一批扩张范围试点。

291个智慧城市试点并不代表是291个城市。例如中国四个直辖市共有24个试点。

三层城市中，直辖市参与率和试点密度最高，地级市居次。另一方面，有5个试点目前属于4个县级行政区域 (四川省汶川县，新疆自治区富蕴县，福建省平潭县，和青海省海南州) 。

中国670个城市中有210个参加智慧城市试点，参与率为31.3%。地级市的参与率超过半数（53.3%）。

四个直辖市智慧城市试点差异最大。北京有11个试点，上海只有1个。安徽省有近六成城市参与，但每个城市基本上只有一个试点。浙江省参与的城市不足全省四份之一，但大部份参与城市都有不止一个试点。

二、六大建设方向明确，但缺乏监测与分析

由于智慧城市试点有3到5年的创建期，因此部份第一批试点应已接近从概念转为现实的转折点。

《规划（2014-2020）》积极推进智慧城市建设，提出六个建设方向，最主要莫过於信息网络宽带化。市民和企业能否上网？能否快速、大批、廉价地上网？简单地说，不能上网就没有智慧城市。其他五个发展方向，必须要实现能上网之后，再去衡量市民和企业是否可以获得及时、可靠、和高质量的政府信息和服务。

《规划（2014-2020）》第三十一章注明健全监测评估的重要。除了加强城镇化统计工作，并要实施动态监测与跟踪分析，开展规划中期评估和专项监测，推动规划顺利实施。

可惜到目前为止，291个试点中没有任何一个试点对智慧城市作出系统性或任何专项动态统计监测与跟踪分析。中央部门也没有公布任何计划，对各批试点作出定时或实时的监测统计报告。

与此同时，信息孤岛、重复建设、资源浪费以及政绩工程等问题，陆续浮现。一些被报导陷入危机的“鬼城”，也处于291个智慧城市试点之中。健全监测系统的必要性似乎越来越明显。

三、规范智慧城市监管，可从小统计入手

智慧城市的管理起步，并不需要拥有大数据，一些小统计可以先开始，由浅入深，由简至繁。例如信息网络宽带化的目标本来就很明确，处理一般的行政记录都应可提供这些小统计，定时甚至实时在网上公布。

去年9月，工信部和科技部公告39个城市（城市群）为“宽带中国”示范城市，其中至少32个已是智慧城市试点之一。“宽带中国”中的指标更近期化和精细化。

2014-2015年是“宽带中国”推广普及阶段。重点在继续推进宽带网络提速的同时，加快扩大宽带网络覆盖范围和规模，深化应用普及。2016年应该开始优化升级阶段。

2015年底，“宽带中国”中的数量指标包括固定宽带用户超过2.7亿户，城市和农村家庭固定宽带普及率分别达到65%和30%。3G/LTE用户超过4.5亿户，用户普及率达到32.5%。行政村通宽带比例达到95%。城市家庭宽带接入能力基本达到20Mbps，部分发达城市达到100Mbps，农村家庭宽带接入能力达到4Mbps。3G网络基本覆盖城乡，LTE实现规模商用，无线局域网全面实现公共区域热点覆盖。互联网网民规模达到8.5亿。

目前智慧城市试点有多少个会在今年底达到这些个别的目标？

此时不推出统计评估系统，开始实施动态监测与跟踪分析，更待何时？

Saturday, August 23, 2014

2014 Workshop on Big Data and Urban Informatics

After more than a year of preparation, the Workshop on Big Data and Urban Informatics was held at the University of Illinois at Chicago on August 11-12, 2014.

More than 150 persons from at least 10 countries (Australia, Canada, China, Greece, Israel, Italy, Japan, Portugal, United Kingdom, and the U.S.) attended the forum sponsored by the National Science Foundation.

Piyushimita (Vonu) Thakuriah, co-chair for the workshop, reported on the funding of Urban Big Data Center at the University of Glasgow in Scotland (http://bit.ly/1kXG2Uh). Its mission is to “support research for improved understanding of urban challenges and to provide data, technology and services to manage, make policy, and innovate in cities.” The Urban Big Data Center partners with five other universities including the University of Illinois at Chicago. Vonu, a transportation expert, is the director of the center.

In the course of two full days, 68 excellent presentations were made in total, far exceeding the expectations of the organizers a year ago. These papers will be posted in the web in the near future.

Two luncheon keynote speakers highlighted the workshop.

Carlo Ratti presented the state-of-the-art work of the MIT SENSEable City Lab, which specializes in the deployment of sensors and hand-held electronics to study the environment. Since conventional measures of air quality tend to be collected at stationary locations, they do not always represent the exposure of a mobile individual. In one project titled “One Country, Two Lungs” (http://bit.ly/1nbSBXi), a team of human probes travelled between Shenzhen and Hong Kong to detect urban air pollution. The video revealed the divisions in atmospheric quality and individual exposure between these two cities.

Paul Waddell of the University of California at Berkeley presented his work on urban simulation and dynamic 3-D visualization of land use and transportation. Some of his impressive work images can be found at http://bit.ly/1rn9hmj. His video and examples reminded me about their potential applicability for creating the “Three Districts and Four Lines” in China’s National Urbanization Plan. I also learned about a somewhat similar set of products from China’s supermap.com, a Geographic Information System software company based in Beijing.

One of the 68 presentations described the use of smart card data to study the commuting patterns and volume in Beijing subways during rush hours. One other presentation compared the characteristics of big data and statistics and raised the question of whether big data is a supplement or a substitute to statistics.

The issue of data quality was seldom volunteered in the sessions, but questions about it came up frequently. Through editing, filtering, cleaning, scrubbing, imputing, curating, re-structuring, and many other terms, it was clear that some presenters spent an enormous amount of their time and efforts to just get the data ready for very basic use.

Perhaps data quality is considered secondary in exploratory work. However, there are good quality big data and bad quality big data. When other options are available, spending too much time and effort on bad quality big data seems unwise because it does not project a practical, future purpose.

There were also few presentations that discussed the importance of data structure, whether it is already built in as design or created through metadata. Structured data contain far more potential information content than unstructured ones and tend to be more efficient and optimal in information extraction, especially if they have the capability to be linked across multiple sources.

For the purpose of governance, I was somewhat surprised that use of administrative records has not yet caught on in this workshop. Accessibility and confidentiality appeared to be barriers. It would seem helpful for future workshops to include city administrators and public officials to help bridge the gap between research and practical needs for day-to-day operations.

Nations and cities share a common goal in urban planning and urban informatics – improve the quality of city life and service delivery to constituents and businesses alike. On the other hand, there are drastic differences in their current standing and approach.

China is experiencing the largest human migration in history. It has established goals and direction for urban development, but has little reliable, quantitative research or experience to support and execute its plans. The West is transitioning from its century-old urban living to a future that is filled with exciting creativity and energy, but does not seem to have as clear a vision or direction.

Confidentiality is an issue that contrasts sharply between China and the West. The Chinese plans show strong commitment to collect and merge linkable individual records extensively. If implemented successfully, it will generate unprecedented amount of detailed information that can also be abused and misused. The same approach would likely face much scrutiny and opposition in the West, which has to consider less reliable but more costly alternatives in order to meet the same needs.

There is perhaps no absolute right or wrong approach to these issues. The workshop and the international community being created offer a valuable opportunity to observe, discuss, and make comparisons in many globally common topics.

Selected papers from the workshop will now undergo additional peer review. They will be published in an edited volume titled “See Cities Through Big Data – Research, Methods and Applications in Urban Informatics.”