Sep062016
Posted at 12:27 PM
Post by Justin Antonipillai
The rapid evolution of technology and the rise of all things digital have unleashed a stunning amount of data through federal agencies – and an unprecedented opportunity to advance the ways in which we share, analyze, interpret and put data to work in meaningful ways. To that end, I’d like to share a few stories that demonstrate how using data science, agile development methods, user-centered design principles and simple visualizations can unlock new levels of productivity and insight from data at the Commerce Department.
The background: Commerce Data Services (CDS) launched its Data Academy last spring to teach basic data science skills to Commerce employees interested in increasing their skills. In addition, CDS offered a special opportunity to selected students to participate in a four-week intensive data boot camp led by Natassja Linzau, who leads our education initiatives. Graduates of the boot camp then served a three-month residency with the CDS through their newly established In-Residence program to learn how to bring a modern, data-science approach to their work.
Chief Data Scientist Jeff Chen oversaw the residents during their details with the CDS. Projects for each resident were selected by their individual bureaus to answer a fundamental challenge: “What agency problem can I solve, and what value can I create, by putting our data to work in new ways?” The overarching goal was to demonstrate how we can make Commerce data more consumable, and bring a data-driven approach to modernizing government.
The results evidence what’s possible for individuals and agencies that are able to put data to work in new ways. In a longer post than usual, let me highlight just a few projects and the individuals who made them happen:
- EDA Grants Viewer, Stephen Devine, Economic Development Administration.
Stephen’s project consolidates EDA grant data from disconnected databases and gives EDA staff a single, user-friendly interface to easily and quickly query, download, and visualize data on grant recipients and the impact of the grant. This allows EDA staff to be more responsive to internal and external stakeholders, including Congress, the White House Office of Management and Budget, local officials, and EDA managers. The Grants Viewer saves EDA time and money, and allows managers and leadership to access and employ EDA data to sharpen decisions.
- “beaR” Library, Andrea Julca, Bureau of Economic Analysis
Andrea’s project uses “R,” an open-source programming language for statistical computing, and one of the fastest-growing statistical programming languages and tools of choice for data scientists globally.
BEA’s data is widely used by economists, academics and researchers, data journalists and countless others. Today, BEA puts out a lot of complex and priceless datasets on spread sheets, with literally thousands, if not millions, of rows of data. To sort, clean, and use the data, users have to pull it out of the spread sheets and transfer it to a script they wrote to process it.
To help BEA’s data users and broaden the impact of the data, Andrea created an R library where users can use our API to pull from BEA’s multiple datasets and obtain, query, and visualize the BEA data more easily and quickly. By automating this function, we make it easier for others to use our data and yield valuable insights.
- Expertise-based Patent Matching, Karl Skowroneck, US Patent and Trademark Office
Today, when patent managers take in patent applications, the task of matching and sending the applications to the right patent examiner is still too much of a manual and gut-sense process.
To make this process more efficient and objective, and match the patent application to each examiner’s expertise and work load, Karl developed his Patent Matching system. It uses a “recommendation engine” – harnessing data to automate sending the right applications with the right examiners.
Karl’s project helps streamline the patent review workflow, improve the quality, and help examiners do their best work – which makes everyone happier.
- Potential Client Prediction Tool, Pri Oberoi, CDS
Pri’s tool uses predictive modeling to combine internal Commerce data with commercial data to build an algorithm that can sift through millions of businesses and reveal new, untapped clients for Commerce to reach and serve.
The tool involves two Commerce bureaus so far: 1) The International Trade Administration and its “New Exporters Project” that allows ITA to identify and reach out to help potential exporters; and 2) The National Institute of Standards and Technology as it seeks to identify manufacturers that could benefit from partnering with Commerce to push cutting-edge manufacturing practices and innovation.
- Search String Analysis Project, Star Ying, CDS
When reviewing patent applications, examiners need better patent search tools to ensure the innovations they’re reviewing are really new and different. So they need better search language tools, especially with the tech-driven explosion in new terms, acronyms and synonyms. A patent application referring to “Christmas tree” could mean either the one with tinsel and candy canes, or an oil/gas drilling device.
Star Ying, a CDS lead data scientist, developed an algorithm that standardizes search terms and provides patent examiners with a Google-like search capability, allowing quicker discovery of the most relevant terms (and more consistent actions).
Faster and better search is crucial, since USPTO has 226 years of patent data from back to the first in 1790 for a fertilizer ingredient, and only 8,000 examiners reviewing 600,000 annual patent applications a year. Star’s tool also offers modern visualization displaying search data graphically, in word clouds, to offer a variety of perspectives.
- Travel Tracker, Negar Kalbasi, CDS
Negar’s cloud-based travel itinerary and event-tracking tool would bring modern functionality to our legacy system, including simplified data entry, better records management, real-time reporting on multiple dimensions, new search functions, and visual dashboards – all bringing more cohesive and seamless coordination and tracking of where our senior officers are representing Commerce, allowing for further strategic use of time and other resources.
As CDS developed Travel Tracker, it tested the tool with ITA's Global Markets Office and incorporated user feedback into design and development. The tool already holds hundreds of travel records and has the potential to help track and coordinate travel of all Commerce leaders across different bureaus.
- Patent Data Visualization, April Blair, PTO
April’s project would help patent review managers see – in a glance – the output of their patent examiners.
Currently, managers need to analyze and decipher spreadsheets of numbers, some dated. April’s tool will provide managers with real-time data in an easily digestible format that allows them to see, at a glance, how team members are doing, and take quicker, more efficient and effective management action.
Going forward, there will only be more data and more of a need for advanced technologies + data science skills to find meaning in it. These (and many other) CDS projects validate a key tenet of the agency’s strategic plan – that data and the ability to manage it more effectively are the fuel for innovation.