Published on November 19, 2019 by Archana Anumula
Data is being generated faster than we can keep up with
Estimates are that the world has several quintillion bytes of data and that by 2020, there will be 40x more bytes of data than stars in the observable universe 1 . This staggering amount of data brings with it a number of challenges, particularly in the research industry, where time is of the essence. More often than not, analysts face issues – from data mining to collation to cleaning to validation. These are necessary tasks that they have to spend most (40-50%) of their research time on.
Increasingly necessary data checks
For instance, at the heart of macroeconomic research is a reliable historical data series. Increasing globalization and ever-changing geopolitics mean that most research involves cross-country analysis and generating thematic reports. Preparing the necessary background data files for such reports is not a simple or straightforward task and could be time consuming. If an analyst spent 20-30% of their time on data collection and analysis 10 years ago, they would have to spend twice that now, with increased reliance on technology solutions and larger teams to handle big data efficiently and to arrive at meaningful results.
A not-so-trivial issue in the big data business
Many macro research analysts source trade-related data from UN websites (such as the UN Comtrade Database2 and the UNCTAD database). The UN Comtrade Database hosts a repository of official international trade statistics and relevant analytical tables. It is a complex web of data, a mini universe in itself, with data for over 170 countries detailed by commodity/service category and partner countries. With time series dating back to the early 1960s, the website contains well over 3 billion data records.
Usually, finding fish/meat exports of a single developed market such as the US or UK via national statistics offices is relatively easy, while performing the same exercise for a number of countries could be cumbersome. This is where the UN Comtrade Database comes in. A regular user would agree that multiple downloads are needed to collate the dataset, as there are limits on how much data can be downloaded at one time – data can be downloaded only for five years, five reporting countries and five trading partners. In addition, retrieving data for complex queries is possible only in JSON or XML format, and bulk datasets can be downloaded only via public APIs. This means that a user with a non-IT background would have to break the parameters into smaller sets.
After data cleaning, it is important to validate the data. With trade-related data, for example, data checks can become onerous due to increased volatility of the series. As such, checks and tests need to be carried out to ensure that the resultant series is accurate. For instance, missing data points in the historical series would have to be computed and currencies/numbers converted.
In this age of convenience, where everything is needed at the click of a button, data sourcing and maintenance to produce accurate results should be a simple process. While the data available on the UN Comtrade Database is invaluable and probably irreplaceable, it is still “raw” and requires fine-tuning. The data download limits also mean we need a more-easy-to-use platform.
Demand for alternative data cleaning and validation sources
To address such challenges, we offer a range of data management services such as data support through platforms or managed services. One example is Data Bridge – a platform to manage data across multiple source files in a database or a data-pull process.
How some are tackling the problem
Integrating Data Bridge with, for example, a third-party platform such as the UN Comtrade Database enables sourcing data through multiple channels, be it web scraping or bulk data downloads via APIs. Through Data Bridge, the user can procure data in pre-defined formats and define the analysis required (for example, calculate growth in the UK’s fish exports or calculate the UK’s fish exports as a percentage of its total exports to the US). This saves a significant amount of the user’s time, otherwise spent on data aggregation, while ensuring the accuracy of data, prior-checked for errors and inconsistencies, with the option to raise queries, if needed. Other functions such as the ‘Scheduler’ option enables the user to run periodic file updates to ease the data updating process, while the ‘Dashboard’ option provides a clear picture of data availability at any given point in time or as per other parameters. The ‘Data Visualization’ feature of Data Bridge completes the picture, providing additional information and data search functionalities for the well-informed user. In summary, Data Bridge will help transform data from multiple channels and formats into a ready-to-use, customizable format, saving time, improving efficiency and reducing costs.
What's your view?
Thank you for sharing your Comments
About the Author
Assistant Director, Quantitative Services
Archana Anumula has more than 12 years of experience in economic research. She is proficient in writing country-specific economic reports and short notes on macroeconomic releases and in building and maintaining large databases, among others. She has been with Acuity Knowledge Partners since 2011, and currently manages the Economics Research Support team that carries out sell-side research for a global investment banking firm. She is adept in managing accounts, teams, and pilot projects. Prior to joining Acuity Knowledge Partners, Archana was at the Research division at UBS ISC (Cognizant) and Infosys BPM.
Archana Anumula holds a Bachelor of Commerce and Master of Economics from Bangalore University, India.
Like the way we think?
Next time we post something new, we'll send it to your inbox