Category Archives: Big Data

Display data for trading

Please share this article:

hey there,

I m currently working at a trading shop where they hosting historic data in a database system (SQL). They are also hosting real time data in their database. My goal is to help them display these data in an easy to use platform (web/app) so they could visualize it easier.

i was thinking of using python but i am open to any ideas. at the moment, simplicity would be ideal since i don’t want to be spending too much time figuring things out


submitted by /u/mshiddensecret
[link] [comments]

Please share this article:


Big data driven insights into tomorrow's marketing environments

Please share this article:

The business of Experian, a global leader in credit reporting and marketing services with annual revenues exceeding US$4.3 billion (for 2017), is all about data.

Experian has four main business units: Credit Information Services, Decision Analytics, Business Information Services, and Marketing Services. Experian Marketing Services (EMS) helps marketers connect with customers through relevant communications across a variety of channels, driven by advanced analytics on an extensive database of geographic, demographic, and lifestyle data. EMS has built its business on the effective collection, analysis, and use of data.

Quadrillions of records

The company has always handled large amounts of data, billions and quadrillions of records, on who consumers are, how they’re connected, how they interact. With today’s proliferation of digital channels and information of social media likes, web interactions and email responses, older systems no longer have the capacity to deal with the data volumes.

In the past, there was no requirement to provide data in real-time. Experian sent customer database updates to clients once a month for campaign adjustments, allowing Experian to process large volumes of data through a number of diverse platforms, which were mostly mainframe based.

That’s changing. Today’s consumers leave a digital trail of behaviors and preferences for marketers to leverage so they can enhance the customer experience. Experian’s clients, which includes many of the top retail companies in the world, are asking for more frequent updates on consumers’ latest purchasing behaviors, online browsing patterns and social media activity so they can respond in real time. They are increasingly looking for a single, integrated view of their customer.

Technology infrastructure for real-time reporting

Meeting the need for immediacy of information and customisation of data in real time for clients, would require a technological infrastructure that can accommodate rapid processing, large-scale storage, and flexible analysis of multi-structured data. Experian’s mainframes were hitting their limits in terms of performance, flexibility and scalability.

EMS set an internal goal to process more than 100 million records of data per hour, translating to 28,000 records per second.

The team decided to look for new architectures that could handle the new volumes of data. About 30 criteria were identified for the new platform, ranging from depth and breadth of offering to support capabilities to price to unique distribution features. Two criteria were prioritized: Both batch and real-time data processing capabilities; and scalability to accommodate large and growing data volumes.

The North America Experian Marketing Services group led the evaluation of NoSQL technologies within Experian. Hadoop and HBase quickly surfaced as a natural fit for Experian’s needs. EMS engineers downloaded raw Apache Hadoop.

They saw certain gaps that could be filled by a commercial distribution. EMS evaluated several distributions and selected Cloudera to meet EMS’ enterprise-level Hadoop needs, such as meeting client SLAs (service level agreements) and having 24×7 reliability.

Experian invested in Cloudera Enterprise, which is comprised of three things: Cloudera’s open source Hadoop stack (CDH), a management toolkit (Cloudera Manager), and expert technical support.

A production version of Experian’s Cross-Channel Identity Resolution (CCIR) engine was launched. CCIR is a linkage engine that is used to keep a persistent repository of client touch points. CCIR runs on HBase, a high-performance, distributed data store that integrates with Cloudera’s platform to deliver a secure and easy-to-manage NoSQL database.

EMS’ HBase system spanned five billion rows of data, as of 2017, and the number is expected to grow tenfold in the near future. HBase offers a shared architecture that is distributed, fault tolerant, and optimised for storage. In addition, HBase enables both batch and real-time data processing.

Experian feeds data into the CDH-powered CCIR engine using custom extract, transform, load (ETL) scripts from in-house mainframes and relational databases including IBM DB2, Oracle, SQL Server, and Sybase IQ.

Processing performance accelerated by 50x

The new platform is delivering operational efficiency to Experian by accelerating processing performance by 50x, at a fraction of the cost of the legacy environment. The new system can process 100 million records per hour compared to 50 million matches per day earlier.

Cloudera Enterprise allows Experian to get maximum operational efficiency out of their Hadoop clusters. Due to a wide variation in use cases for customers, the team had to do a lot of tweaking on the platform to get the performance we need. Cloudera Enterprise provides the ability to store these store different configuration settings and version those settings.

McCullough added, “Not only has Cloudera Manager simplified our process, but it’s made it possible at all. Without a Linux background, I would not have been able to deploy Hadoop across a cluster and configure it and have anything up and running in nearly the timeframe that we had.”

Furthermore, Cloudera Manager enabled the deployment and configuration of Hadoop across a cluster in the timeframe Experian had. Cloudera Manager monitors services running on cluster and reports when servers are unhealthy, services have stopped, and/or nodes are bad. It automates distribution across the cluster, monitors CPU usage across various applications and data storage availability and provides a single portal to see into all cluster details.

The deployment allowed Experian to process orders of magnitude more information through its systems. Experian’s platform is the first data management platform of its kind that accepts data, links information together across an entire marketing ecosystem, and puts it into a usable format for an enhanced customer experience. These data processing capabilities combined with Experian’s expertise in bringing together data assets provided new insights into tomorrow’s marketing environments.

Further collaboration

In January 2017, it was announced that Experian was integrating Cloudera Enterprise onto its cloud environment for its Credit Information Services, Decision Analytics and Business Information Services business lines, with the aim of improved credit data processing speeds for clients. Thus, Cloudera continues to transform the way Experian provides consumer and business credit data to its clients.

All content from customer success story and case study on

Please share this article:


Global Big Data in Power Sector Market Analysis, Growth, Trends & Forecast 2018-2023 –

Please share this article:

DUBLIN–(BUSINESS WIRE)–The “Global Big Data in Power Sector Market – Growth, Trends, and Forecast (2018 – 2023)” report has been added to’s offering.

The imbalance in electricity demand and supply is driving the demand for a smart solution, like big data. Big data has helped utility companies to track consumption pattern and forecast, to accordingly shift the supply in both space and time, hence, resulting in efficiently utilizing assets.

Smart grid market is maturing. In 2016, utility companies invested more than USD 10 billion to deploy smart meters pushing the total installed number to 700 million. This has resulted in more than 100 petabytes of data per year. More than half of the deployment was in China, while the rest of the world installed more than 50 million in 2016.

In the US, companies in the electricity sector have installed 65 million smart meters, covering more than 50% of the total households, in 2015. Deployment reached approximately 70 million smart meters by the end of 2016, and is projected to reach 90 million by 2020. Utilities in the country are increasingly using big data for better decision making.

Benefits of big data technologies in big wind farms are currently not on the as much as in demand management, energy storage, and distributed generation sectors, but big data technology adoption in the power sector is expected to increase in the coming years. The incremental yield improvement is expected to be in the order of 1-5%.

Companies Mentioned

  • Microsoft
  • Teradata
  • International Business Machines Corporation (IBM)
  • SAP SE
  • Palantir Technologies Inc.
  • Oracle Corp.
  • EnerNoc Inc.
  • Siemens AG
  • C3 Inc.
  • Accenture PLC

Key Topics Covered:

1. Executive Summary

2. Research Methodology

3. Market Overview

4. Market Dynamics

5. Supply Chain Analysis

6. Industry Attractiveness – Porter’s Five Forces Analysis

7. Market Segmentation and Analysis

8. Regional Market Analysis

9. Key Company Analysis

10. Competitive Landscape

11. Disclaimer

For more information about this report visit

Please share this article:


Big Data Market 2018: Growing and Moving to the Cloud

Please share this article:

The big data market is strong and thriving — although it isn’t always called “big data” these days.

The term “big data” first became part of the tech lexicon in the late 1990s, when people like John Mashey at SGI began using the phrase to describe the enormous and growing stores of enterprise data that were difficult to store and analyze using the technology available at the time.

In 2001, analyst Doug Laney suggested a definition of big data that included three Vs: volume, velocity and variety. Over the next few years, Laney’s definition became something of an industry standard, and some people added a fourth V — variability — to the definition.

In 2005, big data technology took a dramatic step forward when Yahoo debuted the Hadoop open source distributed data store. The project became the lynchpin for an entire ecosystem of commercial and open source data storage and analytics solutions.

In 2014, IDC and EMC released their most recent digital universe study, which revealed that the amount of data stored by the world’s digital systems is growing by 40 percent per year. The companies predicted that by 2020, the digital universe would include 44 zettabytes of information. That’s nearly as many bits as there are stars in the universe, and it’s enough information to fill a stack of 2014-era tablets stretching to the moon 6.6 times.

Today, big data certainly hasn’t become any smaller, but the size of growing data stores no longer gets as much attention as it once did. Instead, most organizations are focused on analytics, data science and machine learning. They have accepted that managing big data is simply a part of doing business; if they want to compete and succeed, they need to find ways to turn those big data stores into valuable insights.

Big Data Market Overview

Enterprise spending on big data technologies continues to climb as it has for the past decade. According to IDC, worldwide revenues for big data and business analytics are likely to grow from $150.8 billion in 2017 to $210 billion in 2020. That’s a compound annual growth rate of 11.9 percent.

“After years of traversing the adoption S-curve, big data and business analytics solutions have finally hit mainstream,” said Dan Vesset, an IDC group vice president. “BDA as an enabler of decision support and decision automation is now firmly on the radar of top executives. This category of solutions is also one of the key pillars of enabling digital transformation efforts across industries and business processes globally.”

And organizations are reporting that their big data initiatives are having a positive impact on their bottom line. In the NewVantage Partners Big Data Executive Survey, 80.7 percent of respondents said that their big data investments had been successful, and 48.4 percent said that they had realized measurable benefits as a result of their big data initiatives.

Those sorts of results are likely to encourage enterprises to continue investing in big data, but the types of big data solutions they are adopting are shifting. According to Forrester Research, “The shift to the cloud for big data is on. In fact, global spending on big data solutions via cloud subscriptions will grow almost 7.5 times faster than on-premise subscriptions.” The firm added, “Furthermore, public cloud was the number one technology priority for big data according to our 2016 and 2017 surveys of data analytics professionals.”

The cloud is particularly popular for big data analytics that rely on machine learning technologies. Machine learning requires advanced — and expensive — computing hardware, but running machine learning in the cloud makes it possible for organizations to access this technology at a fraction of the cost of what it would take to install it in their own data centers. Although organizations face some challenges related to cloud analytics, experts say this cloud analytics trend is likely to accelerate in coming years.

Big Data Technologies: Market Breakdown

As the big data market has matured, vendors have developed a wide variety of different big data technologies to meet enterprises’ needs. This is a very broad market, but most big data solutions fall into one of the following categories:

  • Business intelligence (BI): Business intelligence solutions provide analytics and reporting capabilities on business data typically stored in a data warehouse. According to Gartner, the BI and analytics market is forecast to increase from $18.3 billion in 2017 to $22.8 billion in 2020. However, this is slower growth than in the past.
  • Data mining: Data mining is a broad category that encompasses a wide variety of techniques for finding patterns in big data. While many big data solutions still offer data mining capabilities, the term has fallen somewhat out of favor as vendors instead are using terms like “predictive analytics” and “machine learning” to describe their solutions.
  • Data integration: One of the big challenges with big data analytics is gathering all the relevant data from disparate sources and converting it into a format that allows for it to be analyzed easily. This had led to a whole crop of data integration solutions, which are sometimes also called ETL (short for “extract, transform, load”) solutions. According to Markets and Markets, data integration revenues could be worth $12.4 billion by 2022.
  • Data management: This category of solutions includes tools that help organizations integrate, clean, store, secure and assure the quality of their digital data. Markets and Markets predicted that this category of big data tools could generate $105.2 billion in revenue by 2022.
  • Open source technologies: Many of the most widely used big data technologies are available under open source licenses. In particular, technologies like Hadoop and Spark, which are managed by the Apache Foundation, have become very popular. Many vendors offer commercially supported versions of these open source big data technologies.
  • Data lakes: A data lake is a repository that ingests data from a wide variety of sources and stores it in its native format. This is a little different than a data warehouse, which stores data that has been cleaned and formatted for analytics. Data lakes are popular with organizations that want to perform analytics on both structured and unstructured data.
  • NoSQL databases: Unlike relational database management systems (RDBMSes), NoSQL databases don’t store information in traditional tables with rows and columns. Instead, they use other models, such as columns, documents or graphs for tracking data. Many enterprises use NoSQL databases for storing unstructured data for analytics.
  • Predictive analytics: Currently one of the most popular forms of big data analytics, predictive analytics looks at historical trends in order to offer a good estimate about what might happen in the future. Many modern predictive analytics solutions incorporate machine learning capabilities so that their forecasts become more accurate over time. A Zion Market Research report said spending on predictive analytics could climb from $3.49 billion in 2016 to $10.95 billion by 2022.
  • Prescriptive analytics: Prescriptive analytics goes a step farther than predictive analytics. In addition to telling organizations what is likely to happen in the future, these solutions also offer suggested courses of action in order to achieve desired results. Experts say few (if any) big data analytics solutions currently on the market have true prescriptive capabilities, but this is an area of intense research for vendors.
  • In-memory databases: In-memory technology makes big data analytics much, much faster. In any computer system, accessing data in memory (also sometimes called RAM) is much faster than accessing stored data on a hard drive or solid state drive. In-memory databases allow users to store vast quantities of data in memory, yielding dramatic speed boosts.
  • Artificial intelligence and machine learning: Many next-generation big data analytics tools incorporate machine learning, which is a subcategory of artificial intelligence (AI). Machine learning uses algorithms to help systems get better at tasks over time without explicit programming. This is one of the fastest-growing areas of the big data market.
  • Data science platforms: Many vendors have begun labelling their big data analytics solutions as “data science platforms.” Products in this category typically incorporate many different capabilities in a unified platform. Nearly all the products in this category have some analytics and machine learning features, and many also have data integration or data management features as well.

Big Data Companies

Given that the market includes so many different types of big data solutions, it should be no surprise that an extremely long list of companies offer big data products. The list below includes some of the best-known big data companies, but there are many others.

  • Amazon Web Services — offers cloud storage, databases, data warehouse, analytics and machine learning services
  • Alpine Data Labs — now owned by Tibco; offers a data science and machine learning platform
  • Alteryx — offers a self-service big data analytics platform
  • Big Panda — offers analytics for monitoring and managing IT event data
  • Cloudera — offers a Hadoop distribution, plus data science and big data analytics tools
  • Databricks — founded by the team behind Apache Spark; offers a united analytics platform powered by Spark
  • Dataiku — offers a collaborative data science platform
  • Datameer — offers an agile data pipeline management platform
  • DataStax — founded by the team behind the Apache Cassandra database; offers a distributed cloud database based on Cassandra
  • Domino — offers a data science platform
  • FICO — offers data analytics tools, including AI and machine learning software and solutions for fighting fraud and cybercrime
  • Google Cloud — offers cloud-based storage, data warehouse, analytics, machine learning, and more
  • GridGrain — offers an in-memory computing platform based on Apache Ignite
  • — offers data science and machine learning platforms based on open source technology
  • Hitachi Vantara— formed by the merger of Hitachi Data Systems, Hitachi Insight Group and Pentaho; offers data integration, big data analytics, storage and related products
  • Hortonworks — offers a popular Hadoop distribution, as well as other big data tools and services
  • HPCC — offers a distributed big data platform that is an alternative to Hadoop
  • HPE — offers big data hardware and services
  • IBM — offers big data cloud services, as well as database, data warehouse, analytics and machine learning software
  • Informatica — offers a cloud-based data management platform with a wide variety of big data solutions
  • KNIME — offers data mining and analytics software
  • MapR — offers a converged data platform, plus big data storage, analytics, machine learning and NoSQL database
  • MarkLogic — offers a NoSQL database and data integration tools
  • Microsoft Azure — offers cloud-based storage, big data analytics, machine learning, data warehouse, data lake and more
  • MongoDB — offers a NoSQL database and a cloud service based on the same technology
  • Mu Sigma — offers big data analytics and decision science solutions
  • Oracle — offers cloud-based and on-premise database, data integration, data management, analytics and more
  • Palantir — offers data integration and data management solutions
  • Pivotal — offers in-memory technology and a multi-cloud analytics platform
  • Qlik — offers business intelligence and analytics software
  • RapidMiner — offers data mining, data science, predictive analytics and machine learning solutions
  • SAP — offers in-memory data management, analytics, artificial intelligence and machine learning tools
  • SAS — offers analytics, business intelligence and data management solutions
  • SiSense — offers business intelligence and analytics
  • Splice Machine — offers a combination database, data warehouse and machine learning platform
  • Splunk — offers analytics for log and security data
  • Striim — offers streaming analytics
  • SumoLogic — offers analytics for log and security data
  • Tableau — offers business intelligence and big data analytics
  • Talend — offers big data integration tools
  • Tibco Jaspersoft — offers business intelligence and analytics
  • Teradata — offers data warehouse, data lake and business analytics
Please share this article: