Big Data International Conference 2018
Wikipedia defines Big Data as data sets that are so voluminous and complex that traditional data-processing application software are inadequate to deal with them. This includes both fixed and variable schema structured data, and unstructured data such as text, audio, image and video. All of it now seems to be flowing into the enterprise as demand increases to analyse it to help improve customer engagement, reduce risk or reduce cost. While much of this data was initially being ingested into on-premises Big Data systems like Hadoop, the last twelve months in particular has seen a huge move to capturing much of this data in Cloud storage as the battle for Big Data Analytics shifts to the Cloud. Today, everything from Hadoop, Spark, NoSQL databases, data warehouses, master data management, and streaming data are all now all available on the Cloud.
Also, data integration tools, information catalogs and analytical tools are also available there and while this trend is set to continue, most companies are now in a situation where Big Data and Big Data processing is running in a hybrid computing environment with data spread across multiple Cloud and on-premises data stores. This kind of complexity means that analytical systems (including Big Data platforms) now need to be managed and accessible across the firewall and not just on-premises.
Given this complexity how can you reduce time to value in a hybrid Big Data environment when your data is now not necessarily a single central data store but spread across multiple data stores on-premises and in one or more Clouds?
Also, tools are changing to help us be more productive. Workflow tools now mean we can build analytical pipelines without the need to write code.
Machine Learning automation tools are now available to accelerate model development and models can be deployed as services in containers so that they can run in-Hadoop, in-Spark, in-Stream, and at the edge. Data Science workbenches and interactive notebooks accelerate model development in Python, R, and Scala. Also SQL can now be used to access structured and semi-structured schema free data.
But with all these tools and data in Cloud and on-premises data stores, how are you going to govern this environment now that GDPR is active legislation? Is there anything to help with this? How can you exploit all this to accelerate time to value? What key skills do you need and what do you need to avoid in your data engineering teams to succeed?
In addition, with the arrival of the Internet of Things in manufacturing, healthcare, insurance, energy, logistics, government and other industries, high velocity streaming data increasing rapidly. Are you ready for streaming data? What should you do about real-time? Is this now the future? What is your strategy for IoT and how will fast data impact your architecture and your current methodologies?
This Conference aims to provide an update on Big Data to show the latest advances in technology and address all these areas The intention is to help you be successful with Big Data, integrate new technologies into your existing environment and reduce time to value.
- Introduction - Accelerating Time to Value in A Hybrid Big Data Computing Environment
- Organising for success - The five dysfunctions of a data engineering team and how to get it right
- The Rise of the Information Catalog - Finding and organizing your data and analytical assets in a hybrid Big Data environment
- Interactive Notebooks: what are they, why you need them and how you should use them to analyse Big Data?
- Governance and Compliance - Recipes for GDPR-friendly Data Science
- Accelerating time to value using analytical pipelines and SQL in a Big Data environment
- Big Data - Why Real-time is now the future - how will it impact your Big Data architecture?
- IoT Data Strategies - Managing the next frontier in fast data
- New analytical techniques - Delivering Business Value using Graph Analytics and Visualisation