Querying multiple distributed storage systems with Apache Hive robustly

35
1 © Cloudera, Inc. All rights reserved. Querying multiple distributed storage systems with Apache Hive robustly Ashish Singh | Software Engineer, Cloudera

Transcript of Querying multiple distributed storage systems with Apache Hive robustly

Page 1: Querying multiple distributed storage systems with Apache Hive robustly

1© Cloudera, Inc. All rights reserved.

Querying multiple distributed storage systems with Apache Hive robustlyAshish Singh | Software Engineer, Cloudera

Page 2: Querying multiple distributed storage systems with Apache Hive robustly

2© Cloudera, Inc. All rights reserved. 2© Cloudera, Inc. All rights reserved.

Page 3: Querying multiple distributed storage systems with Apache Hive robustly

3© Cloudera, Inc. All rights reserved.

Programming Model SQL

Page 4: Querying multiple distributed storage systems with Apache Hive robustly

4© Cloudera, Inc. All rights reserved.

Page 5: Querying multiple distributed storage systems with Apache Hive robustly

5© Cloudera, Inc. All rights reserved.

Page 6: Querying multiple distributed storage systems with Apache Hive robustly

6© Cloudera, Inc. All rights reserved.

Storage Handler

Page 7: Querying multiple distributed storage systems with Apache Hive robustly

7© Cloudera, Inc. All rights reserved.

Page 8: Querying multiple distributed storage systems with Apache Hive robustly

8© Cloudera, Inc. All rights reserved.

Page 9: Querying multiple distributed storage systems with Apache Hive robustly

9© Cloudera, Inc. All rights reserved.

Page 10: Querying multiple distributed storage systems with Apache Hive robustly

10© Cloudera, Inc. All rights reserved.

+ = HiveKa

Page 11: Querying multiple distributed storage systems with Apache Hive robustly

11© Cloudera, Inc. All rights reserved.

+ = HiveKa

Project available on github (https://github.com/HiveKa)

Page 12: Querying multiple distributed storage systems with Apache Hive robustly

12© Cloudera, Inc. All rights reserved.

Page 13: Querying multiple distributed storage systems with Apache Hive robustly

13© Cloudera, Inc. All rights reserved.

Page 14: Querying multiple distributed storage systems with Apache Hive robustly

14© Cloudera, Inc. All rights reserved.

Page 15: Querying multiple distributed storage systems with Apache Hive robustly

15© Cloudera, Inc. All rights reserved.

Page 16: Querying multiple distributed storage systems with Apache Hive robustly

16© Cloudera, Inc. All rights reserved.

Demo Time

16© Cloudera, Inc. All rights reserved.

Page 17: Querying multiple distributed storage systems with Apache Hive robustly

17© Cloudera, Inc. All rights reserved.

Page 18: Querying multiple distributed storage systems with Apache Hive robustly

18© Cloudera, Inc. All rights reserved.

Page 19: Querying multiple distributed storage systems with Apache Hive robustly

19© Cloudera, Inc. All rights reserved.

Page 20: Querying multiple distributed storage systems with Apache Hive robustly

20© Cloudera, Inc. All rights reserved.

Page 21: Querying multiple distributed storage systems with Apache Hive robustly

21© Cloudera, Inc. All rights reserved.

Page 22: Querying multiple distributed storage systems with Apache Hive robustly

22© Cloudera, Inc. All rights reserved.

Page 23: Querying multiple distributed storage systems with Apache Hive robustly

23© Cloudera, Inc. All rights reserved.

• Strict code review policies

Page 24: Querying multiple distributed storage systems with Apache Hive robustly

24© Cloudera, Inc. All rights reserved.

• Strict code review policies• ~7600 upstream tests

Page 25: Querying multiple distributed storage systems with Apache Hive robustly

25© Cloudera, Inc. All rights reserved.

• Strict code review policies• ~7600 upstream tests• End-to-end tests: qTests

Page 26: Querying multiple distributed storage systems with Apache Hive robustly

26© Cloudera, Inc. All rights reserved. 26© Cloudera, Inc. All rights reserved.

Page 27: Querying multiple distributed storage systems with Apache Hive robustly

27© Cloudera, Inc. All rights reserved.

@Cloudera

Page 28: Querying multiple distributed storage systems with Apache Hive robustly

28© Cloudera, Inc. All rights reserved.

@Cloudera

• Believe in Open source Community• Invest heavily in improving upstream test infra• Ptests to reduce turn around time

Page 29: Querying multiple distributed storage systems with Apache Hive robustly

29© Cloudera, Inc. All rights reserved.

@Cloudera

• Believe in Open source Community• Invest heavily in improving upstream test infra• Ptests to reduce turn around time

But, is that enough?

Page 30: Querying multiple distributed storage systems with Apache Hive robustly

30© Cloudera, Inc. All rights reserved. 30© Cloudera, Inc. All rights reserved.

Integration Testing

Page 31: Querying multiple distributed storage systems with Apache Hive robustly

31© Cloudera, Inc. All rights reserved. 31© Cloudera, Inc. All rights reserved.

Compatibility Testing

Page 32: Querying multiple distributed storage systems with Apache Hive robustly

32© Cloudera, Inc. All rights reserved. 32© Cloudera, Inc. All rights reserved.

Scale Testing

Page 33: Querying multiple distributed storage systems with Apache Hive robustly

33© Cloudera, Inc. All rights reserved. 33© Cloudera, Inc. All rights reserved.

Upgrade Testing

Page 34: Querying multiple distributed storage systems with Apache Hive robustly

34© Cloudera, Inc. All rights reserved. 34© Cloudera, Inc. All rights reserved.

Random Query Generator

Page 35: Querying multiple distributed storage systems with Apache Hive robustly

35© Cloudera, Inc. All rights reserved.

Thank youAshish [email protected]@singhasdev