Big data ciat april_2014_dj_et_slideshare

download Big data ciat april_2014_dj_et_slideshare

of 26

Embed Size (px)

description

Presenters: Daniel Jiménez (Leader of the Big Data expert group, DAPA) & Edgar Torres (Leader, Rice Program, AGROBIODIVERSITY) Title: BIG DATA: BIG DATA ANALYSIS: is it a solution to understand Big Problems? . The case of yield variation of rice in Colombia ------------------ Cukier and Mayer-Schönberger (2013) stated “As the telescope enabled us to comprehend the universe and the microscope allowed us to understand germs, the new techniques for collecting and analyzing information will help us to make sense of our world in ways we are just starting to appreciate”. We subscribe to this view and nowadays in agriculture we have the capacity to capture, analyze, store and share agricultural information in ways which 10 years ago was considered science fiction. The amount and variety of agricultural data generated by multiple individuals and organizations using a huge range of techniques and technologies is growing exponentially. We believe that the next agricultural (r)evolution will come from the development of innovation systems that harness agricultural data from multiple sources, to generate new knowledge that will increase agricultural productivity moving beyond blanket technological solutions towards a system of dynamic site-specific management, which are sensitive and responsive to climate, soil and local socio-economic conditions. In this seminar, CIAT's researchers will share how several databases that have been collected for different purposes and shared by FEDRARROZ (the country-wide association of rice growers in Colombia), have been used to obtain important insights to support FEDEARROZ on how to be more efficient managing rice at site-specific level. http://marafris.ciat.cgiar.org:8080/Webinars/Bluejeans/2014-24-04%20-%20Daniel_Jimenez_Edgar_Torres.mp4 Mayer-Schonberger, V., Cukier, K., 2013. ). Big Data: A Revolution That Will Transform How We Live, Work and Think

Transcript of Big data ciat april_2014_dj_et_slideshare

  • BIG DATA: BIG DATA ANALYSIS: is it a solution to understand big problems? Rice program (Agrobioversity) & Big Data expert group (DAPA)
  • computational models are tailored to the analysis of the data rather than data to a particular methodology, as researchers have done for over a century Applying the principles of Big Data to research in agriculture Big Data refers to things that one can do at a large scale that cannot be done at a smaller one to extract new insights Sometimes to inform is better than explain Looking for patterns or associations Approaching N=All Adding value to secondary databases Big Data (Foreign Affairs magazine / McKinsey's High Tech) Cukier and Mayer-Schnberger (2013)
  • computational models are tailored to the analysis of the data rather than data to a particular methodology, as researchers have done for over a century How? Including the use of ICTs to collect (androids app), analyze (traditional and machine learning techniques), share (in a way that facilitates the decision making at different levels and for different users) Analytical approaches tailored to the analysis of the data rather than data to a particular methodology, as researchers have done for over a century Development of tools as part of a close dialogue with end-users
  • How? + + = Climate Soil Crop management productivity/ha (including varieties) % ? + % ? + %? = To Explain (100 %) Maximizing productivity in agricultural systems. Working with secondary databases To Identify the combination of factors that lead to high and low productivities (empirical approaches machine learning) Within the framework Convenio MADR-CIAT climate change project Adaptation strategy
  • 0.0 1.0 2.0 3.0 4.0 5.0 6.0 0 500 1000 1500 2000 2500 3000 3500 Tn/ha Thousandstonsorhas Trends on Rice Production, Harvested Area and Yield in Colombia, 1990-2012 Area Production Yield The problem: In Colombia, since 2009 there is a significant reduction on the yields at the farm level Source USDA-PSD
  • And what are the causes for this yield reduction? We can see similar problems in Central America, Ecuador, Peru and Venezuela. Reductions on yield that are causing heavy losses to the rice farmers Not a single factor is involved: Drought, high minimum temperatures, low light, high humidity, bacteria, mites , fungus , lack of adaptation etc. low yields are caused by Burkholderia glumae!
  • Misdiagnosis, wrong treatments and excessive pesticides applications causing others problems (Hoja Blanca) Non ecoefficient And to worsen the problem the farmers wants a magical cure
  • Reducing stress because of lack of water. Water Harvest Better agronomy Key points, Crop Rotation and Regulations Improved Cultivars Increasing Yield Potential Protecting Yield Adding value There is something missing here? How we can manage this problem?
  • AMTEC Massive Adoption of Technology OBJECTIVES To transfer jointly the technology available for crop management. To increase productivity and reduce production costs, with the least environmental impact, in a context of social responsibility To aim for competitiveness and profitability of rice farmers in Colombia TECHNOLOGY TRANSFER Field days Planning and good management practices Visits to research centers Demonstration Trials Reduction costs
  • County AMTEC Farmer AMTEC vs Farmer Yield Ton ha -1 Cost US$/Ton Yield Ton ha -1 Cost US$/Ton Yield Ton ha -1 Cost US$/Ton El Juncal 6,50 417 5,30 614 1,20 -197 Ibagu 7,96 338 6,90 456 1,06 -118 Norte Tolima 7,48 366 6,29 485 1,19 -119 Montera 6,38 323 4,68 470 1,70 -147 Zulia 6,56 328 5,79 370 0,77 -42 Pompeya 5,70 309 4,30 503 1,40 -194 Mara La Baja 8,75 248 6,13 333 2,62 -85 Pompeya 4,30 475 3,36 600 0,94 -125 Ibagu 8,66 322 7,23 406 1,43 -84 Fundacin 6,53 299 5,60 384 0,93 -85 Casanare 5,90 319 5,20 434 0,70 -115 Average 6,79 340,4 5,52 459,5 1,27 -119,1 AMTEC Results from 2012 and 2013 Source Fedearroz Agronomy helps a lot! 2012 2013
  • Gene discovery Emerging pathogen: Burkholderia glumae, producing grain sterility Sources of tolerance identified Tolerant genotype showing 60% less damage than susceptible genotypes Molecular markers are being developed to speed up the transference of this trait into elite germplasm Susceptible Tolerant (field evaluation)
  • Trait Discovery Gene Discovery & Marker Applications Germplasm Enhancement Elite Breeding Breeding pipeline QTLs mapping; QTL validation; functional markers identification MABC; recurrent selection; genomic selection inbred FLAR CIRAD & hybrids-HIAAL; MET trait value characterization; screening methods; donors identification; populations development; sequencing; gene validation
  • TECHNOLOGY TRANSFER (25agronomist) RESEARCH BREEDING AND AGRONOMY (45 researchers) Breeding (Conventional 7,) Agronomy (Physiology 3, Phytopatology 1, Soils 2, Water 2, Crop Management 26, Biotech 3, Weeds 1) ECONOMICS (7 officials) Updated Socio-economic studies Our strategic partner for Rice Research in Colombia
  • computational models are tailored to the analysis of the data rather than data to a particular methodology, as researchers have done for over a century National Survey Purpose: Keep the crop sector updated N= 738 cropping events Harvesting records Purpose: Technical research (crop management, soils, breeding, biotechnology, physiology) N= 3193 cropping events Data is no longer regarded as static, whose usefulness is finished once the purpose for which it was collected is achieve Information on: Planting and harvesting date, productivity , grain humidity, variety, cropping system Zones: Caribbean, Andean (Tolima), Plains (Llanos) Databases: Databases. plenty of information
  • Adding value to secondary databases. The case of information on cropping events of rice in Colombia Planting dates experiments (Field trials) Purpose: Technical research on the best sowing date N= 272 cropping events Adding value to secondary databasesbut first, merging databases: Challenging task!!! Climate About 27 weather stations
  • Letting the data speak Before Big Data our analysis were usually limited to testing a small number of hypotheses that we defined well before we even collected the data. When we let the data speak we can make connections that we had never thought existed Cukier and Mayer-Schnberger (2013)
  • Sowing Harvest a cropping event in rice = 120 days Climate series for all variables Crop time Hypothesis Yield variation is associated with climate
  • FEDEARROZ 733, 27 % of productivity variation explained Multivariate analysis for Saldaa (research station- Andean zone ): cropping events (2007 to 2012) Lagunas, 47 % of productivity variation explained Letting the data speak FEDEARROZ 733 N = 189 N = 63 Cimarrn Barinas
  • Letting the data speak Climate and analysis based on phenological stages in Saldaa (research station ) Andean zone 2007 2012 (N= about 800 cropping events irrigated rice) The crop sector can suggest to farmers the best planting date By assessing the same approach in other stations (enviroments) New insights for future breeding Adaptation strategy for climate change Climate accounts for 30% to 40% of production variability in irrigated rice
  • computational models are tailored to the analysis of the data rather than data to a particular methodology, as researchers have done for over a century Letting the data speak Climate and analysis based on phenological stages in Zone: Colombian Plains- 2007 2012 (N= about 500 cropping events Upland rice) Rainfall is a critical driving factor for upland rice during grain filling and panicle initiation Machine learning (MLP) Again! - climate accounts for 30% to 40% of production variability in upland rice
  • Letting the data speak Climate and analysis based on phenological stages in Zone Plains-Colombia 2007 2012 N= about 200 (cropping events Upland rice.. variety F174) Temperature is a critical driving factor for variety 174 (upland rice) during grain filling Machine learning (MLP) This time climate explained more than 40% of production variability !!! in upland rice V F174
  • Case study : working with secondary databases: Seasonal forecast, ni@s & Big Data. Rice in Colombia (Pompeya- Llanos) What is likely to happen in March-April-May 2014? We generated 24 clusters based on more than 500 cropping events Seasonal forecast + (data) Best technologies + Big Data analysis = Better adaptive responses to CC and CV Cluster 7 Rice variety Productivity (Kg/Ha) Cropping events F174 4,564 31 FORTALEZA 3,543 17 F2000 4,977 8 LAGUNAS 5,052 6 MOCARI 4,604 6