Whitepaper: Agricultural Systems + Data Outlook 2Q14

29
Whitepaper: Agricultural Systems + Data Outlook The Data Guild , 20140220 How can data be leveraged to make food production and distribution systems more responsive, resilient, and efficient? An ecosystem of agricultural data has been quietly evolving, and is rapidly becoming a vital component of global food security. The data rates and variety are vast: remote sensing via small satellites, sensor networks in the fields, tractorsasdrones, and more. Many issues implied by this category of data, however, are quite subtle and in some cases counterintuitive. Given that this field is relatively new and not particularly organized yet, key learnings may be adapted from other sectors where largescale data and analytics have already played a transformational role: finance, intelligence, ecommerce, telecom, energy, etc. We examine both key questions and the evolving vendor landscape for agricultural data in the context of supply chain analysis, defining nomenclature for components of the ecosystem and identifying key issues for consideration. Ultimately, this paper is at best an early draft for a much longer and more comprehensive study: it provides a rubric for analyzing the complexities of agricultural data, along with examples for the identified categories. Impact Farming represents the single largest employer globally, as the primary livelihood for 40% of the world’s population. There are more than 500 million small farms worldwide , most of which are 1 family farms that rely on rainfed agriculture. The global domestic product for agriculture was 2 nearly $15 trillion in 2013 and the agricultural real estate in the U.S. alone is valued at over $2 3 trillion. The impact of these figures needs to be considered in the context of two factors: resource consumption and production asymmetries. In terms of resource consumption, recognize that 70% of the world’s freshwater resources goes toward agriculture . This figure is estimated to reach 89% by 2050. Meanwhile, soils are being 4 5 1 Small farms: Current Status and Key Trends ”, Oksana Nagayets, Future of Small Farms (2005), p. 355 2 Agriculture, value added (% of GDP) , The World Bank (2014) 3 National Agricultural Statistics Service , USDA (2014) 4 FAO Aquastat 5 UN Water Facts and Figures (2013) Agricultural Data (Q2 2014) The Data Guild Page 1

description

Whitepaper: Agricultural Systems + Data Outlook 2Q14

Transcript of Whitepaper: Agricultural Systems + Data Outlook 2Q14

Page 1: Whitepaper: Agricultural Systems + Data Outlook 2Q14

Whitepaper: Agricultural Systems + Data Outlook

The Data Guild, 2014­02­20

How can data be leveraged to make food production and distribution systems more responsive, resilient, and efficient?

An ecosystem of agricultural data has been quietly evolving, and is rapidly becoming a vital component of global food security. The data rates and variety are vast: remote sensing via small satellites, sensor networks in the fields, tractors­as­drones, and more. Many issues implied by this category of data, however, are quite subtle and in some cases counter­intuitive. Given that this field is relatively new and not particularly organized yet, key learnings may be adapted from other sectors where large­scale data and analytics have already played a transformational role: finance, intelligence, e­commerce, telecom, energy, etc.

We examine both key questions and the evolving vendor landscape for agricultural data in the context of supply chain analysis, defining nomenclature for components of the ecosystem and identifying key issues for consideration. Ultimately, this paper is at best an early draft for a much longer and more comprehensive study: it provides a rubric for analyzing the complexities of agricultural data, along with examples for the identified categories.

Impact

Farming represents the single largest employer globally, as the primary livelihood for 40% of the world’s population. There are more than 500 million small farms worldwide , most of which are 1

family farms that rely on rainfed agriculture. The global domestic product for agriculture was 2

nearly $15 trillion in 2013 and the agricultural real estate in the U.S. alone is valued at over $2 3

trillion. The impact of these figures needs to be considered in the context of two factors: resource consumption and production asymmetries.

In terms of resource consumption, recognize that 70% of the world’s freshwater resources goes toward agriculture . This figure is estimated to reach 89% by 2050. Meanwhile, soils are being 4 5

1 “Small farms: Current Status and Key Trends”, Oksana Nagayets, Future of Small Farms (2005), p. 3552 Agriculture, value added (% of GDP), The World Bank (2014)3 National Agricultural Statistics Service, USDA (2014)4 FAO Aquastat5 UN Water Facts and Figures (2013)

Agricultural Data (Q2 2014) The Data Guild Page 1

Page 2: Whitepaper: Agricultural Systems + Data Outlook 2Q14

depleted at a 10­40% faster rate than they are replenished. Within the United States, 90% of 6

cropland is currently losing soil faster than its sustainable replacement rate , and that represents 7

a very large capital loss. High annual rates of soil depletion and salinization, together with increasing cycles of flooding and drought, place enormous stresses on the entire agricultural system. The stakes are high, but much can be accomplished to mitigate looming issues by the effective use of data and analytics.

In terms of production asymmetries, recognize that more than 80% percent of all agricultural holdings measure less than two hectares: these are smallholder and family farms . Overall, 8

family farms account for more than 98% of all farms, and more than 56% of global agricultural production . While corporate farms tend to predominate in areas of high potential yield, the 9

smallholder farms are stewards in marginal lands . Their highly specialized knowledge sustains 10

production as resource challenges escalate. For example, smallholder farmers typically use innovative technologies to conserve resources – ranging up to 30­60% water use efficiencies in some regions . Moreover there are cascading economic effects: each US$1 of farming income 11

in Asia creates an additional US$0.80 in non­farming sectors. Along with that, microfinance should not be overlooked as a driver for local acceptance of new technologies.

The wealthy nations tend to maintain or increase their consumption of natural resources, while exporting their footprints to producer nations which are typically poorer. For example, European and North American populations consume a considerable amount of virtual water embedded in their food imports, by more than a 200% multiple .12

Key Trends

Trends within the ecosystem can be identified by exploring the following questions:

Who are the stakeholders in this system?

Certainly the farmers and their vendors and buyers play central roles. Which other actors have substantial impact on the flows of data?

6 “Soil Erosion: A Food and Environmental Threat”, David Pimental, Environment, Development, and Sustainability (2006)7 Changes in Average Annual Soil Erosion by Water on Cropland and CRP Land, 1992 –1997,Natural Resources Conservation Service, USDA (2000)8 2000 World Census of Agriculture, FAO (2010)9 “Food Tank By The Numbers: Family Farming”, Danielle Nierenberg, et al., Food Tank (2014)10 ibid., Pimental (2006)11 ibid., Nierenberg (2014)12 UN Water Facts and Figures (2013)

Agricultural Data (Q2 2014) The Data Guild Page 2

Page 3: Whitepaper: Agricultural Systems + Data Outlook 2Q14

Are there potential feedback loops within the “supply chain” of agricultural data – spanning from farm sensors to aggregate market metrics – that could be leveraged by new business models?

For example, the emergence of feedback loops involving machine data plus algorithmic modeling in the late 1990s propelled early successes in e­commerce to evolve 13

increasingly sophisticated web apps for improving customer experience online: Amazon’s product recommendations, Google’s search results, eBay’s product auctions. In farming, more high­quality, granular telemetry could provide an opportunity to address risk is new ways and allow insurers to offer new, more affordable products. Better risk mitigation, in turn, can open the door to new credit and capital investment in local markets.

Are there strategic points within the system where open standards could substantially improve interoperability and problem­solving?

Lack of agreed­on standards and protocols in agricultural settings has hampered the pace of innovation. Examples from other domains are instructive. For example, the emergence of the HTTP protocol and the HTML markup language in the early 1990s greatly accelerated applications on the Internet. A more recent example is the explosion of Arduino­based sensors and Smart Home devices built to integrate with protocols such as Z­Wave and Zigbee.

Are there niches within the data ecosystem that are noticeably under­ or over­subscribed in terms of investor and/or vendor attention?

On the one hand, the over­subscribed portions of the system will mostly likely undergo consolidation where some firms acquire competitors, as others get rolled­up. On the other hand, the under­subscribed portions – particularly around key friction points in food production system – indicate opportunities for new business to emerge.

What are the relationships between natural and data ecosystems?

Consider how data flows relate to energy flow, since farming is essentially a form of highly optimized energy capture and storage.

At which points in data workflows does human decision­making breakdown?

13 Statitistical Modeling: The Two Cultures, Leo Breiman, UC Berkeley (2001)

Agricultural Data (Q2 2014) The Data Guild Page 3

Page 4: Whitepaper: Agricultural Systems + Data Outlook 2Q14

Instead of replacing human intuition, how can machine learning help augment human judgement? For example, people who have domain experience can make expert decisions when the input dimensions are limited to 4­7 variables. In higher dimensional spaces, human intuition fails and machine learning techniques become essential for generalized insights.

Also, are there cultural constraints to consider, e.g. advising hog farmers to grow popcorn or suggesting that farmers of one village should cooperate with next. What is the social context to be understood and leveraged?

Are there ways in which linked data­driven food production and distribution can assist smaller farms to compete with large corporate farms? Are there perhaps new kinds of co­ops possible here? Could we imagine essentially bespoke farming in small, but focused on newly profitable niches and better­aggregated local demand – enabled via large­scale analytics?

There are analogies with how Google (AdWords, etc.) disrupted entrenched advertising giants to created entire new markets for a wide variety of e­commerce firms.

To what extent do attitudes and cultural norms among farmers themselves affect how new technologies are integrated into practice? How will this affect the pace of innovation in the space?

Too often, the techno­centric, almost utopian, view of technologists assumes that key stakeholders will welcome new technologies with open arms. In farming, as in many other corners of society, this presumption of adoption is naive at best. And this is not just among farmers themselves, but seed companies, wholesalers, and a host of other key constituents in the food system who have optimized their businesses and livelihoods to the system as it exists today. Disruption here is not simply a matter of replacing techniques and technologies. Technologists and investors interested in this space would do well to think of how the points discussed here fit into specific cultural landscapes ­ and how those landscapes are changing.

A Systemic Perspective

Taking a perspective of the system as a whole, there are clearly points where data is typically produced, transacted, consumed, exchanged, aggregated, reported, etc. While the data interactions and flows between various vendors are complex, some generalizations can serve to categorize the vendor landscape.

Agricultural Data (Q2 2014) The Data Guild Page 4

Page 5: Whitepaper: Agricultural Systems + Data Outlook 2Q14

In a simple case, a cascade through six stages describes the larger ecosystem for data, spanning from farm sensors to aggregate market metrics:

Data Collection

At the first stage, which we label as data collection, a variety of sensors and processors collect high resolution data at the lowest level resolution. There are a variety of different categories for data collection and an ever­growing field of vendors at this stage. The following lists attempt to show examples for each category of data collection:

remote sensing orbital: space station imaging

UrtheCast orbital: small satellites

Planet Labs Skybox Imaging Satshot

high­altitude: atmostats, airships JP Aerospace

Agricultural Data (Q2 2014) The Data Guild Page 5

Page 6: Whitepaper: Agricultural Systems + Data Outlook 2Q14

Titan Aerospace low­altitude: aerial imaging, aerostats, drone orchestration, etc.

TerrAvion HoneyComb PrecisionHawk Raven Aerostar Skycatch

tractor telemetry John Deere FarmSight, Apex AgLeader SMS Trimble FarmWorks

farm robotics Blue River

sensor networks localized weather

WeatherHawk Ambient Weather

water usage Hortau PowWow Energy Agronode

nutrient testing Solum / Monsanto

pest management Semios Dolphin Engineering

direct data entry much of the most relevant data is entered manually by farmers

inferential sensors as in soft sensors14

import from other sources (government, semi­public agencies, etc.) weather

14 “Design of inferential sensors in the process industry: A review of Bayesian methods”, Shima Khatibisepehr, Biao Huang, Swanand Khare, Journal of Process Control (Nov 2013)

Agricultural Data (Q2 2014) The Data Guild Page 6

Page 7: Whitepaper: Agricultural Systems + Data Outlook 2Q14

local rainfall soil distribution pest/disease spread water allocations snowpack variance/evaporation water cycle soil compaction hazards: pipelines, cables, underground irrigation

A number of issues beleaguer this data collection stage, including: Poor communications infrastructure in rural areas

lack of adequate cell coverage in rural areas (depends on the region) satellite upload temporarily blocked by cloud cover and other weather events co­ops among neighboring properties share towers, where overlap is possible

Serious privacy concerns see below in “Drivers: Privacy and Security Issues”

Data quality lack of calibration, high variance on devices (need for maintenance, etc.) additional factors that explain variance in yield map results15

Data Silos: Vendors must surface metadata to help overcome problems of data silos on farms. Here, standards could play a key role in spurring innovation broadly, and creating new synergies between players in the market. Specifically, support for popular geospatial formats, data import/export, and effective licensing that does not impede data aggregation downstream are all key needs.

The field of sensor design in general is undergoing a rapid evolution. For example, self­powered sensors from Piezonix can function continuously by scavenging energy. Arduino and other hardware platforms have opened up new capabilities for rapid prototyping and small form­factors, even into the hands of hobbyists – which is a particular boost for entrepreneurs. Meanwhile, National Instruments has a large market share for production of sensors, and much of its market among design engineers is outside of the U.S.

Mobile, low­altitude data collection methods such as drones and aerostats may help augment the remote sensing from higher altitudes – in other words, fill in gaps on demand, provide high resolution baseline measures, etc. These could help augment communications where cell coverage is sparse. Then again, use of such equipment may create negative reactions.

Increasingly, consumer­grade mobile devices provide substantial platforms for the data collection required in agriculture. Examples include Project Tango from Google, used for high resolution 3D mapping – or for that matter, the widespread use of smartphones and tablets by

15 Yield Monitors and Maps: Making Decisions, Larry Lotz, Ohio State (1997)

Agricultural Data (Q2 2014) The Data Guild Page 7

Page 8: Whitepaper: Agricultural Systems + Data Outlook 2Q14

farmers out in the fields. This trend is expected to continue, as specialized devices converge into the general category of consumer mobile.

Clearly, this part of the vendor landscape is becoming crowded. There are definite needs for a wide variety of data sources and sensor types. Even so, the tendencies of early vendors does not support a wide playing field in the long term. On the one hand, many if not most of these vendors attempt to “own” data and push their feature sets far up the technology stack. On the other hand, without effective interoperability those vendors are creating data silos. Farmers must focus on operations, of which data+analytics only comprises a portion. Demand will compel interoperability to some extent: a similar effect was observed among early Internet vendors as Web 2.0 practices drove adoption of open standards. Meanwhile consolidation is inevitable, with the larger Ag players such as Monsanto and John Deere and more traditional IT vendors such as IBM, Cisco, and arguably also Google in a good position to carve up market share.

Addressing the telecom connectivity issues specifically, note the tension emerging from multiple drivers:

Increasing pervasiveness of low­cost sensors Increasing instrumentation of almost all equipment Demand for near real­time data collection and analytics

These factors will continue to push the data collection issues that are already stressed. Note that sophisticated sensor networks (e.g., from NI) tend to have embedded prognostics, such that computation can be pushed out to the edge – using partial aggregates and other computational techniques. A general trend in large­scale data analytics for the Internet of Things (IoT) will be to push as much processing out to the field as possible. That becomes necessary 16

for real­time stream processing, and also to help reduce data rates – which meanwhile continue to grow substantially.

Probabilistic approximation techniques for data streaming (compressed sensing) may also become quite useful here. Potentially, much can be adapted from state of the art open source projects that address mission­critical data infrastructure for low­latency use cases. See two examples from Twitter:

Summingbird by Twitter engineers: Oscar Boykin, Sam Ritchie, et al. Add ALL the Things: Abstract Algebra Meets Analytics, Avi Bryant (formerly Twitter)

16 Industrial Internet: Pushing the Boundaries of Minds and Machines, Peter C. Evans and Marco Annunziata, GE, (2012­11­26)

Agricultural Data (Q2 2014) The Data Guild Page 8

Page 9: Whitepaper: Agricultural Systems + Data Outlook 2Q14

Elastic Infrastructure

At the second stage, which we label as elastic infrastructure, where the collected data gets uploaded and stored – somewhere – so that it can be prepared for analysis. From a high­level standpoint, better connectivity in rural areas is a key enabler of increased farm efficiency and productivity. Opportunities are emerging for new kinds of mixed initiatives, including public­private partnerships, that in improving these communications networks can also provide multilateral benefits to farming communities. Such an infrastructure is akin to laying the foundation for the next generation of precision farm production.

Vendors include: Google Earth Engine Marinexplore AmigoCloud SpaceCurve Agralogics

These vendors tend to have cloud­based implementations, and use SaaS or subscription models for pricing, which in turn leverage more general IT vendors in the cloud space:

Amazon AWS Google Compute Platform Rackspace Public Cloud Microsoft Azure HP Public Cloud

Additionally, wireless networks represent another kind of infrastructure vendor: Wise Networks Sensoraide

Ultimately these vendors will likely confront competitive pressure from more traditional network vendors, e.g., Cisco Systems, Juniper Networks, etc.

Infrastructure vendors inherit the downstream issues generally encountered by data aggregator businesses: privacy, metadata alignment, curation needs, effective licensing, tracking lineage, import/export, etc. In regulated industries (Finance, Health Care, etc.) these issues must be addressed directly – whereas in newer industries (e­commerce, social networks, etc.) such controls are still in relative infancy. Ideally, good practices should be built in before regulation appears.

Agricultural Data (Q2 2014) The Data Guild Page 9

Page 10: Whitepaper: Agricultural Systems + Data Outlook 2Q14

Note that infrastructure businesses tend to bundle data enrichment and analytics services in addition to a core function of data transport and storage. So far, vendors tend to differentiate with particular value­added specialties:

time­series analysis and geospatial analysis metadata alignment / schema / lineage for a wide variety of data sets blending farm data with other external data (Open Data from gov sources) support for curation, addressing data quality issues introduced during collection allowing customers to create data products for resell managing interfaces (aka “app stores”) for third­party data products integrating mobile devices with service fleets

As data products continue to leverage machine learning, other important issues for elastic infrastructure include:

contingencies to upload data at scale via alternative channels data preparation at scale: imputed missing values, feature engineering, etc. ultimately, provide for queries, approximated metrics, etc., to feed analytics compression technologies data processing and computation at the edge (as noted above)

Backlash based on privacy concerns from farmers could ostensibly change infrastructure strategies significantly. Also, privacy laws in different regions (e.g., EU) will have impact on data policies. Both factors indicate eventual regulations in this stage of data infrastructure. The traditional IT vendors have addressed these kinds of issues before many times; however, start­ups may encounter difficult challenges advocating at that level of policy and government.

Focusing on the core problems of elastic infrastructure, most of the vendors do not pay enough attention to the needs of data preparation (curation, cleaning, metadata alignment) prior to serving data to the analytics downstream. Experience from other domains (e.g., ad­tech, social networks) shows that the bulk of the work performed in data infrastructure at scale is in 17

clean­up prior to analytics use cases. Marinexplore is an exception in this case, providing metadata alignment across a wide variety of data sources.

Another issue concerns data workflows on a farm. Generally there are teams of people, working concurrently at different locations. There are important requirements for data to be updated across the team in real­time. Given the scale of the data, this will require effective use of tiered architectures, balancing what data preparation gets handled in the cloud versus on a mobile device. AmigoCloud is an exception in this case, providing real­time updates among the mobile devices used by a team on a farm.

17 Data Jujitsu: The art of turning data into product, DJ Patil, O’Reilly Radar (2012)

Agricultural Data (Q2 2014) The Data Guild Page 10

Page 11: Whitepaper: Agricultural Systems + Data Outlook 2Q14

Meanwhile, far too many of the data collection and analytics vendors attempt to provide their own services for elastic infrastructure. Ultimately these get run on shared infrastructure (public clouds) anyway. Note there has been a tendency for the large IT infrastructure vendors to move up the technology stack, particularly in lucrative verticals. Again, agriculture represents a $15T annual market globally. It is inconceivable that start­ups at this stage will not experience significant economic pressure from the more traditional IT vendors for networking and storage. Expect much consolidation at this stage, but also opportunities for disruption in key areas such as compression and local computation.

Analytics

At the third stage, which we label as analytics, metrics get assessed or predicted for multiple components that feed into agronomic models downstream. Note that the word “analytics” has a variety of usages, ranging from reporting/dashboards out to real­time algorithmic modeling and optimization. The usage in this stage in particular is more for predictive analytics.

Some examples include specialized analytics that are currently bundled with the corresponding specialized sensors:

water stress integrated pest management nutrient analysis localized weather

Again, most vendors currently attempt to own the data and the full technology stack. Specialized sensors and analytics get bundled with infrastructure services – and yet without effective interoperability and import/export this implies (and indeed, produces) much duplication. In turn, this creates unnecessary extra costs for farmers and acts to impede innovation in the larger ecosystem. We see similar cases downstream, where analytics for agronomic models have overlap with market analytics, e.g. Agronometrics. Overall, the industry term "seed to sale" has more than one connotation; however, in the context of data vendors it tends to describe features to manage some aspect of analytics from planting through harvest and sales – likely at the expense of features for interoperability.

Duplication of resources will tend to drive mergers and acquisitions over time, as vendors consolidate their shareable components (elastic infrastructure) while combining their specializations (sensor, analytics) into complementary packages. It is interesting that the large vendors in agriculture data analytics – e.g., Monsanto and IBM – have apparently avoided

Agricultural Data (Q2 2014) The Data Guild Page 11

Page 12: Whitepaper: Agricultural Systems + Data Outlook 2Q14

bundling infrastructure and instead focus on specific data products such as risk metrics used for crop insurance:

Climate Corp / Monsanto IBM / Deep Thunder

An interesting targeted offering in this space is OlaSmarts, which bundles sensors, processing and analytics into multiple verticals. One subsidiary focuses on precision agriculture, e.g., helps farmers cut irrigation costs, particularly in water­constrained environments. It appears to be positioned contra to most other strategies that focus on “seed to sale” and OSFA.

Another kind of vendor emerging in this category is represented by imaging analytics services, e.g., Ceres Imaging, which leverage data from multiple sources to produce multiple kinds of data products. Perhaps the realities of remote sensing have helped temper vendors’ speculations: remote sensing data rates are very high, imaging algorithms are complex, and so this specialty presents a highly skewed buy­to­build ratio. Based on the trajectories of data services in other domains – ad­tech, security, finance, etc. – this is likely to become a more viable business model than the “seed to sale” product attempts.

Keep in mind that effective analytics for agriculture data often implies integrating a wide range of data sources. For an excellent discussion of this topic, see Sensor Fusion for Precision Agriculture, Viacheslav I. Adamchuk, Raphael A. Viscarra Rossel, Kenneth A. Sudduth and Peter Schulze Lammers (2011). Lessons from e­commerce reinforce this point. On the one hand, data silos are not effective in the long run. On the other hand, multiple data sources get leveraged to mitigate missing data, data quality issues, etc. Silos will tend to conflict with the more effective agronomic modeling approaches that emerge over time. It stands to reason that as the field matures, vendors will focus less on “seed to sale” product attempts and more on improving predictive power by leveraging multiple data sources. Again, the larger vendors appear to have recognized this point already.

The accuracy of analytics is sensitive to calibration issues . Calibration sites (i.e., research 18

farms) needed for technology development tend to require large­scale capital investments, aggressive partnering, etc. In the near term, this implies more acquisitions by the large players: Monsanto, IBM, etc. In the long term, services for “crowdsourcing” calibration sites will likely emerge to allow more cost­effective R&D for start­ups.

Lessons from finance show that predictive modeling which has impact on large­scale capital investments tends to be carefully audited and controlled. Concerns about data provenance and model transparency get emphasized in analytics products. In machine learning there is a perennial tension between model interpretability and predictive power. In other words, there are

18 “Yield Monitor Calibration Tips: Making The Most From Your Data”, Nathan Watermeier, Ohio State (2004)

Agricultural Data (Q2 2014) The Data Guild Page 12

Page 13: Whitepaper: Agricultural Systems + Data Outlook 2Q14

design challenges for satisfying the common need to answer the “why” question for a particular decision. Product managers insist on interpreting results from automation. Auditors require accountability and controls for any modifications to critical decision­making systems. Executives wish to have complex analytics summarized as a short list of rules for their organizational learnings. Meanwhile, there is a constellation of reasons for feedback from analytics into engineering in general, e.g., for improved feature engineering, or resource trade­offs – with the latter being increasingly crucial to embedded use cases such as mobile devices and robotics.

Overall, the knowledge discovery process requires a measure of rightful skepticism about what the machine is doing in making a particular decision. This is not an intuitive response in most businesses, currently, and will increasingly become a pain point as the data rates and dependence on analytics escalate. Consequently, the capital structure of corporate farms may impede adoption of more advanced analytics. That could open opportunities for competition from smaller farms and new kinds of data­intensive co­ops.

Other issues likely to be faced by analytics products include: model portability vs. lock­in (e.g., use of the PMML standard to migrate models) batch vs. real­time/streaming automation for feature engineering and model evaluation

That last point is particularly salient. Without interoperability, open standards, model portability, etc., the proprietary analytics products and services effectively become static and anti­competitive. Farmers and ag analysts lack the ability to conduct effective A/B tests or other means of evaluation (e.g., tournaments) to compare analytics. That is (we hope) likely a near­term artifact, and will change as vendors recognize the benefit of proving their analytics directly with individual farmers’ data.

Another area of analytics that is potentially quite important are the efforts to support subsistence farmers in other geographic regions, e.g., Agrepedia in Ethiopia. The point is that aggregate data can be pulled off to the cloud, where predictive models can be run cost­effectively (e.g., pricing models at harvest time) then provided to subsistence farmers via SMS. Of course, given the normative level of connectivity in the rural areas of the U.S. today, such an approach could be equally viable there.

Telecom providers tend to be powerful in many areas of the world, and services such as SMS are relatively available and cheap. SMS can be adapted as data portals for cloud­based infrastructure and simple, but effective, analytics services. This approach has excellent implications, given that harvests in these regions get sold to intermediaries who tend to be

Agricultural Data (Q2 2014) The Data Guild Page 13

Page 14: Whitepaper: Agricultural Systems + Data Outlook 2Q14

aggressively extractive . Cloud­based services outside of a distressed region could help 19

disintermediate entire layers of corruption.

Farm Operations

At the fourth stage, which we label as farm operations, there are a variety of different functions to manage, including:

seed catalog selection DuPont Pioneer Yield Pop

activity calendar weather – both short and long term predictions asset inventory

Trecker yield maps

Farmers Edge Solapa 4 VitalFields

livestock management FarmerOn

commodity price monitoring FarmLogs Agronometrics HarvestMap

accounting workflow Granular

contracts, deliveries harvest storage

An essential point of precision agriculture is that by combining analytics based on a variety of data collected from sensors (including satellites, drones, etc.) along with field topography, farm operation history, etc., the variability of crops at specific locations can be leveraged to improve overall yield: modify the seed density, modify the inputs, etc. For example, consider the statement from the acquisition of Climate Corporation by Monsanto:20

19 Many intermediaries buying harvests are effectively “loan sharks” who charge 50% interest rates, etc.20 “Monsanto to Acquire The Climate Corporation, Combination to Provide Farmers with Broad Suite of Tools Offering Greater On­Farm Insights”, Business Wire (2013­10­02)

Agricultural Data (Q2 2014) The Data Guild Page 14

Page 15: Whitepaper: Agricultural Systems + Data Outlook 2Q14

The companies estimate the majority of farmers have an untapped yield opportunity of up to 30 bushels to 50 bushels in their corn fields, and they believe that advancements in data science can help further unlock that additional value for the farm.

Given current prices the delta in yield would be approximately US$200/acre for corn. Without 21

question, yield is the key performance indicator (KPI) on which corporate farms in the U.S. rely most. Even so, yield is not necessarily the best metric of success for farmers overall. Outside of the US, arguably the best interests of the smallholder and family farms are to optimize for return on investment (ROI). While many point to the role of scientific advances for increasing crop yields dramatically, it is important to note that these advances over the past two centuries have 22

come at the cost of disproportionately increased inputs (water, nitrogen, phosphorus, etc.) which are the critical resources. Viewed through that lens, the precision estimates for aggregate yield are perhaps most acutely in the context of financial traders.

Another issue is that farmers tend to want immediate access to operations data – analytics, history, etc. – via their mobile devices. Even when the data is not needed immediately, it still provides a comfort for adoption/learning curve as these technologies move into the field. That places a stronger need on multi­tier architectures and trade­offs between cloud­based infrastructure and mobile device capabilities.

While some of those vendors focus on specific operational concerns, other vendors pursue One­Size­Fits­All (OSFA) strategies , attempting to encompass all the operations of a farm. 23

This strategy is akin to “seed to sale”, with related risks. Examples include: OnFarm Farm at Hand

Lessons from e­commerce indicate that OSFA approaches tend to collapse after the early adopter phase wanes. Compelling solutions for farm operations will focus on interoperability, well­defined interfaces, and the ability to accommodate “plug­ins” from alternative analytics sources. Again, a similar effect was observed among early Internet vendors as Web 2.0 practices drove adoption of open standards.

21 http://www.quotecorn.com/22 “Nasa­funded study: industrial civilisation headed for 'irreversible collapse’?”, Nafeez Ahmed, The Guardian (2014­03­14)23 “One Size Fits All”: An Idea Whose Time Has Come and Gone, Michael Stonebraker, Uğur Çetintemel, ICDE Proceedings (2005)

Agricultural Data (Q2 2014) The Data Guild Page 15

Page 16: Whitepaper: Agricultural Systems + Data Outlook 2Q14

Overall, farm operations software is a likely point to emphasize recommendation services, i.e., feedback loops for consuming analytics products based on aggregated data. For example, successful outcomes in one region may imply generalized learnings (recommendation systems) in other similar regions. This value proposition may drive greater levels of interoperability, as vendors attempt to monetize their aggregate data.

Also, this stage of farm operations provides a good point for evaluation of analytics, such as model tournaments. History of predictions must be managed and leveraged. This implies even greater concerns about data privacy and security – much more so than sensor data collection – since aggregate data becomes more valuable to bad actors.

Distribution

The fifth stage, which we label as distribution, is also sometimes called “procurement”. More traditional forms of supply chain analysis can begin to be applied at this stage:

traceability: tracking containers, logistics, etc., via pallet RFID, etc. Opara TransVoyant

direct market sales (disintermediation) Plovgh

shipping and storage costs accountability/sustainability reporting

Data becomes more “refined” as it aggregates downstream. Other forms of analytics at this stage feed downstream: geospatial estimates of required inputs, aggregate yield, etc.

One poignant issue at this stage is that food processing (post­harvest) tends to use more fresh water and energy resources than the farming. An emerging business model for traceability: end­uses (Whole Foods, high­end restaurants, etc.) place a premium on accountable products, so there's a pricing differential. Processors which are less accountable become suspect – largely because insiders already know they are the major source of the problem re: massive water waste, contaminants, etc. Effective monitoring, data collection, elastic infrastructure, analytics, etc., are needed to ensure that the distribution stage is accountable for what gets consumed. That implies another kind of feedback loop in the data flows: what are the water+energy implications of a harvest after it leaves the farm?

Market Aggregation

Agricultural Data (Q2 2014) The Data Guild Page 16

Page 17: Whitepaper: Agricultural Systems + Data Outlook 2Q14

The sixth stage, which we label as market aggregation, concerns the kind of agriculture data that most people are already familiar with:

global market analytics Mercaris Gro­Ventures (Africa focus)

commodities trading AgFlow

market intelligence Cleantech Group Food Tank Praescient Analytics Palantir Stratfor

public policy USDA GODAN FAO AQUASTAT

Traditionally this stage of agricultural data has been focused either on short­term opportunities (commodities trading) or very high­level concerns from qualitative perspectives (policy making , 24

global food security, natural resource management).

Opportunities abound for leveraging feedback loops in the data, algorithmic modeling, aggregate data services driving hyperlocal (per­block) recommender systems, etc. This is especially the case as sensor networks become more pervasive and as remote sensing services continue to provide better, higher­resolution data. A key point is to focus the data services so that markets steer away from short­term extractive practices (hedge funds) and toward opportunities to apply data to make food production and distribution systems more responsive, resilient, and efficient.

General Insights

An earlier question asked, who are the stakeholders in this system? We find a number of actors who represent stakeholders in the flow of agricultural data, each of whom represent diverse, sometimes conflicting, interests in the larger value chain:

farmers corporate farms co­ops public/private partnerships, e.g., water districts

24 Of course, the impact of policy changes should be modeled and considered prior to implementation.

Agricultural Data (Q2 2014) The Data Guild Page 17

Page 18: Whitepaper: Agricultural Systems + Data Outlook 2Q14

technology vendors shippers/storage wholesalers/distributors food processors end uses: groceries, restaurants, etc. public policy makers: USDA, CAP, etc. financial markets/traders

Which of these stakeholders require more transparency into the data flows? For example, do the end use cases such as restaurants require traceability at the level of individual palettes, all the way through food processors, shippers, etc., back to the origin at a farm? Does that need for traceability conflict with legitimate concerns about data privacy, or could it open the door for data security concerns and other abuses? In any case, we can use these identified stakeholders to analyze the different issues identified for agricultural data.

Overall, the point of data flowing across these several stages is to generate actionable insights, at very large scale, and in many cases with relatively low latency. That is a tall order, and voices within agriculture lament the volume/velocity/variety of the data, and the “needle in a haystack” effect of attempting to draw actionable insights from mountains of raw data.

Even so, it’s important to keep in mind that other verticals – e.g., finance and telecom – have achieved this already for their own specific needs. Agriculture is known for relatively conservative practices, with perhaps a 10­year cycle for adopting new technologies. To change that aspect in any way, one must understand the root causes: among which uncertainty and enormous risks dominate whole markets, local communities, and families. Farmers earn 40 paychecks in a lifetime, and there is little margin for error.

Even so, as the following drivers indicate, there are good reasons to accelerate key areas of technology adoption. Some of the more conservative bias against new technologies may need to be adjusted due to other looming priorities.

Driver: Drought Outlook

Circa 2014, the predominant issue being discussed in California (and hence, proximate to many of the technology vendors) is drought. Variance in snowpack levels causes serious shortfalls in water resource allocations via aqueducts – with obvious impact on farm operations now in crisis. In addition, variance in the timing of the water cycle stress natural resources and infrastructure throughout these connections, from snowpack to farm or food processor usage: reservoirs, river ways, aquifers, levees, seawater incursion, etc. Attempts to control nature usually fail sooner or

Agricultural Data (Q2 2014) The Data Guild Page 18

Page 19: Whitepaper: Agricultural Systems + Data Outlook 2Q14

later. For further details, see “The Emerging Science of Environmental Applications”, The Fourth Paradigm, Jeff Dozier, William Gail (2009). In particular, two crucial factors have been missed in the related science: mountain hydrology, and measuring/modeling the evaporation cycle.

Another complex issue is how applications of surface water (e.g., aqueducts, diverted rivers, etc.) interact with groundwater usage in the context of aquifers. For example, in the California Central Valley, there have been widespread examples of land subsidence. In conditions where ground water pumpage rates exceed the recharge and surface water inflows, the structure of the aquifer collapses : the rock falls in on itself, leading to sinkholes, damage to infrastructure, and 25

less water holding capacity.

Without better modeling of the water cycle, the impact of these variable factors on agronomic models at scale causes serious problems with the effective use of agricultural data. It also implies opportunities for researchers and entrepreneurs, as well as public/private partnerships involving the larger vendors.

During an extended drought, much of the economics of agriculture shifts. The vendor landscape will experience many changes. For example, growers for leafy greens will sell their water to orchards. Both parties win: the vegetable farms realize greater revenue streams by arbitraging water rights, and the orchards must preserve their capital investments. Therefore the priorities for data use change dramatically. Also, reluctance to adopt new technologies is lowered as farmers recognize existential risks to their businesses, and scramble for any potential remedy. This is currently a major driver for data use at scale, and technology vendors may benefit from the California drought, since many investors recognize the immediate business use cases for technology solutions. Meanwhile, throughout the world there are cycles of drought and flooding which must be addressed. Hopefully the experience in California, now prompting attention from technology firms and investors in Silicon Valley, will have benefits globally.

An audience remark at a recent From Farms to Foodies industry forum summarized the 26

essential problem of water and other natural resources: “We must design systems to help regenerate the soil, not be extractive: using technology from a regenerative standpoint, that is the bigger challenge.” This has been a pervasive problem in agricultural data at the market aggregate stage where emphasis has been placed on extractive use cases, e.g., hedge funds.

Driver: Privacy and Security Issues

25 “Ground­water availability in the United States”, Thomas Reilly, et al., USGS circular 1323 (July 2008)

26 Paul Dolan, Mendocino Wine Company

Agricultural Data (Q2 2014) The Data Guild Page 19

Page 20: Whitepaper: Agricultural Systems + Data Outlook 2Q14

Entirely separate from the drought/flooding issues is a another constellation of concerns which are almost as challenging. The accelerating use of agricultural data at scale – and with that, the increased use of wireless networking, cloud computing, analytics vendors, etc. – brings into question a number of privacy and data security issues. Two recent articles express these themes succinctly:

Big ag companies could now control a data trove that presents privacy and business risks to farmers who don’t want to share the secrets of their trade with rivals or the government. – Businessweek27

At this point, digging into data could represent the next big step forward for U.S. agriculture, but only if farmers feel safe taking the plunge. That’s why the Farm Bureau, the country’s largest farm group, hashed out a new policy for sharing farm data in January. It includes the right for farmers to delete their data whenever they want. – Iowa Public Radio28

Brad Martin @Paia Corporation, an expert in embedded systems and hardware data security, noted several points to be considering carefully about data privacy:

The self­motivated innate wisdom of farmers is apparent in their unwillingness to donate their data for “big data” programs in which they are not certain to benefit proportional to their offering. The past poor­neighbor behavior of industrial partners (c.f. seed patents) has led to a general distrust of motives from those who own the data against those who would profit from using the data (i.e., the industry).

Without the benefits of data sharing, each farmer would likely have lower yield and higher production cost than an optimized analysis would tend to provide ("commons dilemma"). It is somewhat less clear whether well­managed farms will themselves benefit on balance from the overall improvement in yields and certainly unclear if they will obtain any benefit from increased yields on other, competitive fields in the same market.

There may be a “slippery slope” option in which certain aspects of the farmer's data are willingly exchanged for services rendered: a “free” crop analysis tool that keeps (and sells) related records to third parties (anonymized, of course!). There may be a willing limit to the amount of sharing.

27 “Farmers Press Agribusiness Giants for Data Security”, Shruti Singh, Jack Kaskey, Businessweek (2014­01­23)28 “Farmers Worry About Sharing Big Data”, Grant Gerlock, Iowa Public Radio (2014­02­18)

Agricultural Data (Q2 2014) The Data Guild Page 20

Page 21: Whitepaper: Agricultural Systems + Data Outlook 2Q14

Once a farmer's information is merged into an aggregated database, it becomes part of a very desirable target. We know that, given sufficient motivation, virtually no data system is immune to compromise. This is not unlikely: several countries who are major agriculture trading partners with the U.S. have substantial IT espionage programs that have proven effective against even the world's most savvy network companies.

It's also not clear whether such large databases would be protected from government interference. There's no current indication that USDA self­interest would lead to micro­management of farm operations, but it's hard to imagine that compelling uses wouldn't be found for databases of this type, leading to substantial risk to personal privacy for farms of all sizes. Vendors will not knowingly allow uninformed compromise of their customer's/client's databases by third parties.

More troubling is the possibility that these vendors – most of which provide cloud­based aggregation/analytics services to farms – themselves become the targets of cyber attacks. Other industries have seen cases in which innocuous vectors (common web exploits, spear phishing) are used to gain access to a large population of special interest. If data services become dominated by a small handful of software vendors, a targeted exploit surreptitiously downloaded to many farms could be engineered to attack whichever software it encountered. Once compromised, a farmer's data could then be uploaded to the Internet for the attacker's own use.

It's simple enough to say, "keep your data safe, keep your systems updated, be careful about viruses" but it's extremely difficult to do so in a comprehensive fashion. A highly­sophisticated attack will not use any of the known and detectable mechanisms. Antivirus programs and web filters won't work against them. Absent any specialized protection, the farmer who keeps data on a computer that is not directly connected to the Internet will be best protected from attack and loss.

Regardless of the extent of security precautions taken, it is clear that there will be bad actors and there will be security breaches. The consequences will be significant, and the risk grows with the scale of networked, semi­autonomous systems – as in every other industry in the world. Therefore the industry must prepare for backlash from farmers, from the public, and ultimately for increased regulation as a consequence. Those vendors who can take action proactively by building security and privacy controls into their offerings will benefit.

This driver of privacy issues reinforces the inherently conservative nature of agriculture. As a John Deere executive explained, “For the sake of individual data pieces, we are not going to trade in a relationship we spent 175 years building.” Even so, the priorities of drought and other major drivers in agricultural data pose a dilemma: where are the appropriate risks for adopting

Agricultural Data (Q2 2014) The Data Guild Page 21

Page 22: Whitepaper: Agricultural Systems + Data Outlook 2Q14

new technologies? Ultimately the vendors need to become viewed as “honest brokers” by the growers. For example, providing decision support software that recommends another vendor’s seeds.

Is this an argument for privacy­preserving, distributed data mining? In other words, for farmers exercising greater control of their data and getting paid for access to their data by outside parties? Imagine if standards could be created that support distributed data mining where one would be permitted to derive certain kinds of results from the data, but could not access the raw data directly. Analogies exist in e­commerce, where ComScore, QuantCast, DataLogix, and others aggregate data about consumers browsing on the Internet. On the one hand, our browsers aren't exactly gold lodes, except for phishing or passwords, whereas e­commerce firms have become prime targets. On the other hand, individual farm operations store enough aggregate data and analytics to be valuable for attacks. Plus, the value grows with each stage downstream.

Driver: Open Standards and Interoperability

The span of the agricultural data flowing through several stages – from sensor to market aggregates – implies that no one player could possibly own all of the data, or all of the tech stack. Consequently, there are distinct needs for interoperability to allow this field to grow. In other words, unless vendors can find effective ways exchange data at critical points there will be deadlocks in the overall system: data silos, limited degrees of sensor fusion, less predictive modeling possible, etc.

As has been demonstrated in a variety of other verticals, interoperability is best achieved by when vendors can agree to adhere to open standards. For example, the open standards HTTP network protocol and HTML markup format allow for a wide variety of Internet browsers, web servers, content, web services, etc. As it stands currently there is way too much fragmentation in the flows of agricultural data. Platforms and open standards are needed to accelerate innovation and help the field mature.

An open standards body was recently established for agricultural data, the Open Ag Data 29

Alliance (OADA). Their initial work includes a presence on GitHub with an open source code repository. The stated principle of OADA is that “each farmer owns data generated or entered by the farmer, their employees, or by machines performing activities on their farm.” Part of the approach is to establish a common set of APIs on different cloud providers: ostensibly, farmers could migrate their data between different providers. That is a start, and though it does not yet begin to address many of the data security issues, the OADA is providing a forum for

29 “Group to Promote Open Data Standard”, Willie Vogt, Farm Futures (2014­03­12)

Agricultural Data (Q2 2014) The Data Guild Page 22

Page 23: Whitepaper: Agricultural Systems + Data Outlook 2Q14

community discussions.

Moreover, OADA has substantial support from Monsanto, which has stated its aims to 30

integrate and assure data privacy:

The data created by a farmer, or generated from equipment the farmer owns or leases, is owned by that farmer and should be easily managed.

Other interesting efforts toward these ends include: Standardized Precision Ag Data Exchange (SPADE) Project Spatial data infrastructures for precision farming data ­ standards and system design

criteria, Martin Weis (2007)

Compelling solutions for farm operations will focus on interoperability, well­defined interfaces, and the ability to accommodate “plug­ins” from alternative analytics sources. Similar effects were observed among early Internet vendors as Web 2.0 practices drove adoption of open standards. In particular, it will be important for vendors to:

surface their products’ metadata to help avoid potential data silos allow for data import/export between vendors, while propagating schema and lineage support popular geospatial formats, datetime formats, etc. use effective data licensing that does not impede data aggregation downstream

Even so, the capital structure of corporate farms may conflict with adoption of more advanced analytics and interoperability. If so, that could open opportunities for competition from smaller farms and new kinds of data­intensive co­ops.

One of the open standards that is becoming quite important for agricultural data is RFID, in terms of traceability of farm products, accountability of processing and procurement, etc. From a public policy perspective, this also provides the capability to reverse­engineer the procurement chain 31

in case of illness or contamination.

Arguably, there is an open standard used by analytics vendors in general, called PMML, which could readily be used in agriculture. It provides for model portability, guards against vendor lock­in, allows analytics to scale (e.g., on cloud­based infrastructure) independent of where the models are trained, etc. Vendors providing analytics products and services would need to agree on PMML for model import/export. That is likely to occur over time anyway as more traditional IT vendors move into this space.

30 “Guiding Principles on Data and Privacy”, David Friedberg, The Climate Corporation31 “RFID's Role in Food Safety”, Mark Roberti, RFID Journal (2013­07­29)

Agricultural Data (Q2 2014) The Data Guild Page 23

Page 24: Whitepaper: Agricultural Systems + Data Outlook 2Q14

Driver: Funding Analysis

As recently as Q1 2012, the outlook for clean tech investments had been receding . Many of 32

the Silicon Valley venture capital firms backed away from agriculture. One notable exception is Khosla Ventures, which has been consistently engaged in this area. They funded two recent acquisitions by Monsanto Growth Ventures: Climate Corp and Solum.

Other predominant sources of capital investment include: family office investments strategic funds: Monsanto, Dow, BASF, etc. investment bankers challenge funds/incubators: Start­Up Chile, Africa Enterprise Challenge Fund crowdfunding: Kiva, AgFunder, Angel List

One geopolitical aspect becomes apparent in an analysis of the agricultural data vendors: a large cohort of Ag data startups are based in Southern Hemisphere, and most of these have 33 34

been involved with Start­up Chile. So there is a competitive tension emerging between Monsanto Growth Ventures and other strategic funds in the Northern Hemisphere (mostly Silicon Valley since 2013) and incubators in the Southern Hemisphere.

This regional economic tension will likely be shadowed by public policy. For example, Mexico recently ruled against allowing use of Monsanto GMO products . This tension echoes among 35

the influential buyers, e.g., Whole Foods has announced its intent to require GMO labeling by 36

2018. Mexico plays a unique role in the borderlands: it is within the Northern Hemisphere and obviously sells much of its output to the United States, and yet politically and culturally it finds resonance with other Latin American countries in the Southern Hemisphere.

This begs two questions. On the one hand, will other Silicon Valley venture capital firms rush back into clean tech investments following the two recent (circa 2014Q2) successes of Khosla/Monsanto? On the other hand, will national governments tilt the geopolitical playing field by subsidizing incubators following the success of Start­Up Chile?

32 “The state of cleantech venture capital: what lies ahead”, Matthew Nordan, GigaOm (2013­03­27)33 “Avance Proporción de Países Seleccionados Top 5”, slide 9, Start­Up Chile (2014)34 AngelList “Agriculture Startups” (2014)35 “Mexico Judge Bans Monsanto’s GMO Corn”, Devon Pena, Environmental and Food Justice (2013­10­11)36 “Our Commitment to GMO Labeling”, Whole Foods Market

Agricultural Data (Q2 2014) The Data Guild Page 24

Page 25: Whitepaper: Agricultural Systems + Data Outlook 2Q14

Conclusions

The following trends are in progress for each of the six stages:

Stage 1: Data Collection the needs of this stage are complex, but the vendor landscape is becoming crowded

implies much consolidation among startup vendors new entrants face headwinds in the face of market fatigue

remote sensing products tend to augment sensor networks implies large­scale use cases for data fusion, i.e., cloud­based apps startups that focus too much on one data source are probably doomed

farm robotics and tractor instrumentation (mobile) will augment static sensors absent key learnings from other verticals, startups tend to repeat critical mistakes

creating data silos desire to "own" the data and the tech stack lack of promoting standards for interoperability

data quality, communication, and privacy issues beleaguer vendors implies that regulatory policy will emerge, enterprise incumbents may dominate as public policy fails to respond, private solutions emerge

computation and decisions/alerts are pushed to the edge of new sensor networks sensors become extended computational resources that can take action compression techniques, coupled with computational resources in low­power

packages create new opportunities to pervasive sensors nets that rely less on always­on network connectivity

farm operations use cases will drive toward more real­time processing implies pushing computation out to the edge, as a truism throughout the larger

Internet of Things space other verticals (finance, telecom, search) confronted this need already in terms of open source strategy, look to Twitter for leadership

Stage 2: Elastic Infrastructure traditional IT infrastructure vendors will move into the space, edging out the startups:

economies of scale for networking, storage, cloud services, etc. incumbents can navigate regulatory policy more effectively their business tendency to move up the stack for lucrative verticals

startups attempting a “seed to sale” strategy are mostly doomed successful startups will differentiate by focusing on specialized use of infrastructure:

Agricultural Data (Q2 2014) The Data Guild Page 25

Page 26: Whitepaper: Agricultural Systems + Data Outlook 2Q14

emphasize features that address ongoing pain­points, such as data preparation at scale prior to analytics, e.g., data federation, clean­up, curation, metadata alignment, etc.

edge their way into the subsequent stage of analytics by specialized use of elastic infrastructure: time­series, geospatial, imaging, etc.

position themselves for acquisition by IT infrastructure incumbents Demand for better communications infrastructure grows

New opportunities for both established vendors and new entrants to fill gaps, especially through strategic partnerships

Progressive communities establish public­private partnerships that include tax incentives, financing, and other components that make buildout more cost­effective

North/South Hemisphere tension emerges IT incumbents from the industrialized North displace business models for startups

predominantly based in the developing South long product cycles in the North may benefit the relatively nimble startups in the

South, if local politics do not interfere this does not imply a clear “winner” between the two

Stage 3: Analytics analytics products are not an end in themselves; they feed metrics into farm operations

misplaced emphasis at this stage poses additional risks for siloed strategies analytics offerings that are tightly coupled to feedback loops with users in specific

workflows will edge out static dashboards machine learning becomes a key factor use cases where local optimization and

customization provides measurable benefit platforms leverage the coming generation of low­power, high­computation sensors

networks present new opportunities for efficient, highly­targeted analytics that rely less on

constant connection to the cloud "seed to sale" strategies drive startups to bundle infrastructure services with data

collection analytics implies significant duplication of resources and extra costs to farmers duplication costs drive acquisitions and mergers, plus “seed to sale” aversion on the other hand, bundled services of multiple vendors (through strategic

partnerships) could succeed if the value of such a bundle is obvious large analytics vendors may avoid infrastructure plays

left to IT incumbents who typically pursue up the stack in lucrative verticals calibration is a major issue in practice, huge downsides for analytics innovation

requires large capital investments, aggressive partnering, etc.

Agricultural Data (Q2 2014) The Data Guild Page 26

Page 27: Whitepaper: Agricultural Systems + Data Outlook 2Q14

will drive acquisitions near­term by large analytics vendors may drive "crowdsource" calibration services long­term

as regulatory policy emerges, predictive analytics come under pressure implies trade­offs in favor of accountability at the cost of accuracy corporate farms may be too conservative to navigate those issues potential opportunities for new kinds of data­intensive co­ops

without attention to interoperability and standards, analytics products become too static and limit adoption

crucial needs being missed: feature engineering, model portability, tournaments

Stage 4: Farm Operations One Size Fits All, related to "seed to sale", represents an anti­pattern for startup viability

the strategy tends to collapse after early adopter phase wanes farmers demand immediate access to data via mobile devices

natural response to grappling with technology learning curves accentuates needs for cloud­based infrastructure and real­time processing

business needs for recommendation services at this stage drives need for feedback loops and data products from aggregate stages will tend to track a similar market evolution in e­commerce

technology giants focused on yield optimization ROI is a better metric for most farmers’ success (outside of the U.S.); however,

that is even more difficult to model yield increases have come at the expense of disproportionately larger increases

for inputs focus on the precision estimates of aggregate yield presumably serves the

interests of financial traders more so than farmers overall data at this stage presents a more lucrative target for bad actors

this heightens concerns about privacy/security

Stage 5: Distribution traceability is a driver at this stage

implies new kinds of business opportunities surfaces new issues for privacy/security and accountability

processing consumes more water+energy resources than farms do must ensure accountability for consumed resources accentuates needs for monitoring, data collection, elastic infrastructure, analytics,

etc., in parallel to farm sensors potentially a big sustainability win and a big economic opportunity as the costs of

water+energy increase what are the water+energy implications of a harvest after it leaves the farm?

Agricultural Data (Q2 2014) The Data Guild Page 27

Page 28: Whitepaper: Agricultural Systems + Data Outlook 2Q14

consumers will demand transparency implies opportunities for feedback loops and data products

Stage 6: Market Aggregation traditional focus at this stage have been ineffective

short­term opportunities (commodities trading) very high­level qualitative concerns (policy)

major opportunities for leveraging feedback loops in the data algorithmic modeling, aggregate data services, etc.

refocus data services to steer market opportunities steer away from short­term extractive practices (hedge funds) apply data to make food production responsive, resilient, efficient

Outlook: Forced Asymmetries, Tail Wagging the Dog

Increasing variance in snowpack levels and rising rates for anthropomorphic evaporation in wealthy regions (e.g., California) will stress local infrastructure which is already at crisis levels (e.g., aqueducts and transportation). This will force even greater asymmetries in production as well as in technology innovation – in terms of relatively wealthy versus poor regions, where the latter increasingly gain the upper hand for technology and expertise. Of course, niches will persist, such as local organic farms near metro areas in the U.S. Even so, some of the large stakeholders have vested interests in undermining even these: technology giants (for political momentum, homogenizing toward their agenda) and financial traders (surfacing risk through exposed data, extracting capital).

In the wealthy regions: available water resources will be redirected to capital­intensive, high­margin crops such as orchards, vineyards, and premium livestock to preserve capital. Meanwhile, the productivity and long­term commercial viability of these properties is decreasing. Some crops will push north as grow zones change. Other crops will be pushed toward imports (e.g., Southern Hemisphere) or incentivize urban agriculture at scale. Extensive monocultures (e.g., grain) become increasingly subject to systemic risks on several fronts.

As risks increase for capital­intensive crops in wealthy regions, this segment of farmers will become more averse to technologies that open the door to potential privacy and security breaches. They have far too much to lose, while multiple bad actors have too much to gain. In particular, financial interests could engage in aggressively extractive practices at a scale that would make the 2008 credit default swap crisis look small by comparison.

An essential tension is that technology giants will insist on owning the data collection, analytics,

Agricultural Data (Q2 2014) The Data Guild Page 28

Page 29: Whitepaper: Agricultural Systems + Data Outlook 2Q14

operations, etc., required for precision agriculture – either through product features (likely short­term approach, least likely success) or through mergers and acquisitions (most likely long­term success). However, prerequisites for their long­term commercial success are diametrically opposed to the realities of farmers’ concerns about security breaches. Large­scale security breaches will occur, and there will be political backlash in the wealthy nations. Corporate farms already at risk due to water shortages, soil depletion, etc., will become likely targets for hostile actions from the financial sector, particularly in the U.S.

Another inherent tension is that the more vital agricultural production continually gets pushed to poor regions, which are predominantly family farms and smallholder farmers. Increasingly, these become the technology innovators and over time may develop the best practices for efficient 37

natural resource management.

An outlook of forced asymmetries emerges. Effectively, the relatively wealthy regions will promote conditions ripe for highly volatile financial problems locally, while exporting their most vital dependencies to poor nations – which must in turn make increasingly more effective use of natural resources through technology innovation. Likely losers in this equation include corporate farms (legacy practices, inefficient process, capital risks) and incumbent vendors who misunderstand the role of data at scale. Likely winners in this equation include technology giants who leverage interoperability in lieu of owning the technology stack (e.g., Monsanto) and technology centers in the relatively poor regions (e.g., Start­Up Chile) who by definition must now produce the true innovators and in turn will tend to have some of the greatest financial leverage.

The authors acknowledge contributions by Brad Martin and Bill Worzel in the development of portions of this paper, and also acknowledge many helpful suggestions by other members of The Data Guild.

37 Trade and Environment Review 2013: Wake Up Before it is Too Late, UNCTAD (2013): “This implies a rapid and significant shift from conventional, monoculture­based and high­external­input­dependent industrial production toward mosaics of sustainable, regenerative production systems that also considerably improve the productivity of small­scale farmers.”

Agricultural Data (Q2 2014) The Data Guild Page 29