Architecture - CRIA  · Web viewGBIF Architecture. 5. Network ... Word Museo del laboratorio de...

67
ABBIF Proposal: Architecture (Draft March 14, 2006) Index Introduction........................................ 1 DarwinCore..................................................... 3 ABCD – Access to Biological Collection Data....................4 Protocols for Data Exchange......................................4 DiGIR.......................................................... 4 BioCASe........................................................ 5 TAPIR.......................................................... 5 GBIF Architecture................................................ 5 Network Infrastructure..........................................11 Latin America................................................. 11 Brazil........................................................ 15 Analysis........................................... 16 Peru............................................................ 16 Questionnaires................................................ 16 Information System............................................ 17 Venezuela....................................................... 19 Bolivia......................................................... 20 Colombia........................................................ 21 Questionnaire................................................. 21 Information System............................................ 21

Transcript of Architecture - CRIA  · Web viewGBIF Architecture. 5. Network ... Word Museo del laboratorio de...

ABBIF Proposal: Architecture(Draft March 14, 2006)

Index

Introduction..............................................................................................1DarwinCore.......................................................................................................................3

ABCD – Access to Biological Collection Data...................................................................4

Protocols for Data Exchange................................................................................................4

DiGIR................................................................................................................................4

BioCASe............................................................................................................................5

TAPIR................................................................................................................................5

GBIF Architecture.................................................................................................................5

Network Infrastructure........................................................................................................11

Latin America..................................................................................................................11

Brazil...............................................................................................................................15Analysis..................................................................................................16

Peru....................................................................................................................................16

Questionnaires................................................................................................................16

Information System.........................................................................................................17

Venezuela...........................................................................................................................19

Bolivia.................................................................................................................................20

Colombia.............................................................................................................................21

Questionnaire..................................................................................................................21

Information System.........................................................................................................21

French Guyana...................................................................................................................22

Ecuador..............................................................................................................................23

Brazil...................................................................................................................................23

Collections......................................................................................................................23

Information Systems.......................................................................................................25

Strategic Plan..................................................................................................................29Strategy: Proposed Network..................................................................31

Elements of the Architecture..............................................................................................32

ABBIF coordination.........................................................................................................32

Data Providers................................................................................................................32

Portal...............................................................................................................................34

Resource Registry & Discovery......................................................................................35

Tools...............................................................................................................................35

Data archive....................................................................................................................35Proposal.................................................................................................36

Participants (to be confirmed).............................................................................................36

Workshop Program.............................................................................................................37Annex 1: Answers from Collections of Colombia...................................38

IntroductionThere are a number of possibilities to design an information system when its data is actually produced and shared by different parties. Basically, a system can be centralized, distributed, or combined (mixed), with a number of variations.

A centralized system (figure 1) is recommended when data providers do not have the necessary infrastructure (hardware, software, connectivity) or expertise or even when data will only be produced for that particular system.

Data Data ProvidersProviders

Central Central SystemSystem

UserUser

Figure 1. Diagram of a Centralized Information System

By adopting this architecture, data providers don’t need to store any local data and they usually interact with an administrative interface to manage everything remotely. They also have to agree to a common format and content to be implemented in the central database. The great advantage is the low demand on informatics that will be imposed on data providers and the fact that developers will have a very controlled system to work on. The challenge is to maintain data providers actively validating and updating their data.

A distributed architecture is a system where the data is distributed but the query is centralized (figure 2) or where both data and query are distributed (figure 3).

Col 1

Col 2

Col 3Col 4

Col 5

program

Central Repository

query

interface

Col 1

Col 2

Col 3Col 4

Col 5

program

query

interface

Figure 2. Distributed data: centralized query Figure 3. Distributed data and query

1

Advantages include “real time” updating, clarity as to who the data provider is, and the possibility of a closer interaction between data providers and users. Disadvantages include the greater demand on infrastructure and expertise of each data provider and the complexity of developing and maintaining a distributed system.

The proposal is that ABBIF focuses on species and specimen data. Data will include specimen records in biological collections, observation data of field surveys, and taxonomic names. A strategy for each data component must be established.

The choice of the best architecture depends on the existing infrastructure and expertise of each data provider and custodian. Besides that, biological collections hold their data using different software in different operational systems, different formats, and recording different data elements (figure 4).

Col 1

Col 2

Col 3

Col 4

Col 5

programa

buscar

interface

Win2000Brahms

LinuxMySQL

Win98Access

Win98biota FreeeBSD

PostgreSQL

Communication Protocol

Data Model

Figure 4. Diagram showing the complexity of integrating data from biological collections

In order to integrate these systems it is necessary that data providers agree to use a common data exchange model.

To determine the best architecture to be proposed for the ABBIF network, it is important to study: What standards and protocols are available; What standards and protocols are the existing networks, of direct interest to ABBIF,

adopting; and, What is the situation of local data providers and custodians concerning infrastructure and

expertise.

StandardsThe adoption of standards and protocols for the exchange of data and information about biodiversity is fundamental for the development of interoperable systems. In general, one can define a standard as “something established by authority, custom, or general consent as a model or example”1. A communication protocol can be defined as a formal description of rules and message formats that two systems must adopt to communicate and interact. Perhaps the most important and known protocols are TCP/IP (Transmission Control Protocol / Internet Protocol), SMTP (Simple Mail Transfer Protocol), POP (Post Office Protocol) and IMAP (Internet Message

1 Merriam-Webster Online Dictionary (www.webster.com)

2

Access Protocol). This group represents the basis for all data transmission through the Internet. Standard languages such as HTML (Hyper Text Markup Language) and XML (eXtensible Markup Language) are also important as they define rules for formatting the vast majority of documents through the Internet.

An important group that is discussing and developing standards and protocols for data on species and specimens is TDWG (International Working Group on Taxonomic Databases)2.

TDWG’s mission is to: To provide an international forum for biological data projects; To develop and promote the use of standards; and To facilitate data exchange.

A number of working groups have been established within TDWG to develop and promote the use of standards and protocols. Of immediate interest to ABBIF we include: DarwinCore; ABCD – Access to Biological Collection Data; DiGIR; BioCASe; and TAPIR.

DarwinCore3

DarwinCore (DwC) is a standard that began to be developed within the scope of the Species Analyst network based at the University of Kansas Natural History Museum and Biodiversity Research Center. The idea was to define common data fields to all taxonomic groups and this way standardize the integration of primary data of biological collections. This standard uses XML (defined by an XML-Schema) and is being used by most networks such as GBIF4, MaNIS (Mammal Networked Information System)5, OBIS (Ocean Biogeographic Information System6), speciesLink7 in Brazil, among others.

It is based on a non-hierarchical set of data elements which include: InstitutionCode, CollectionCode, CatalogNumber, ScientificName, BasisOfRecord, Kingdom, Phylum, Class, Order, Family, Genus, Species, Subspecies, ScientificNameAuthor, IdentifiedBy, YearIdentified, MonthIdentified, DayIdentified, TypeStatus, ColectorNumber, FieldNumber, Collector, YearCollected, MonthCollected, DayCollected, JulianDay, TimeOfDay, ContinentOcean, Country, StateProvince, County, Locality, Longitude, Latitude, CoordinatePrecision, BoundingBox, MinimumElevation, MaximumElevation, MinimumDepth, MaximumDepth, Sex, Preparationtype, IndividualCount, PreviousCatalogNumber, RelatedCatalogNumber, RelatedCatalogItem, RelationshipType, Notes, DateLastModified. The standard accepts extensions that have been proposed for geospatial, curatorial, paleontology, microbial, and observation data8.

ABCD – Access to Biological Collection Data9

ABCD is a highly structured standard for data about objects in biological collections. Its objective is the same as DarwinCore, except with much more detail as it has around 500 elements against 50 elements of DarwinCore. There are specific elements for observational data sets and for the following types of collections:

Herbaria and Botanical Gardens Zoological Collections Culture Collections

2 http://www.tdwg.org/3 http://darwincore.calacademy.org 4 http://www.gbif.net 5 http://elib.cs.berkeley.edu/manis/ 6 http://www.iobis.org/ 7 http://splink.cria.org.br 8 http://darwincore.calacademy.org/Extensions/ 9 http://www.codata.org/taskgroups/TGbiocollection/

3

Mycological Collections Plant Genetic Resources Paleontological Collections

This data model is being used by the Biological Collection Access Service for Europe, BioCASE10. As DarwinCore it uses XML (defined through an XML Schema). ABCD version 2.0611 has been recommended by the TDWG meeting in St. Petersburg as the adopted version of the standard and has since then been ratified by TDWG members.

Protocols for Data ExchangeNetworks that serve data from biological collections, besides using a standard data model (such as DarwinCore and ABCD) also require a protocol for transferring data.

DiGIR12

One of the first networks of biological collections to be developed as a distributed system was The Species Analyst (TSA), at the end of the 90’s. TSA used the ANSI/NISO Z39.50 protocol which was first adopted in 1988 and was used to interconnect libraries. It defines a communication standard between computers to retrieve information. An important characteristic is the fact that it supports a client-server environment which allows the separation of the user interface from the data server. Z39.50 has also been implemented on a range of platforms. Whilst Z39.50 was an effective solution, there were some issues with the protocol that convinced Species Analyst network developers to study another solution. At the time, the protocol was found to have a complicated specification, which meant a very steep learning curve for developers. Conceptual schemas were not defined with a formal language such as XML Schema; and at the time, there was limited support for XML and Unicode

In order to address these issues, developers of the Species Analyst network and a number of people involved with the TDWG13 held a small workshop in Santa Barbara to start discussing a solution to replace Z39.50 for the biodiversity informatics community. The goal was to develop a protocol that was based entirely on the use of XML documents for messaging between clients and data providers, with a data transport mechanism that was predominantly based on HTTP. DiGIR was designed to offer the same capabilities as Z39.50 except using simpler technologies and a more formal specification for description of information resources. The result is a distributed information retrieval solution that provides an easy entry for participation in distributed information networks.

DiGIR became operational in 2003 and was adopted by a number of networks such as The Mammal Networked Information System (MaNIS), the Ocean Biogeographic Information System (OBIS), the Global Biodiversity Information Facility (GBIF), and the speciesLink Network in Brazil.

BioCASe14

The Biological Collection Access Service for Europe (BioCASE), a network of biological collections, adopted ABCD as the concept schema, and for this purpose modified the DiGIR protocol to meet its needs. This modified protocol is known as the BioCASE data transmission protocol or just simply BioCASE. The protocol is based on the DiGIR protocol, but was forced to incorporate some BioCASE-specific changes that unfortunately make the two incompatible.

10 http://www.biocase.org/ 11 http://www.bgbm.org/TDWG/CODATA/Schema/ 12 http://www.digir.net/ 13 www.tdwg.org/ 14 http://www.biocase.org/dev/protocol/index.shtml

4

TAPIR15

In 2004 GBIF promoted a study to develop a new merged protocol that would meet the needs of both DiGIR and BioCASE networks (Döring & Giovanni, 2004). This protocol was named TAPIR (TDWG Access Protocol for Information Retrieval) and shall be tested in 2006. It is expected that both networks, BioCASE and those that have adopted DiGIR, migrate to the new protocol. The new protocol is being tested by implementing it in two data provider software packages, representing each of the existing network communities, BioCASe (the BioCASe PyWrapper software ) and DiGIR (a new Java provider package currently named DiGIR2). A detailed TAPIR specification document is also being developed.

GBIF ArchitectureWe have discussed possible architectures (centralized, distributed, and combined or mixed) and standards and protocols that are being adopted internationally. Another important feature of this analysis is to observe what GBIF, that is openly serving species and specimen data on the Internet, is using. GBIF plays a fundamental role as it is the global initiative that is integrating species and specimen data worldwide. Whatever architecture and strategy is adopted by ABBIF must be compatible with this initiative.

In 2003 GBIF established its “architecture fundamentals” which are important and relevant when designing an information facility (see GBIF Biodiversity Data Architecture, 200316). The basic principal was not to impose any specific software or technology, but having the access to biodiversity data as its key goal.

The document presents as basic principals: Free access to data: this implies that any restrictions must be carried out at the data provider

level, the system would not control user access to data; Support for global users: the idea is to enable the implementation of different human

languages in presentation services; Consider human and machine users: the system would be implemented to be accessed by

web browsers and web services; Consider structured and unstructured data: the document acknowledges the importance of

defining both structure and content of data (fundamental for interoperability and machine analysis) but also includes that it is important to make unstructured data available;

Reusable, replaceable, and redundant components: the idea is to develop a framework where new data providers can be rapidly added; promote the maintenance of persistent data sources, as opposed to databases where their lifetimes are tied to a project; planning for redundancy, replicating working components to different locations across the globe; and adopting an open technology framework, where operating systems, database management systems, web servers, programming languages, and other tools are a choice to be made by each participant according to existing needs and skills.

GBIF has developed a network based on nodes (figure 5).

15 http://ww3.bgbm.org/tapir 16 http://circa.gbif.net/irc/DownLoad/kjeFA-J1mmGHrfOtAyTZ74s8jUwq9HoJ/p6hpeSGHkYZQWMiF42pMFYPs7fCtNHv-/GBIFBiodiversityDataArchitecture-v0.7-draft.pdf

5

Figure 5. GBIF Network: major classes of nodes

GBIF is responsible for running the network, establishing standards, and developing tools. The portal is the hub for the development of any service that must be centralized such as the registry of metadata and for serving data from the biodiversity data index to the end user. GBIF participants’ nodes are established to share biodiversity data. They may be gateways to data nodes or data nodes themselves. They may also provide services such as mapping, analysis, and hosting of orphaned data sets. Data nodes are primary providers of data.

When GBIF was first designed, key elements of the Portal were the Biodiversity Data Index and the Taxonomic Name service (figure 6).

Figure 6. Diagram of the GBIF portal

The Biodiversity Data Index holds a subset of the data held by the data nodes and includes specimen identifiers associated with identification, geospatial and temporal information.

6

Centralization of these subsets of data supports a much more rapid response to user queries, minimizing network traffic. Although taxonomic names provide the primary organizational structure for biodiversity data, no complete catalogue of names is available today. This is an ever evolving task which requires international collaboration. GBIF is also involved in a number of initiatives to create web services such as mapping, georeferencing, and data cleaning. This portal presently is much more complex and figure 7 presents a diagram of how the future portal is expected to operate.

7

Figure 7. GBIF’s data portal deployment model

The central column represents functions which should be executed centrally (marked as GBIF Secretariat). The components involved in delivery of services to end users and portals are shown as replicated to a number of mirror sites. The Master Data Store needs to be implemented in a

8

single location (and should at least be associated with a "Master" instance of the Despatcher component, but the Crawler and Validation Chain components could also be mirrored for efficiency.

The existing GBIF UDDI registry would need significant enhancement before it could properly support the process illustrated here.

The Schema Repository should be developed in close conjunction with the TDWG Technical Architecture Group and can initially be represented by a small stub implementation that offers equivalent function to the rest of the Data Portal.

The Crawler corresponds largely to the Indexer component of the existing prototype Data Portal. It includes a scheduler which identifies data resources which should be indexed or checked for updates and develops an appropriate strategy in each case for accessing modified data. It should maintain a "map" monitoring the progress made in indexing any resource so that the process can be interrupted and restarted, and also so that data providers can be notified of any records from their resource which could not be accessed for any reason. The data offered by the Service Registry will provide the basis for the Crawler's activity (including endpoint URLs, protocols and data standards supported, acceptable times and days for crawling each provider's data, any agreements made with providers as to how much data the Data Portal should cache in the Master Data Store, etc.). The Crawler should process the data retrieved by placing an object into the Validation Chain for each record found (new and modified records; also objects indicating the completion of an indexer operation for a given provider to allow for clean-up of obsolete records, etc.).

The Validation Chain corresponds largely to the Data Validation Services described in the GBIF Data Portal Strategy, but also includes some other function from the Indexer component of the current prototype Data Portal. This is a configurable workflow component that allows a range of processing steps to be applied to each object placed into the chain. The exact steps will vary according to the nature of the record concerned. It will include the generation of a series of annotations to the object based on routines to validate or interpret the data in the record. The aim is to reach the end of the Validation Chain with a clear understanding of what the record represents in as much detail as possible, including an evaluation whether there are ambiguities or problems with any of the data elements. By the end of the chain, all objects should be in a form that can readily be stored in the Master Data Store.

The Despatcher is a new addition to the model to ensure the greatest possible flexibility in how the Data Portal may operate. The key role of this component is to forwarding the objects from the Validation Chain into the Master Data. It will however be the natural point to process information which should be included in a report to each data provider at the end of each visit to index their data. Upon further review and discussion with GBIF stakeholders (including data providers) a range of other notification services could be implemented at this point (e.g. forwarding objects or notifications to thematic and regional portals whenever records appear which are of interest to those portals; management of notifications to users of the addition of data relating to their taxa of interest). Such extensions would be a future option, but the development of a generic Despatcher will make this easy.

The Master Data Store (Data Index) is implemented as a database used solely for managing the best possible overview of the data in the GBIF network and does not itself support requests from users or remote portals. All such requests will be made against Slave Data Stores maintained by MySQL replication.

The Access Portal is a layered application making use of Hibernate to access data from a Slave Data Store and including a Service Layer implementing all logic associated with the Data Portal's processing of data for display. Axis will be used to provide an XML access interface to the methods offered by the Service Layer. These methods will be those required to develop an HTML User Portal based on the GBIF data. Axis will allow these same methods to be exposed easily as SOAP web services for use by other portals. This interface will represent a "GBIF Native Portal Interface" which will not always map directly to TDWG standards (since frequently only a tiny number of data elements are needed and these should be combined in different ways from the standards). Additional access interfaces (TAPIR, WFS, etc.) can also be implemented and exposed from the

9

Service Layer. The Data Portal's own HTML User Portal and User Services will be implemented by a JSP layer based on the XML Data Services (the "GBIF Native Portal Interface").

Mirroring will be implemented by a combination of multiple DNS records and Apache redirection.

But GBIF is more then just a portal. Figure 8 shows GBIF’s data-exchange architecture.

Figure 8. Diagram of GBIF’s data exchange architecture

This diagram emphasizes GBIF’s basic layers, as follows (from the bottom up): Resources - There is an increasingly large number of digital resources relating to

biodiversity. These may be in just about any format (various databases with all kinds of data models; human readable text documents in various formats; images; etc.) and may or may not yet be connected to the Internet.

Access – To make these resources accessible in a practical way, it is important to select a limited number of agreed transfer protocols and formats to expose them on the Internet. GBIF has adopted various TDWG data standards and protocols for this purpose (DiGIR/BioCASe/TAPIR, Darwin Core, ABCD, Taxon Concept Schema) and also expects to handle access through plain URLs where appropriate or via Globally Unique Identifier (GUID) resolution services as these are agreed and implemented.

Discovery – Once these resources are available on the Internet, it is important to advertise them to potential users. GBIF has established a (UDDI) registry for this purpose to store information describing the content and access interfaces for resources and to allow GBIF and others to find resources of interest for various purposes. GBIF has been operating its registry for over two years and plans soon to replace the existing implementation with one that offers richer function for describing resources and for searching for resources of interest. Other registries may be developed to meet the interests of different networks and communities. These may still benefit from the use of the same protocols and data standards adopted by the GBIF network (access to reusable software components; ability for some resources to be part of both networks; etc.).

10

Indexing – Within a large distributed network, it is important to maintain a dynamic map of the content of the network at a finer level of detail than is possible with the metadata stored in a service registry. GBIF is therefore developing a central index of biodiversity data by crawling the contents of resources registered in the UDDI registry. This index will itself be exposed through a range of web services to allow users to get rapid answers to many basic questions and to provide pointers to relevant data records throughout the network. Again it is likely that other groups may develop their own special-purpose indexes based on the underlying infrastructure, benefiting from the common core of standard access mechanisms and discovery services.

Presentation & analysis – These underlying layers should provide a common set of core services suitable for GBIF and others to build a wide range of applications and portals. GBIF will continue to develop a central portal for rapid discovery of basic information, but other groups may develop more specialized portals which integrate information from the central GBIF index and all the underlying network resources with other information managed by the groups concerned. Since the interfaces to the GBIF index and other resources will be exposed as web services, it will also be possible to include these data within workflow applications of various kinds.

In general, GBIF expects ultimately to see increasing diversification at the higher levels in this diagram, but strongly encourages the shared use of as many of the lower layers as makes sense in each case. Its goal is to support the replication of the GBIF data services on a regional basis to ensure that the information from the GBIF registry and index are available for inclusion within local applications and portals.

We believe that ABBIF must follow GBIF’s general concept of a Network Portal with data nodes or data providers and participant nodes that encourage local participation and may themselves act as data nodes. It is important to analyze the answers to the questionnaires to identify local institutions that already are GBIF nodes or that may contribute to the network.

Network InfrastructureAnother important element that helps define the architecture is the existing or potential communication infrastructure. The present analysis is based on the document Redes Nacionais de Educação e Pesquisa: Situação no Brasil e América Latina17 written to offer subsidies to Brazil’s national strategy for biological collections.

Latin AmericaThe digital divide is something that concerns scientific research due to our ever increasing dependency on network infrastructure and on information and communication services that many times are not available in developing countries. Latin American countries present a very heterogeneous and fragile situation especially when compared to more developed regions. 10 Latin American countries hold operational academic networks, the best being located in México, Brazil, and Chile. In Colombia, El Salvador, Guatemala, Honduras, Nicaragua, Panama, Paraguay and Peru the academic networks are still in an organization phase.

The Americas Path (AMPATH) project led by Florida International University in 2001 established a high performance exchange point in Miami, Florida to facilitate peering between U.S. and international research and education networks (figure 9).

17 http://www.cria.org.br/cgee/documentos/redesALC310505.doc

11

Figure 9. Diagram of the international, high-performance research connection point in Miami, Florida (AMPATH)

Recently, in 2004, with the support of the European Commission (@LIS program) another network, RedCLARA, began to operate and will include 18 countries. This is certainly a milestone in Internet connectivity in Latin America. Besides facilitating the development of new networks this certainly is an opportunity to build common research agendas of regional and global interest.

Figure 10 presents a diagram of the network.

12

Figure 10. ALICE Project and Red CLARA

The topology for RedCLARA includes connections of 155 Mbps with the main national networks (Argentina, Chile, Brazil and Mexico) and of 10 to 45 Mbps to the other South American countries. Peru and Uruguay were recently connected and the next will be Costa Rica, El Salvador, Nicaragua, Guatemala, Panama and Ecuador. Connections to Bolivia, Ecuador and Colombia are planned. A connection of 622 Mbps leaves Brazil and interconnects RedCLARA to the GÉANT network of research and education in Europe.

Tables 1 and 2 present a synthesis of the situation of national research and education networks (NRENs). When comparing data from more developed with developing countries it is clear that the situation of Latin America is not good, especially when one considers new developments and applications that are adequate in environments with good infrastructure but may be prohibitive in less developed countries. Table 2 also shows what countries will gain with RedCLARA.

13

Table 1. National Research and Education Networks of some countriesCountry Organization Status Connectivity

BackboneExternal Capacity

Connected Institutions

Germany G-WIN operating 2.5 – 10 Gbps US - 2 x 2.5 GbpsEU - 5 Gbps

550

Korea KREONET2 / KOREN

operating 2.5 – 10 Gbps US - 2 GbpsEU - 155MbpsJapan - 2 Gbps

277

Holland SURFnet5 operating 10 Gbps US - 10 GbpsEU - 10 Gbps

150

Poland PIONER operating 10 Gbps US - 2 GbpsEU - 10 Gbps

21 Metropolitan Network5 High Performance Computing Centers

France RENATER operating 2.5 Gbps EU – 10 GbpsUS – 4 x 2.5 GbpsCA – 2 x 1 Gbps

50

USA Internet2 / Abilene operating 155 Mbps – 10 Gbps EU – 10 GbpsAsia – 10 Gbps

220

Source: ICFA SCIC Report – Networking for High Energy and Nuclear Physics, February, 2004

Table 2. National Research and Education Networks of Latin AmericaCountry Organization Situation Connectivity

BackboneExternal Capacity Connected

Institutions2004 2005 Before

CLARAAfter CLARA

Argentina RETINA operating 256 Kbps a 34 Mbps

45M 59 Mbps +45 Mbps 56

Bolivia under development

64 a 128 Kbps

1.5 Mbps In negotiation

18

Brazil RNP operating 34M a 622 Mbps

Up to 10 Gbps to 10 states

555Mbps +155 Mbps 220

Chile (*) REUNA operating 155 Mbps 1 Gbps 45 Mbps 90 Mbps 14Colombia Universidad de

CaucaUnder development

2 Mbps-34Mbps

34 Mbps In negotiation

43

Costa Rica CR2 Net operating 45 Mbps 45 Mbps 8 Mbps 45 Mbps 8Cuba REDUNIV Under

development64Kbps-2 Mbps

6 Mbps In negotiation

23

Ecuador REICYT operating 128 Kbps-5Mbps

45 Mbps 8 Mbps 16 Mbps 20

El Salvador

RAICES Planning phase 10 Mbps 9

Guatemala RAGIE Planning phase In negotiation

7

Honduras RHUTA Planning phase In negotiation

-

México CUDI operating 2Mbps a 155 Mbps

2*1 Gbps 3*155 Mbps

45 Mbps 60

Nicaragua RENIE Planning phase In negotiation

8

Panama REDCYT operating 2-5 Mbps 45 Mbps +10 Mbps 8Paraguay ARANDU Planning phase 128Kbps-

10MbpsUp to 155 Mbps to 2 sites

2 Mbps 12 Mbps 37

Peru RAAP operating 10 Mbps 45 Mbps 45 Mbps 45Mbps 8Uruguay RAU operating 64 Kbps a 1

MbpsUp to 100 Mbps to 12 sites

6 Mbps 18 Mbps 46

Venezuela REACCIUN operating 26 Mbps 53 Mbps + 45 Mbps 78

Source: CLARA.

It is important to observe that in the global scenario NRENs are constantly evolving and achieving higher levels of connectivity to meet the requirements of new applications developed by research and education institutes worldwide. Countries that are catching up, such as Korea have their backbones at a Gbps level. In these countries one can also see the greater level of investments to guarantee a good connectivity in the extremes of the network (end-to-end). Latin American countries are still in the Mbps level (one thousand times less) with the exception of Brazil, Mexico and Chile. When compared to developed countries one can state that Latin America is in the

14

situation these countries were 5 years ago. This is certainly a constraint to international cooperation in the field of science and technology.

BrazilBrazil has its Research and Education Network (RNP) installed since 1989. RNP integrates all 26 Brazilian states and its capital through a backbone of up to 10 gigabits per second. São Paulo, Rio de Janeiro, Minas Gerais and Brasília, are on a backbone of 10 Gbps; while Rio Grande do Sul, Santa Catarina, Paraná, Bahia, Pernambuco and Ceará, at 2,5 Gbps. The rest of the states are connected through links of up to 34 Mbps. It is expected that the whole network will be operating at gigabits by the year 2007 (figure 11).

Figure 11. RNP Backbone

The national network (RNP) links about 300 universities, research institutions and federal agencies. Integrated to the national network are the state networks that distribute the network from the state presence point of RNP. The most important state networks are Santa Catarina, Paraná, Rio de Janeiro, and São Paulo. In the case of São Paulo, there is also a Research and Development Program (TIDIA)18 in different areas of information and communications technology, telecommunications and computer networks, associated with the advanced internet.

18 Tecnologia da Informação no Desenvolvimento da Internet Avançada (www.tidia.fapesp.br/portal)

15

AnalysisThe questionnaire that was sent out to evaluate the situation of data providers and custodians of the region included questions of relevance for the definition of the best architecture, such as:

Standards Used

Data Model: Darwin core, ABCD, CABRI, Others (specify)

Protocol: DiGIR, BioCASE, Z39.50, http, xml, Others (specify) Existing infra-structure: Hardware and software: Staff: Adequate, Insufficient (specify) Internet Access:

Type of Internet Access: None, Modem, dedicated line. Data and Information Access Policy: Unrestricted access, Restricted access Willingness to participate in this project.

All questionnaires from collections from countries located in the Amazonian region were analyzed. Collections that don’t wish to participate in ABBIF or that don’t want to share their data were not included. Institutions that don’t have specimen data were also not included.

Peru

QuestionnairesCRIA sent out the ABBIF questionnaires to 5 institutions and 15 individuals in Peru and received 7 answers. The answers from 6 institutions were sent by Siamazonia and 1 answer was sent from a private collection. Table 3 shows the result of the questionnaire concerning standards, protocols, infrastructure, Internet access and information policy.

Table 3. Answers to the questionnaire from institutions in PeruCollection total records Digitized (no. &

% of total)Georeferenced (no. & % of digitized)

total records Amazon

Digitized (no. and % of total)

Georeferenced (no. and % of digitized)

Siamazonia 60.000 60.000 30.000 60.000 60.000 30.000Herbário MOL-FCF 11.428 6.857 5.000Herbário Amazonese

130.000 32.500 22.750

Herbário Regional de Ucayali

Herbário Herrerensi 6.000 3.300 2.640 5.000 3.200 0 - 2500

UNMSM (11 collections)

1.500.000 200.000 400.000 40.000

Personal collection of leaf beetles and their host plants

100.000 5.000 5.000 5.000

Total (without Siamazonia

1.747.428 247.657 (14%) 30.390 (12%) 415.000 100.000 (24%)

30.000 (30%)

16

  Standards & Protocols

Infrastructure Internet access

Information Policy

observation

Siamazonia DarwinCore, DiGIR, http, xml

Sufficient hardware, require software for data analysis, mirroring, Arc IMS, sufficient staff

512 Kbps unrestricted gbif node

Herbário MOL-FCF Require more disk space, sufficient staff

dedicated line

unrestricted

Herbário Amazonense DarwinCore, http, xml

computers, disk space, camera, require staff for the collection

256 Kbps unrestricted

Herbário Regional de Ucayali

No answers

Herbário Herrerensi DarwinCore, http

computers, disk space, camera, scanner and software for collection management, insufficient staff

512 Kbps unrestricted

UNMSM (11 collections)

computers, camera, scanner, memory (servers and software for image editing), require people for digitization

dedicated line

unrestricted the museum has not adopted standards

Personal collection of leaf beetles and their host plants

Information SystemPeru is in a very good situation as it has developed Siamazonia19, the information system for biological and environmental diversity of the Peruvian Amazon (Sistema de Información de la Diversidad Biológica y Ambiental de la Amazonía Peruana). Siamazonia was created in 2001 through the BIODAMAZ project (Proyecto Diversidad Biológica de la Amazonía Peruana), an agreement between Peru and Finland, and was developed by the Instituto de Investigaciones de la Amazonía Peruana (IIAP). IIAP is a GBIF node and therefore is a natural partner of the ABBIF network.

Its structure is based on nodes, similar to GBIF. The following diagram was taken from its website:

Figure 12. Structure of the Siamazonia Network

In the diagram, the facilitating node is IIAP that has committed itself for long-term development and maintenance of secretarial, technical, and administrative tasks of the system. Principal nodes are

19 www.siamazonia.org.pe/

17

universities or their museums, research institutes, and other institutions with valuable information resources and interest in participating in the development of the system. Their representatives (IIAP and principal nodes) constitute the Steering Committee, which is the major decisive body of the system. Additional nodes may include a broad category of institutions of interest to the network, but that don’t fulfill the requirements of principal nodes.

IIAP produced a technical document presenting an overview of the architecture for the planned Peruvian Amazonian Biodiversity and Environmental Information System (IIAP, 2004)20. This document is a result of five regional workshops held during the months of March and April, 2001. Besides its proposed node structure, the document also presents a diagram of the information system (figure 13) where it includes a linkage to GBIF.

Figure 13. General Structure of the Information System (IIAP, 2004)

The document also states that databases in general are of free access.

Siamazonia is already serving data to GBIF using DiGIR. One resource is Observations of flora y fauna of Peruvian Amazon by BIODAMAZ project with 477 records and 112 taxons and the other resource is Information of Flora and Fauna in Varzeas (Peruvian Amazon) with 11.009 records and 3.218 taxons.

20 Sistema de Información de la Diversidad Biológica y Ambiental de la Amazonía Peruana (SIAMAZONIA), Serie IIAP-BIODAMAZ, ISBN N° 9972-667-10-3, 2004. http://www.iiap.org.pe/biodamaz/faseii/download/literatura_gris/2.pdf

18

VenezuelaQuestionnaires were sent to 21 institutions and 28 individuals and 11 answers were received. Table 4 and 5 present the answers to the questionnaire as to available data, digitization, facilities, and policy.

Table 4. Answers to the questionnaire concerning no. of records and digitization

Acronym Checklists group

No. Records (total) Georef.

digitized

No. Rec. Amazon

georef. Amazon

Digit. Amazon Software

COP birds 80.000 80.000 80.000BIOTA database

EBRG

Reports to MARN on diverse vertebrates vertebrates 61.529 6.000 6.000 7.404 740 7.404 Excel

PORT (BioCentro)

Checklists of Amazonian phanerogams phanerogams 100.000 7.500

Visual Basic, Access

ecoSIG - interested in participating as data custodians

Databases on Amazonian amphibians, birds and mammals

maps, lists of species (amphibian, birds and mammals)

Sistema de Información Geográfica, Arc-View

GUYNphanerogams, cryptogams 18.625 13.000 Access

BioCentro - Museo de Zoologia fish 53.000 40.000 53.000 10.000 10.000 10.000 Specify

VENplants, fungi, algae 350.000 35.000 113.000 27.200 27.200 Access

MHNLS

phanerogams, vertebrates, invertebrates 190.000

190.000 190.000 5.399 5.399 5.399 WinISIS

MBUCVterrestrial vertebrates 14.898 10.000 14.898 1.628 1.628 1.628 Excel

MIZA insecta2.500.00

0 5.289 5.289 500.000 768 768

PostgreSQL, PHP, Excel

ULABGamphibian, reptiles 200 0 0 200

Total3.368.25

2379.28

9 469.687 551.831 18.535 52.399% 100,0% 11,3% 13,9% 16,4% 3,4% 9,5%

An important feature is that, with the exception of one institution, all are in the process of digitizing their holdings. 14% of the over 3 million specimens are digitized, with a very high percentage of georeferencing (over 80% of the digitized records). If one doesn’t consider MIZA’s collection of insects (2.5 million of which less then 0.2% is digitized), we are considering over 850 thousand specimens, of which more than 460 thousand records (or more than 50%) are digitized.

19

Table 5. Answers to the questionnaire concerning on-line data

Acronym available online

Standards & Protocols

Adequate Hardware & Software

Adequate Staff

Internet Access

Restricted access

Willingness to participate

COP no   yes yes ABA 256 yes

not before knowing ABBIF conditions and aims

EBRG no   no no none   yes

PORT (BioCentro) no   no no none yes yesecoSIG - interested in participating as data custodians http://ecosig.ivic.ve   yes yes

100 Mbps no yes

GUYN no   no nodedicatedline yes yes

BioCentro - Museo de Zoologia no DIGIR yes no none yes yesVEN no   no no   no yesMHNLS no   yes yes 192/128 no yesMBUCV no   no no none   yes

MIZAhttp://www.miza-fpolar.info.ve

Darwin Core, http no no none yes yes

ULABG no   no nodedicatedline yes

yes, if of mutual benefit

Although the digitizing process seems to be in place, the same does not apply to on-line availability of data. Venezuela, unlike Peru, does not hold a GBIF node or have a local organization working on an information system to integrate biodiversity data. Two institutions indicated that they have data on-line: Ecosig, a geographic information system; and the Museo del Instituto de Zoología Agrícola (MIZA). MIZA is the collection with the largest holding (2.5 million specimens - insects) of which only 0,2% is digitized.

Venezuela shows a need for resources to digitize data and also to develop a system to make biodiversity data available on the Internet. There is a project that was recently approved on the development of an integrated information system for vertebrate collections in Venezuela. The project involves the following institutions: Museo de Historia Natural La Salle (MHNLS); Museo de Biología de la Universidad Central de Venezuela (MBUCV); Museo Estación Biológica Rancho Grande (EBRG); and the Colección Ornitológica Phelps (COP), all of which answered the ABBIF questionnaire and together are responsible for 350 thousand specimens. The focal point of this project is MHNLS. We believe that this project can learn from experiences such as GBIF, CRIA and Siamazonia and use all open source developments that are available.

BoliviaThe questionnaire was sent to 14 institutions and 13 individuals and CRIA received answers from one herbarium and two zoological collections all from the Museo de Historia Natural Noel Kempff Mercado (MHNNKM)21.

21 http://www.museonoelkempff.org

20

Collection

No. specime

ns SoftwareData online Protocol

Staff, Software & Hardware Internet information

Willingness to participate

Herbario del Oriente Boliviano 65.000 Excel no http Inadequate ADSL Restricted yesZoological collection 92.970 Excel no http Inadequate ADSL yes yesEntomological collection 500.000 Excel no http Inadequate ADSL yes yesTotal 657.970              

There was no information on the percentage of digitized or georeferenced data. The Museum’s website holds information on research that is being carried out, on maps and also presents lists of species from the project of the Fundación para la Conservación del Bosque Seco Chiquitano, Cerrado y Pantanal Boliviano (FCBC). Data of the collections’ holdings are not available on-line.

Colombia

QuestionnaireBased on the web survey on possible data providers that was carried out in the beginning of the project, 124 questionnaires were sent out to Colombian institutions and 133 to individuals. Only 9 answers were received, but, as is the case of Peru, Colombia has a GBIF node, the Alexander von Humboldt Biological Research Institute. They were contacted directly and carried out a very good survey on 93 institutions involving 29 herbaria, 60 zoological and 9 microbial collections. These institutions together hold a total of 4.071.632 records, more then 50% digitized and about 10% of the digitized records georeferenced. Only approximately 2% of the records are from the Amazon region, but this is undoubtedly an important initiative to be sponsored (tables in annex 1).

Information SystemThe Alexander von Humboldt Biological Research Institute are natural partners of ABBIF as they are a GBIF node and are responsible for the Biodiversity Information System SIB (Sistema de Información sobre Biodiversidad22) that is being developed in Colombia.

SIB is being implemented as a distributed network. Humboldt is the leading institution, and is officially23 responsible for its design, implementation and general coordination. The structure includes a Technical Committee that is responsible for:

defining general aspects of a national policy for biodiversity data and information management;

validating technical elements and providing recommendations as to the implementation of SIB at a local, regional, and national level;

establishing a line of capacity building, replicating the SIB model and promoting expertise in other entities; and,

facilitating the articulation of SIB with other information initiatives at the national, regional, and global lever.

The technical committee today is composed of members of the following institutions: Instituto Amazónico de Investigaciones Científicas – SINCHI Instituto Alexander von Humboldt Instituto de Hidrología Meteorología y Estudios Ambientales – IDEAM Instituto de Investigaciones Marinas y Costeras José Benito Vives de Andréis – INVEMAR Instituto de Investigaciones Ambientales del Pacífico – IIAP Instituto de Ciencias Naturales de la Universidad Nacional – ICN

22 http://www.siac.net.co/Home.php 23 Ley 99 de 1993 y los decretos reglamentarios 1600 y 1603 de 1994

21

Ministerio de Ambiente, Vivienda y Desarrollo Territorial

SIB is also composed of regional and thematic networks.

The data model that is being used is DarwinCore V2 standard (as the minimum acceptable content) and the Estándar para intercambiar información al nivel de organismo designed by the project team24. The interface for distributed searches is not available. SIB communication protocol was being developed concurrently with DiGIR, but it seems clear now that in order to share data with other initiatives it is important to use a common protocol. GBIF recommended that SIB should use TAPIR that should be ready for testing in the near future.

SIB (Dec 2005) has four datasets publicly available: Butterflies of the Schmidt-Mumm Biological Collection (Humboldt), Pteridophyta of the FMB biological collection (Humboldt), Leguminosae of the FMB biological collection (Humboldt) and Selected records from the National Herbarium of Colombia (ICN-UN). According to Ángela M. Suárez-Mayorga25, standardization and proper documentation of biological data and/or metadata are ongoing in nearly 30 organizations and four networks of data administrators: Red Nacional de Observadores de Aves, Red Nacional de Jardines Botánicos, Red de Colecciones Biológicas de los Andes and SIRAP-Eje Cafetero. All are also documenting metadata for biological datasets, following the "Estándar para la documentación de metadatos de conjuntos de datos relacionados con biodiversidad26. One problem that Humboldt faces, as do many other data custodians, is in convincing data providers to openly share their data on the Internet.

French GuyanaAlthough French Guiana is an overseas department of France and, consequently, is politically a part of Europe, it is located in South America and for this reason will be included in this report as an “Amazonian country”.

The “Herbier de Guyane (CAY)”, a Center of the Institute de Recherche pour le Developpment (IRD) in Cayenne answered the questionnaire. The herbarium houses approximately 160,000 vascular plant, bryophyte, and fungal specimens collected in the Guiana’s area, mainly in French Guiana, more than 125,600 of which are digitized in the AUBLET2 database (2,500 with digitized images), ca. 90,000 georeferenced. Of the 160,000 specimens, 452 are nomenclatural types. IRD is a member of the “Flora of the Guianas” consortium. Moreover, the Herbarium contributes to the “Flora Neotropica” program (New York Botanical Garden) and to the “Checklist of the vascular plants of the Guyana Shield” (Smithsonian Institution, Washington DC).

Of the 160 thousand specimens, 125,600 (78.5%) are digitized and 90,000 (56%) are georeferenced. The database software used is Oracle and the data model RIHA (Réseau Informatique des Herbiers Africains) that is compatible to ABCD. Biocase is used as a communication protocol

The data is freely available on-line at http://www.cayenne.ird.fr/aublet2/ and the herbarium also serves data through gbif (123,634 records).

The herbarium requires scanning equipment for digitizing images of the specimens, especially the types. They are also interested in establishing a digitizing program of the non vascular plants and of the Guiana Shield collections and would require additional staff.

The herbarium is open to collaboration with new partnerships of the Amazonian countries to share experience and their data is available without restrictions. This data could therefore be immediately available to ABBIF.

24 http://www.siac.net.co/sib_descargas.php 25 personal email January 10, 200626 http://www.siac.net.co/sib_descargas.php

22

EcuadorThe questionnaire was sent to 16 institutions and 23 individuals, based on the Internet survey. Three answers were received, including 2 collections that have interest in participating in ABBIF.Name coll. Acronym Checklists No.

Specimens % Digitalized

No. records Amazon

software

Unión Mundial para la Naturaleza

UICN UICN databases Access, Cold Fusion

Escuela Superior Politecnica de Chimborazo (ESPOCH), Herbarium

CHEP TROPICOS database

8.700 5.000 1.500 TROPICOS (pick)

Pontificia Universidad Católica del Ecuador

Herbario QCA

Checklists for Equadorian Angiosperms and Pteridophytes

250.000 30.000 10.000 Filemaker Pro

Acronym Data available online? URL

Standards & Protocols

Hardware & Software

Staff Access to Internet

Information policy access

Willingness to participate

IUCN www.sur.iucn.org

http Adequate Inadequate dedicated 512

Restricted Yes

CHEP no Tropicos (Pick)

Inadequate Inadequate Modem Unrestricted Yes

Herbario QCA

no Inadequate Inadequate Ethernet, 54kbp

Unrestricted Yes

These collections require support to digitize and to make data available on the internet. IUCN, although leading the conservation commons initiative27 does not have a clear data sharing policy in Ecuador or have data on species readily available.

The Herbaria of the Pontificia Universidad Católica del Ecuador is a collaborator of Missouri Botanical Garden together with the Herbario Nacional at Museo Ecuatoriano de Ciencias Naturales and the Department of Systematic Botany of Aarhus University, in the Catalogue of the Vascular Plants of Ecuador project28. A possible strategy may be the establishment of a regional server with a portal at QCA to begin structuring local data and to serve data to the ABBIF network.

Brazil

CollectionsBrazil has two very important projects underway that are of direct interest to ABBIF: the speciesLink network29 and PPBio – MCT30, the biodiversity research program of the Ministry of Science and Technology.

The speciesLink network involves 40 collections, one centralized information system of observation data from São Paulo State (SinBiota31) and one centralized network with 9 microbial collections (SICol32):

27 http://www.conservationcommons.org 28 http://www.mobot.org/mobot/research/ecuador/welcome.shtml 29 http://splink.cria.org.br/ 30 http://ppbio.inpa.gov.br/ 31 http://sinbiota.cria.org.br/atlas/

23

Collections no. of records Digitized (no. and % of total)

Georeferenced (no. and % of digitized)

Plants (herbaria, algae, wood) 1.430.250 289.487 (20%) 70.544 (24%)

Zoological collections 935.523 374.208 (40%) 191.101 (51%)

Microbial Collections 8.724 8.724 (100%) 0 (0%)

Observation Data 71.866 71.866 (100%) 71.866 (100%)

Total records 2.446.363 744.285 (30%) 333.511 (45%)

The total number of records from the Amazon region in Brazil in the speciesLink network is 127.473 and from other Amazon Basin countries 122.624. Of the total (250.097), 128.097 are georeferenced. It is important to stress the fact that there are some very specialized collections in São Paulo with important holdings from the Amazon region such as

the fish collection of the São Paulo State University Museum (MZUSP); and,

The Bee Collection (RPSP) of the biology department FFCLRP/USP.

PPBio in its first phase is concentrating on the Amazon and the semi-arid regions. Partner institutions include INPA – Instituto Nacional de Pesquisas da Amazônia; MPEG – Museu Paraense Emílio Goeldi; and INSA-CF – Instituto Nacional do Semi-Árido Celso Furtado. These institutions will have support to digitize and make their data available on-line.

The tables that follow present a summary of the status of the institutions that answered the questionnaire.Collection Plants Animals Micro. total

recordsDigitized (no. and % of total)

Georeferenced (no. and % of digitized)

DZSJRP - pisces 1 7.500 7.500 4.684UNIR - Fish 1 23.229 23.229 23.229UNIR - Mammals (CRM)

1

MEFEIS 1 10.200 1.000UFRR 1 2.751 2.751UFAM Coleção Zoológica

1 191.992 0 0

MIRR 1 5.914 4.666INPA - Peixes 1 24.536 17.000INPA - CMIM 1 7.459 7.459 0INPA - Mammals 1 4.819 4.819INPA - invert. 1 303.015 1.022INPA - Amphi 1 13.500 13.500 6.750INPA - Herbaria 1 215.000 200.000 86.000INPA - Aves 1 631 631 400JBRJ 1 410.000 40.000SPF - USP 1 145.000 15.000 7.000Herbário MG 1 174.000 165.000Instituto Butantan - IBSP

1 9.298 2.295 593

HRCB 1 40.000 5.000 0MPEG - Invert 1 2.000.000 20.000MPEG - Fish 1 11.000 8.500MPEG - Herp. 1 60.000 58.000 2.000IPT - BCTw (xiloteca) 1 19.500 7.600 0MPEG - Masto 1 34.000 16.000 1.000MPEG - Coleção Ornitológica

1 74.965 71.200

INPA - xiloteca 1 10.392 3.100Total 9 16 1 3.798.701 695.272 (18%) 131.656 (19%)

Records from the Amazon Basin:Collection Plants Animals Micro. total

records amazon

Digitized (no. and % of total)

Georeferenced (no. and % of digitized)

on-line

DZSJRP - pisces 1 390 309 343 splink.cria.org.brUNIR - Fish 1 23.229 23.229 23.229 noUNIR - Mammals (CRM)

1 no (100% digitized - Word)

MEFEIS 1 splink.cria.org.br

32 http://sicol.cria.org.br/cv/

24

Collection Plants Animals Micro. total records amazon

Digitized (no. and % of total)

Georeferenced (no. and % of digitized)

on-line

UFRR 1 2.751 2.751 noUFAM Coleção Zoológica

1 191.992 0 0 no

MIRR 1 NnoINPA - Peixes 1 noINPA - CMIM 1 noINPA - Mammals 1 4.819 4.819 noINPA - invert. 1 noINPA - Amphi 1 13.500 13.500 6.750 noINPA - Herbaria 1 noINPA - Aves 1 631 631 400JBRJ 1 splink.cria.org.brSPF - USP 1 splink.cria.org.brHerbário MG 1 noInstituto Butantan - IBSP

1 splink.cria.org.br

HRCB 1 splink.cria.org.brMPEG - Invert 1 noMPEG - Fish 1 8.000 8.000 noMPEG - Herp. 1 59.000 58.000 2.000 noIPT - BCTw (xiloteca)

1 splink.cria.org.br

MPEG - Masto 1 32.000 16.000 noMPEG - Coleção Ornitológica

1 63.720 60.500 no

INPA - xiloteca 1 8.300 no

Total 9 16 1 408.332 187.739 (46%) 32.722 (17%)

Information SystemsBrazil has been actively involved in the discussions of the clearing-house mechanism of the Convention on Biological Diversity and in the discussion of the establishment of biodiversity information systems. Although the country is not a GBIF member, CRIA has been participating in a number of international initiatives collaborating in the establishment of standards, protocols and information systems. This experience and the products of this work will certainly contribute to the establishment of ABBIF.

CRIA developed an information network to link data from biological collections located in the State of São Paulo called speciesLink and a centralized information system called SinBiota to receive data from surveys carried out by researchers financed through the Biota/Fapesp Program33. Both of these developments are project based and were financed by The State of São Paulo Research Foundation (Fapesp). Another system developed by CRIA is SICol, a centralized information system with data from microbial collections of biotechnological interest.

SinBiota SinBiota34 adopted a centralized model (figure 14). Data providers are individual researchers or groups working in the field. It really doesn’t make any sense to expect each and every researcher to maintain his/her data in a private information system, on the Internet, interoperable with a number of databases of other researchers. So the natural strategy was to develop a centralized database that could be fed by each researcher through a password-controlled web interface. A common format for field records for all taxa was adopted and all use the same web interface to enter, alter, or delete data from the database. Associated to the field record is a list of species. The web server, besides freely and openly serving data to any internet user, integrates the database with maps through the mapCRIA35 web service.

33 www.biota.org.br 34 http://sinbiota.cria.org.br/atlas/ 35 http://www.cria.org.br/mapcria/doc/

25

Web InterfaceResearcher ofthe Biota/FapespProgram

Database

Surveys andassociated lists of species

Maps

UserMap Service

Web server

Figure 14. Diagram of SinBiota’s architecture

SIColBefore defining the architecture for the microbial culture collection information system, a survey was carried out to determine what infrastructure and expertise was available. Holdings of microbial collections are small when compared to herbaria and most zoological collections. The survey showed that most collections use spread sheets or text files to organize their data and have problems such as lack of local expertise in informatics and inadequate Internet access. For these reasons the option was the development of a centralized system (figure 15) where collections could “deposit” their data. This system was named SICol36.

administrativeinterface

virtualcatalog

users

updates queries

Data providers (culture collections)

relational databasePostgreSQL

Perl & Apache

HTTP

SQL

Figure 15. Diagram of SICol’s Architecture

36 http://sicol.cria.org.br/

26

A user friendly interface was developed to allow curators to simply upload a flat file with the data of their holdings. An extension of DarwinCore for microbial data was developed based on CABRI 37

guidelines for minimum, recommended and full data sets for catalogue production.

speciesLinkThe third system developed by CRIA was speciesLink38. The aim was to integrate data from biological collections located in the State of São Paulo that were willing to share their data. The system to be developed should also be interoperable with SinBiota and The Species Analyst Network. This was clearly the case of a distributed architecture, but it would also have to acknowledge problems such as lack of expertise and poor Internet connectivity.

speciesLink is based on a DiGIR network, which typically involves 3 components: Presentation layer: the software that interacts with the user offering a friendly interface for

queries and presentation of the results. This layer also interacts with the next layer, the portal.

Portal: the portal is responsible for the distribution of messages. It is the software responsible for receiving queries from the presentation layer and distributing them to each data provider connected to the network. Communication with the providers is carried out using the DiGIR protocol.

Provider: is the software responsible for receiving queries from the portal and translating them to the query language used by the local database. The translation process includes mapping of the local fields according to the conceptual schema used by the network.

The original idea would be to connect each collection directly to the portal through this protocol. But due to lack of good connectivity, infrastructure and/or expertise, the solution found was to develop regional servers that mirror the data held by these collections (figure 16).

DiGIRPortal

virtualcatalogusers queries

Collections (data providers)

HTTP / XML

SOAP

DiGIRProvider

DiGIRProvider

DiGIRProvider

HTTP / XML

RegionalServer

DiGIRProvider

Figure 16. Diagram of a combined system

For this architecture other interfaces were developed to read records and update the databases held at the regional server. Filters that allow the curator to omit sensitive data and have full control over the data he/she wishes to make freely available were also developed. Figure 17 presents the diagram of the architecture adopted by speciesLink.

37 CABRI (Common Access to Biological Resources and Information) – www.cabri.org 38 http://splink.cria.org.br/

27

speciesLink site

Presentation Layer

speciesLink site

Presentation Layer

DiGIRPortalDiGIRPortal

Data ProviderPHP

CollectionManagement

System

SQL

Collection ACollection A

DataPostgres

ProviderPHP

MirrorSOAP server

SQL

Regional ServerRegional Server

DataspLinker

Java

CollectionManagement

System

SQL

Collection BCollection B

DataRepository

Data spLinkerJava

CollectionManagement

System

SQL

Collection CCollection C

DataRepository

LibDiGir

LibDiGir

Slow or unstable connectivity

Fast and stable connectivity

Figure 17. Diagram of the speciesLink Architecture

Another important feature of the speciesLink network was the development of a number of tools for mapping, monitoring and data cleaning39. This increased the interaction between participating collections and CRIA’s staff.

Figure 18 presents a diagram of the data cleaning process which is initiated every night. The system identifies collections that have updated their databases and then runs the process. A report for each collection is generated and made available on the web. Suspect records for names and lat/long are highlighted and a number of diagrams and charts with the collection’s profile and data cleaning progress are presented. All information is made publicly available40 so that users can also evaluate the quality of each collection’s data.

39 See the speciesLink data & tools page at http://splink.cria.org.br/tools 40 http://splink.cria.org.br/dc

28

Col 1 Col 2 Col 3 Col n Col n

Collections of São PauloCol 1 Col 2

International Collections... ...

Tables with suspectrecords

Tables with suspectrecords

out/2004

chart.pm(Perl)

Local databaseLocal databasedc_taxdc_geo

PostgreSQL

PostgreSQL

Suspectrecords

Perl

Web

spLink Portal spLink Portal J ava

Preparation of diagrams& profiles

daily import of updates

Figure 18. Diagram of the data cleaning process

All developments carried out by CRIA use free and open source software (Intel hardware; Linux Red Hat operating system; Apache web server; Perl, PHP and Java programming languages; and HTTP, SOAP, XML and DiGIR protocols).

Strategic PlanAnother important study of relevance to this project, contracted by the Brazilian Ministry of Science and Technology through the CGEE (Center for Strategic Management and Studies on Science, Technology and Innovation) was the definition of a national strategy for the modernization of Brazilian biological collections and the development of an integrated information system about biodiversity.

The Brazilian Societies of Botany, Zoology, and Microbiology were invited to coordinate this process together with CRIA. A number of documents were produced by specialists and were presented and discussed at a workshop held in June and the proposed strategy was presented at a workshop held in July with approximately 80 participants, including visiting specialists from abroad41. All documents are available on-line42 and present the state-of-art of biological collections, information systems, and the Internet in Brazil.

The strategy for the establishment of a program for the next 10 years was presented and is being discussed within the Ministry. There already are some concrete results of this work, such as:

A call for proposals sent out by the Ministry of Science and Technology for biological collections that includes setting up on-line information systems, with a total budget of R$ 5 million for 2 years.

41 http://www.cria.org.br/cgee/col 42 http://www.cria.org.br/cgee/col/documentos

29

A Taxonomy Program established by the National Council for Scientific and Technological Development (CNPq)

Another interesting development is the replication of the speciesLink experience in other states. The following networks shall be developed in 2006 using the same standards, protocols, and decentralized architecture, all integrated with speciesLink:

Parana Network of Biological Collections: this is one of the 8 projects approved in the recent call for proposals of the Ministry of Science and Technology and involves 8 collections from Parana State;

Biota do Espírito Santo: with 16 collections from 3 institutions (Universidade Federal do Espírito Santo, Museu de Biologia Mello Leitão, and INCAPER – Instituto Capixaba de Pesquisa, Assistência Técnica e Extensão Rural).

Discussions have also begun with the state of Bahia and collections of the semi arid region of the Northeast of Brazil.

30

Strategy: Proposed NetworkThe analysis of existing experiences (GBIF, Siamazonia, Humboldt, and CRIA) with standards, protocols, tools, and architecture indicate that there isn't a universal solution for all situations. Technology for a truly distributed system exists and the speed of the Internet is increasing, but a decision as to the architecture to be adopted (centralized, distributed, or combined) for each situation will depend on an evaluation of the data provider and user, and on the available resources (expertise, hardware, software, communication). Factors that are independent of the architecture are that the data provider must have full control over his/her data/information and that target users have complete access to the information they require in a format that they can use.

It is clear that the proposed architecture for ABBIF must reflect the aims of the project which include:

the establishment of an integrated regional information system for the Amazonian region, based on free and open access to taxonomic information and specimen data;

the development of a system where each data/information provider or custodian will be fully responsible for his/her own data/information;

the development of a system where each provider can undertake frequent updating; the development of a system that will help promote data validation; the development of a system where full attribution to data/information sources are given; strengthening of local stakeholders – biological collections and data custodians; strengthening and integration of existing information systems at local, national, and regional

levels; and, integration of ABBIF with GBIF.

In order to propose a strategy it is important to think about the different actors that will compose the network. The actors of ABBIF are:

data providers; data custodians; users; and financing agencies

Data providers can be biological collections, researchers carrying out inventories, taxonomic studies, etc, and researchers of other fields with complimentary data such as climate, vegetation, satellite images, etc. They have a series of responsibilities within the network with include following certain standards in registering data and metadata and attesting the quality of their. Biological collections as data providers must also have a clear data and information policy, allowing free and open access to data that is not confidential or sensitive.

Data custodians or administrators of databases and/or information systems have an important role to play. Developing, running, and maintaining information systems is a highly professional activity. It is not for amateurs. Data custodians therefore must be trustworthy and competent and must participate in the development or at least adopt internationally accepted standards. They have an important role to play in offering support to data providers as to the use of standards and must promote the interoperability and integration of systems. Data custodians must guarantee data integrity and respect any restrictions indicated by each data provider, protecting property rights, confidentiality and other restrictions if necessary or pertinent. Data custodians are also responsible for system back up, migration to new technologies and maintenance in general. It is desirable that they have a highly specialized team in data bases, qualified to develop tools of interest to data providers and users.

Users also have an important role to play. They must adhere to adopted standards and respect restrictions and limits of data use, acknowledging authorship and credits. They must also offer feed-back to authors and to custodians indicating possible errors and discussing the possibility of implementing new services.

31

Financing Agencies must also prepare themselves for this new digital age and have a clear data and information policy especially for data that is already born digital. Public funding in activities of public interest should generate systems that provide free and open access to data and information that are not confidential or sensitive. There must also be a policy to digitize historical data, such as biological collection records, and a long term policy to maintain information systems. In a regional information facility such as ABBIF, it is also important that an inter-agency policy be established to maximize resources and better integrate activities.

Elements of the ArchitectureThe aim is the establishment of a data infrastructure open to all interested, where the data provider has complete control over his/her data.

ABBIF coordinationA distributed coordinating effort is perhaps that greatest challenge to be faced. The whole concept proposed is the strengthening of local data providers, offering the necessary infrastructure for open and free dissemination of data, but without their losing control and responsibility for the data.

In the case of ABBIF we believe that local data custodians such as Siamazonia, Humboldt and CRIA have a significant role to play. It is important that the project strengthens these initiatives at the country level and, at the same time, is able to use these capacities at a regional level. Country data custodians should act as facilitating nodes and should be part of an ABBIF development council together with GBIF. At the same time it is important that there is a “secretariat” in place, responsible for the network, monitoring activities and promoting ABBIF, identifying new country or regional partners.

An ABBIF secretariat should work on coordinating and strengthening these efforts and capacities and, at the same time offering services to countries and institutions that want to share data and don’t have the necessary local expertise and infrastructure.

The coordination structure should be further discussed at a workshop with country representation.

Data ProvidersWe think it is important to determine target data providers of ABBIF’s initial phase. In our opinion focus should be given to specimen and specie data, so therefore biological collections and observation data (inventories) would be our first targets. We also believe that the organization of data providers must be country driven, meaning that the articulation and involvement of different providers will be carried out nationally.

Biological collections, due to the nature of their activities, are information centers. They must have sufficient infrastructure and expertise to set up their own information system for internal purposes. Those that also have the necessary infrastructure and expertise to hold an internet information system available 24 hrs a day can serve their data directly to the network. Those that don’t have or don’t want to maintain dynamic links should have a mechanism to submit, alter, and delete their data at a regional server (or cache node).

Figure 19 shows a diagram of the network.

32

ABBIF Portal

Collections with dynamic links

Regional Servers

Collections mirroring their data in regional servers

Figure 19. Component data provider: biological collections

Collections with dynamic links and regional servers must adopt compatible standards and protocols and must be held in institutions capable of maintaining the system and serving data through fast Internet connections.

Observation data and taxonomic descriptions represent two other groups of data providers, individuals or research groups. This is the case where facilities must be offered by data custodians where researchers may deposit their data for full and open access on the internet. This is not a task for amateurs. There must be a highly specialized staff that has as its main activity the development and maintenance of information systems that guarantee the preservation and dissemination of data.

Based on the open and free access to data concept, this element of the network will be called digital data commons space43. The network may have more then one servers that guarantee the necessary infrastructure for preservation, maintenance, recuperation, and dissemination of the data. Internet connectivity must be stable and fast (figure 20).

43 see National Science Board. Draft Report: Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century. NSB-05-40. March 30, 2005. http://www.nsf.gov/nsb/meetings/2005/LLDDC_draftreport.pdf

33

portal

“data commons”Data commons space

Observation data

“data commons”

Taxonomic data Other data

Internet 2

Figure 20. Architecture element: digital data commons space

We believe that this element could involve the conservation community that hold important observation data that are normally disseminated through books and reports.

PortalGBIF today has a data index that serves data to the system. A subset of over 85 million records, with name and locality data is harvested from 152 data providers and maintained at a centralized database. This makes the basic search system much quicker and solves problems such as slow or unstable connectivity. After carrying out the basic search the user obtains a list of providers with the number of records found. Users can then display the list of records corresponding to each provider. Users can also download the selected records. This moment he/she may choose to download the data directly from the data providers or from the GBIF index (faster), and the format of the downloaded file. There is also a map illustration of the distribution of the requested records that can be produced dynamically.

CRIA developed a fully distributed system. When a query is processed it is sent out to the providers that search the databases and dynamically send the results. At the moment, the speciesLink Network has 6 regional servers (mirroring data from 38 collections) 2 collections with dynamic links, one centralized database with observation data (at CRIA), and one centralized information system of microbial collections (with 9 collections). This architecture is interesting for advanced users that can search any field and retrieve the full data set as a file. Speed and the “fragility” of the network is a disadvantage. If a server for any reason is off line, that “branch” of the network will be unavailable. Maps are also produced dynamically.

CRIA also developed an indexing service of a subset of the data which is used for data cleaning. At the moment CRIA is thinking in providing the user with the possibility of searching its index for the data subset to provide faster results and a more stable system. But the distributed search system will continue to be offered as we believe it is very powerful and important to advanced users.

Based on the Internet connectivity study that was carried out for Latin America (RedCLARA) one can see that some links of Amazonian countries are still not in place. At the same time, it is important to develop a truly distributed network helping countries “catch up” with both, the technology and infrastructure. For this reason we believe that the ABBIF portal should have both, an index system that will harvest data from all regional providers, and a distributed search service.

34

The index system will be used to quickly serve a data subset to users and for data cleaning. The dynamic search system will be available for advanced users.

Resource Registry & DiscoveryIn a distributed environment with many data providers it is desirable to have a central registry defining at the software level who are the network participants and how to interact with them (species-level services might use different protocols from specimen-level services, and even specimen-level services could potentially use different protocols among them or different protocol versions). As the number of data providers may increase over time, a means for automatic discovery will certainly be necessary. GBIF’s UDDI registry seems to be the most reasonable alternative since it is already available for the whole biodiversity community and ABBIF resources should also be integrated with the GBIF network. UDDI offers a simple mechanism to enable configuration of thematic networks (through service categories) that could easily be used to distinguish ABBIF participants from other resources.

ToolsAnother important activity is the development of tools for data providers and users. These tools should be preferably developed as web services to be able to used more freely at all levels local, country, and regional.

Data archiveAs a last element of the network, it would also be important to address the problem of long term data archiving. This may also be a task for country data custodians or their partners. It is important that the scientific council discusses this issue to determine priorities as to what data should be added to a permanent archive and identify an institution or a pool of institutions responsible for this activity.

Figure 21 below presents a diagram of the system.

portal

long term data archive

data commonsspace

biological collections observation data

data commonsspace

taxonomic data

Web services•Maps•Modeling•Data cleaning•automatic georeferencing•Other services

regional server

Figure 21. Diagram of the system

35

Annex 1: Answers from Collections of ColombiaCollection name Acronym Institution General Group No. total

records% Georreferenced % Digitalized No. records

Amazon% Georreferenced Amazon

% Digitalized Amazon

Herbario Amazónico Colombiano COAH

COAH Instituto Amazónico de Investigaciones Científicas - SINCHI

Plants 107.150   1 160.725   1

CALT CAL Animals 18.464          Instituto Alexander von Humboldt

IAvH Instituto de Investigación de Recursos Biológicos "Alexander von Humboldt"

Animals 349.054 Birds and butterflies: 100%; other insecta: 50%; mammals: 10%; amphibia and reptilia: 0%

Birds and butterflies: 100%; other insecta: 50%; mammals: 10%; amphibia and reptilia: 0%

Birds: 1817; butterflies: 550; fishes: 730 coll.; amphibia, mammals and reptilia: pending

Birds and butterflies: 100%; other insecta: 50%; mammals: 10%; amphibia and reptilia: 0%

Birds and butterflies: 100%; other insecta: 50%; mammals: 10%; amphibia and reptilia: 0%

Herbario Federico Medem IAvH

FMB Instituto de Investigación de Recursos Biológicos "Alexander von Humboldt"

Plants 159.000 0,85 0,5 8.742 100 100

Colección de Zoología ICN Instituto de Ciencias Naturales Universidad Nacional de Colombia

Animals 948.368   Birds: 100%; amphibia: 80%; general: 60%

Amphibia: 4000

   

Herbario Nacional Colombiano

COL Instituto de Ciencias Naturales Universidad Nacional de Colombia

Plants 1.180.000 0,1 0,2 Many spec. Quantity not specified

   

Museo Micológico - Hongos fitoparásitos

MMUNM Universidad Nacional de Colombia

Microorganisms 2.970          

Museo Entomológico "Francisco Luis Gallego"

MEFLG Universidad Nacional de Colombia sede Medellín

Animals 342.340   0,1      

Herbario Pontificia Universidad Javeriana

HPUJ Pontificia Universidad Javeriana

Plants 71.000   0,375      

Museo Javeriano de Historia Natural Lorenzo Uribe s.j

MPUJ Pontificia Universidad Javeriana

Animals 2.864.808 72.5% of the amphibia, 92.2% of the fishes, 94.7% of the reptilia, 96.7% of the birds, 80.7% of mammals and a low -not specified- percentage of the Orthoptera

1      

Herbario Nacional de HNM Corporación Plants 1.420   0,05      

36

Collection name Acronym Institution General Group No. total records

% Georreferenced % Digitalized No. records Amazon

% Georreferenced Amazon

% Digitalized Amazon

Malezas Colombiana de Investigación Agropecuaria Corpoica

Herbario Gabriel Gutierrez Villegas (MEDEL)

MEDEL Universidad Nacional de Colombia sede Medellín

Plants, Microorganisms

100.960   0,5      

Herbario de la Orinoquía Colombiana

Llanos Universidad de los Llanos

Plants 19.320   0      

Herbario Ciat CIAT Centro Internacional de Agricultura Tropical - CIAT

Plants 32.018   0,98      

Jardín Botánico José Celestino Mutis

JBJCM Jardín Botánico de Bogotá J.C.M.

Plants 7.036   100% of the Herbarium and 20% of fruit collection

     

Herbario Forestal Universidad Distrital Francisco José de Caldas

UDBC Universidad Distrital Francisco José de Caldas

Plants 35.000   0,9      

Herbario Universidad de Antioquia

HUA Universidad de Antioquia

Plants 138.000 0,5 0,38 20% aprox.    

Herbario José Cuatrecasas Arumi (VALLE)

VALLE Universidad Nacional de Colombia Sede Palmira

Plants 37.438   1      

Herbario Jardín Botánico "Joaquín Antonio Uribe"

JAUM Fundación Jardín Botánico Joaquín Antonio Uribe

Plants 118.276 0,5 33772 Unknown    

Herbario CUVC CUVC Universidad del Valle Plants 106.800   0,25      Colección Laboratorio de Limnología Universidad de Antioquia

CLUA Laboratorio de Limnología - Universidad de Antioquia

Animals 5.692   0,5      

Colección Entomológica Universidad de Antioquia

CEUA Laboratorio Colecciones Entomológicas - Universidad de Antioquia

Animals 86.498   1      

Vectores y Huéspedes Intermediarios de Enfermedades Tropicales

VHET Universidad de Antioquia

Animals 25.470   0,5      

Museo del laboratorio de Entomología

MENT-UT Universidad del Tolima

Animals 24.646          

Laboratorio de Investigación de Abejas - Labun

LABUN Universidad Nacional de Colombia

Animals 40.080   0,15      

Museo de Historia Natural Universidad del Cauca

MHN-UC Universidad del Cauca

Plants 83.742   0,6      

37

Collection name Acronym Institution General Group No. total records

% Georreferenced % Digitalized No. records Amazon

% Georreferenced Amazon

% Digitalized Amazon

Entomológica Forestal Universidad Distrital Francisco José de Caldas

EF-UDFJC Universidad Distrital Francisco José de Caldas

Animals 6.200   1      

Museo Historia Natural Universidad Distrital

MUD Universidad Distrital Proyecto Curricular Licenciatura Biología

Plants, Animals 3.274   1      

Colección de Artrópodos de Importancia Médica

UVS Universidad del Valle , Facultad de Salud

Animals 10.720 0 0 2.000 0 0

Museo de Historia Natural Universidad Pedagógica Nacional

MHNUPN Universidad Pedagógica Nacional

Animals 58.378          

Colección Biológica U.D.C.A.

UDCA Corporación Universitaria de Ciencias Aplicadas y Ambientales, U.D.C.A.

Plants 22.018   1      

Colección Zoológica de Referencia Científica "IMCN"

IMCN Instituto para la Investigación y Preservación del Patrimonio Cultural y Natural del Valle del Cauca- INCIVA

Animals 40.448   0,85      

Colección de Insectos Acuáticos de Colombia CIB

CIACIB Corporación para Investigaciones Biológicas (CIB)

Animals 29.246          

Colección de Mosquitos de Colombia (CIB)

CMCCIB Corporación para Investigaciones Biológicas (CIB)

Animals 8.882          

Instituto Nacional de Salud INS Instituto Nacional de Salud

Animals 21.088   0,5      

Colección Taxonómica Nacional de Insectos "Luis María Murillo"

CTNI Corporación Colombiana de Investigación Agropecuaria Corpoica

Animals 37.568   0,2      

Museo Entomológico "Marcial Benavides"

ME"MB" Federación Nacional de Cafeteros- Cenicafé

Animals 17.070   0,9      

Colección Familia Constantino-CFC

CFC   Animals 9.000   0      

Colección Entomología: Hemiptera Acuáticos

PC   Animals 104          

Hans W. Dahners PETALUDA   Animals 14.200   1      Colección Piéridos de Colombia Rodrigo Torres Nuñez

CPCRTN Animals 1.834          

Colección Jean Francois le Crom

JFLC   Animals 60.000          

Colección Personal Angela CPAA   Animals 8.200          

38

Collection name Acronym Institution General Group No. total records

% Georreferenced % Digitalized No. records Amazon

% Georreferenced Amazon

% Digitalized Amazon

AmarilloColección Personal Carlos Sarmiento

CPCS   Animals 3.200          

Colección "Da Ros" C"DR" Fundación Ciencia Ecología, Arte e Historia Fundación C.E.A.H. (Museo Vittoriano)

Animals 4.980   0,4      

Colección Entomológica Universidad Nacional Sede Palmira

CEUNP Universidad Nacional de Colombia sede Palmira

Animals 57.400   1      

Colección de Insectos asociados a plantaciones forestales de Colombia

CONIF Corporacion Nacional de Investigación y Fomento Forestal -CONIF

Animals 62.800   0,9      

Herbario TULV - Jardín Botánico Juan María Céspedes

TULV Instituto para la Investigación y Preservación del Patrimonio Cultural y Natural del Valle del Cauca- INCIVA

Plants 34.000 0,45 1      

Serpentario de la Universidad de Antioquia

SUA Universidad de Antioquia

Animals 5.064          

Museo de Historia Natural "Luis Gonzalo Andrade"

UPTC Universidad Pedagógica y Tecnológica de Colombia, Facultad de Ciencias, Escuela de Ciencias Biológicas

Animals 2.202          

Museo de Entomología de la Universidad del Valle

MUSENUV Universidad del Valle Animals 101.864       occasional  

Vertebrados-Aves   Universidad del Valle - Biología

Animals 12.914   1   occasional  

Museo Entomológico Facultad de Agronomía

UNAB Universidad Nacional de Colombia

Animals 200          

Colección de vertebrados, anfibios y reptiles

UV-C Universidad del Valle - Biología

Animals 28.774       occasional  

Herbario "Armando Dugand Gnecco"

DUGAND Universidad del Atlántico

Plants 6.070          

Colección Efraín Henao CEH   Animals 10.760   0,1      Colección de Vertebrados e Invertebrados

MHN-Uca Universidad de Caldas

Animals 13.416          

Colección Familia Pardo Locarno

CFPL   Animals 11.100          

Vertebrados e Invertebrados MHNCC Comunidad Animals 10.360   0,9      

39

Collection name Acronym Institution General Group No. total records

% Georreferenced % Digitalized No. records Amazon

% Georreferenced Amazon

% Digitalized Amazon

Hermanos MaristasBanco de Cepas y Genes, Instituto de Biotecnología, Universidad Nacional de Colombia

IBUN Instituto de Biotecnología, Universidad Nacional de Colombia

Plants, Microorganisms

3.689   0,48      

Colección Microorganismos de CENICAFE

CEN Federación Nacional de Cafeteros - Centro Nacional de Investigaciones de Café - CENICAFE

Microorganisms 689   0,5      

Colección de Referencia (Moluscos)

CRM-UV Universidad del Valle - Biología

Animals 13.060          

Procedencias de Trichanthera Gigantea (H. & B.) Nees

FCISSPA Fundación Centro para la Investigación en Sistemas Sostenibles de Producción Agropecuaria- CIPAV

Plants 98   0,9      

Jardín Botánico Juan María Cespedes

JBJMC Instituto para la Investigación y Preservación del Patrimonio Cultural y Natural del Valle del Cauca- INCIVA

Plants 8.824   0,5      

Fundación Zoológico Santacruz

FZS Fundación Zoológico Santacruz

Animals 728          

Piscilago Zoo PZ Caja Colombiana de Subsidio Familiar - Colsubsidio

Animals 1.558   0,5      

Jardín Botánico "Alejandro von Humboldt"

JBAVH Universidad del Tolima

Plants 1.346   0,1      

Hongos, Univalle UV-mico Universidad del Valle - Facultad de Salud

Microorganisms 3.740          

Colección de Microorganismos

M-UBCB Corporación para Investigaciones Biológicas (CIB)

Microorganisms 4.069   1      

Secretaria de Agricultura - Antioquia

SA.A Departamento de Antioquia

Animals 9.854   0,2      

Parque Zoológico Santa Fe PZSF Sociedad de Mejoras Públicas de Medellín

Animals 3.046   0,5      

Jardín Botánico "Joaquín Antonio Uribe"

JAUM -JB Fundación Jardín Botánico "Joaquín Antonio Uribe"

Plants 19.000   0,9      

Colección de Ciencias Naturales

MUA Universidad de Antioquia - Museo Universitario

Animals 33.224   0,3      

Xiloteca X-UNCM Universidad Plants 5.836   1      

40

Collection name Acronym Institution General Group No. total records

% Georreferenced % Digitalized No. records Amazon

% Georreferenced Amazon

% Digitalized Amazon

Nacional de Colombia sede Medellín

Zoológico de Barranquilla ZOOBAQ Fundación Botánica y Zoológica de Barranquilla

Animals 848   1      

Banco de Germoplasma de Microorganismos de Interés en Agricultura

CCoM Corporación Colombiana de Investigación Agropecuaria Corpoica

Microorganisms 4.555          

Jardín Botánico José Celestino Mutis

JBJCM Jardín Botánico de Bogotá J.C.M.

Plants 31.200   0,57      

Colección de Microbiología - CIMIC

CIMIC Universidad de los Andes- Centro de Investigaciones Biológicas - CIMIC

Microorganisms 278   0,7      

Museo de la Salle M.L.S. EN ZOOLOGIA - B.O.G EN BOTANICA

Congregación Hermanos Escuelas Cristianas

Plants, Animals 164.370   0,25      

Jardín Botánico de Popayan JBP Fundación Universitaria de Popayán

Plants 2.028   0,3      

Cepario Corpogen CG Corporción Corpogen

Microorganisms 3.634   0,07      

Colección Malacofaunica Terrestre de la Facultad de Ciencias de la UNMG

UMNG-MT Universidad Militar Nueva Granada

Animals 5.480   1      

Colección Entomológica de la Facultad de Ciencias de la Universidad Militar Nueva Granada

UMNG-Ins Universidad Militar Nueva Granada

Animals 6.148   1      

Zoólogico de Cali FZC Fundación Zoológica de Cali

Animals 3.818   1      

Fundación Centro de Primates

FCP Fundación Centro de Primates, FUCEP

Animals 682   1      

Museo Entomológico Piedras Blancas

MEPB Caja de Compensación Familiar - COMFENALCO ANTIOQUIA

Animals 16.000   1      

Colección Viva Programa de Ofidismo/Aracnidismo Universidad de Antioquía: Ofidios - Reptiles

COLVIOFAR Universidad de Antioquia

Animals 520   1      

Colección Viva Programa de Ofidismo/Aracnidismo Universidad de Antioquía:

    Animals 236   1      

41

Collection name Acronym Institution General Group No. total records

% Georreferenced % Digitalized No. records Amazon

% Georreferenced Amazon

% Digitalized Amazon

Escorpiones - ArtropodosJardín Botánico de Plantas Medicinales del C.E.A.

JB-Medicinales -CEA

Corpoamazonia Plants 1.028   0,97      

Museo de Historia Natural Universidad de la Amazonía

UAM Universidad de la Amazonía

Animals 396   0,3      

Colección de Insectos Universidad del Quindio

ICQ Universidad del Quindío

Animals 6.010   0,3      

Colección Zoológica Viva Centro Estación de Biología Tropical "Roberto Franco" CEBTRF

CEBTRF Centro Estación de Biología Tropical "Roberto Franco" Facultad de Ciencias, Universidad Nacional

Animals 616   1      

Collection name Acronym Observational records

Data sistematization

Data available Online? URL

Standards & Protocols

Hardware and Software

Staff Internet Access Acces to Information

Herbario Amazónico Colombiano COAH

COAH   ACCES 97, ARC-VIEW 3.2 CDS-ISIS ver. 3.07

http://www.sinchi.org.co/herbario.php?page=servicios&opcion=herbario&subopcion=coleccion

Compatible con estándar RRBB SIB

       

CALT CAL   Biota            Instituto Alexander von Humboldt

IAvH Sistematized at the same databse

Access-VBA , SQLServer. Fishes and insecta (except butterflies) still in Excel and may be incorporated soon

Butterflies: http://www.siac.net.co/sib_descargas.php?ArchivoDesplegado=635

Standard for docummentation of biological records, version 5.0, XML

  insufficient dedicated Unrestricted (except for sensible data for endangered spp.)

Herbario Federico Medem IAvH

FMB   Access-VBA, SQL Server

Not yet. Soon at www.siac.net.co/sib

Standard for docummentation of biological records, version 5.0, XML

  insufficient dedicated Unrestricted (except for sensible data for endangered spp.)

Colección de Zoología

ICN   Spica         dedicated restricted

Herbario Nacional Colombiano

COL   Spica http://aplicaciones.virtual.unal.edu.co/colecciones/datos/herbario/consultasHerbario.jsp

Compatible with standard RRBB SIB

  Understaffed. Lacking professionals and auxiliars to process the

dedicated unrestricted

42

data. Investments needed

Museo Micológico - Hongos fitoparásitos

MMUNM   Data not systematized

           

Museo Entomológico "Francisco Luis Gallego"

MEFLG   Specify http://www.unalmed.edu.co/%7Ementomol/

         

Herbario Pontificia Universidad Javeriana

HPUJ   Excel     Require tools for systematization not defined as yet

Scarce economic resources. Curators needing more time

  restricted

Museo Javeriano de Historia Natural Lorenzo Uribe s.j

MPUJ   Excel and ArcView

           

Herbario Nacional de Malezas

HNM   Not specified            

Herbario Gabriel Gutierrez Villegas (MEDEL)

MEDEL   FoxPro 4.0; BRAHMS 5 to be implemented

           

Herbario de la Orinoquía Colombiana

Llanos                

Herbario Ciat CIAT   Oracle            Jardín Botánico José Celestino Mutis

JBJCM   Access http://www.jbb.gov.co/web/home.php?pag=products

         

Herbario Forestal Universidad Distrital Francisco José de Caldas

UDBC   Acces Plattform 50% and Arc View 3.2 (12000 spec. in 6 books)

           

Herbario Universidad de Antioquia

HUA   Excel   Compatible with standard RRBB SIB

adequate insufficient    

Herbario José Cuatrecasas Arumi (VALLE)

VALLE   Access            

Herbario Jardín Botánico "Joaquín Antonio Uribe"

JAUM   Arkas, Excel, Biotica

    adequate insufficient 100 Kbps  

Herbario CUVC CUVC   Excel            Colección Laboratorio de Limnología Universidad de Antioquia

CLUA   Excel            

43

Colección Entomológica Universidad de Antioquia

CEUA   Excel            

Vectores y Huéspedes Intermediarios de Enfermedades Tropicales

VHET   Excel, Word            

Museo del laboratorio de Entomología

MENT-UT   In progress            

Laboratorio de Investigación de Abejas - Labun

LABUN   FileMaker, Excel

           

Museo de Historia Natural Universidad del Cauca

MHN-UC   Excel            

Entomológica Forestal Universidad Distrital Francisco José de Caldas

EF-UDFJC   Excel            

Museo Historia Natural Universidad Distrital

MUD   Excel            

Colección de Artrópodos de Importancia Médica

UVS Birds: Threskiornithidae: Mesembrinibis cayennensis in Putumayo

  No   PC with adequate programs

Staff to digitalize the data

  unrestricted

Museo de Historia Natural Universidad Pedagógica Nacional

MHNUPN   The manager collection

           

Colección Biológica U.D.C.A.

UDCA   Excel            

Colección Zoológica de Referencia Científica "IMCN"

IMCN   Excel   Standard RRBB SIB

insufficient insufficient   unrestricted

Colección de Insectos Acuáticos de Colombia CIB

CIACIB   FileMaker (MAC)

           

Colección de Mosquitos de Colombia (CIB)

CMCCIB   FileMaker (MAC)

           

Instituto Nacional de Salud

INS   Access            

Colección Taxonómica Nacional de Insectos "Luis María Murillo"

CTNI   FoxPro            

Museo Entomológico ME"MB"   Excel            

44

"Marcial Benavides"Colección Familia Constantino-CFC

CFC   Catalog, listing by families

           

Colección Entomología: Hemiptera Acuáticos

PC                

Hans W. Dahners PETALUDA   Excel            Colección Piéridos de Colombia Rodrigo Torres Nuñez

CPCRTN   The manager collection

           

Colección Jean Francois le Crom

JFLC                

Colección Personal Angela Amarillo

CPAA                

Colección Personal Carlos Sarmiento

CPCS                

Colección "Da Ros" C"DR"   Acces, Word, Excel

           

Colección Entomológica Universidad Nacional Sede Palmira

CEUNP   Access            

Colección de Insectos asociados a plantaciones forestales de Colombia

CONIF   Shared database

           

Herbario TULV - Jardín Botánico Juan María Céspedes

TULV   Access            

Serpentario de la Universidad de Antioquia

SUA   Acces 100%            

Museo de Historia Natural "Luis Gonzalo Andrade"

UPTC                

Museo de Entomología de la Universidad del Valle

MUSENUV   Arkas 2000            

Vertebrados-Aves     Access            Museo Entomológico Facultad de Agronomía

UNAB                

Colección de vertebrados, anfibios y reptiles

UV-C   Arkas, amphibia 70%, Reptilia 40%

           

Herbario "Armando Dugand Gnecco"

DUGAND   Access            

Colección Efraín Henao

CEH   Winissis            

Colección de MHN-Uca                

45

Vertebrados e InvertebradosColección Familia Pardo Locarno

CFPL                

Vertebrados e Invertebrados

MHNCC   Excel            

Banco de Cepas y Genes, Instituto de Biotecnología, Universidad Nacional de Colombia

IBUN   Access            

Colección Microorganismos de CENICAFE

CEN   Excel            

Colección de Referencia (Moluscos)

CRM-UV   Excel            

Procedencias de Trichanthera Gigantea (H. & B.) Nees

FCISSPA   Excel            

Jardín Botánico Juan María Cespedes

JBJMC   BGRecorder            

Fundación Zoológico Santacruz

FZS   ICOZOO (clínical stories )

           

Piscilago Zoo PZ   Excel, ICOZOO            Jardín Botánico "Alejandro von Humboldt"

JBAVH   Access            

Hongos, Univalle UV-mico                Colección de Microorganismos

M-UBCB   Access            

Secretaria de Agricultura - Antioquia

SA.A   Excel            

Parque Zoológico Santa Fe

PZSF   Excel 40%, Word 10%

           

Jardín Botánico "Joaquín Antonio Uribe"

JAUM -JB   BGRecorder            

Colección de Ciencias Naturales

MUA   Not specified            

Xiloteca X-UNCM   Access            Zoológico de Barranquilla

ZOOBAQ   ARKS 4.0-ISIS: records; ZOOTRITION: nutritional information; ICOZOO: medical reports

           

46

Banco de Germoplasma de Microorganismos de Interés en Agricultura

CCoM   Excel and Access

           

Jardín Botánico José Celestino Mutis

JBJCM   BGRecorder            

Colección de Microbiología - CIMIC

CIMIC   Word            

Museo de la Salle M.L.S. EN ZOOLOGIA - B.O.G EN BOTANICA

  Access            

Jardín Botánico de Popayan

JBP   Excell, Acces, BG recorder 2. J.B.P.

           

Cepario Corpogen CG   Filemaker            Colección Malacofaunica Terrestre de la Facultad de Ciencias de la UNMG

UMNG-MT   Excel            

Colección Entomológica de la Facultad de Ciencias de la Universidad Militar Nueva Granada

UMNG-Ins   Excel            

Zoólogico de Cali FZC   ARKS-ISIS and others to be implemented

           

Fundación Centro de Primates

FCP   Excel            

Museo Entomológico Piedras Blancas

MEPB   Excel            

Colección Viva Programa de Ofidismo/Aracnidismo Universidad de Antioquía: Ofidios - Reptiles

COLVIOFAR

               

Colección Viva Programa de Ofidismo/Aracnidismo Universidad de Antioquía: Escorpiones - Artropodos

                 

Jardín Botánico de Plantas Medicinales del C.E.A.

JB-Medicinales -CEA

  BGRecorder            

47

Museo de Historia Natural Universidad de la Amazonía

UAM   Access, Excel            

Colección de Insectos Universidad del Quindio

ICQ   Access            

Colección Zoológica Viva Centro Estación de Biología Tropical "Roberto Franco" CEBTRF

CEBTRF   Tracker Ce Brain Id (includes a tool for automatic identification) 100%

           

48