Evaluating the Sustainability Performance of Typical Conventional and Certified Coffee Production Systems in Brazil and Ethiopia Based on Expert Judgements (original) (raw)

Materials and Methods

The procedure for selecting typical production systems is described in the following two subsections. In the following subsection, the SMART-Farm Tool and how it was applied in this study is introduced. Further, we explain the data collection process. Finally, in the last subsection the applied uncertainty analysis is laid out.

Typical Farm Theory

We define a typical farm as a farm type that represents the mode of the distribution of farms according to defined classification criteria. The farm depicting the mode is the farm that can be found the most often on the ground and thus differs from an average farm that depicts the mean of all farms and which rarely exists on the ground. We deem the definition of a typical farm through the mode as more meaningful in the context of sustainability assessments and have included a more extensive justification of this topic in the discussion of methods section. Classification criteria are the defined geographical area, the relevant farm enterprises, and resource endowments (Feuz and Skold, ). Further relevant classification criteria for this study are defined and discussed in the following subsection. The concept of typical farms has been used in many studies as a basis for e.g. policy assessments (Häring, ; Thünen Institute, 2016; Reidsma et al., 2019).

Definition of Typical Coffee Production Systems in Selected Regions in Ethiopia and Brazil

The authors defined typical organic and non-certified coffee production systems and the system boundaries of Ethiopian and Brazilian coffee production according to the following classification criteria. The geographical areas of interest where the typical systems are situated in the two countries were defined as the sub-national region(s) where most coffee is produced. This was first determined through different sources of literature (sources are indicated in subsection Typical farms in Ethiopia and Brazil), and then supplemented and verified by experts (the definition and selection of experts can be found in subsection Selection of experts). Furthermore, we identified typical characteristics regarding technology constraints, resource access and management and validated these as well through literature and expert interviews. The following decision criteria were chosen to identify an organic and a non-certified coffee production system:

SMART-Farm Tool

The SMART-Farm Tool was used as follows: all SMART indicators relevant to the respective typical systems were rated with a performance score reflecting the degree of goal achievement. This was conducted with the help of scientific literature and expert judgements (Further explanations in Selection of experts). Each indicator has an assigned weight according to its importance for the subtheme. The weights were defined by a panel of 67 experts from 21 countries using a Delphi process (Schader et al., 2019). Most indicators are relevant for several subthemes with varying importance. Weights are depicted as percentages and can be either positive or negative. The indicator achievements were assessed on different qualitative (binary and ordinal) and quantitative (numerical and percentages) benchmarks. These were then translated into percentages. For example, the indicator “Does the farm have adequate savings to cater for its cash needs?” will be answered in a qualitative way (No, Partly, Yes) which is then translated into a percentage rating (0, 50, 100%). A quantitative indicator is, e.g., “What proportion of the agricultural area does not receive synthetic chemical fungicide applications?” which is directly rated in percent of agricultural area.

For calculation of the degree of goal achievement in percent of the subthemes, the achievements of the indicators were multiplied by the weights, summed up, and divided by the sum of all possible achievements using the following formula:

x: index of farms

i: index of subthemes

n: index of indicators

DGA: Degree of goal achievement

IM: Impact of an indicator on a sub theme

IS: Performance of a farm with respect to indicator n

Results are categorized as follows: 0–20% insufficient, 20–40% limited, 40–60% moderate, 60–80% good, and 80–100% best sustainability performance (FAO, ).

Selection of Experts

According to Mieg and Näf (2005), an expert is a specialist in a certain area of knowledge with several years of experience, his/her knowledge not being transferable to another area and not being predominantly dependent on the expert's personal skills such as intelligence or memory. In this study, the definition of an expert was further clarified as either an advisor or researcher with a sufficient overview of the heterogeneity of farms in the respective country or production area to be able to rate the SMART indicators. Experts were identified through relevant scientific literature, i.e., authors of scientific publications concerning coffee production. Extension services or development agencies working with coffee producers were also included. Experts were further asked to identify other experts in their respective field of expertise (snowball sampling). This is a crucial part of the methodology as through this one gets first-hand information on who belongs to the circle of experts in the respective area and minimizes the chances of leaving relevant interview partners out (Bogner et al., ).

The section of the study that involved human participants was performed in accordance with all relevant institutional and national ethical guidelines. Approval by an ethics committee was not required in accordance with Swiss law. Informed consent was obtained from respondents in accordance with section 32 of REGULATION (EU) 2016/679 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data (General Data Protection Regulation).

Data Collection Through Literature Review and Expert Interviews

A literature review of scientific and gray literature, as well as governmental reports and databases was carried out to get an overview on existing data relating to SMART indicators. From this, most information on the typical production systems and some SMART indicator ratings could be obtained. All information not obtained in this way was collected through expert interviews. In total, 26 experts were interviewed. For Ethiopia, one livestock expert, seven natural scientists, one social scientist, two economists, one coffee exporter, and three agricultural advisors were interviewed. In the case of Brazil, six agronomists, two economists, and three agricultural advisors were interviewed. Each expert only rated the indicators and farm types in his or her field of expertise.

To assess the overall sustainability of the chosen typical production systems, the SMART-Farm Tool Version 4.0 was used. As typical production systems were assessed with the help of expert interviews and not through individual farm assessments, the ratings of indicators were in some cases depicted as distributions and not as precise values. This method accounted for variations within the defined typical production systems. For example, an expert might find that the proportion of arable land devoted to legumes of a coffee farmer lies realistically between 10 and 30%, all values in this range having the same probability. The underlying distribution of this rating is thus uniform. Additionally, in some cases indicators were rated by more than one expert, which can also result in an indicator rating depicted as a distribution, such as a discrete distribution if one expert rates an indicator with 40% and another with 50%.

Uncertainty Analysis

In addition to the uncertainty ranges provided by the expert scores (termed “basic uncertainty”), uncertainty of the underlying scores themselves was also estimated (termed “data uncertainty”). This uncertainty was quantified and analyzed for each indicator score. To do this, we evaluated the quality of the expert ratings or rating through literature and defined an uncertainty distribution for each rating. The uncertainty distribution parameters were based on a pedigree matrix approach. The term pedigree matrix was used here as the data quality indicators describe the source of the information, like a genealogical table documenting the pedigree of a person. It comprises five independent criteria: “reliability,” “completeness,” “temporal correlation,” “geographic correlation,” and “further technological correlation”—each divided into five quality levels that add up to scores from one to five. For each score, a normal uncertainty distribution is assigned with the mean of zero and a variance based on expert judgement. This distribution can then be added to the basic uncertainty of each indicator (Weidema et al., 2013). As we refer in SMART to a scale from zero to 100%, the normal distributions were truncated at these boundaries. In this study, only the criteria “reliability,” “temporal correlation,” and “geographical correlation” and within these, only a selection of quality levels was of importance, hence the non-relevant criteria were omitted (see Table 1). We did this to avoid an unnecessarily high variation.

Quality Score (o , ) 1 (o , = 0) 2 (o , = 0.019) 4 (o , = 0.089) 5 (o , = 0.2)
Reliability Verified data based on measurements N/A Qualified estimate N/A
Temporal correlation Less than 3 years of difference to the time period of the dataset Less than 6 years of difference to the time period of the dataset N/A Age of data unknown or more than 15 years of difference to the time period of the dataset
Geographical correlation Data from area under study Average data from larger area in which the area under study is included N/A N/A

Adapted pedigree matrix from Weidema et al. (2013).

The distribution of the degree of goal achievement of each subtheme was calculated with the help of Monte-Carlo uncertainty analyses (Rubinstein and Kroese, 2011) with 1000 iterations using the @RISK excel add-in to determine the Monte-Carlo uncertainty distributions. This procedure allowed us to see all possible outcomes of a scenario including the probability of their occurrence (Palisade, 2016). Unless otherwise specified, only cases where a difference in favor of one scenario is seen in 950 of 1000 simulation runs (i.e., p < 0.05) are mentioned in section Results and discussion.

Results and Discussion

Typical Farms in Ethiopia and Brazil

In the case of Ethiopia, four types of coffee production systems were distinguished: forest, semi-forest, garden, and plantation coffee production systems (Tefera and Tefera, 2013). The semi-forest and garden coffee cropping systems are the most relevant in Ethiopia with 50 and 40 percent of the overall production, respectively (Gole, ). As they are also the most prevalent under organic certification with few differences to the conventional systems, only these two mentioned systems were hence considered here and described in detail (see Table 2).

Types Classification criteria Type 1 and 3: Ethiopian garden conventional and certified small producer Type 2 and 4: Ethiopian semi-forest conventional and certified small producer Type 5: Brazilian conventional large producer Type 6: Brazilian certified small producer
VSS • None, member of non-certified cooperative (Type 1 and 2) • Organic and Fairtrade, member of certified cooperative (Minten et al., 2015) (Type 3 and 4) • Conventional • Most are organized in cooperatives (Carvalho, ) • Organic and Fairtrade certified • Organized in certified cooperatives (Pedini, 2016)
Coffee species Coffea Arabica: • Wild growing seedlings are collected and redistributed (Gole, ) • Landraces, partly coffee berry disease resistant (Hylander, ) • Seedlings grown in own nursery, bought or received from government (Jena et al., ) • _Coffea Arabica_• Propagation: Often own seedling nursery if 100 hectares or bigger (Resende, 2016; Carvalho, ) • _Coffea Arabica_• Propagation: Bought-in seedlings from local markets (Pedini, 2016)
Specifications of the geographical area of interest • Oromia and the Southern Nations, Nationalities, and People Regions (SNNPR) in the South and West of the country (Tefera and Tefera, 2014) • Precipitation between 1,500 and 2,500 mm and temperatures between 15–25°C (Tefera and Tefera, 2014) • Altitude: 1,200–2,750 meters above sea level (Tefera and Tefera, 2014) • Minas Gerais: Sul de Minas (Vilela and Rufino, 2010) • Hilly area, very good growing conditions, good soil, enough rain (Secretaria de Produção e Agroenergia, 2009) • Altitude: 559–2,362 meters above sea level (Vilela and Rufino, 2010)
Farm size • 0.5 ha land (Tefera and Tefera, 2013) • 70% coffee area, 5–20 fruit trees and Ensete, 10–25% vegetable garden (Merdassa, 2016) • Primary source of income are coffee sales (Merdassa, 2016) • 1–1.5 ha coffee forest, 0.5 arable land (Merdassa, 2016) • Plot in the forest, often not directly nearby the homestead/agricultural plots, but clear ownership structure (unlike forest coffee) (Minten et al., 2014) • Primary source of income are coffee sales (Merdassa, 2016) • Big farms (> 50 ha), 150 ha is a typical size (Carvalho, ) • Family farms (<10 ha), 5 ha is a typical size (Pedini, 2016)
Labor • Family labor (Hylander, ) • Common: Informal, reciprocal, unpaid help of neighbors in harvest time (Merdassa, 2016) • Some family child labor (Minten et al., 2015) • Family labor (Merdassa, 2016) • 1–2 hired workers for the harvest season of 2–4 months (Merdassa, 2016) • Some family child labor (Minten et al., 2015) • 20–30 full time employees, 10–70 seasonal workers (Carvalho, ) • Family labor • One to two temporary harvest workers (Pedini, 2016)
Yield • 400–500 kg/ha (Gole, ) • 300–400 kg/ha (Gole, ) • 1,800 kg/ha (Carvalho, ) • 1,200 kg/ha (Moreira, 2016)
Crops and agricultural practices • Mixed agroforestry system (Gole, ) • Shade trees: Fruit trees, Ensete ventricosum, and others (Gole, ) • Manual weeding 2–3 times a year (Gole, ) • Tree species: 5–10 (Gole, ) • Local cereal crops and vegetables as part of agroforestry system (Merdassa, 2016) • Low to no use of chemical pesticides and fertilizers irrespective of certification (Jena et al., ; Minten et al., 2015; Tefera, 2015) • Reliance completely on manual labor (Mekonnen, 2016) • Reduction of natural forest composition to tall tree canopy of few, mostly leguminous shade trees, and coffee layer with limited number of intermediate canopy layers (Gole, ) • Clearing 2 times per year (Gole, ) • Tree species: ~19 (Gole, ) • Arable land: Separate cultivation of local cereals and vegetables (Merdassa, 2016) • Plantation • Specialization in coffee (Pedini, 2016; Resende, 2016) • High level of mechanization as far as possible (Carvalho, ) • Weed control: Herbicides twice a year (Resende, 2016) • Inputs: Fungicides 2–3 times a year against coffee rust, as well as insecticides against coffee berry borer (Resende, 2016) • Mineral fertilizers (NPK) applied 3 times a year (Resende, 2016) • Plantation (Pedini, 2016; Resende, 2016) • Specialization in coffee (Pedini, 2016; Resende, 2016) • Medium level of mechanization, portable harvest, and mowing machine (Moreira, 2016) • Weed control: Mechanical (Moreira, 2016) • Bought-in seedlings, rock phosphate, residues (cake) of oil production with Ricinus communis L., application 2–3 times per year, together with organic material from grasses and coffee pulps (Moreira, 2016) • 30–40% use copper fungicides (Moreira, 2016)
Livestock • Different kinds of livestock (cattle, sheep, goats, equines, and poultry) are kept in the system (Abebe, ) • Included in assessment: 70–80% extensive beef cattle free roaming on communal land (Dermauw, ) • Main use: Cultural reasons, meat for home consumption, and occasionally for sale (Dermauw, ) • Average of 0.25 ha per animal of communal grazing area (Mengistu, 2006) • 2–10 animals, main breeds: Zebu: e.g., Boran, Sheko, and Abigat (Mengistu, 2006) • None • Only a few small animals (chicken, pigs) (Pedini, 2016), not assessed

Typical coffee production systems.

In Brazil, the State of Minas Gerais accounts for about 50% of the country's coffee production (Barbosa et al., ). Because of this, we chose it as an exemplary study area. In Minas Gerais, coffee from the Cerrado region accounts for 30% of green coffee production and Montanha coffee for around 70% (Vilela and Rufino, 2010). The Montanha farms can be further divided into three groups: small (<10 hectares); medium (10–50 hectares); and large producers (>50 hectares). They are situated in the Minas Gerais highlands. This makes mechanization and intensification difficult. Nevertheless, the large producers in particular are highly mechanized insofar as is possible in the respective terrain. The biggest farm type is likely to be the one with the highest production share [this assumption is based on numbers regarding the share of area each farm type manages according to Vilela and Rufino (2010)]. As a second model system, we assessed a typical organic farm in Minas Gerais. Personal correspondence indicated this would be represented by a Fairtrade and organic certified Montanha small producer (Pedini, 2016). The two chosen systems are defined in further detail in Table 2.

When these definitions of typical production systems were applied to the SMART-Farm Tool, the number of relevant indicators per production system was the following: Ethiopian garden systems: 227, Ethiopian semi-forest systems: 241, Brazilian conventional system: 204, Brazilian certified system: 174.

Sustainability Performance in Dimension “Environmental Integrity” at the Subtheme Level

In this and the following three subsections, the degrees of goal achievement of the subthemes are presented. These were calculated with Monte-Carlo simulations based on the expert judgements of single indicators. The goal achievement of a subtheme is always set in relation to the included indicators. The values described are the means of the degrees of goal achievement in percent in the subthemes and the error bars show the standard deviations of the Monte-Carlo simulations. To give an overview of the results, the very low and high scoring subthemes are presented here and the greatest differences and similarities between the described typical systems are highlighted and discussed. A high score indicates good performance, whereas a low score indicates unsatisfactory performance. Additional discussion of subthemes can be found in Appendix I. A detailed description of each subtheme and its goals can be found in Appendix II. Lastly, the means of indicator ratings can be found in Appendix III. In the following, the indicator identity number (ID) is mentioned in brackets if an indicator rating is described. If no literature source is indicated, the statement is based on expert judgement.

Typical Ethiopian Systems

In this subsection, firstly, the overall performance of the dimension of the different systems is presented. Furthermore, the highest and lowest scores are discussed. Literature suggests that certification can have positive effects on environmental outcomes (Hardt et al., ; Vanderhaegen et al., 2018). In several countries, organic coffee certification could reduce the use of chemical input and increased adoption of some environmentally friendly management practices, such as increasing tree cover and habitat conservation (Blackman and Naranjo, ; Jurjonas et al., ; Giuliani et al., ). However, in our assessment, overall, a moderate to good performance was observed in Figure 2 of all four Ethiopian systems, not only for the certified systems. The four assessed Ethiopian production systems are diversified and mostly extensively managed with hardly any use of external inputs that could cause contamination, regardless of certification status (Tefera, 2015). Literature describes Ethiopian smallholder coffee as around 95% organically managed even though only a small part is formally certified (Tefera, 2015). The use of synthetic pesticides and fertilizers is reported only in very few cases (Jena et al., ; Minten et al., 2015). Fairtrade only has minor requirements regarding the environmental dimension and thus also does not induce a better performance for the certified systems (FLO, ).

In this paragraph, some exemplary subthemes are further explored: For example, some of the major driving forces (impact of indicator on subtheme outcome over 50%) behind the good performance in all four systems of the subtheme “Genetic Diversity” are the following SMART-Farm Tool indicators: No use of synthetic insecticides or fungicides (Indicator IDs 233, 234), locally adapted livestock breeds (245), no use of genetically modified seeds (519) or hybrids (247) and the cultivation of some rare crop species (223).

However, there are also some low scoring subthemes. The subtheme “Greenhouse Gases” achieved one of the lowest scores of the Ethiopian systems, meaning emissions are rather high. Positively rated SMART indicators are for example no use of synthetic fertilizers or pesticides (Indicator IDs 233, 234), no use of electricity (332) and extensive pasture management (253). Negative aspects are, to name some examples of indicators with a high impact, practices prevalent in the typical systems such as the conversion of grassland to arable land (601), burning of bushes and household waste (788), and problematic practices on the arable land such as regular plowing (182), no to little mulching (237), and generally insufficient erosion prevention measures (700). Insufficient climate change mitigation actions have been pointed out by Westengen et al. (2019).

Typical Brazilian Systems

Contrary to the Ethiopian systems, the Brazilian systems show large differences in sustainability performance, as can be seen in Figure 3. Consequently, in this subsection, these differences are mainly discussed. Agribusiness is much further developed in Brazil than in Ethiopia (Boddey et al., ). Here, organic certification influences the choice of inputs considerably and thus, a great sustainability improvement is visible in the environmental dimension for the typical certified system. Farm size also plays a role as the certified farms are smaller and thus not as mechanized.

The certified Brazilian farmers maintain a green cover throughout the year and only apply organic fertilizers and crop residues according to expert judgements. For pest management, they work partly with coffee rust-resistant varieties, with copper fungicides used only in rare cases. These aspects lead to a positive indicator rating and resulting good sustainability performance in the environmental dimension.

This stands in contrast to the Brazilian conventional system's scores that mostly remained in the moderate category. One of the major reasons for this discrepancy that experts mentioned is the use of synthetic pesticides and fertilizers in the conventional system. Additional negative performance is because some of the regularly used substances are considered chronically toxic, toxic to bees and aquatic organisms and persistent in soil and water by the PAN pesticide database. Experts mentioned further that heavy machinery is also likely to be used that can cause severe soil compaction.

In this paragraph, some exemplary subthemes are further explored: The conventional Brazilian system scores lowest in “Waste Reduction and Disposal” mostly due to low recycling rates in the area in general (335.1, 334, 334.1–334.5) as well as the occurrence of rather high-risk wastes related to synthetic pesticides and fertilizers (327). Boddey et al. () confirm the excessive use of such substances in the conventional Brazilian coffee production, especially where the application is carried out with the help of machines. Carvalho () point out the environmental risks related to the use of such substances in coffee production, such as pollution of natural resources, and the resulting negative long- and short-term health effects for humans and other living beings. Boddey et al. () present organic agriculture as a solution to these problems in Brazilian coffee production. However, the certified system also has some low scoring subthemes. One of the lowest scores of the certified Brazilian system is “Greenhouse Gases.” Indicator ratings with a high impact (40% and more) that cause this moderate rating are, e.g., no areas of permanent grasslands (222) or agroforestry (202) in the system and no use of fuel made from renewable sources (348) or home-produced (188) fuel. In fact, this is the only subtheme where the conventional Brazilian system outperforms the certified system in this dimension. This is mostly due to positively rated indicators relating to the seedling nurseries used by the conventional systems, which are unlikely to use peat (733). This result is further discussed in section Conclusions.

Sustainability Performance in Dimension “Economic Resilience” on Subtheme Level

Typical Ethiopian Systems

Firstly, the overall performance of Ethiopian systems (Figure 4) is discussed in this subsection. Some contradicting trends are highlighted, as well as the major differences between certified and non-certified systems. Some empirical studies (Milford, 2004; Philpott et al., 2007; Dörr, ; Kodama, ) showed that certification could improve returns to smallholder coffee farmers. Several studies that have been conducted somewhat later indicate, however, that income increases through certification are generally modest (Valkila, 2009; Valkila and Nygren, 2010; Jena et al., ; Ruben and Fort, 2012). In many subthemes, such as “Internal Investment,” “Profitability,” and “Liquidity,” the performance of all Ethiopian systems is not particularly high, reflecting the vulnerability to poverty of an Ethiopian smallholder highly dependent on coffee and easily affected by price changes as described, e.g., in Woubie et al. (2015) and Jena et al. (). This performance cannot be increased through certification in most subthemes. This finding is confirmed by several studies (Jena et al., ; Minten et al., 2015; Woubie et al., 2015). The main reasons in literature for these findings are, firstly, that the cooperatives can only buy a limited amount of coffee due to cash constraints. Farmers therefore sell most of their coffee to private buyers as conventional produce. Secondly, the price premium obtained by selling organic produce is only transmitted to around a third to the farmers because it is captured at the cooperative level. This amount is too low to significantly improve the farmer's socio-economic situation. Thirdly, yields in cash crops in Sub-Saharan Africa remain low (Morel et al., 2019).

In this paragraph, some exemplary subthemes are further explored: For example, the driving forces (indicators with a high impact on subtheme outcome) behind the low score in the subtheme “Liquidity” are: the farmers are not able to cater for their cash needs (Indicator ID 770), no insurance against natural disasters (156) and coffee as the only income source (158). At the same time, not all subthemes of the Ethiopian systems score low. Four are rated well with “Stability of Supply” scoring highest. The good performance of this subtheme is mostly due to the system's independence of external inputs (231, 233, 234, 323, 324, 626, 199, 712). Generally, only seeds and seedlings are occasionally bought on local markets or provided by the government or cooperatives, otherwise the system is self-sufficient. Jena et al. () verify these findings as the coffee farmers in their study area also mostly use very few inputs.

“Product Information” scores lowest for all conventional systems. The indicators causing this low score concern no certification of sales products (63, 65) nor of inputs (4, 5), no direct marketing (141), and generally a low transparency (175). Here, the certified systems clearly score higher. This is mostly due to the certification of the coffee and the resulting improvement of traceability. Minten et al. (2014) elaborate on traceability problems caused by the centralized trading system in Ethiopia. Certified coffee is mostly traded through cooperatives which can by-pass the Ethiopia commodity exchange (ECX) and thus traceability can be ensured (Minten et al., 2015).

Additionally, “Food Safety” and “Food Quality” show an improvement in sustainability performance in all the certified systems compared to the conventional systems, because it is more likely that measures are taken in case of a reported contamination (169) according to experts. Moreover, again improved traceability (4, 5, 63, 65) also leads to a better performance.

Lastly, substantial differences in performance can be seen concerning the subtheme “Stability of Market” in the case of the Ethiopian certified systems as they score higher than their conventional counterparts. The reasons for that are improved transparency (175) and somewhat better access to advisory services (703). This difference can however not be detected in at least 950 of 1,000 simulation runs and is not backed up with literature sources. Minten et al. (2015) do not find a difference in access to training between organic and Fairtrade certified farmers in the coffee growing regions in South-West Ethiopia. Jena et al. () report only an insignificant improvement of training access for Fairtrade and/or organic certified farmers as compared to non-certified farmers in their study carried out in the Jimma region in South-West Ethiopia. In addition, they highlight the existence of a cooperative effect in Ethiopia rather than a certification effect. If a cooperative functions well, the members profit in terms of access to extension services, credit and price premiums.

Typical Brazilian Systems

As regards the Brazilian systems, some effects of certification can be seen (Figure 5) where the certified system scores better than the conventional. Effects of farm type can be observed in some subthemes, where the large-scale and intensified conventional Brazilian farm type scores best, presumably profiting from economies of scale. For example, “Profitability” shows the best performance for the Brazilian conventional system ranking in the category “Good.” It is also the only subtheme that scores substantially higher in the conventional system than in the Brazilian certified system. The higher performance of this subtheme regarding the conventional system results mainly from the fact that, according to the expert indicator ratings, there is intensive use of synthetic fertilizers and pesticides (e.g., 233, 234) and is thus likely to obtain a high yield as opposed to the certified system. De Almeida and Zylbersztajn () show that conventional farmers have a profit-oriented farming approach and are likely to obtain good prices for their produce in Brazilian large-scale coffee production. Smaller farmers are not as well organized with respect to profit optimisation. However, the certified system also shows a satisfying performance in the economic dimension. A study on the socio-economic sustainability of organic coffee farms in Brazil shows that family-run organic farms are most likely to be socio-economically sustainable. For larger organic farms, the high cost of hired labor causes severe economic constraints (Wegner et al., 2013).

Similarly to Ethiopian systems, the certified Brazilian system scores substantially higher in the subthemes “Product Information,” “Food Safety,” and “Food Quality.” Some of the reasons mentioned by experts are the absence of harmful substances, good storage facilities, good traceability structures (4, 5, 63, 65) and few cases of contamination (169). Lastly, in contrast to the Ethiopian systems, no substantial differences in performance can be seen for the Brazilian systems concerning the subtheme “Stability of Market.” For both Brazilian systems, indicator ratings show that extension services are sufficiently available (703) and thus perform as well as the Ethiopian certified systems and with less probability of variation in the outcome. However, there is evidence that coffee producers in Minas Gerais choose to be certified in order to get better access to extension services (Lemeilleur et al., ).

Sustainability Performance in Dimension “Social Well-Being” on Subtheme Level

Typical Ethiopian Systems

As regards the Ethiopian systems, sustainability performance generally ranges from “insufficient” to “moderate” (Figure 6). There are some very low scoring subthemes, which are often related to labor conditions. Labor relations are casual and there is no social security for family workers or hired workers, according to expert judgements. An example as to how this affects the performance of the subtheme “Freedom of Association and Right to Bargaining” is explained in the following. The performance of the Ethiopian garden coffee system is the lowest in this subtheme. As there is no external labor in the garden system, the indicators relevant in this subtheme only relate to working conditions at suppliers. The farmers do not have a socially responsible procurement strategy (5), no social certification (65) and source inputs from Ethiopia where social conditions are potentially problematic according to the International Labor Organization (ILO) (514), this subtheme scores very low. The certified garden coffee system achieves a better score due to the certification of the coffee (65). The semi-forest coffee systems also achieve a substantially improved performance because they include positively rated indicators concerning bargaining rights of hired workers (442, 442.1).

With regard to “Non-Discrimination,” “Gender Equality,” “Forced Labor,” and “Support to Vulnerable People,” the situation is similar. In the case of “Non-Discrimination” for example, the negative indicator ratings affecting the performance of this subtheme are the same for all systems, such as the lack of clear ownership rights (456.5) and potential socially problematic inputs (514). However, both semi-forest systems score higher, as the indicator rating includes a few more positively influencing indicators regarding external labor [e.g., no harassment of employees or forced external labor on the farm (445)]. The overall low performance of Ethiopian systems in “Gender Equality” reflects that in cash crops such as coffee, inequality between men and women in developing countries is particularly high (Tavenner et al., 2019).

Although the overall performance of the Ethiopian systems is not convincing, some subthemes score well. For example, all Ethiopian systems score highest in the subtheme “Public Health.” Reasons for this good performance are, amongst others, indicator ratings indicating no or little use of synthetic pesticides (232, 233, 234) or fertilizers, antibiotics (352, 295), technologies such as GMO (519), and low amounts of waste production (e.g., 327).

Typical Brazilian Systems

Regarding the typical Brazilian systems, the certified system scores better in subthemes that relate to measures against discrimination, whereas the non-certified system shows a better performance with regard to labor relations (Figure 7). The good performance in the subthemes “Support to Vulnerable People,” “Non-Discrimination,” and “Gender Equality” of the certified Brazilian system stands in contrast to the conventional Brazilian system that performs substantially lower in these subthemes. According to expert judgments, this is due to awareness raising measures of the certified cooperatives that effectively influence farmers' attitudes toward gender equality (455, 456). According to the experts, they are likely to take measures against discrimination of women such as unequal wages. Conventional farms do not have similar measures in place. The Coffee Barometer 2014 highlights the importance of women in coffee production making up about 50% of the work force (Panhuysen and Pierrot, 2014). Waltz (2016) emphasize that there is still a lack of empowerment of women in Southern Brazil, especially in a family farm context. The improvement in sustainability performance in the certified system in this subtheme can be confirmed by evidence from Minas Gerais in Brazil and Mesoamerica. Nelson and Pound (2009) report in a case study that a Fairtrade certified cooperative in Minas Gerais significantly improved women's participation in decision making on cooperative level through anti-discrimination measures. However, this effect could not be seen in some other cases in the area. In addition, a study assessing gender equity comparing conventional and Fairtrade/organic double certified coffee farmers in Mesoamerica shows that the certification brings significant improvements to women's control over farm practices, cash access and access to network benefits (Lyon et al., 2010). Like the Ethiopian systems, the certified Brazilian system scores highest in the subtheme “Public Health.” The substantially worse performance of the conventional Brazilian system is mostly a result of indicator ratings indicating its intensive use of synthetic pesticides and fertilizers.

Regarding some labor related subthemes, the Brazilian conventional system scores either as well as or better than the Brazilian certified system. The latter is true for “Employment Relations” where the conventional system scores highest. According to expert judgments, conventional large-scale farms in the South of Minas Gerais often employ a number of workers permanently (463.1) and may pay salaries above the minimum wage to skilled laborers (410). On the contrary, in the certified Brazilian system, most work is done by the family. Occasionally, harvest help are temporarily employed (463.1). Here, the labor conditions are also regulated, but not as well as in the big farms as the workers are not permanently employed. All employees in both systems have legally binding contracts (423), regulated working hours (437, 490), bargaining rights (442), and the right to join a union (442.1). De Almeida and Zylbersztajn () support these findings in their study on success factors in the Brazilian coffee agri-chain with a focus on Minas Gerais. They point out that highly mechanized and large-scale coffee farms in Minas Gerais use skilled labor, invest in training and offer differentiated salaries. They also emphasize that such big farms are more likely to employ permanently. This stands in contrast to their characterization of Brazilian small-scale producers who are reported to rely mostly on family and temporarily hired, low skilled labor. Nevertheless, employees always have legally binding contracts and social insurance, whether permanently employed or not.

Sustainability Performance in Dimension “Good Governance” on Subtheme Level

Typical Ethiopian Systems

The performance of the Ethiopian systems mostly ranges between the categories insufficient and limited in this dimension, as can be seen in Figure 8. As can be seen in Figure 5, the subtheme “Holistic Audits” ranks the lowest for the Ethiopian non-certified systems as almost all indicators are negatively rated. According to expert indicator ratings, none of the monitoring options on the sustainability performance of the farm are used by conventional Ethiopian coffee farmers. For example, no soil samples are taken to determine fertilizer requirements (290), the farmers do not source inputs in a socially or environmentally responsible way (4, 5) and the produce is not certified according to a social or environmental standard (63, 65). The certified Ethiopian systems perform substantially better as they are inspected internally and occasionally externally according to their certifications (63, 65).

Looking at “Remedy, Restoration & Prevention”, the conventional Ethiopian systems score insufficiently as minor infringements of the law may happen (53) and no communication or conflict resolution procedures are in place (22, 28). We observe a better performance of the Ethiopian certified systems due to somewhat better supervision through certification procedures. This regards indicator ratings showing that restriction measures against infringements of the law (53) such as extending the agricultural area into a nearby forest and measures against contamination of produce (169) are in place. The Ethiopian semi-forest systems score the lowest of all systems in the subtheme “Legitimacy” as the employment situation in these systems is precarious (e.g., 423, 410) according to expert judgements.

The Ethiopian systems all score the same in the subtheme “Mission Statement,” in the category “Limited.” Typical Ethiopian coffee farmers are partly committed to sustainability topics (8), but cannot name specific planned improvements (750). No difference can be found between the certified and the non-certified systems as the farmers are often not aware of their cooperative's certification and its meaning (Jena et al., ).

Typical Brazilian Systems

There are several other subthemes where the certified system outperforms the non-certified system. “Transparency” e.g., scores rather low for the conventional system (Figure 9). Through certification and the resulting requirements on a better traceability (165, 63, 65), the rating of the certified system is substantially better. Another example is the subtheme “Mission Statement,” where the conventional Brazilian system is outperformed by the certified system. This is partly due to the different sizes of the farm types. The SMART -Farm Tool omits some indicators regarding written commitments to sustainability and the publication of such material for a smallholder. According to expert indicator ratings, typical certified smallholders do commit themselves verbally to sustainability issues (8). In the SMART-Farm Tool it is assumed that the typical conventional farm as a bigger enterprise has the resources to issue something like a sustainability report (6, 35). However, the typical conventional farm is not likely to have any such documentation. This explains its very low performance in “Mission Statement.”

The Brazilian systems score similarly and substantially better in the subtheme “Remedy, Restoration & Prevention” than the Ethiopian systems. Both are unlikely to be involved in infringements (53) of the law as regards labor regulations or conflicts with neighbors (22) according to expert indicator ratings. Expert indicator ratings also suggest that they take measures in cases of product contamination (169). This also explains to a good extent the performance of the Brazilian systems in the subtheme “Legitimacy” where both systems obtain the highest degree of goal achievement of all subthemes in this dimension (Category “Good” in the conventional system and even “Best” in the certified system). It shows that, according to expert judgements, both systems are mostly compliant with the applicable national laws and international human rights standards. The certified system scores better because it is unlikely to cause any negative social or environmental impacts whereas the conventional system my cause some environmental pollution due to intensive practices (21), such as pesticide use and resulting environmental pollution.

Discussion of the Method and Study Limitations

Typical coffee production systems at the country level in Ethiopia and Brazil could be distinguished and successfully assessed with the SMART-Farm Tool. One of the main criteria to select the typical farm types was the production and export amount of coffee. In Ethiopia, certification status and farm size can be well isolated, thus certification effects could be estimated. However, in Brazil farm size and certification are correlated, therefore we had to choose a certified smallholder system and a conventional large-scale farm. The results should be viewed in context of this, and any certification effect treated with caution. An advantage of this method is however that a realistic picture of an existing farm type is drawn instead of attempting to create an artificial counterfactual that does not reflect the typical situation.

The method of collecting data through expert interviews is generally considered an easy and efficient way to obtain information (Mieg and Näf, 2005; Glaser et al., ; Bogner et al., ).

However, in this study some difficulties arose when discussing the SMART indicators for the typical systems we defined. As the questionnaire is originally designed for a real farm assessment, experts occasionally found it difficult to give an estimation for a typical farm. In general, extracting data in such detail as required by the SMART-Farm Tool was challenging. Some examples are indicators such as “Settings of combustion motors [How often are the settings of combustion motors of vehicles (e.g., tractor, stapler) and other machineries checked and adjusted (engine, air filter etc.)?]” or “Infringements of the law (In the last 5 years, have there been any cases in which the farm has broken the law? If yes, how serious were they?).” Such questions could be omitted or replaced by more general indicators for future expert judgements.

On the contrary, single farm assessments often are prone to biased answers by farmers, particularly if sensitive issues such as pesticide use, child labor, forced labor, infringement of the law, and gender equality are concerned. Independent experts may have a more objective view. An example is a recently published study by Ssebunya et al. (2019) where no child labor was found in coffee production systems during sustainability assessments of individual farms using SMART. However, it is likely to be prevalent according to Akoyi et al. (). Following the approach of interviewing experts can thus yield complementary information for sustainability assessments in which numerous sensitive topics are analyzed. Issues such as social desirability and conformity bias, as well as prestige bias can be circumvented.

Apart from the content side, we see a methodological contribution of this paper as there are so far only very few studies, which follow such a structured expert-based approach, and then validate it with accounts from the literature. Furthermore, the concept of looking at typical farms that represent the mode of a distribution rather than the mean, has several advantages. Firstly, the combination of farm characteristics of the typical farms can be observed in the field and thus specific recommendations can be made. This stands in contrast to studying an “artificial” average farm where the modeled combination of farm characteristics cannot be found in the field. Here, an aggregation bias is likely where we assume that the average farm has a larger option space than farms in the field really have (Feuz and Skold, ; Häring, ). Secondly, economizing of resources used in the study while still yielding relevant results can be consolidated in this approach (Feuz and Skold, ). This is especially relevant for sustainability assessments, where a large amount of data in the different dimensions needs to be collected per farm.

A promising approach and thus a topic for further research could also be a combination of farm assessments and expert interviews to apply SMART at the sectoral level. Here, the advantages of both approaches could be capitalized on and a robustness check of results could be conducted. In order to efficiently coordinate this process, some farms fulfilling the description of the defined typical farm could be assessed individually.

During the process of data collection, the question of bias caused by the manner of interviewing or the background of the experts arose several times. Bogner et al. () confirm that complete neutrality in expert interviews is methodologically impossible. They argue that there are, e.g., different ways the interviewer can be perceived by the interviewee, causing him, or her to give differing answers. The methods used in this study aiming to avoid a potential bias are: (1) the number of interviewed persons from groups with strong political or commercial interests were kept to a minimum, (2) the interview procedure as described in the Methods section was followed as an attempt at standardization, (3) uncertainties in indicator ratings were accounted for through Monte-Carlo simulations. Nevertheless, neutrality of the answers cannot be entirely ensured due to the nature of the data.

As mentioned in Feuz and Skold (), the selected farm types are not representative. In addition, the data collection for indicator rating was conducted in a qualitative way. This means that the results of this study cannot be claimed to have any statistical representativeness, but rather give an overview of perceived typical situations in the field. This needs to be kept in mind when interpreting the results.

The choice of an appropriate sustainability assessment tool is crucial. Nowadays a wide variety of sustainability assessments and tools are available (FAO, ). The SMART-Farm Tool is based on a multi-criteria assessment approach designed to assess the sustainability performance of a farm with relatively low cost and based on the data easily available at farm level. Schader et al. (2016) mention that for some subthemes such as “Energy Use,” “Greenhouse Gases,” and “Profitability,” a more quantitative method like a life cycle assessment or the calculation of gross margins may be alternative assessment methods. However, they argue that these approaches are costlier as the data may either not be available or the farmers may be hesitant to disclose them, especially in the case of economic data. Nevertheless, they also point out that a further in-depth adaptation of the pool of indicators in the SMART-Farm Tool from case to case may be advisable. During this research, several areas were identified where improvements would be helpful for future assessments of similar production systems. Some examples are indicators addressing agroforestry and perennial crop production characteristics more in depth as well as price spreads and volatility. Additionally, an extension of the tool from farm level to organizational or processing level may be of use in the coffee production context.

Finally, there is some distortion of results caused by the number of indicators relevant for each system. For example, the conventional Brazilian system scores better in the subtheme “Greenhouse Gases” than the certified system. This is caused by the fact that the former contains indicators accounting for practices in seedling nurseries. However, it is likely that practices are similar also in the external seedling nurseries from which the certified system buys. This increase in sustainability is thus not real, but rather a construct of the method setting the system boundaries as such. A similar case can be observed when evaluating the Ethiopian systems with and without hired labor. e.g., in the subtheme “Freedom of Association and Right to Bargaining,” both systems score the same except from two additional indicators only relevant for the system using hired labor. This leads to a situation where the two systems are not easily comparable. This is a model-inherent issue that will need to be addressed in the future, for example by selecting generic indicators relevant for each system for comparison.

Overall, it remains a challenge to find a balance between a more individualized approach that reflects the specific characteristics of the described system well, and a method that still grants comparability across many different systems. Here, trade-offs depending on the intention of use of the respective sustainability assessment in a specific context cannot be avoided entirely.

Lastly, Monte-Carlo simulations were already used successfully in a similar context in order to calculate uncertainty distributions of the SAFA subthemes if the indicator weights are uncertain (Schader et al., 2019). In this study, the method proved itself helpful to take variations within the typical systems defined into account as well as the uncertainty resulting from the respective data source. With regard to the additional uncertainty, Weidema et al. (2013) mention that the uncertainties they estimated may have been understated. Hence, variation within the subthemes may be even larger than depicted in this study.