Add limit to total number of fields in mapping by yanjunh · Pull Request #17357 · elastic/elasticsearch (original) (raw)

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

@yanjunh

This is to prevent mapping explosion when dynamic keys such as UUID are used as field names. index.mapping.total_fields.limit specifies the total number of fields an index can have. An exception will be thrown when the limit is reached. The default limit is 1000. Value 0 means no limit. This setting is runtime adjustable

@yanjunh

This is to prevent mapping explosion when dynamic keys such as UUID are used as field names. index.mapping.total_fields.limit specifies the total number of fields an index can have. An exception will be thrown when the limit is reached. The default limit is 1000. Value 0 means no limit. This setting is runtime adjustable

@yanjunh

@s1monw

I wonder if we really need to make 0 a special value, why can't folks just use Long.MAX_VALUE or any high number? I also wonder if this PR can reject documents that have been accepted previously on the primary and trigger a mapping update on the replica due to some cluster state delays. This might also reject mapping merges coming from the master due to concurrent indexing? @jpountz should know more here, in any case I guess we need to make this check higher up the stack I suspect.

jpountz

private void checkTotalFieldsLimit(long totalMappers) {
long allowedTotalFields = indexSettings.getValue(INDEX_MAPPING_TOTAL_FIELDS_LIMIT_SETTING);
if (allowedTotalFields > 0 && allowedTotalFields < totalMappers) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like Simon suggested, I would just do: if (allowedTotalFields < totalMappers) {. If someone needs to have many fields anyway and understands the issue, (s)he can still set index.mapping.total_fields.limit=1000000 for instance.

@jpountz

@simonw The code has the check under the following if statement: if (reason == MergeReason.MAPPING_UPDATE). This means that the mapping is being merged due to a call to the update mapping API. So only the master node performs these checks, and only when there is a call to the update mapping API. In all other cases, like on data nodes or on the master node when it restores mappings from disk, the MergeReason would be MergeReason.MAPPING_RECOVERY so the check will be skipped. So I think we are fine?

@s1monw

@jpountz good can we add a comment there?

@jpountz

I will do it in a separate commit.

@jpountz

@s1monw

jpountz pushed a commit that referenced this pull request

Mar 29, 2016

@yanjunh @jpountz

This is to prevent mapping explosion when dynamic keys such as UUID are used as field names. index.mapping.total_fields.limit specifies the total number of fields an index can have. An exception will be thrown when the limit is reached. The default limit is 1000. Value 0 means no limit. This setting is runtime adjustable

Closes #11443

@jpountz

@yanjunh It was so close that I applied the changes I suggested and then merged. I hope you don't mind. 361adcf

@yanjunh

@jpountz Not at all. Thanks for merging this feature.

imotov added a commit that referenced this pull request

Mar 29, 2016

@imotov

…onIT#testDynamicUpdates

With restriction for the total number of fields introduced in #17357 this test can fail if a large number of records is randomly selected for indexing.

@segalziv

@clintongormley

@segalziv no, this is a breaking change so we'll keep it in 5.0 only

@segalziv

@thundercrawl

@yanjunh Is this limitation is a hard value for ES usage ? because the setting will let us think it a red line, and in current USE CASE ,we have documents exceed 3000 fields, so we have to divide the document to different index? is there any bad things to exceed 10000 fields?

@yanjunh

@thundercrawl The default limit is rather conservative. Depending on the resource (especially memory) ES nodes have, you can config much large limit. In practice, I have seen people using 10000 without problem.

@thundercrawl

@yanjunh Great thanks, that helpful for us to make a design

Labels