TS23.007 17.4.1
19A PFCP based restart procedures
After a PFCP entity has restarted, it shall immediately update all local Recovery Time Stamps and shall clear all remote
Recovery Time Stamps. When peer PFCP entities information is available, i.e. when the PFCP Association is still alive,
the restarted PFCP entity shall send its updated Recovery Time Stamps in a Heartbeat Request message to the peer
PFCP entities before initiating any PFCP session signalling.
Follow-up on [#2048](https://github.com/open5gs/open5gs/pull/2048)
AMF crashes when 'skipInd' field is missing:
```
amf | 03/21 07:45:04.092: [amf] FATAL: [imsi-001010000000000] No skipInd (../src/amf/namf-handler.c:392)
amf | 03/21 07:45:04.092: [amf] FATAL: amf_namf_comm_handle_n1_n2_message_transfer: should not be reached. (../src/amf/namf-handler.c:393)
```
In case of CM_CONNECTED skipInd is not important.
In case of CM_IDLE the proper relase would contain skipInd.
<nf>/init.c:<nf>_main() :
ogs_pollset_poll() receives the time of the expiration of next timer as
an argument. If this timeout is in very near future (1 millisecond),
and if there are multiple events that need to be processed by
ogs_pollset_poll(), these could take more than 1 millisecond for
processing, resulting in the timer already passed the expiration.
In case that another NF is under heavy load and responds to an SBI
request with some delay of a few seconds, it can happen that
ogs_pollset_poll() adds SBI responses to the event list for further
processing, then ogs_timer_mgr_expire() is called which will add an
additional event for timer expiration. When all events are processed
one-by-one, the SBI xact would get deleted twice in a row, resulting in
a crash.
0 __GI_abort () at ./stdlib/abort.c:107
1 0x00007f9de91693b1 in ?? () from /lib/x86_64-linux-gnu/libtalloc.so.2
2 0x00007f9de9a21745 in ogs_talloc_free (ptr=0x7f9d906c2c70, location=0x7f9de960bf41 "../lib/sbi/message.c:2423") at ../lib/core/ogs-memory.c:107
3 0x00007f9de95dbf31 in ogs_sbi_discovery_option_free (discovery_option=0x7f9d9090e670) at ../lib/sbi/message.c:2423
4 0x00007f9de95f7c47 in ogs_sbi_xact_remove (xact=0x7f9db630b630) at ../lib/sbi/context.c:1702
5 0x000055a482784846 in amf_state_operational (s=0x7f9d9488bbb0, e=0x7f9d90aecf20) at ../src/amf/amf-sm.c:604
6 0x00007f9de9a33cf0 in ogs_fsm_dispatch (fsm=0x7f9d9488bbb0, event=0x7f9d90aecf20) at ../lib/core/ogs-fsm.c:127
7 0x000055a48275b32e in amf_main (data=0x0) at ../src/amf/init.c:149
8 0x00007f9de9a249eb in thread_worker (arg=0x55a483d41d90) at ../lib/core/ogs-thread.c:67
9 0x00007f9de8fd2b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
10 0x00007f9de9063bb4 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100
Other NF instances are obtained through NRF
or created directly through configuration files.
Other NFs created by the config file should not be passed
through NRF discovery or anything like that.
Since self-created NF Instances do not have an ID,
they are implemented to exclude them from NRF Discovery.
A buffer overflow occurred in ALPINE
because the size of the pfcp message structure increased by
ogs_pfcp_tlv_framed_route_t framed_route[8];
ogs_pfcp_tlv_framed_ipv6_route_t framed_ipv6_route[8];
Metric 'bearers_active' was incremented in only one code path
(smf_bearer_add() for 4G only), while it was decremented from two paths
(smf_bearer_remove() for both 4G and 5G).
Increment metric also for 5G path (smf_qos_flow_add()), so it won't get
decremented into negative values.
The current load percentage of the NF Service Consumer is provided
in the payload body of the PATCH request when periodically
contacting the NRF (heart-beat).
AMF: ratio between currently connected ran_ue and maximum number of them
SMF: ratio between current PDU sessions and maximum available
PCF: ratio between current AM+SM policy associations and maximum available
or ratio between currently connected UEs and maximum number of them
(the load which is higher)
AUSF, UDM: ratio between currently connected UE and maximum number of them
BSF: ratio between current sessions and maximum available
NSSF: ratio between currently used NSIs and maximum number of them
NRF currently doesn't determine that the NF Profile has changed.
Expose SM metrics with labels according to ETSI TS 128 552 V16.13.0 in
SMF by using hash.
The metrics are named respecting the rule:
<generation>_<measurement_object_class>_<measurement_family_name>_<metric_name_as_in_TS_128_552>
Existing gauge sessions_active is renamed!
Since slice itself is not unique, the plmnid label is exposed in
addition to snssai.
Exposed metrics example:
-standard gauges:
fivegs_smffunction_sm_sessionnbr{plmnid="00101",snssai="1000009"} 0
fivegs_smffunction_sm_qos_flow_nbr{plmnid="00101",snssai="1000009",fiveqi="9"} 0
-nonstandard counters:
fivegs_smffunction_sm_n4sessionestabfail{cause="71"} 68
fivegs_smffunction_sm_n4sessionreport 1
fivegs_smffunction_sm_n4sessionreportsucc 1
fivegs_smffunction_sm_n4sessionestabreq 1
In case that database is (manually) corrupted for a specific UE, SSC
mode and ARP preemption vulnerability fields are not set correctly,
SMF will crash when trying to build a request to create PCF association.
Function smf_npcf_smpolicycontrol_build_create() will end prematurely,
and when cleaning up resources it will try to free() up invalid pointer,
which was not set to 0 at beginning of the function.
[smf] ERROR: SSCMode is not allowed (../src/smf/nudm-handler.c:165)
[sbi] DEBUG: STATUS [201] (../lib/sbi/nghttp2-server.c:443)
[sbi] DEBUG: SENDING...: 3 (../lib/sbi/nghttp2-server.c:451)
[sbi] DEBUG: {
} (../lib/sbi/nghttp2-server.c:452)
[sbi] DEBUG: STREAM closed [1] (../lib/sbi/nghttp2-server.c:962)
[smf] ERROR: No Arp.preempt_cap (../src/smf/npcf-build.c:132)
<crash>
0 __GI_abort () at ./stdlib/abort.c:107
1 0x00007f9348fe43b1 in ?? () from /lib/x86_64-linux-gnu/libtalloc.so.2
2 0x00007f9349aef745 in ogs_talloc_free (ptr=0x7f9348e38dab <_int_free+1675>,
location=0x5591b8675d27 "../src/smf/npcf-build.c:181") at ../lib/core/ogs-memory.c:107
3 0x00005591b8653c45 in smf_npcf_smpolicycontrol_build_create (sess=0x7f9343070010, data=0x0)
at ../src/smf/npcf-build.c:181
4 0x00007f9349abc2b4 in ogs_sbi_xact_add (sbi_object=0x7f9343070010,
service_type=OGS_SBI_SERVICE_TYPE_NPCF_SMPOLICYCONTROL, discovery_option=0x7f9338006d90,
build=0x5591b86531d0 <smf_npcf_smpolicycontrol_build_create>, context=0x7f9343070010, data=0x0)
at ../lib/sbi/context.c:1699
5 0x00005591b86580be in smf_sbi_discover_and_send (service_type=OGS_SBI_SERVICE_TYPE_NPCF_SMPOLICYCONTROL,
discovery_option=0x0, build=0x5591b86531d0 <smf_npcf_smpolicycontrol_build_create>, sess=0x7f9343070010,
stream=0x7f9344fce0a0, state=0, data=0x0) at ../src/smf/sbi-path.c:110
6 0x00005591b864e9da in smf_nudm_sdm_handle_get (sess=0x7f9343070010, stream=0x7f9344fce0a0,
recvmsg=0x7f933f52d5a0) at ../src/smf/nudm-handler.c:290
7 0x00005591b8600c96 in smf_gsm_state_wait_5gc_sm_policy_association (s=0x7f9343070610, e=0x7f9338076730)
at ../src/smf/gsm-sm.c:523
...
Without this change, using metrics with core setup configurations
(configs/vonr.yaml for example) would not be possible. Having one
metrics section for whole config file causes every NF to start metrics
server on same port causing an abort.
When UE would send a request to release PDU session, AMF would
eventually send "PDU Session Resource Release Command" downlink to both
UE (N1) and gNB (N2). Each UE and gNB would then reply with "PDU Session
Resource Release Response" indicating they released their own resources.
Usually the first one to respond would be gNB. SMF made an assumption
that this would always be the case. And it would wait for signal that UE
resources were freed, before releasing session resources. But
occasionally the situation is that UE responds first, and SMF releases
resources prematurely.
This situation does not normally occur. But under high stress (100's of
UE PDU releases at the same time) this happens occasionally.
According to the standard, this situation is perfectly normal.
3GPP TS 23.502 Rel. 16
4.3.4.2 UE or network requested PDU Session Release for Non-Roaming and
Roaming with Local Breakout
...
Steps 8-10 may happen before steps 6-7.
...
== Known limitation ==
Placing npcf-smpolicycontrol and pcf-policyauthorization
in different NFs is not supported. Both npcf-smpolicycontrol
and pcf-policyauthorization should be placed in the same NF.
- Gy instead of Gx AVP was used.
- Use correct avp position and avp variables.
Fixes: 657eef9169 ("[SMF] send 3GPP-Charging-Characteristics on Gx if received on S5/8c")
The 3GPP-Charging-Characteristics is an operator specific AVP
(optional). The 3GPP-Charging-Characteristics can be filled by the HSS
and forwarded by the MME towards the SMF.
In case that SMF was configured to run without Diameter, it would crash
on application exit due to uninitialized variables/pointers.
ERROR pid:unnamed in fd_sess_handler_destroy@sessions.c:324: ERROR: Invalid parameter '(handler && ( ((*handler) != ((void *)0)) && ( ((struct session_handler *)(*handler))->eyec == 0x53554AD1) ))', 22
[smf] FATAL: smf_gx_final: Assertion `ret == 0' failed. (../src/smf/gx-path.c:1353)
As per 3GPP TS 29.060 version 15.3.0, section 7.3.3, 7.3.4, 7.3.5 and 7.3.6
Only if PCO IE is included in Update/Delete PDP Context Request then it
must be present in Update/Delete PDP Context Response.
In order to reflect on whether the request contained PCO IE or not
the SMF context containing the GTP request needs to be updated
i.e. update if present else clear the contents
As per 3GPP TS 29.274 version 10.5.0, section 7.2.9 and 7.2.10,
Only if PCO IE is included in Delete Session Request then it
must be present in Delete Session Response.
In order to reflect on whether the request contained PCO IE or not
the SMF context containing the GTP request needs to be updated
i.e. update if present else clear the contents
In case there are multiple AMF registered to NRF, SMF would pick only
the first AMF from the list.
In the case of sending PDU Session Establishment Accept from SMF to
AMF, this would mean a high chance of failure since the AMF might
be different than the original requester, and would not know about a
particular UE.
Modify SMF to use ServingNfId field from original request
SmContextCreateData from AMF to determine to which AMF should it send
PDU Session Establishment Accept message.
3GPP TS 29.244 7.2.2.4.2 documents that the peer will set SEID=0 in the
response when we request something for a session not existing at the peer.
If that's the case, we still want to locate the local session which
originated the request, so let's store the local SEID in the xact when
submitting the message, so that we can retrieve the related SEID and
find the session if we receive SEID=0.
It was spotted that if DeleteSessionReq sent by SMF is answered by UPF
with cause="Session context not found", then it contains SEID=0 (this is
correct as per specs). Hence, since SEID=0 session is not looked up, so
sess=NULL.
A follow up commit improves the situation by looking up the SEID in the
originating request message in that case.
* [SMF] Avoid abort() if gtp_node mempool becomes full
Related: https://github.com/open5gs/open5gs/issues/1621
* [SMF] metrics: Add new ctr tracking gtp_node allocation failures
This metrics is useful to track whether at some point the mempool went
full, so that config needs to be updated to increase the mempool size.
Gy (3GPP TS 32.299 ) refers to AVP in DCCA (RFC4006).
RFC4006 5.1.2:
"[...] by including the Multiple-Services-Indicator AVP in the first
interrogation."
Nokia's infocenter documentation also states it's sent during Initial CCR
only: "(CCR-I only)".
smf_gtp_node_pool were properly freed.
However, the seqence was wrong, so we got a warning message.
To solve this problem, I've moved smf_gtp_node_alloc/free
from gtp_path.[ch] to context.[ch]
* Initial metrics support based on Prometheus
This commit introduces initial support for metrics in open5gs.
The metrics code is added as libogsmetrics (lib/metrics/), with a well
defined opaque API to manage different types of metrics, allowing for
different implementations for different technologies to scrap the
metrics (placed as lib/metrics/<impl>/. The implementation is right now
selected at build time, in order to be able to opt-out the related dependencies
for users not interested in the features. 2 implementations are already
provided in this commit to start with:
* void: Default implementation. Empty stubs, acts as a NOOP.
* prometheus: open5gs processes become Prometheus servers, offering
states through an http server to the Prometheus scrappers. Relies on
libprom (prometheus-client-ci [1] project) to track the metrics and format
them during export, and libmicrohttpd to make the export possible through
HTTP.
[1] https://github.com/digitalocean/prometheus-client-c
The prometheus-client-c is not well maintained nowadays in upstream, and
furthermore it uses a quite peculiar mixture of build systems (autolib
on the main dir, cmake for libprom in a subdir). This makes it difficult
to have it widely available in distros, and difficult to find it if it
is installed in the system. Hence, the best is to include it as a
meson subproject like we already do for freeDiameter. An open5gs fork is
requried in order to have an extra patch adding a top-level
CMakeList.txt in order to be able to includ eit from open5gs's meson
build. Furthermore, this allows adding bugfixes to the subproject if any
are found in the future.
* [SMF] Initial metrics support
* [SMF] Add metrics at gtp_node level
* docs: Add tutorial documenting metrics with Prometheus
* [SMF] Gn: Drop unreachable return line
* [SMF] Avoid crash if Create{Session,PdpContext}Resp fails to be sent
Crash spotted in a running open5gs-smfd process, triggered by:
ERROR: ogs_gtp_sendto() failed (1:Operation not permitted) (../lib/gtp/path.c:119)
ERROR: ogs_gtp_xact_commit: Expectation `rv == OGS_OK' failed. (../lib/gtp/xact.c:730)
ERROR: smf_gtp2_send_create_session_response: Expectation `rv == OGS_OK' failed. (../src/smf/gtp-path.c:451)
FATAL: smf_gsm_state_wait_pfcp_establishment: Assertion `OGS_OK == smf_gtp2_send_create_session_response( sess, gtp_xact)' failed. (../src/smf/gsm-sm.c:676)
* [SMF] Avoid crash if Delete{Sesson,PdpContext}Resp fails to be sent
Let's simply continuing with release of the session, there's not much we
can do about it. Peer will eventually realize the conn is no longer
there.
According to the following standards the response to the endpoint
/nudm-sdm/${supi}/sm-data should be an array of
SessionManagementSubscriptionData objects, instead of only one object.
TS 29.503 version 16.6.0
TS 29.505 version 16.4.0
UDR now responds to the request with only item in the array.
UDM copies all items as is.
SMF uses only the first item in the array, even if there are more
present.