A trace include too many flows or IP addresses for tools that try to keep state of each flow. You will probably need to extract packets of interest or split a trace by a stateless tool (e.g., tcpdump).
Use the MAC addresses (2 routers at both ends of the monitored link) in the pcap files. You can use "tcpdump -e -n -r <tracefile>" to identify MAC addresses, and "tcpdump -n -r <tracefile> ether [src|dst] <mac-addr>" to filter packets by MAC address.
It is probably due to asymmetric routing; one direction of a flow goes through the monitored link but the other direction through another link.
We use the port-mirroring feature of a router for copying packets and a commodity PC server and a NIC for capturing packets. So, we could fail to capture all packets.
Our priority is to sustain the system for a long term with our limited budget rather than pursuing high precision. So, we use commodity hardware and OS to capture packets, and the clock is synchoronized through normal NTP. Thus, the timestamps have only limited precision (typically sub-millisecond but could be worse under congestion).
WIDE accommodates various research experiments, and one of such experiments is USC's ANT project that tries to probe the entire IPv4 address space using ICMP echo requests/replies. Note that these ICMP requests are sent to a huge number of IP addresses, which inflates the number of IP addresses appearing in a trace.
The traffic in the WIDE backbone is a mixture of commodity traffic and research experiments. Most Japanese universities use SINET(AS2907) for their upstream, and the WIDE member universities are multi-homed with both SINET and WIDE. So, WIDE doesn't carry all traffic from these univeristies (but one exception is NAIST whose upstream is WIDE only). As for international connectivities, WIDE is connected to Internet2 via TransPAC so that traffic to/from US/EU/AsiaPacisifc academic sites traverses on this path.
WIDE provides connectivities to its member institutions and other related projects. See "http://two.wide.ad.jp/" for a topology map.
Regarding BGP, WIDE(AS2500) has one upstream, NTTC GIN(AS2914), and provides (partial) transits to its members and other projects, including Keio University(AS38635), University of Tokyo(AS2501), JAIST(AS17932,AS55384), NASIT(AS131158), NDA(AS23799), AI3(AS4717), M-ROOT-DNS(AS7500), E-DNS-JP(AS23634), JPNIC(AS2515), CKP(AS4718), APAN-JP(AS7660), ISC(AS24047). Also, WIDE has peering arrangements with many ASes through IXes.
The IP addresses appearing in traces are anonymized using a prefix-preserving method. The mapping is consistent only within a single trace for daily traces, but across the traces in a DITL dataset (a 24-hour or longer dataset).
Crypto-PAn is used for prefix-preserving anonymization. The anonymization method was switched from the original tcpdpriv to Crypto-PAn in August 2015.
samplepoint-F monitors the 1Gbps transit link in Tokyo to an upstream provider, NTTC GIN(AS2914). The traffic is port-mirrored to a 10Gbps link at the router, and then, captured using a commodity PC server and a 10GbE NIC via BPF on FreeBSD. The traffic is between WIDE and its non-peer ASes (mostly foreign commercial ASes and domestic small ASes, as WIDE has other paths to academic networks and peers with major domestic ASes).
samplepoint-G monitors the 10Gbps link to DIX-IE, an experimantal IX in Tokyo operated by WIDE. The traffic is port-mirrored to a 10Gbps link at the router, and then, captured using a commodity PC server and a 10GbE NIC via BPF on FreeBSD. The traffic is between WIDE and its peer ASes. (Note that this is the main IX for WIDE but there are other IXes, and most academic ASes are connected through other academic networks).
We can provide such versions only for research purposes. But it requires some work on our side so that you need to convince us that it is worth the effort (e.g., having done a homework before asking a favor). Further, if you are a student, we will ask for an endorsement from your adviser. Contact a WIDE member for further information.
Kenjiro Cho, Koushirou Mitsuya and Akira Kato.
"Traffic Data Repository at the WIDE Project".
USENIX 2000 FREENIX Track, San Diego, CA, June 2000.